$3.5 Million Build Log (Cryo-Electron Microscope Facility Installation)

$3.5 Million Cryo-Electron Microscope Install Build Log

I thought maybe some people would find this interesting, as it’s not exactly an every day occurrence, and yet has managed to be the bulk of my life for the past couple of years.

Background

I’m a PhD student working in a laboratory that is dedicated to solving high-resolution structures of various proteins and viruses. What that essentially means is that we are trying to build atomic-level 3D maps of macromolecular structures. I was initially drawn to the field because I’m not a particularly good biologist, and the thought of just being able to look at these things made it seem a lot simpler of a path to understand them.

Why am I acting like this is interesting to anyone here?

  1. I think the biology is universally fascinating, especially in light of a major biological epidemic impacting everyone’s lives recently. The ability to create high resolution maps of molecules helps us identify what different proteins do. For a few examples, what if there’s a mutation in a gene that makes everyone die young? One of the ways we figure out why that’s happening is by looking at the protein that the gene makes. Or if we want to see if/how a particular drug works, we can look at the drug with its target to see if it’s working the way we designed it to. Or in many cases, we don’t actually design the drug, but rather screen thousands of potential chemical compounds with the target protein to see if any of them do affect the target protein, and then we investigate how that mechanism works. For instance, here is an image of a protein (I believe a phospholipase) bound to a common drug nearly everyone has tried: aspirin. (The protein being the big colorful chain, the aspirin being the beige molecule in the middle).

(Source: https://www.tandfonline.com/doi/abs/10.1080/10611860400024078)

  1. After biology, computers are the next “rate-limiting step” to how we can determine structures of biomolecules. I will go into a lot more detail on this, but the main reason for this is because your average protein is on the size order of 1x10^-9 meters (nanometers). Imagine if you took a grain of sand, and cut it into 1 million pieces. Trying to look at a protein would be like trying look at one of those pieces. And even if we could just look at it, if we want accurate 3D information, we need to look at it (scan it, essentially) from all different angles. The reality is, when we see a protein, it usually is looking something like this:

(Note, this is actually a ribosome. Source: CSB Cryo-Electron Microscopy Facility | Center for Structural Biology | Vanderbilt University)

Obviously, we’re going to have to do something to enhance the data. That’s where computers come in. Data massaging can take an inordinately long amount of time. In fact, the total amount of data from top to bottom, just to solve a single protein in 3D can be anywhere from 5-20 TB. And all of that information goes into creating a single 3D model that is typically a couple-megabyte size coordinate list of atoms. Besides generating the sample, which can take years, working with the data is typically the next most difficult and time consuming parts, and we use some relatively advanced computational techniques for that. We also use some relatively brute force techniques, and as a result, the equipment is pretty impressive, and I believe, of interest to the community here.

How we actually look at these things:

There’s two traditional methods for getting high resolution structural information of biomolecules:

  1. X-Ray Crystallography: You generate the protein, and then concentrate it in the presence of hundreds, maybe thousands of different chemical combinations, and hope that one of them causes the protein to crystallize. Then you shoot it with xrays, and record the xray diffraction pattern which can be used to backproject the 3D structure of the electron density of the molecule. There’s multiple problems with this technique: it’s time consuming, many things don’t crystallize, and you can’t focus x-rays, so you don’t have good phase data on the structure, which can make them extremely tedious to solve. If you want to understand how it is extremely difficult, here is an x-ray diffraction pattern:

That’s the raw data that we get from doing x-ray crystallography, and have to use to determine a 3D structure.

  1. NMR spectroscopy of a protein. I honestly know nothing about this process because it’s extremely complicated and not widely used. My understanding is it’s inherently limited:

There’s also one new technique, known as Cryo-Electron Microscopy. This one is extremely simple compared to the other two (at least in theory). We generate the sample, and look at it in a microscope. Obviously there’s a few things that get in the middle, and we for some of them, we use some pretty heavy-handed computational approaches to solve those problems, which I’ll get into.

Anyways, this is what I’ve been working on part and full time since 2018 (in addition to some crystallography). In 2017, my principal investigator (PI; the PhD in charge of my lab at the university I am attending for graduate school) wrote a National Science Foundation grand to install a cryo-EM microscope facility at my university, which was a pretty huge deal for us because it was the largest ($3.5 million) instrument grant in the history of the university, and there’s not many of these microscopes in the world.

How Cryo-Electron Microscopy Works

I’ll outline a basic workflow for generating high-resolution 3D structures:

  1. Generate tons of sample (this involves molecular biology I won’t get into)
  2. Get that sample frozen onto a tiny copper grid coated with carbon (it has to be frozen because we are going to shoot it with electrons)
  3. Put that sample into an electron microscope (has to be electron because light’s diffraction limit is too high to resolve small objects like proteins) and take images of it. Hopefully, it’s on that grid in every possible orientation
  4. Feed those images to a computer to pull and average together 2D particles into distinct classes of orientations
  5. Use that 2D data to perform a fourier reconstruction of the molecule in 3D
  6. Backproject that fourier reconstruction into normal space to get the actual human-interpetable 3D reconstruction
  7. Iterate 4-7 until you get the best possible outcome

The Install

OK, let’s get to the hardware.

Here’s the microscope:

Inside that case is a 200 kV Talos Arctica cryo electron microscope. It accelerates electrons to 200 kV, and shoots them through a sample frozen in vitreous ice. Underneath that, it hits a 24 megapixel CMOS electron sensor. As the electrons encounter the sample, they scatter. We can refocus the scattered electrons at a higher angle and use that to essentially “magnify” the image. The images are images of electron density.

Because we have a “gun” up top (this particular model use a field emission gun as an electron source) that is capable of accelerating electrons at voltages of up to 200 kV (some of the bigger scopes go up to 300 kV), and also because we don’t want those electrons to encounter anything but our sample (as that would add noise), we have to keep the entire column of the microscope under vacuum.

This abvove image is the field emission gun, mounted at the top of the column. It’s crucrial that it do two things:

  1. It needs to emit just a single electron (or as close to a single electron) at a time as possible, so that we can have precise control over how many electrons we are dosing the sample with
  2. It needs to emit those electrons at 200 kV. If it emits them at significantly different voltages, when we apply current in the lenses to refocus those electrons, they will focus at a different spot and subtract from the signal.

Here’s a closeup of the 200 kV high-tension source for the microscope:

Again, at high-voltage, keeping things under vacuum is extremely important, because air can conduct electricity at extremely high voltages. For that reason, the high-tension source is insulated with SF6.

We also have to keep everything extremely cold. When we start shooting our sample with electrons, it will want to start moving. The colder it is, the more electrons it can absorb before it starts moving. Once it starts moving though, getting high-resolution information is impossible, because we can’t align the particles well at that point. The other reason to keep everything cold is that it helps with the column vacuum. Less heat = less kinetic energy of any molecules that may be inside the column.

Looking inside the microscope:

The column is in the center. The stuff sticking off it are apertures to block out incident electrons (as well as incident x-rays that are accidentally generated).

On older microscopes everything was controlled by hand. However, these apertures, both their x and y positions, as well as their diameter, are controlled by servos electronically.

This is an ion-getter pump. It’s pretty much the final pump in a series of pumps, and is responsible for the lowest vacuum (10^-11 mBar) in the system. It puts a strong current (3-10kv) through any gaseous molecules to ionize them, and then drives them into a solid electrode of the opposite charge.

Vibration is the enemy, so this whole thing is on air-bags:

Down here are the cameras:

You can also see one of the sensors down there for the active EMI cancellation field system we have. This is the controller for that system:

Getting into the computers…We have a number of computers, for a few different reasons.


Above is the primary control computer. It is not networked, for obvious reasons. You have to VPN in from the computer on the right, which is terrible and slow (4GB RAM). That computer is for physical control of the microscope, aligning the electron beam, etc. It is not for collecting data.

Here’s the monitor we use to collect data:

That computer contains five dual-processor FPGAs in addition to its dual v4 Xeons, 256 GB RAM, and 8TB RAID 0 array. The FPGAs have direct fiber hookups to the camera, and we can fully automate careful data collection (which can take days) from it. It is unfortunately stuck on Windows Server 2012. Kill me now.

Here’s the rear of that PC (it’s rackmounted) showing the fiber connections.


Each fiber connection handles 1/5th the 24 megapixel image in pretty damn near close to instant time. It’s got 10GB connections to the camera controller, microscope PC, and LAN, as well as 4 power supplies.

I mentioned earlier that we had on average 5-20 TB raw data per dataset. I fought an epic pitched battle with campus IT and eventually won (well, at least a victory in my opinion - they agreed to leave me alone). I used a SuperMicro server chassis with 36 drive bays (24 front, 12 rear), which we put 16 TB drives in. Currently, it’s only 1/3rd full as we are still in the baby stages, but that’s ~120 TB storage, and we should be able to triple that.

The rest of the specs:

1x AMD 7272 (12c/24t)
128 GB RAM
12 16TB drives
TrueNAS Core 12 w/RAIDZ 2 configuration

Above that is the crown jewel, our computational server. Unfortunately one I lost the bid to build myself on (over another pitched battle with IT). I did however get some pretty phenomenal pricing after playing a very hard line.

That’s running:

2x AMD 7282 CPUs (each is 16c/32t)
128 GB RAM
2x Nvidia A40

It’s also pulling 20 amps all on its own.

As of this week, all the hardware is here and functional. I will post a part 2 about software configurations and the hell that is active directory when no-one in the IT department wants to help you! (Not to mention the hell of academic software)
:smiley:

I will also walk through a whole software refinement and post some benchmarks for anyone that might be interested.

Now I’m going to go grab a drink because this thread is a couple years of my life and doesn’t seem like it’s all that much to show for it yet lol

28 Likes

Wow, absolutely amazing.
I’d love to hear more about this, can’t wait for part 2 :stuck_out_tongue:

I have some n00b questions though:

That camera seems to have an awful lot of bandwidth, what kind refresh is it running at? How long would a typical exposure be? Is it running “line-based” or “full frame”?

2x Nvidia A40 That computer has more ECC GDDR6 than I have SSD storage for my Linux desktop. Are you running “traditional” algorithms on there, or is this for using machine learning?

Also Microscope is a little bit of a misnomer when it basically requires it’s own room plus supporting infrastructure, isn’t it? :wink:

1 Like

Does your team participate in any of the CASP stuff?

1 Like

Just wanted to say thank you for an amazing post

It brought back memories of doing CFD on single core Pentiums at the turn of the millennium. It amazes me how much computing has progressed in 20 years.

Good luck with the project!

1 Like

Pinned for a week because it’s cool

That is a damn impressive setup.
But then again, looking at small things is incredibly hard.

1 Like

Being a medicinal chemist myself, I absolutely love this post.

1 Like

This has to be one of the coolest things I’ve seen in a long time. Thank you so much for sharing!

1 Like

Coool!

Would it be possible to use liquid helium in this system? You mentioned the temperature being important for the stability.

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.