As people around the world marveled in July at the most detailed pictures of the cosmos snapped by the James Webb Space Telescope, biologists got their first glimpses of a different set of images — ones that could help revolutionize life sciences research.
The images are the predicted 3-D shapes of more than 200 million proteins, rendered by an artificial intelligence system called AlphaFold. “You can think of it as covering the entire protein universe,” said Demis Hassabis at a July 26 news briefing. Hassabis is cofounder and CEO of DeepMind, the London-based company that created the system. Combining several deep-learning techniques, the computer program is trained to predict protein shapes by recognizing patterns in structures that have already been solved through decades of experimental work using electron microscopes and other methods.
The AI’s first splash came in 2021, with predictions for 350,000 protein structures — including almost all known human proteins. DeepMind partnered with the European Bioinformatics Institute of the European Molecular Biology Laboratory to make the structures available in a public database.
July’s massive new release expanded the library to “almost every organism on the planet that has had its genome sequenced,” Hassabis said. “You can look up a 3-D structure of a protein almost as easily as doing a key word Google search.”
These are predictions, not actual structures. Yet researchers have used some of the 2021 predictions to develop potential new malaria vaccines, improve understanding of Parkinson’s disease, work out how to protect honeybee health, gain insight into human evolution and more. DeepMind has also focused AlphaFold on neglected tropical diseases, including Chagas disease and leishmaniasis, which can be debilitating or lethal if left untreated.
The release of the vast dataset was greeted with excitement by many scientists. But others worry that researchers will take the predicted structures as the true shapes of proteins. There are still things AlphaFold can’t do — and wasn’t designed to do — that need to be tackled before the protein cosmos completely comes into focus.
Having the new catalog open to everyone is “a huge benefit,” says Julie Forman-Kay, a protein biophysicist at the Hospital for Sick Children and the University of Toronto. In many cases, AlphaFold and RoseTTAFold, another AI researchers are excited about, predict shapes that match up well with protein profiles from experiments. But, she cautions, “it’s not that way across the board.”
Predictions are more accurate for some proteins than for others. Erroneous predictions could leave some scientists thinking they understand how a protein works when really, they don’t. Painstaking experiments remain crucial to understanding how proteins fold, Forman-Kay says. “There’s this sense now that people don’t have to do experimental structure determination, which is not true.”
Proteins start out as long chains of amino acids and fold into a host of curlicues and other 3-D shapes. Some resemble the tight corkscrew ringlets of a 1980s perm or the pleats of an accordion. Others could be mistaken for a child’s spiraling scribbles.
A protein’s architecture is more than just aesthetics; it can determine how that protein functions. For instance, proteins called enzymes need a pocket where they can capture small molecules and carry out chemical reactions. And proteins that work in a protein complex, two or more proteins interacting like parts of a machine, need the right shapes to snap into formation with their partners.
Knowing the folds, coils and loops of a protein’s shape may help scientists decipher how, for example, a mutation alters that shape to cause disease. That knowledge could also help researchers make better vaccines and drugs.
For years, scientists have bombarded protein crystals with X-rays, flash frozen cells and examined them under highpowered electron microscopes, and used other methods to discover the secrets of protein shapes. Such experimental methods take “a lot of personnel time, a lot of effort and a lot of money. So it’s been slow,” says Tamir Gonen, a membrane biophysicist and Howard Hughes Medical Institute investigator at the David Geffen School of Medicine at UCLA.
Such meticulous and expensive experimental work has uncovered the 3-D structures of more than 194,000 proteins, their data files stored in the Protein Data Bank, supported by a consortium of research organizations. But the accelerating pace at which geneticists are deciphering the DNA instructions for making proteins has far outstripped structural biologists’ ability to keep up, says systems biologist Nazim Bouatta of Harvard Medical School. “The question for structural biologists was, how do we close the gap?” he says.
For many researchers, the dream has been to have computer programs that could examine the DNA of a gene and predict how the protein it encodes would fold into a 3-D shape.
Here comes AlphaFold
Over many decades, scientists made progress toward that AI goal. But “until two years ago, we were really a long way from anything like a good solution,” says John Moult, a computational biologist at the University of Maryland’s Rockville campus.
Moult is one of the organizers of a competition: the Critical Assessment of protein Structure Prediction, or CASP. Organizers give competitors a set of proteins for their algorithms to fold and compare the machines’ predictions against experimentally determined structures. Most AIs failed to get close to the actual shapes of the proteins.
Then in 2020, AlphaFold showed up in a big way, predicting the structures of 90 percent of test proteins with high accuracy, including two-thirds with accuracy rivaling experimental methods.
Deciphering the structure of single proteins had been the core of the CASP competition since its inception in 1994. With AlphaFold’s performance, “suddenly, that was essentially done,” Moult says.
Since AlphaFold’s 2021 release, more than half a million scientists have accessed its database, Hassabis said in the news briefing. Some researchers, for example, have used AlphaFold’s predictions to help them get closer to completing a massive biological puzzle: the nuclear pore complex. Nuclear pores are key portals that allow molecules in and out of cell nuclei. Without the pores, cells wouldn’t work properly. Each pore is huge, relatively speaking, composed of about 1,000 pieces of 30 or so different proteins. Researchers had previously managed to place about 30 percent of the pieces in the puzzle.
That puzzle is now almost 60 percent complete, after combining AlphaFold predictions with experimental techniques to understand how the pieces fit together, researchers reported in the June 10 Science.
Now that AlphaFold has pretty much solved how to fold single proteins, this year CASP organizers are asking teams to work on the next challenges: Predict the structures of RNA molecules and model how proteins interact with each other and with other molecules.
For those sorts of tasks, Moult says, deep-learning AI methods “look promising but have not yet delivered the goods.”
Where AI falls short
Being able to model protein interactions would be a big advantage because most proteins don’t operate in isolation. They work with other proteins or other molecules in cells. But AlphaFold’s accuracy at predicting how the shapes of two proteins might change when the proteins interact are “nowhere near” that of its spot-on projections for a slew of single proteins, says Forman-Kay, the University of Toronto protein biophysicist. That’s something AlphaFold’s creators acknowledge too.
The AI trained to fold proteins by examining the contours of known structures. And many fewer multiprotein complexes than single proteins have been solved experimentally.
Forman-Kay studies proteins that refuse to be confined to any particular shape. These intrinsically disordered proteins are typically as floppy as wet noodles (SN: 2/9/13, p. 26). Some will fold into defined forms when they interact with other proteins or molecules. And they can fold into new shapes when paired with different proteins or molecules to do various jobs.
AlphaFold’s predicted shapes reach a high confidence level for about 60 percent of wiggly proteins that Forman-Kay and colleagues examined, the team reported in a preliminary study posted in February at bioRxiv.org. Often the program depicts the shapeshifters as long corkscrews called alpha helices.
Forman-Kay’s group compared AlphaFold’s predictions for three disordered proteins with experimental data. The structure that the AI assigned to a protein called alpha-synuclein resembles the shape that the protein takes when it interacts with lipids, the team found. But that’s not the way the protein looks all the time.
For another protein, called eukaryotic translation initiation factor 4E-binding protein 2, AlphaFold predicted a mishmash of the protein’s two shapes when working with two different partners. That Frankenstein structure, which doesn’t exist in actual organisms, could mislead researchers about how the protein works, Forman-Kay and colleagues say.
AlphaFold may also be a little too rigid in its predictions. A static “structure doesn’t tell you everything about how a protein works,” says Jane Dyson, a structural biologist at the Scripps Research Institute in La Jolla, Calif. Even single proteins with generally well-defined structures aren’t frozen in space. Enzymes, for example, undergo small shape changes when shepherding chemical reactions.
If you ask AlphaFold to predict the structure of an enzyme, it will show a fixed image that may closely resemble what scientists have determined by X-ray crystallography, Dyson says. “But [it will] not show you any of the subtleties that are changing as the different partners” interact with the enzyme.
“The dynamics are what Mr. AlphaFold can’t give you,” Dyson says.
A revolution in the making
The computer renderings do give biologists a head start on solving problems such as how a drug might interact with a protein. But scientists should remember one thing: “These are models,” not experimentally deciphered structures, says Gonen, at UCLA.
He uses AlphaFold’s protein predictions to help make sense of experimental data, but he worries that researchers will accept the AI’s predictions as gospel. If that happens, “the risk is that it will become harder and harder and harder to justify why you need to solve an experimental structure.” That could lead to reduced funding, talent and other resources for the types of experiments needed to check the computer’s work and forge new ground, he says.
Harvard Medical School’s Bouatta is more optimistic. He thinks that researchers probably don’t need to invest experimental resources in the types of proteins that AlphaFold does a good job of predicting, which should help structural biologists triage where to put their time and money.
“There are proteins for which AlphaFold is still struggling,” Bouatta agrees. Researchers should spend their capital there, he says. “Maybe if we generate more [experimental] data for those challenging proteins, we could use them for retraining another AI system” that could make even better predictions.
He and colleagues have already reverse engineered AlphaFold to make a version called OpenFold that researchers can train to solve other problems, such as those gnarly but important protein complexes.
Massive amounts of DNA generated by the Human Genome Project have made a wide range of biological discoveries possible and opened up new fields of research (SN: 2/12/22, p. 22). Having structural information on 200 million proteins could be similarly revolutionary, Bouatta says.
In the future, thanks to AlphaFold and its AI kin, he says, “we don’t even know what sorts of questions we might be asking.”