An AI system developed by UK-based company DeepMind has achieved the long-sought-after goal of accurately predicting the shape of proteins from their sequence alone, a key part of understanding how the machinery of life works. In a competition, AlphaFold was able to match two-thirds of the results achieved by humans doing expensive and time-consuming lab experiments.
“I was really wowed when I saw it,” says John Moult at the University of Maryland, one of the competition’s organisers. “This is the first time we’ve come close to approaching experimental usefulness, which is pretty extraordinary.”
Proteins are vital for life. Cells are full of machines – from turbines that generate energy to transporters that walk along tracks pulling cargo – that are built from proteins, and the shapes of these machines are crucial. For instance, the coronavirus can enter and infect cells because the spike protein on its surface fits into a receptor on human cells, like a key into a lock.
These shapes depend on the sequence of 20 different amino acids that are chained together to make proteins. It is easy to work out the sequence of any protein because this is determined by the DNA that codes for it. But despite half a century of efforts, biologists hadn’t previously been able to work out the shape of a protein from its sequence alone.
Instead, they have had to rely on experimental methods such as X-ray crystallography, which involves analysing the diffraction pattern formed when an X-ray beam is fired through a protein crystal.
“This is exceptionally difficult,” says John Jumper, who leads the AlphaFold team at DeepMind. Making crystals of some proteins is hard, and interpreting the diffraction patterns can be tricky.
Brute-force computing based on physics alone isn’t an option, because proteins are too complex. Instead many groups worldwide have turned to machine learning, where AI systems are trained using data sets of known protein structures.
For each target protein, groups including DeepMind’s look for variants found in related species and feed their sequence and structure into the AI system, along with the sequence of the target protein. The idea is that the system learns to work out the shape of the target protein by looking at patterns linking sequence and structure.
In 1994, Moult and a colleague set up the CASP (Critical Assessment of protein Structure Prediction) competition to judge the performance of computer predictions. Any group that wants to enter is sent the sequences of proteins whose structure has been determined experimentally but not yet published.
Predicted shapes are scored out of 100 based on how close each amino acid is to the position determined by experiment. A score above 90 is considered to be on a par with results obtained by experiments.
In the 2016 competition, the best team got a median score of around 40 in the hardest category. In 2018, the first version of AlphaFold got a median score of nearly 60 in this category. This year, a redesigned AlphaFold got a median score of 87 in the hardest category. Across all categories, it scored above 90 for two-thirds of the proteins.
While this result is amazing, there were some clear failures, says Moult. For instance, AlphaFold didn’t do well with a protein whose structure is influenced by interactions with other proteins that surround it.
This variability could be an issue, but AlphaFold also provides a measure of how trustworthy its predictions are, so scientists will know which ones to rely on, says Jumper. “This is huge.”
Separate from the competition, Andrei Lupas at the Max Planck Institute for Developmental Biology in Germany had been trying to work out the structure of a particular protein for a decade until DeepMind offered to help. A few tweaks were needed to improve accuracy, but Lupas’s team had the final structure within half an hour of receiving AlphaFold’s prediction. “It’s astonishing,” he says. “It’s really astonishing.”
Lupas thinks for the next few years researchers will still need to do some experimental work to check shape predictions, but will eventually be able to rely on computation alone. This will make a huge difference, he says, but the real revolution will come from being able to use computers to predict how proteins interact with other molecules.
“This will completely change the face of medicine,” says Lupas. For instance, AlphaFold was able to predict the shapes of several coronavirus proteins soon after the virus was first sequenced in January, he says. Even better would be to have the ability to predict which of the thousands of existing drugs bind to these proteins and might have a therapeutic effect, without having to do expensive experiments.
DeepMind has revealed few details about AlphaFold so far, but says it will soon publish a paper. The company was unable to say how scientists will be able to get access to the technology, but says it is keen for it to be widely available. “We want to make sure this has the biggest impact,” says Pushmeet Kohli at DeepMind.
More on these topics: