AI lab DeepMind has used artificial intelligence to create the most comprehensive human protein map to date. The company, a subsidiary of Google’s parent company Alphabet, makes the data free, and some scientists are comparing the potential impact of the work to the impact of the Human Genome Project, an international effort to map all human genes.
Proteins are long, complex molecules that perform numerous tasks in the body, from forming tissues to fighting disease. Their purpose is determined by the structure that folds into complex and irregular shapes like origami. Understanding how proteins fold helps explain their function, which helps scientists perform tasks ranging from basic research into how the body works to designing new drugs and treatments.
Previously, we relied on expensive and time-consuming experiments to determine protein structure. However, last year DeepMind showed that it can generate accurate predictions of protein structure using an AI software called AlphaFold. Now, the company is making the hundreds of thousands of predictions made by the program open to the public.
Demis Hassabis, Company CEO and Co-Founder, said, “I think Demis Hassabis is the pinnacle of DeepMind’s entire 10-year lifespan. The Verge. “From the beginning, this is what we are trying to do. To make breakthroughs in AI, we test in games like Go and Atari, [and] Apply them to real-world problems to see if they can be used to accelerate scientific breakthroughs and benefit humanity.”
There are currently around 180,000 protein structures available in the public domain, each of which is experimentally generated and accessible through the Protein Data Bank. DeepMind uses animals such as mice and fruit flies, coli. (There is some overlap between DeepMind’s data and existing protein structures, but the nature of the model makes it difficult to quantify exactly how much.) Most importantly, the release contains predictions for 98% of all human proteins and contains approximately 20,000 each of them. Another structure known collectively as the human proteome. Although not the first public data set on human proteins, it is the most comprehensive and accurate.
Scientists can download the entire human proteome themselves if they want, says John Jumper, AlphaFold’s head of technology. “HumanProteome.zip effectively exists. It appears to be around 50 GB in size,” says Jumper. The Verge. “You can save it to a flash drive if you want, but it doesn’t do much without a computer for analysis!”
After DeepMind launches the first tranche of this data, it plans to continue adding protein repositories to be maintained by the European Institute of Molecular Biology (EMBL), Europe’s leading life sciences laboratory. According to Edith Heard, director of EMBL, DeepMind hopes to publish predictions of 100 million protein structures by the end of the year.
Hassabis says the data will be made available permanently and free of charge to both scientific and commercial researchers. “Anyone can use anything,” DeepMind CEO said in a press briefing. “They just need to recognize the people involved in the citation.”
Understanding the structure of proteins is useful to scientists in a variety of fields. Information can help design new drugs, synthesize new enzymes that break down waste, and create crops that are resistant to viruses and extreme weather. Already, DeepMind’s protein predictions are being used in medical research, including studying the action of SARS-CoV-2, the virus that causes COVID-19.
New data will accelerate these efforts, but scientists say it will still take a long time to turn this information into real results. “I don’t think we’re going to change the way we treat patients within this year, but it will definitely have a big impact on the scientific community,” said Marcelo C. Susa, a professor of biochemistry at the University of Colorado. , said The Verge.
DeepMind senior research scientist Kathryn Tunyasuvunakool says scientists need to get used to putting such information at their fingertips. “As biologists, we can confirm that there is no playbook where we can even see 20,000 structures. [amount of data] This is very unexpected,” said Tunyasuvunakool. Budge. “Analyzing hundreds of thousands of structures is crazy.”
But especially DeepMind’s software is prediction Additional work is sometimes required to confirm the structure of a protein structure rather than an experimentally determined model. DeepMind revealed that they spent a lot of time building accuracy metrics into their AlphaFold software.
However, the prediction of protein structure is still very useful. Determining the structure of proteins through experimental methods is expensive, time consuming, and relies on a lot of trial and error. This means even less reliable predictions can save scientists years of work by pointing the right direction for research.
Helen Walden, professor of structural biology at the University of Glasgow, says: The Verge DeepMind’s data will “significantly alleviate” research bottlenecks, but will continue “the laborious and resource-consuming task of, for example, performing biochemical and biological assessments of drug function.”
Sousa, who previously used data from AlphaFold in his work, tells scientists that the impact will be felt immediately. “In our collaboration with DeepMind, we had a data set with protein samples that we had for 10 years and we never got to the point of developing a suitable model,” he says. “DeepMind agreed to give us a rescue and we were able to solve the problem in 15 minutes after we sat for 10 years.”
Why is protein folding difficult?
Proteins are made up of chains of 20 different types of amino acids in the human body. Every individual protein can be made up of hundreds of individual amino acids, and each amino acid can be folded and twisted in different directions, so the final structure of a molecule has an incredibly large number of possible configurations. One estimate is that a typical protein can fold in 10^300 ways. That is, 1 followed by 300 zeros.
Because proteins are too small to examine under a microscope, scientists have had to indirectly determine their structures using expensive and complex methods such as nuclear magnetic resonance and X-ray crystallography. The idea of determining the structure of a protein just by reading a list of its constituent amino acids has long been theorized but difficult to achieve, so many describe it as a “great challenge” in biology.
However, in recent years, computational methods, especially those using artificial intelligence, have suggested that such an analysis is possible. With these techniques, an AI system is trained on a data set of known protein structures and uses this information to generate its own predictions.
Many groups have been working to address this issue for years, but DeepMind’s access to deep AI talent and computing resources has been able to dramatically accelerate progress. Last year, the company beat the competition by participating in an international protein folding competition known as CASP. The results are so accurate that computer biologist John Moult, one of CASP’s co-founders, said: [of protein folding] resolved.”
DeepMind’s AlphaFold program has been upgraded since last year’s CASP competition and is now 16x faster. “We can fold the average protein in minutes, in most cases seconds,” says Hassabis. The company also open sourced the underlying code of AlphaFold last week, allowing others to build on their work in the future.
Professor Liam McGuffin of the University of Reading, who developed the UK’s leading protein folding software, praised AlphaFold for technical excellence, but noted that the success of the program relies on decades of prior research and public data. “DeepMind has vast resources to keep this database up-to-date,” McGuffin said. “It’s better suited to doing this than a single academic group.” The Verge. “I think the scholars would eventually get there, but it would have been slower because there were not enough resources.”
Why is DeepMind interested?
many scientists The Verge You mentioned DeepMind’s generosity in releasing this data for free. After all, the lab is owned by Alphabet, Google’s parent company, which is devoting vast amounts of resources to commercial medical projects. DeepMind itself loses a lot of money every year, and there have been numerous reports of tensions between the company and its parent company over issues like research autonomy and commercial viability.
But, says Hassabis. The Verge The company has always planned to make this information available for free, and doing so is a realization of the founding spirit of DeepMind. He emphasizes that DeepMind’s work is used in many places at Google. “Almost everything we use has some of our technology in it.” But the main goal of the company has always been fundamental research.
“The consensus at the time of the acquisition was primarily to advance the state of AGI and AI technologies and then use them to accelerate scientific innovation,” Hassabis says. “[Alphabet] There are a lot of departments that are focused on making money,” he adds, adding that DeepMind’s research focus “brings all sorts of benefits in terms of reputation and goodwill for the scientific community. There are many ways to get value.”
Hassabis predicts that AlphaFold is a sign of the future, a project that shows the tremendous potential of artificial intelligence to tackle messy problems like human biology.
“I think we are in a really exciting moment,” he says. “Over the next decade, we and others in the field of AI are hoping to make incredible breakthroughs that will truly accelerate solutions to some of the really big problems we face on the planet.”