"Shaping" the Future of Medicine: How AlphaFold's Protein Structure Prediction Advances Medical Science

(Image Credit: technologynetworks.com)

(Image Credit: nature.com)

August 20, 2024

Kathlyn Phan  

12th Grade

Fountain Valley High School



In recent years, AlphaFold has emerged as a groundbreaking tool within the artificial intelligence industry by predicting protein structures with incredible accuracy. Developed by DeepMind from Google, AlphaFold has been widely used by researchers and scientists for its 3D protein structure prediction and modeling capabilities since its debut. Although the tool has some limitations, AlphaFold is still highly regarded as the top protein structure prediction method according to CASP14. Through deep learning, this new technology holds the power to enhance disease detection and help scientists with drug discovery that could revolutionize the world of medical science.


In 2021, AlphaFold was created to use deep learning to take amino acid sequences and output precise 3D protein structure models. All of its protein structure predictions are stored in the AlphaFold Protein Structure Database which is publicly available. Since its creation, the database has grown from 300k to over 214 million protein structure models in 2024. This immense database allows the program to keep learning from past entries and training to further improve upon its prediction accuracy. Since the database is public, it has been generally utilized as a primary data source to help expand the world of medical science. The digital library is also exceptionally user-friendly by allowing people to directly download files or use its cloud-based tool to access information. Its ease of use allows the AlphaFold Protein Structure Database to appeal to a wider scientific community. 


This innovative technology was developed with an architecture that is comprised of multiple interconnected modules trained together in a sequence using “end-to-end” training. Each module is tasked with a different set of objectives. “End-to-end” training means that an input is fed and each module carries out its task and passes it through the sequence to produce a cumulative output. The benefit of this type of training is that the modules learn to work together to produce a cohesive result and eliminate the need to refine each individual module.


AlphaFold operates by taking the user input which is submitted as a FASTA file of the targeted protein’s primary sequence. The sequence then goes through the multiple sequence alignment (MSA) search. The MSA search contains three different tools, JackHMMER, UniRef90, and HHBlits, which each hold different top matches from the database that are compared to the input information. From this search, the top four matches are selected to become templates for the next step, prediction models. In the prediction models step, the MSA and templates are fed into five different neural network architectures that run the information through their system multiple times to refine it in a process called “recycling”. As a result, they produce multiple slightly different 3D protein structure models. The final step of the AlphaFold process is the AMBER relaxation which assesses and ranks the protein structure models based on the program’s confidence of how accurate it is. It determines this through the average predicted local distance difference test (pLDDT).


Although AlphaFold leverages advanced AI technology, it does contain some limitations that negatively affect its accuracy and efficiency. DeepMind states that the multiple sequence alignment (MSA) search decreases in accuracy when there are fewer than 30 sequences. This means that a certain amount of information about the protein structure must be provided prior to using AlphaFold to predict the rest of it. AlphaFold additionally encounters difficulty when traversing large unstructured loops, making N-C terminal predictions, and modeling flexible domains within proteins. 


Despite AlphaFold’s flaws, its technology is still pivotal in the biotechnology field because it provides an understanding of biological processes and detects diseases. Accurate structural predictions help researchers learn how proteins function, how mutations affect them, and how they interact with drugs. A significant amount of the research done by using AlphaFold focuses on understanding and treating diseases that kill millions of people. Some of these diseases include Chagas disease and leishmaniasis which are prevalent in poor and vulnerable communities. AlphaFold has also been used to aid the discovery of a malaria vaccine and antibiotic resistance research. By expanding upon our understanding of protein structures, AlphaFolds paves the way for personalized medicine and advancements in healthcare that can transform the way we handle diseases and health conditions.

(Image Credit: paperswithcode.com)

Reference Sources

AlphaFold. “AlphaFold Protein Structure Database.” Alphafold.ebi.ac.uk, 2022, 

https://alphafold.ebi.ac.uk/.

Callaway, Ewen. “Major AlphaFold Upgrade Offers Boost for Drug Discovery.” Nature, vol. 629, 8 May 2024,

www.nature.com/articles/d41586-024-01383-z, https://doi.org/10.1038/d41586-024-01383-z. Accessed 10 May 2024.

EMBL-EBI. “What Is AlphaFold? | AlphaFold.” Ebi.ac.uk, 2021, 

www.ebi.ac.uk/training/online/courses/alphafold/an-introductory-guide-to-its-strengths-and-limitations/what-is-alphafold/#:~:text=AlphaFold%20is%20Google%20DeepMind. Accessed 17 Aug. 2024.

Google DeepMind. “AlphaFold.” Google DeepMind, 5 Aug. 2024,

https://deepmind.google/technologies/alphafold/#:~:text=Accelerating%20scientific%20discovery. Accessed 17 Aug. 2024.

Jumper, J. , et al. “Overview of the Architecture - E-Learning@VIB.” E-Learning@VIB, 14 Dec. 2021,

https://elearning.vib.be/courses/alphafold/lessons/the-alphafold-pipeline/topic/overview-of-the-architecture/#:~:text=Developed%20at%20DeepMind%2C%20the%20AlphaFold. Accessed 17 Aug. 2024.

SCISPACE. “What Are Some Limitations of AlphaFold?” Typeset.io, 2022, 

https://typeset.io/questions/what-are-some-limitations-of-alphafold-47f2qivh44.

Váradi, Mihály, et al. “AlphaFold Protein Structure Database in 2024: Providing Structure Coverage for over 214 Million Protein Sequences.”

Nucleic Acids Research, vol. 52, 2 Nov. 2023, 

https://doi.org/10.1093/nar/gkad1011. Accessed 8 Dec. 2023.