What's different about AI technology developed in-house at KAIST?
AI

Beyond LLMs: The Future of AI-Driven Protein Design

Big Tech's AI race is shifting from LLMs to protein design. Google DeepMind and OpenAI are taking different approaches, sparking anticipation for breakthroughs in life sciences and drug development.
준혁
Junhyeok Jeon, AI Researcher
2025.02.057min read
Beyond LLMs_ The Future of AI-Driven Protein Design.png

From LLMs to Protein Design AI: The Expanding AI War Among Big Tech

In recent years, Big Tech companies have been competing in artificial intelligence, primarily focusing on developing large language models (LLMs). OpenAI’s ChatGPT, Google’s Gemini, and Meta’s LLaMA are among the most notable examples, with several other companies developing commercially viable LLMs. Meanwhile, China has also made waves in the global AI landscape with the announcement of DeepSeek, which has generated significant attention.

The 2024 Nobel Prize in Chemistry.

Google DeepMind has been the dominant force in AI research related to protein structures. In 2020, DeepMind revolutionized protein structure prediction with the introduction of AlphaFold, which enabled rapid and accurate 3D structure prediction of proteins. This groundbreaking technology earned the 2024 Nobel Prize in Chemistry.

Now, OpenAI has officially challenged DeepMind’s dominance. In January 2025, OpenAI announced the development of GPT-4b micro, an AI model specialized in protein design. Meanwhile, in September 2024, Google DeepMind had already introduced AlphaProteo, its protein design AI. The AI war that once revolved around LLMs is now expanding into the domain of protein design AI.

AlphaFold: Revolutionizing Protein Structure Prediction

Google DeepMind AlphaFold

AlphaFold, now widely recognized, is an AI system that predicts the 3D structure of proteins based on their amino acid sequences. AlphaFold has demonstrated reliable accuracy while significantly accelerating protein structure prediction compared to traditional methods such as X-ray crystallography and NMR spectroscopy. Moreover, DeepMind has publicly released millions of AlphaFold-predicted protein structures for free, bringing about a breakthrough in biological research.

However, AlphaFold is primarily designed to predict the folding (structure) of existing proteins rather than creating new ones. If the goal is to design novel protein sequences with specific functions, a different model is required.

  • You can find more technical details about AlphaFold here.

https://hyperlab.hits.ai/en/blog/google-deepmind-alphafoldlatest

  • Also, you can read to find out why AlphaFold is so powerful.

https://hyperlab.hits.ai/en/blog/MSA_eng

LLM to Protein Design AI, AlphaProteo: Designing Proteins That Bind to Target Proteins

To tackle this challenge, Google DeepMind has developed a new AI model for protein design called AlphaProteo [1, 2]. AlphaProteo is an AI system that designs new proteins capable of binding to specific target proteins. When a user specifies a target protein, AlphaProteo generates a protein sequence that can effectively bind to it.

For example, suppose that the VEGF-A (Vascular Endothelial Growth Factor A) protein is overexpressed in cancer cells. By providing information about VEGF-A, AlphaProteo can design a novel protein sequence that binds tightly to it. Once these two proteins interact, VEGF-A may lose its original function, making the newly generated protein a potential anticancer drug candidate.

According to the latest results, AlphaProteo has proposed effective candidate molecules for eight target proteins. Among these, for seven targets—excluding TNF-alpha—AlphaProteo has demonstrated superior performance compared to conventional protein design methods.

AlphaProteo, which has experimentally demonstrated superior performance [1].

LLM to Protein Design AI, GPT-4b micro: Redesigning Proteins with Specific Functions

As a leader in the LLM market, OpenAI has entered the biological AI race by partnering with the biotech startup Retro Biosciences to develop GPT-4b micro, an AI model for protein design [3]. Unlike AlphaProteo, which focuses on designing new proteins that bind to specific target proteins, GPT-4b micro aims to create novel proteins with specific biological functions.

OpenAI demonstrated the potential of GPT-4b micro by designing improved Yamanaka Factors. Here’s a brief explanation of Yamanaka Factors:

Discovered in 2006 by Dr. Shinya Yamanaka, the Yamanaka Factors refer to four transcription factors (Oct4, Sox2, Klf4, and c-Myc) that can reprogram somatic cells into induced pluripotent stem cells (iPSCs). In other words, the expression of these four transcription factors enables fully differentiated adult cells to revert to an embryonic-like state. This discovery challenged the long-standing biological paradigm that differentiated cells could not be reversed, leading to rapid advancements in stem cell research and regenerative medicine. For this groundbreaking discovery, Dr. Shinya Yamanaka was honored with the 2012 Nobel Prize in Physiology or Medicine.

Dr. Shinya Yamanaka won the 2012 Nobel Prize in Physiology or Medicine for his discovery of Yamanaka factors.Caption

GPT-4b micro designs new proteins that function similarly to the four existing Yamanaka Factors but without side effects or with greater efficiency. OpenAI has stated that it achieved at least 50 times higher reprogramming efficiency than human-designed factors.

However, OpenAI has not yet released the official code or the

thesis on GPT-4b micro. While OpenAI has announced plans to publish its experimental results in a scientific paper, it has not provided a specific timeline for its release.

Comparison of AlphaFold, AlphaProteo, and GPT-4b micro

Feature AlphaFold AlphaProteo GPT-4b micro
Primary Goal Protein structure prediction Designing proteins that bind to target proteins Generating novel protein sequences with specific functions
Input Data Amino acid sequence Target protein sequence and preferred binding sites Protein function or information on other proteins with the same function (estimated)
Output 3D protein structure Designed amino acid sequence Designed amino acid sequence

LLM to Protein Design AI: New Opportunities in Life Sciences and Drug Development

If AlphaFold revolutionized biological research through protein structure prediction, AlphaProteo and GPT-4b micro are now opening a new chapter in life sciences and drug development with AI-driven protein design. Given the diverse functions of proteins within living organisms, the potential of protein design AI appears limitless.

  • Accelerating Drug Discovery: Designing protein-based therapeutics targeting specific diseases.
  • Advancing Synthetic Biology: Enhancing production efficiency by designing new proteins.
  • Solving Environmental Issues: Developing eco-friendly proteins, such as plastic-degrading enzymes.

AI is no longer just a tool for biological analysis; it is now a tool for creating biological innovations. The competition between Google DeepMind and OpenAI has only just begun. It is yet to be determined how far the race in protein design AI will push the boundaries of biological and pharmaceutical research, and future advancements in this field are highly anticipated.

This year, research and investment in drug development are set to accelerate further, playing a key role in addressing global health challenges. By utilizing Hyper Lab, researchers can efficiently identify early-stage drug candidates with the support of proprietary AI technology. Moreover, a free trial is available, offering valuable assistance to ongoing research.

Would you like to start a free trial of Hyper Lab or schedule a meeting to discuss its implementation? You can easily make a reservation through the links below:

Start your Free-trial: https://lrl.kr/u7hO

Schedule an Meeting: https://abit.ly/6tr1uz


Sources

[1] https://arxiv.org/abs/2409.08022

[2] https://deepmind.google/discover/blog/alphaproteo-generates-novel-proteins-for-biology-and-health-research/

[3] https://tecknexus.com/ai-meets-longevity-openai-retros-gpt-4b-micro/24/