AI-Generated Glowing Protein Code May Have Taken 500 Million Years To Evolve Naturally

AI-Generated Glowing Protein Code May Have Taken 500 Million Years To Evolve Naturally



It’s thought that proteins first appeared on Earth around 3.7 billion years ago, and since then, nature has forged them into the molecules that exist today. But what if there was a way we could artificially mimic that process – only much, much faster? 

That’s exactly what a group of researchers from the company EvolutionaryScale claim to have done with the power of artificial intelligence (AI), generating the code for a brand-new fluorescent protein to boot.

Proteins are formed from long strings of amino acids. The technical term for this is a sequence, and differences in said sequences determine the eventual structure and function of the protein.

The researchers write in their paper that “[a] consensus is developing that underlying these sequences is a fundamental language of protein biology that can be understood using language models.” If that were the case, then it could be possible to generate sequences for brand-new proteins, potentially wildly different in structure and function from the ones that already exist.

Their attempt at understanding this language is ESM3, a multimodal generative language model. In plainer terms, it’s a type of generative AI – like OpenAI’s various GPTs – but instead of prompting it to write your homework like with ChatGPT, this model spits out the code for a protein.

It’s been trained on 771 billion unique tokens – the AI term for a unit of data – taken from databases of natural protein sequences and structures, as well as some generated synthetic sequences. In total, this data contained 3.15 billion protein sequences, 236 million protein structures, and 539 million proteins with function annotations. 

The next step was to see if it could generate a brand-new protein sequence. In this case, the team asked the model to generate new fluorescent proteins, prompting it with an incomplete recipe and the task of filling in the gaps.

And it did it, generating the sequence and structure for a previously unknown variant of green fluorescent protein (GFP) – which is frequently used in cell and molecular biology research – dubbed esmGFP.

According to EvolutionaryScale, this new protein “is a vast evolutionary departure from natural fluorescent proteins,” sharing just 53 percent similarity in sequence compared to the closest naturally existing protein, eqFP578, found in the bubble-tip anemone. The research team claims in their paper that this divergence is “to a degree equivalent to simulating over 500 million years of evolution.”

Not everybody was so sure, however – professor of Microbial Ecology and Evolution at the University of Bath Tiffany Taylor, who wasn’t involved in the study, wrote in Live Science in 2024 (when the study was still a preprint) that “AI-driven protein engineering is intriguing, but I can’t help feeling we might be overly confident in assuming we can outsmart the intricate processes honed by millions of years of natural selection.”

Nevertheless, as Taylor said, it’s an interesting concept – but what exactly would it be useful for? EvolutionaryScale’s website says its model is “a tool for scientists to imagine proteins to capture carbon […] enzymes that break down plastic [and] new medicines.”

Still, there’s no guarantee that this will eventually translate into reality. For now, the newly discovered protein remains “generated” in the AI sense only.

The study is published in the journal Science.



Source link

Share:

Facebook
Twitter
Pinterest
LinkedIn

Leave a Reply

Your email address will not be published. Required fields are marked *

Most Popular

Social Media

Get The Latest Updates

Subscribe To Our Weekly Newsletter

No spam, notifications only about new products, updates.

Categories