
San Francisco – Anthrogen (YC S24) has announced the launch of Odyssey, a protein language model boasting over 102 billion parameters, making it the largest and most performant of its kind. Developed by a core team of just six individuals, the model aims to revolutionize protein generation and editing for scientific and research applications. Ankit Singhal, co-founder at Anthrogen, stated in a recent social media post, "Introducing Odyssey—the largest and most performant protein language model ever created."
Odyssey is designed to empower scientists and researchers to "generate and edit proteins, the workhorses of all life on this planet, towards specific functional ends," as highlighted by Singhal. This capability is critical for advancements in areas such as drug discovery, enzyme engineering, and synthetic biology. The model's significant scale and claimed performance position it as a powerful tool in the rapidly evolving field of AI-driven biotechnology.
A key innovation behind Odyssey's efficiency is its novel "Consensus" mechanism, which replaces traditional self-attention. This approach allows nearby protein regions to reach agreement before propagating information, mirroring how structural changes ripple through a protein. Furthermore, the model utilizes a discrete diffusion learning objective for training, which Anthrogen claims is more robust and data-efficient than standard masked language model (MLM) objectives.
Anthrogen claims to have achieved this milestone with "an order of magnitude less funding than our next largest competitor," according to Singhal. This lean development strategy challenges the notion that massive financial investment is always required for breakthroughs in large-scale AI models. The protein language model landscape includes other significant players, such as xTrimoPGLM, which also scaled to 100 billion parameters, and Meta's ESM series, underscoring a competitive environment focused on model scale and efficiency.
The potential impact of Odyssey lies in its ability to accelerate the design and understanding of proteins, which are fundamental to biological processes. By enabling the precise generation and modification of proteins, the model could significantly shorten development cycles for new therapeutics and industrial enzymes. Anthrogen's achievement with a small team and innovative architecture marks a notable development in the application of large language models to biological challenges.