Palo Alto, California – The Arc Institute has announced the release of "State," its inaugural virtual cell model designed to predict how various cell types, including stem, cancer, and immune cells, respond to drugs, cytokines, or genetic perturbations. Co-founder Patrick Hsu shared the development on social media, stating, "Today, Arc Institute releases State, our first perturbation prediction AI model and an important step towards our goal of a virtual cell." This advancement aims to accelerate drug discovery by simulating cellular responses with greater accuracy.
The "State" model is trained on an extensive dataset, encompassing observational data from nearly 170 million cells and perturbational data from over 100 million cells across 70 cell lines. This robust training allows "State" to significantly outperform existing computational approaches. During benchmarking, the model demonstrated a 50% improvement in distinguishing perturbation effects and achieved double the accuracy in identifying true differentially expressed genes.
"State" operates through two interconnected modules: the State Embedding (SE) model and the State Transition (ST) model. The SE model converts transcriptome data into a smooth multidimensional vector space, while the ST model, built on a bidirectional transformer architecture, predicts how cells will transition between states in response to specific perturbations. This architecture allows for flexible capture of biological and technical heterogeneity without relying on explicit distributional assumptions.
The Arc Institute emphasizes that traditional drug discovery faces a high failure rate, with approximately 90% of drugs failing clinical trials due to poor efficacy or unintended side effects. A highly predictive virtual cell model like "State" could help researchers identify new drug candidates more efficiently by simulating millions of in silico perturbations, thereby reducing off-target effects and boosting clinical success rates. The model is currently available for non-commercial use.
This release follows the Arc Institute's earlier success with Evo 2, an AI model trained on 9.3 trillion nucleotides from 100,000 species, which focuses on genetic code understanding. The development of "State" further solidifies the institute's commitment to leveraging AI to deepen the understanding of disease mechanisms and accelerate the translation of scientific discoveries into therapies. The institute has also launched the "Virtual Cell Challenge" to foster further innovation in this field.