Arc Institute, in partnership with NVIDIA and top academic institutions, has introduced Evo 2, a revolutionary AI model for genome design that deciphers and writes the genetic code across all domains of life. Trained on over 9.3 trillion nucleotides from more than 128,000 genomes using the innovative StripedHyena 2 architecture, Evo 2 offers unprecedented predictive and generative capabilities in genomics, promising to transform drug discovery, agriculture, and biotechnology.

Article – Key Points
Introduction of Evo 2:
Announced on February 19, 2025, Evo 2 is an AI-driven genome foundation model that generalizes across DNA, RNA, and proteins (bacteria, plants, animals, and humans). Unlike its predecessor Evo 1 (which focused solely on single-cell prokaryotic genomes) Evo2 expands its training to include complex eukaryotic sequences and metagenomic data, dramatically broadening its scope in biological research.
Data-Driven Training:
Evo 2 was trained on a dataset comprising over 9.3 trillion nucleotides from more than 128,000 genomes spanning bacteria, archaea, and eukaryotes. With the capacity to process up to 1 million base pairs in a single context window and utilizing 40 billion model parameters, Evo2 captures long-range genomic interactions and intricate genetic patterns previously unattainable.
Technical Capabilities:
Built on NVIDIA’s DGX Cloud platform and powered by the novel StripedHyena 2 architecture, Evo 2 leverages a hybrid of convolution and attention mechanisms to achieve training speeds nearly three times faster than optimized transformer models. The model has demonstrated over 90% accuracy in predicting the impact of mutations in the BRCA1 gene and outperforms existing methods in evaluating noncoding mutations—a breakthrough for genomic prediction.
Research and Collaborative Development:
Evo 2 is the result of a multidisciplinary collaboration among Arc Institute, NVIDIA, and researchers from Stanford University, UC Berkeley, and UC San Francisco. The project embraces open science; model weights, training code, inference code, and the comprehensive OpenGenome2 dataset are openly available via Arc Institute’s GitHub and the NVIDIA BioNeMo framework. Tools such as Evo Designer and a mechanistic interpretability visualizer provide accessible interfaces for researchers to explore and build upon these insights.
Innovative Applications:
Evo 2’s versatile capabilities extend to diverse fields such as drug discovery, agriculture, industrial biotechnology, and material science. It not only predicts the functional effects of genetic mutations but also enables the design of synthetic genomes—facilitating the construction of virtual cells and advancing generative functional genomics. For example, Evo2 can aid in designing gene therapies by creating genetic elements with cell-type-specific activity, thereby minimizing side effects in targeted treatments.
Evaluation and Future Potential:
Comparative assessments indicate that Evo 2 significantly outperforms earlier models, particularly in predicting noncoding mutation effects. Its ability to handle long genomic sequences unlocks multi-scale analyses—from short regulatory elements to entire chromosomes—opening new avenues for therapeutic research and synthetic biology. The preprint available on Arc Institute’s website details these evaluations and underscores Evo2’s potential as a foundational tool for future research.
Industry Impact and Accessibility:
Described as an “operating system” for biology, Evo2 democratizes access to advanced genome design technology. Available globally on the NVIDIA BioNeMo platform—including as an NVIDIA NIM microservice for seamless deployment—Evo2 accelerates scientific discovery and empowers researchers worldwide. The project is supported by a robust research environment at Arc Institute, established in 2021 with $650 million in funding, which provides state-of-the-art lab space and long-term funding for its core investigators. NVIDIA further accelerated the project by providing access to 2,000 H100 GPUs via NVIDIA DGX Cloud on AWS, enabling rapid scaling and optimization of the model.
Why This Matters:
Arc Institute, in partnership with NVIDIA and top academic institutions, has introduced Evo 2, a revolutionary AI model for genome design that deciphers and writes the genetic code across all domains of life. Trained on over 9.3 trillion nucleotides from more than 128,000 genomes using the innovative StripedHyena 2 architecture, Evo 2 offers unprecedented predictive and generative capabilities in genomics, promising to transform drug discovery, agriculture, and biotechnology.
This curated collection features the most significant recent AI breakthroughs in biology, driving innovation in genetics and ecosystems research.
Read a comprehensive monthly roundup of the latest AI news!