Understanding what genes do is a central challenge in biology. While genome sequencing has surged over the last decades, a large fraction of protein-coding genes, especially in non-model organisms, remain functionally uncharacterized. These "dark" genes form part of what's often referred to as the functional dark proteome. Traditional methods, which rely on finding similar genes in better-studied species, often come up short when genes have diverged or lack known relatives.
In this study, the HFSP Research Grant team developed FANTASIA, a powerful AI-based tool that uses protein language models to predict gene functions without depending solely on sequence similarity. By embedding protein sequences in a language-like space, FANTASIA learns deep contextual relationships and can infer functions even when no close homologs exist. The model was applied to nearly 1,000 animal genomes and successfully annotated virtually all proteins, recovering function for virtually all genes that remained unannotated by standard approaches.

This breakthrough opens new doors for exploring the diversity of life, particularly in non-model taxa like invertebrates, where functional annotation has long lagged behind. FANTASIA offers a scalable, generalizable approach that could accelerate research in evolutionary biology, developmental genetics, and even conservation.
HFSP support played a key role in this interdisciplinary collaboration. By bridging computational biology, genomics, and evolutionary science in their expertise areas, the Research Grant team plans, as a next step, to expand FANTASIA to include new models, other lineages such as plants or protists, and to integrate it into pipelines for discovering novel therapeutic or biotechnological applications.