Decoding Evolution: Carnegie Mellon University’s Machine Learning Method Sheds Light on Fundamental Aspects

TL;DR:

  • Carnegie Mellon University researchers have developed innovative methods to identify crucial genome segments for understanding species evolution.
  • Their work, published in Science, contributes to the Zoonomia Project, which aims to sequence the genomes of 240 mammals.
  • Coding DNA instructs protein production, driving evolution, but accounts for only 1% of the human genome.
  • Noncoding DNA regions called enhancers control gene activity and were studied using the Tissue-Aware Conservation Inference Toolkit (TACIT).
  • TACIT accurately predicts enhancer activity, aiding in the identification of important regions in newly sequenced genomes.
  • TACIT has potential applications in conservation biology for predicting enhancer function in endangered species.
  • Researchers found links between genomic regions associated with larger brains in mammals and genes implicated in human brain-size disorders.
  • An enhancer linked to social behavior was identified in a specific type of neuron.
  • The study highlights the vast potential of TACIT in uncovering new insights into mammalian evolution.
  • Lead authors include Irene Kaplow, Alyssa Lawler, and Daniel Schaffer, showcasing the innovative undergraduate program at Carnegie Mellon University.

Main AI News:

A groundbreaking research endeavor conducted by a team of esteemed scientists at Carnegie Mellon University’s Computational Biology Department (CBD) has yielded innovative methodologies aimed at identifying crucial segments of the genome that hold the key to comprehending the evolutionary traits of various species.

Published in the esteemed journal Science, this remarkable work spearheaded by Assistant Professor Andreas Pfenning from the School of Computer Science contributes significantly to the Zoonomia Project. The ambitious undertaking seeks to sequence the complete genomes of 240 mammals, with the overarching objective of unraveling fundamental aspects of genes and traits.

The knowledge gained from this project not only holds immense implications for safeguarding human health but also plays a pivotal role in preserving biodiversity. Analyzing and deciphering vast volumes of data necessitates cutting-edge artificial intelligence (AI) and machine learning (ML) technologies.

Coding DNA, a specific segment of the genome, plays a critical role in instructing the production of proteins—the indispensable regulators of cellular function. As time passes, slight variations emerge in the instructions provided by coding DNA for protein synthesis, ultimately becoming a prime driving force behind the process of evolution.

Surprisingly, these protein-coding DNA segments constitute a mere one percent of the colossal three billion nucleotide pairs comprising the human genome. Conversely, other regions of noncoding DNA, known as enhancers, govern the temporal and spatial activity of specific genes.

To gain deeper insights into the inner workings of these regions, the team at CMU developed an innovative machine-learning approach called the Tissue-Aware Conservation Inference Toolkit (TACIT). While traditional evolutionary models often attribute changes in brain size to mutations in a cluster of genes, enhancers can simply activate or deactivate genes to achieve the same outcome.

A predominant focus of mammalian evolution research lies in the examination of genomic regions that have undergone minimal changes over millions of years. These conserved regions, particularly genes, serve as valuable indicators of fundamental elements within mammalian DNA, shedding light on distinctive traits specific to individual species.

However, Assistant Professor Andreas Pfenning and his team face a formidable challenge in their pursuit. Over time, DNA enhancer regions may undergo sequence alterations while retaining their functionality. For instance, a well-studied enhancer known as the Islet enhancer exhibits similar gene regulation patterns across a diverse range of species, including humans, mice, zebrafish, and even sponges, despite more than 700 million years of evolutionary divergence. This inherent similarity makes the identification and tracking of enhancers using conventional methods, which focus on individual nucleotides, considerably more complex.

To overcome this challenge, the researchers developed an innovative solution known as the Tissue-Aware Conservation Inference Toolkit (TACIT). TACIT accurately predicts the activity of enhancers in specific cell types or tissues, allowing scientists to identify these crucial enhancer regions within newly sequenced genomes without the need for additional laboratory experiments.

This breakthrough technology holds significant potential applications in the field of conservation biology. TACIT enables predictions regarding the functionality of enhancers in endangered or threatened species, where conducting controlled laboratory experiments is often impractical or impossible.

Irene Kaplow, a postdoctoral associate and Lane Fellow in the Computational Biology Department (CBD), and a lead author on the research paper, expressed the profound implications of TACIT, stating, “TACIT provides an unprecedented opportunity to predict the function of parts of the genome outside of genes in species for which we cannot get primary tissue samples, such as the bottlenose dolphin and the critically endangered black rhinoceros. As machine learning methods and techniques for identifying enhancers from specific cell types continue to advance, I anticipate that we will be able to broaden the scope of TACIT, enabling novel insights into mammalian evolution.”

Building upon their prediction of genomic function across the 240 mammalian species, the research team leveraged TACIT to identify genomic regions that have evolved in correlation with larger brain sizes in mammals. Remarkably, these regions were found to be in proximity to genes whose mutations have been linked to brain-size disorders in humans. Additionally, the team identified an enhancer associated with social behavior across mammalian species, specifically within a particular subtype of neuron known as parvalbumin-positive inhibitory interneuron.

This is just the beginning of a vast landscape of discovery,” remarked senior author Andreas Pfenning, emphasizing the limitless potential of TACIT. “While we have already uncovered fascinating relationships by applying TACIT to a select number of tissues and traits, there is an abundance of knowledge yet to be uncovered.”

In addition to Pfenning and Kaplow, the paper’s lead authors include Alyssa Lawler, a former Ph.D. student in biological sciences who is now affiliated with the Broad Institute, and Daniel Schaffer, a recent graduate of the CBD’s undergraduate program. Schaffer’s co-first authorship on this publication exemplifies the innovative curriculum of the undergraduate program, which focuses on state-of-the-art computational techniques and fosters hands-on scientific research opportunities.

Conlcusion:

The groundbreaking research conducted by Carnegie Mellon University’s Computational Biology Department, focusing on identifying crucial genome segments and utilizing innovative machine learning methods, holds significant implications for the market. The ability to accurately predict enhancer activity and uncover important regions within the genome opens up new possibilities in fields such as conservation biology and human health protection.

This research has the potential to drive advancements in pharmaceuticals, personalized medicine, and genetic therapies. Furthermore, the insights gained from understanding the evolutionary traits of species and their genetic makeup can inform various industries, such as agriculture, biotechnology, and environmental conservation.

The Tissue-Aware Conservation Inference Toolkit (TACIT) represents a cutting-edge solution that can revolutionize our understanding of genomics and shape future market opportunities. As this field continues to evolve, businesses and organizations should closely monitor and leverage these developments to stay at the forefront of innovation and capitalize on the potential economic impact.

Source