The expression of genes in our genome to produce proteins and non-coding RNAs, the building blocks of life, is critical to enable life and human biology. So, the ability to predict how much of a gene is expressed based on that gene’s regulatory DNA, or promoter sequence, would help us both understand gene expression, regulation, and evolution, and would also help us design new, synthetic genes for better cell therapies, gene therapies, and other genomic medicines in bioengineering.
However, the process by which gene transcription is regulated is incredibly complex; thus, prediction transcriptional regulation has been an open problem in the field for over half a century. In his work, Eeshit used neural networks to predict the levels of gene expression based on promoter sequences. Then, he reverse engineered the model to design specific sequences that can elicit desired expression levels. Eeshit’s work developing a sequence-to-expression oracle also provided a framework to model and test theories of gene evolution.