the bioinformatics chat

#32 Deep tensor factorization and a pitfall for machine learning methods with Jacob Schreiber


Listen Later

In this episode, we hear from Jacob Schreiber about his algorithm,

Avocado.

Avocado uses deep tensor factorization to break a three-dimensional tensor of

epigenomic data into three orthogonal dimensions corresponding to cell types,
assay types, and genomic loci. Avocado can extract a low-dimensional,
information-rich latent representation from the wealth of experimental data
from projects like the Roadmap Epigenomics Consortium and ENCODE. This
representation allows you to impute genome-wide epigenomics experiments that
have not yet been performed.

Jacob also talks about a pitfall he discovered when trying to predict gene

expression from a mix of genomic and epigenomic data. As you increase the
complexity of a machine learning model, its performance may be increasing for
the wrong reason: instead of learning something biologically interesting, your
model may simply be memorizing the average gene expression for that gene
across your training cell types using the nucleotide sequence.

Links:

  • Avocado on GitHub
  • Multi-scale deep tensor factorization learns a latent representation of the human epigenome (Jacob Schreiber, Timothy Durham, Jeffrey Bilmes, William Stafford Noble)
  • Completing the ENCODE3 compendium yields accurate imputations across a variety of assays and human biosamples (Jacob Schreiber, Jeffrey Bilmes, William Noble)
  • A pitfall for machine learning methods aiming to predict across cell types (Jacob Schreiber, Ritambhara Singh, Jeffrey Bilmes, William Stafford Noble)
  • If you enjoyed this episode, please consider supporting the podcast on Patreon.

    ...more
    View all episodesView all episodes
    Download on the App Store

    the bioinformatics chatBy Roman Cheplyaka

    • 4.7
    • 4.7
    • 4.7
    • 4.7
    • 4.7

    4.7

    35 ratings