Phylogenetic Manifold Regularization: a semi-supervised approach to predict transcription factor binding sites

Faizy Ahsan, Alexandre Drouin, François Laviolette, Doina Precup, Mathieu Blanchette

April 2020

Abstract

The computational prediction of transcription factor binding sites remains a challenging problems in bioinformatics, despite significant methodological developments from the field of machine learning. Such computational models are essential to help interpret the non-coding portion of human genomes, and to learn more about the regulatory mechanisms controlling gene expression. In parallel, massive genome sequencing efforts have produced assembled genomes for hundred of vertebrate species, but this data is underused. We present PhyloReg, a new semi-supervised learning approach that can be used for a wide variety of sequence-to-function prediction problems, and that takes advantage of hundreds of millions of years of evolution to regularize predictors and improve accuracy. We demonstrate that PhyloReg can be used to better train a previously proposed deep learning model of transcription factor binding. Simulation studies further help delineate the benefits of the a pproach. G ains in prediction accuracy are obtained over a broad set of transcription factors and cell types.

Type

Conference paper

Publication

International Conference on Bioinformatics and Biomedicine (BIBM)

Alexandre Drouin

Head of AI Frontier Research

Head of AI Frontier Research at AI Frontier Research located at Montreal, QC, Canada.

Phylogenetic Manifold Regularization: a semi-supervised approach to predict transcription factor binding sites

Abstract

Alexandre Drouin

Head of AI Frontier Research​

Head of AI Frontier Research