ServiceNow Research

Phylogenetic Manifold Regularization: a semi-supervised approach to predict transcription factor binding sites


The computational prediction of transcription factor binding sites remains a challenging problems in bioinformatics, despite significant methodological developments from the field of machine learning. Such computational models are essential to help interpret the non-coding portion of human genomes, and to learn more about the regulatory mechanisms controlling gene expression. In parallel, massive genome sequencing efforts have produced assembled genomes for hundred of vertebrate species, but this data is underused. We present PhyloReg, a new semi-supervised learning approach that can be used for a wide variety of sequence-to-function prediction problems, and that takes advantage of hundreds of millions of years of evolution to regularize predictors and improve accuracy. We demonstrate that PhyloReg can be used to better train a previously proposed deep learning model of transcription factor binding. Simulation studies further help delineate the benefits of the a pproach. G ains in prediction accuracy are obtained over a broad set of transcription factors and cell types.

International Conference on Bioinformatics and Biomedicine (BIBM)
Alexandre Drouin
Alexandre Drouin
Research Lead

Research Lead at Human Decision Support located at Montreal, QC, Canada.