Our lab is interested in developing general machine learning models and algorithms for integrative analysis of large-scale genomic data to understand the molecular characteristics of biological functions and phenotypes. We design mathematically principled methods in the categories of graph-based semi-supervised learning, transfer learning, string kernels and other kernel methods, sequence alignment methods and various statistical models for a unified analysis of heterogeneous biological data. Our current projects center around the following topics,
- Cancer genomics: Development of graph-based learning algorithms, sequence alignment algorithms and association rule-mining algorithms for building predictive models and mining biomarkers of cancer phenotypes from microarray or sequencing transcriptome data, DNA copy number variations, SNPs and protein-protein interactions.
- Phenome-genome association analysis: Development of graph-based learning algorithms for analyzing disease and gene associations in a network context.
- Protein remote homology detection: Development of string kernel algorithms and label propagation algorithms to infer the protein remote homologys and study their protein structures and functions.
(2008): Robust and efficient identification of biomarkers by classifying features on graphs . In: Bioinformatics, 24 (18), pp. 2023–2029, 2008, ISBN: 1460-2059.
- Semi-supervised and transfer learning algorithms: Development of general and scalable graph-based learning, transfer learning, sparse group learning and kernel learning method.