Applied Mathematics Colloquium: Supervised Learning for Analysing Large-Scale Genome-Wide DNA Polymorphism Data



Supervised learning for analysing large-scale genome-wide DNA polymorphism data


Supervised learning has been extensively applied in many fields; Alpha-GO and autopilot might be two of the most well-known cases. However, its application in population and evolutionary genetics is still in childhood. Recently, we introduced the boosting, a supervised learning approach, to identify positive Darwinian selection in natural populations and estimate recombination rate along the human genome. We further analysed the genome-wide DNA polymorphism data from nearly 10,000 human individuals (UK10K) and obtained a fine-scale genetic map for humans. These results indicate that supervised learning approaches, together with deep learning and reinforced learning, could play essential roles when analysing large-scale genome-wide DNA polymorphism data.


Professor Haipeng Li
Laboratory of Evolutionary Genomics
CAS-MPG Partner Institute for Computational Biology
Chinese Academy of Sciences

Professor Li is a world-leading researcher, currently working in the laboratory of Evolutionary Genomics CAS-MPG (Chinese Academy of Sciences-Max Plank Gesellschaft) Partner Institute for Computational Biology of Chinese Academy of Sciences in Shanghai. His expertise lies in bioinformatics and phylogenetic, population genetics, computational genomics, large scale date analysis, human/Drosophila genetics and evolution, the evolution of regulatory elements, molecular evolution, genetic association study on complex deceases, etc. He enjoys and enjoyed multiple external grants (1-2 million RMB per year) from NSFC, including a Strategic Priority Research Program of the Chinese Academy of Sciences, 973 project.