ForeverMissed
Large image
His Life

Summary of Yu's Research Achievements

October 26, 2022
Yu’s primary research interest was in computational biology, with contributions spanning 5 major areas. These are (1), genome rearrangements, on which Yu developed models, theory, and algorithms to estimate the evolutionary distances and to solve various edit distance problems abstracted from whole-genome comparison; (2), phylogeny reconstruction, on which Yu designed algorithms and tools to infer the phylogenetic trees using genomic, transcriptomic, and epigenomic data, with applications ranging from ancestral genome reconstruction, cell differentiation, to tumor phylogeny; (3), genome/haplotype assembly, for which Yu developed key theory, designed novel data structures and efficient algorithms that ultimately enables accurate assemblies of genomes/haplotypes using error-prone long-reads data; (4), metagenomics, for which Yu developed a suite of tools for binning contigs/reads and assembling microbial genomes/haplotypes based on novel assembly graph-based algorithms and machine learning approaches that allows for modeling both converge and connectivity information; (5), computational mass spectrometry, for which Yu developed new probabilistic event models and deep neural networks for identifying peptides from mass spectrometry data and for predicting spectral libraries in the quantitative proteomic analysis; Yu also introduced the bootstrapping and jackknifing to the statistical assessment of bacteria identification by MALDI-TOF MS, achieving more accurate bacterial typing and bacterial mixture analysis. In addition to computational biology, Yu worked on theoretical computer science, data mining, and machine learning. For example, Yu proved hardness and designed randomized approximation algorithms for the two variants of the biclustering problem; Yu also collaborated on developing algorithmic solutions for shortest-path related problems on graphs with billions of vertices. 

Yu’s work was algorithmically innovative, led to paradigm shifts in methodology, and had already been widely used in biological, biomedical, and biochemical research. To name a few, Yu generalized the concept of the de Bruijn graph (a fundamental data structure in sequence analysis, particularly, in genome assembly) by introducing the manifold de Bruijn graphs that overcame a long-standing dilemma of constructing a de Bruijn graph with variable (rather than fixed) k-mer sizes. This theoretical breakthrough led to ABruijn and Flye algorithms that transformed the area of long-read genome assembly and were utilized in over 1000 laboratories. Yu proposed a new matching-based distance metric for (phylogenetic) trees and gave a fast polynomial-time algorithm to calculate such distance. The new metric was much more robust than existing, commonly used Robinson-Foulds distance, and became the new standard in phylogenetics. The series of tools and software Yu developed for metagenomic binning, including MetaCoAG, MetaBCC-LR, GraphBin, GraphBin2, RepBin, and ConDiGA, have been downloaded thousands time, and already been widely used by the community in analyzing large-scale metagenomic data and annotating microbial and viral taxonomy.

The full list of Yu’s papers with full text can be found here. Please find more information about Yu’s research via his group website (https://cgg-anu.github.io) and his personal webpage (http://users.cecs.anu.edu.au/~u1024708).