Research interest
Model organism oriented molecular biology by its nature reminds us a fact that we are all decedents of a common ancestor as described by Darwin. Thus, we can understand human biology more by studying model species. Such a way of practice often leads us to believe that biology is all about conservation. However, Darwin’s original emphasis is to provide a theory (i.e., natural selection) on how species diverges between each other. Nowadays, numerous comparative analyses in genomic era demonstrated that species can evolve by various mechanisms, such as protein substitutions, regulatory changes and gene gain. The scientific community invests extensive efforts or protein evolution and regulatory changes. By contrast, gene gain or gene origination is less touched although H. J. Muller speculated its importance as early as one century before.
In 1993, Manyuan Long characterized the first new gene (Jingwei) in Drosophila . In the subsequent decade, the field is biased to new genes generated by retroposition, which only represents a small portion of all retrogenes. I developed a novel pipeline to analyze the syntenic genomic alignment and assigned evolutionary ages to majority (>90%) of annotated genes in both Drosophila and mammals. This unparalleled dataset of new genes enabled a large scale functional study which leads to an unexpected discovery that development frequently recruits new genes.
In the short term, there are at least two types of questions waiting for us, which are never fully addressed previously. First, we found that an excess of new genes are transcribed in developing brain in human lineage compared to mouse. Does this pattern still hold if we compared human with other primates? How about other fast evolving brains such as that of song bird? Secondly, between-species sequence analyses often suggest that adaptive selection drives the evolution of new genes. However, we never know how new genes initially fixed in the population. The quickly accumulating resequencing data in human provided us an opportunity to tackle this question. The answer may significantly help us to understand how different human ethnic groups differentiate between each other.
In the long term, we hope to trace how a new gene emerged mechanistically, how it get fixed in the population, what function it often plays and why, and how and why some gene gets lost from the genome. In brief, we would like to have a complete picture of the life history of genes. In this research direction, we not only build up concepts governing gene gain and loss, but understand how gene gain and loss contributes to species or population level difference.
The development of next and next to next generation sequencing technique generates all kinds of omics data, including genomes, transcriptomes and regulatomes between and within species (especially for human). Other high-throughput functional genomic data, such as proteomic data, are also rapidly accumulating. We never had such a rich and diverse dataset to perform our computational genomic analysis previously. In this prospective, as computational biologists, we do live in a golden age.