Invited Speaker: Lu John Zhi
I will present an integrative, machine-learning method, incRNA, for whole-genome identification of non-coding RNAs (ncRNAs). It combines a large amount of expression data, epigenetics data, RNA secondary-structure stability, and evolutionary conservation at the protein and nucleic-acid level. Using this model, we were able to separate known ncRNAs from coding sequences and other genomic elements with high accuracy (>93% AUC on an independent validation set), and find thousands of novel ncRNA candidates in C. elegans and Arabidopsis.
In addition, we characterized the novel ncRNA candidates and found that they have distinct expression patterns across developmental stages, tend to use novel RNA structural families, and are targeted by specific transcription factors. Overall, our study identifies many new potential ncRNAs in different systems.