TetraRNADB

A database include the landscape of human tetra-classification RNAs


RNAs are usually divided into protein-coding RNAs and non-coding RNAs based on their protein coding potential, and believed to function as either protein or RNA itself. However, Increasing researches suggested that the function for RNA maybe not unitary, but could be bilateral. These RNAs with both coding and non-coding functions are called cncRNAs or bi-functional RNAs [1-5].

Here we extract the Inherent nucleotide sequence features, Inherent peptide features, structural features and expert features from known human mRNAs, lncRNAs and two kind of bi-functional RNAs t o build a tetra-classification model via random forest algorithm (TetraRNA model), and finally get the landscape of human tetra-classification RNAs.

pipeline

According to our TetraRNA model, human RNAs are divided into four categories: protein-coding RNAs (mRNAs), long noncoding RNAs (lncRNAs), bi-functional RNAs in which non-coding is primary and coding is accessorial (translated non-coding RNAs, tr-ncRNAs), and bi-functional RNAs in which coding is primary and non-coding is accessorial (untranslated mRNAs, untr-mRNAs). We finally have 83729 mRNAs, 33425 lncRNAs, 23235 tr-ncRNAs and 951 untr-mRNAs.

TetraRNA database provide the tetra-classification probabilities for human RNAs based on our model. This database displays the original types and new tetra types of human RNAs and also shows the primary analyses of human four classification RNAs.

References


[1] Huang, Y. et al. cncRNAdb: a manually curated resource of experimentally supported RNAs with both protein-coding and noncoding function. Nucleic Acids Res 49, D65–D70 (2021).

[2] Li, J. & Liu, C. Coding or Noncoding, the Converging Concepts of RNAs. Front Genet 10, 496 (2019).

[3] Hubé, F. & Francastel, C. Coding and Non-coding RNAs, the Frontier Has Never Been So Blurred. Front Genet 9, 140 (2018).

[4] Kumari, P. & Sampath, K. cncRNAs: Bi-functional RNAs with protein coding and non-coding functions. Semin Cell Dev Biol 47–48, 40–51 (2015).

[5] Ulveling, D., Francastel, C. & Hubé, F. Identification of potentially new bifunctional RNA based on genome-wide data-mining of alternative splicing events. Biochimie 93, 2024–2027 (2011).

How to cite


TetraRNA, a Tetra-class Machine Learning Model for Deciphering the Coding Potential Derivation of RNA World.