A vast pool of lineage-specific microproteins encoded by long non-coding RNAs in plants

Small open reading frames (smORF), sequences less than 100 codons, were previously considered as non-functional or junk DNA. More recently, studies have shown that some smORFs that are located on long noncoding RNAs (lncRNAs) encode functional microproteins that play important roles in different cellular processes. This study investigated the functions and evolution of smORFs present in lncRNAs in the moss, Physcomitrium patens. The researchers analysed the smORFs conservation across plant taxa using the lncRNAs from P. patens as a reference set. They identified thousands of evolutionarily conserved smORFs, however they observed a rapid decline in conservation of smORFs at the transition from mosses to other plant linages. In contrast, annotated proteins were more conserved than smORFs across these lineages. Comparative analysis of nucleotide pairwise alignments between P. patens and Physcomitrium sp. showed that the distribution of evolutionary rates between lncRNAs and mRNAS were markedly different, with mRNAs evolving significantly slower. Analysing smORFs in lncRNAs with overlapping conserved regions P. patens and Physcomitrium sp. revealed that the large proportion of highly conserved smORFs were found in only a small number of moss species, suggesting that a percentage of the smORFs encode lineage and/or species-specific peptides/microproteins. The authors propose that the existence of a group of smORFs are maintained by selection in species and lineage specific organisms. Using nanopore RNA sequencing, they showed that the transcriptional level of conserved smORFs is higher than non-conserved smORFs. These studies were followed by mass spectrometry to identify 195 translated smORFs, 82 of which were novel species-specific sequences. They went on to investigate a smORF encoded peptide PSEP3 and showed that overexpression resulted in considerable changes in the moss proteome and increased cell death. Taken together, smORFs are diverse and appear to have an important role in the plant proteome. Future studies may reveal exciting functionalities of these short open reading frames. 

How was PEAKS used?

For analysis of smORF translation, 5 peptidomic sets were analysed. The tandem mass spectra from the peptidomic samples were searched individually with PEAKS Studio 8.0 against a custom database containing 32 926 proteins from annotated genes of the moss genome, 85 chloroplast proteins, 42 moss mitochondrial proteins and predicted smORF peptides. The search parameters were set with a fragmentation mass tolerance of 0.05 Da and a parent ion tolerance of 10 ppm. The results were filtered by a 1% false discovery rate (FDR) and with a significance threshold of 20. For quantification analysis, protein was extracted and labelled from wild type and mutant lines with PSEP3 knockout or overexpression. The raw data files from LC-MS/MS were analysed by PEAKS Studio 8.0. A database search was performed with a custom database built from Phytozome proteomic database combined with chloroplast and mitochondrial proteins with a fragmentation mass tolerance of 0.05 Da, a parent ion tolerance of 10 ppm, carbamidomethylation as a fixed modification, and oxidation (M) and deamidation (NQ) as variable modifications. The results were filtered by a 1% FDR. PEAKS Q was used for iTRAQ quantification and normalisation was performed using the average abundance across all peptides. Differentially expressed proteins were considered if their fold change was greater than 1.2 and above a significance threshold of 20. 

Fesenko I, Shabalina SA, Mamaeva A, Knyazev A, Glushkevich A, Lyapina I, Ziganshin R, Kovalchuk S, Kharlampieva D, Lazarev V, Taliansky M, Koonin EV. A vast pool of lineage-specific microproteins encoded by long non-coding RNAs in plants. Nucleic Acids Res. 2021 Oct 11;49(18):10328-10346. doi:10.1093/nar/gkab816. PMID: 34570232; PMCID: PMC8501992.

Abstract

Pervasive transcription of eukaryotic genomes results in expression of long non-coding RNAs (lncRNAs) most of which are poorly conserved in evolution and appear to be non-functional. However, some lncRNAs have been shown to perform specific functions, in particular, transcription regulation. Thousands of small open reading frames (smORFs, <100 codons) located on lncRNAs potentially might be translated into peptides or microproteins. We report a comprehensive analysis of the conservation and evolutionary trajectories of lncRNAs-smORFs from the moss Physcomitrium patens across transcriptomes of 479 plant species. Although thousands of smORFs are subject to substantial purifying selection, the majority of the smORFs appear to be evolutionary young and could represent a major pool for functional innovation. Using nanopore RNA sequencing, we show that, on average, the transcriptional level of conserved smORFs is higher than that of non-conserved smORFs. Proteomic analysis confirmed translation of 82 novel species-specific smORFs. Numerous conserved smORFs containing low complexity regions (LCRs) or transmembrane domains were identified, the biological functions of a selected LCR-smORF were demonstrated experimentally. Thus, microproteins encoded by smORFs are a major, functionally diverse component of the plant proteome.