Unraveling the hidden universe of small proteins in bacterial genomes. Issue 2 (22nd February 2019)
- Record Type:
- Journal Article
- Title:
- Unraveling the hidden universe of small proteins in bacterial genomes. Issue 2 (22nd February 2019)
- Main Title:
- Unraveling the hidden universe of small proteins in bacterial genomes
- Authors:
- Miravet‐Verde, Samuel
Ferrar, Tony
Espadas‐García, Guadalupe
Mazzolini, Rocco
Gharrab, Anas
Sabido, Eduard
Serrano, Luis
Lluch‐Senar, Maria - Abstract:
- Abstract: Identification of small open reading frames (smORFs) encoding small proteins (≤ 100 amino acids; SEPs) is a challenge in the fields of genome annotation and protein discovery. Here, by combining a novel bioinformatics tool (RanSEPs) with "‐omics" approaches, we were able to describe 109 bacterial small ORFomes. Predictions were first validated by performing an exhaustive search of SEPs present in Mycoplasma pneumoniae proteome via mass spectrometry, which illustrated the limitations of shotgun approaches. Then, RanSEPs predictions were validated and compared with other tools using proteomic datasets from different bacterial species and SEPs from the literature. We found that up to 16 ± 9% of proteins in an organism could be classified as SEPs. Integration of RanSEPs predictions with transcriptomics data showed that some annotated non‐coding RNAs could in fact encode for SEPs. A functional study of SEPs highlighted an enrichment in the membrane, translation, metabolism, and nucleotide‐binding categories. Additionally, 9.7% of the SEPs included a N‐terminus predicted signal peptide. We envision RanSEPs as a tool to unmask the hidden universe of small bacterial proteins. Synopsis: RanSEPs is a random forest‐based computational approach capable of predicting small encoded proteins in a species‐specific context. Running this tool in 109 bacterial genomes indicated that up to 16 ± 9.5% of the proteins in a genome could be SEPs. Integration of transcriptomics andAbstract: Identification of small open reading frames (smORFs) encoding small proteins (≤ 100 amino acids; SEPs) is a challenge in the fields of genome annotation and protein discovery. Here, by combining a novel bioinformatics tool (RanSEPs) with "‐omics" approaches, we were able to describe 109 bacterial small ORFomes. Predictions were first validated by performing an exhaustive search of SEPs present in Mycoplasma pneumoniae proteome via mass spectrometry, which illustrated the limitations of shotgun approaches. Then, RanSEPs predictions were validated and compared with other tools using proteomic datasets from different bacterial species and SEPs from the literature. We found that up to 16 ± 9% of proteins in an organism could be classified as SEPs. Integration of RanSEPs predictions with transcriptomics data showed that some annotated non‐coding RNAs could in fact encode for SEPs. A functional study of SEPs highlighted an enrichment in the membrane, translation, metabolism, and nucleotide‐binding categories. Additionally, 9.7% of the SEPs included a N‐terminus predicted signal peptide. We envision RanSEPs as a tool to unmask the hidden universe of small bacterial proteins. Synopsis: RanSEPs is a random forest‐based computational approach capable of predicting small encoded proteins in a species‐specific context. Running this tool in 109 bacterial genomes indicated that up to 16 ± 9.5% of the proteins in a genome could be SEPs. Integration of transcriptomics and proteomics from 12 bacterial species showed that high‐throughput experimental characterization of small proteins (SEPs) presents multiple limitations and false positive detections. RanSEPs is a computational approach that assigns coding potential scores to SEP candidates in a species‐specific manner based on sequence features. After running RanSEPs in 109 bacterial genomes, we determined that between 6 and 25% of the proteins of a bacterial genome could be SEPs. Function prediction of RanSEPs‐predicted SEPs revealed an enrichment in translation, metabolism and nucleotide‐binding proteins. Abstract : RanSEPs is a random forest‐based computational approach capable of predicting small encoded proteins in a species‐specific context. Running this tool in 109 bacterial genomes indicated that up to 16 ± 9.5% of the proteins in a genome could be SEPs. … (more)
- Is Part Of:
- Molecular systems biology. Volume 15:Issue 2(2019)
- Journal:
- Molecular systems biology
- Issue:
- Volume 15:Issue 2(2019)
- Issue Display:
- Volume 15, Issue 2 (2019)
- Year:
- 2019
- Volume:
- 15
- Issue:
- 2
- Issue Sort Value:
- 2019-0015-0002-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2019-02-22
- Subjects:
- mass spectroscopy -- mycoplasmas -- protein prediction -- random forest classifier -- small proteins
Molecular biology -- Periodicals
Systems biology -- Periodicals
572.8 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1744-4292 ↗
http://www.nature.com/msb/index.html ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.15252/msb.20188290 ↗
- Languages:
- English
- ISSNs:
- 1744-4292
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 5900.856300
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21976.xml