Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive. Issue 1 (December 2016)
- Record Type:
- Journal Article
- Title:
- Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive. Issue 1 (December 2016)
- Main Title:
- Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive
- Authors:
- Nellore, Abhinav
Jaffe, Andrew
Fortin, Jean-Philippe
Alquicira-Hernández, José
Collado-Torres, Leonardo
Wang, Siruo
Phillips III, Robert
Karbhari, Nishika
Hansen, Kasper
Langmead, Ben
Leek, Jeffrey - Abstract:
- Abstract Background Gene annotations, such as those in GENCODE, are derived primarily from alignments of spliced cDNA sequences and protein sequences. The impact of RNA-seq data on annotation has been confined to major projects like ENCODE and Illumina Body Map 2.0. Results We aligned 21, 504 Illumina-sequenced human RNA-seq samples from the Sequence Read Archive (SRA) to the human genome and compared detected exon-exon junctions with junctions in several recent gene annotations. We found 56, 861 junctions (18.6%) in at least 1000 samples that were not annotated, and their expression associated with tissue type. Junctions well expressed in individual samples tended to be annotated. Newer samples contributed few novel well-supported junctions, with the vast majority of detected junctions present in samples before 2013. We compiled junction data into a resource calledintropolis available athttp://intropolis.rail.bio . We used this resource to search for a recently validated isoform of the ALK gene and characterized the potential functional implications of unannotated junctions with publicly available TRAP-seq data. Conclusions Considering only the variation contained in annotation may suffice if an investigator is interested only in well-expressed transcript isoforms. However, genes that are not generally well expressed and nonetheless present in a small but significant number of samples in the SRA are likelier to be incompletely annotated. The rate at which evidence for novelAbstract Background Gene annotations, such as those in GENCODE, are derived primarily from alignments of spliced cDNA sequences and protein sequences. The impact of RNA-seq data on annotation has been confined to major projects like ENCODE and Illumina Body Map 2.0. Results We aligned 21, 504 Illumina-sequenced human RNA-seq samples from the Sequence Read Archive (SRA) to the human genome and compared detected exon-exon junctions with junctions in several recent gene annotations. We found 56, 861 junctions (18.6%) in at least 1000 samples that were not annotated, and their expression associated with tissue type. Junctions well expressed in individual samples tended to be annotated. Newer samples contributed few novel well-supported junctions, with the vast majority of detected junctions present in samples before 2013. We compiled junction data into a resource calledintropolis available athttp://intropolis.rail.bio . We used this resource to search for a recently validated isoform of the ALK gene and characterized the potential functional implications of unannotated junctions with publicly available TRAP-seq data. Conclusions Considering only the variation contained in annotation may suffice if an investigator is interested only in well-expressed transcript isoforms. However, genes that are not generally well expressed and nonetheless present in a small but significant number of samples in the SRA are likelier to be incompletely annotated. The rate at which evidence for novel junctions has been added to the SRA has tapered dramatically, even to the point of an asymptote. Now is perhaps an appropriate time to update incomplete annotations to include splicing present in the now-stable snapshot provided by the SRA. … (more)
- Is Part Of:
- Genome biology. Volume 17:Issue 1(2016)
- Journal:
- Genome biology
- Issue:
- Volume 17:Issue 1(2016)
- Issue Display:
- Volume 17, Issue 1 (2016)
- Year:
- 2016
- Volume:
- 17
- Issue:
- 1
- Issue Sort Value:
- 2016-0017-0001-0000
- Page Start:
- 1
- Page End:
- 14
- Publication Date:
- 2016-12
- Subjects:
- RNA-seq -- Splicing -- Intron
Genomes -- Periodicals
Biology -- Periodicals
Molecular biology -- Periodicals
572.8633 - Journal URLs:
- http://www.genomebiology.com ↗
http://link.springer.com/ ↗ - DOI:
- 10.1186/s13059-016-1118-6 ↗
- Languages:
- English
- ISSNs:
- 1474-760X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 9970.xml