Gene Model Annotations for Drosophila melanogaster: Impact of High-Throughput Data. Issue 8 (1st August 2015)
- Record Type:
- Journal Article
- Title:
- Gene Model Annotations for Drosophila melanogaster: Impact of High-Throughput Data. Issue 8 (1st August 2015)
- Main Title:
- Gene Model Annotations for Drosophila melanogaster: Impact of High-Throughput Data
- Authors:
- Matthews, Beverley B
dos Santos, Gilberto
Crosby, Madeline A
Emmert, David B
St. Pierre, Susan E
Gramates, L Sian
Zhou, Pinglei
Schroeder, Andrew J
Falls, Kathleen
Strelets, Victor
Russo, Susan M
Gelbart, William M - Abstract:
- Abstract: We report the current status of the FlyBase annotated gene set for Drosophila melanogaster and highlight improvements based on high-throughput data. The FlyBase annotated gene set consists entirely of manually annotated gene models, with the exception of some classes of small non-coding RNAs. All gene models have been reviewed using evidence from high-throughput datasets, primarily from the modENCODE project. These datasets include RNA-Seq coverage data, RNA-Seq junction data, transcription start site profiles, and translation stop-codon read-through predictions. New annotation guidelines were developed to take into account the use of the high-throughput data. We describe how this flood of new data was incorporated into thousands of new and revised annotations. FlyBase has adopted a philosophy of excluding low-confidence and low-frequency data from gene model annotations; we also do not attempt to represent all possible permutations for complex and modularly organized genes. This has allowed us to produce a high-confidence, manageable gene annotation dataset that is available at FlyBase (http://flybase.org ). Interesting aspects of new annotations include new genes (coding, non-coding, and antisense), many genes with alternative transcripts with very long 3′ UTRs (up to 15–18 kb), and a stunning mismatch in the number of male-specific genes (approximately 13% of all annotated gene models) vs. female-specific genes (less than 1%). The number of identifiedAbstract: We report the current status of the FlyBase annotated gene set for Drosophila melanogaster and highlight improvements based on high-throughput data. The FlyBase annotated gene set consists entirely of manually annotated gene models, with the exception of some classes of small non-coding RNAs. All gene models have been reviewed using evidence from high-throughput datasets, primarily from the modENCODE project. These datasets include RNA-Seq coverage data, RNA-Seq junction data, transcription start site profiles, and translation stop-codon read-through predictions. New annotation guidelines were developed to take into account the use of the high-throughput data. We describe how this flood of new data was incorporated into thousands of new and revised annotations. FlyBase has adopted a philosophy of excluding low-confidence and low-frequency data from gene model annotations; we also do not attempt to represent all possible permutations for complex and modularly organized genes. This has allowed us to produce a high-confidence, manageable gene annotation dataset that is available at FlyBase (http://flybase.org ). Interesting aspects of new annotations include new genes (coding, non-coding, and antisense), many genes with alternative transcripts with very long 3′ UTRs (up to 15–18 kb), and a stunning mismatch in the number of male-specific genes (approximately 13% of all annotated gene models) vs. female-specific genes (less than 1%). The number of identified pseudogenes and mutations in the sequenced strain also increased significantly. We discuss remaining challenges, for instance, identification of functional small polypeptides and detection of alternative translation starts. … (more)
- Is Part Of:
- G3. Volume 5:Issue 8(2015)
- Journal:
- G3
- Issue:
- Volume 5:Issue 8(2015)
- Issue Display:
- Volume 5, Issue 8 (2015)
- Year:
- 2015
- Volume:
- 5
- Issue:
- 8
- Issue Sort Value:
- 2015-0005-0008-0000
- Page Start:
- 1721
- Page End:
- 1736
- Publication Date:
- 2015-08-01
- Subjects:
- transcriptome -- alternative splice -- lncRNA -- transcription start site -- exon junction
Genetics -- Research -- Periodicals
Genomics -- Periodicals
Genetics
Genomics
Genes
Genetics -- Research
Genomics
Electronic journals
Periodical
Periodicals
Fulltext
Internet Resources
Periodicals
572.8 - Journal URLs:
- https://academic.oup.com/g3journal ↗
http://bibpurl.oclc.org/web/43467 ↗
http://www.g3journal.org ↗
http://www.oxfordjournals.org/ ↗ - DOI:
- 10.1534/g3.115.018929 ↗
- Languages:
- English
- ISSNs:
- 2160-1836
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22174.xml