Optimally choosing PWM motif databases and sequence scanning approaches based on ChIP-seq data. Issue 1 (December 2015)
- Record Type:
- Journal Article
- Title:
- Optimally choosing PWM motif databases and sequence scanning approaches based on ChIP-seq data. Issue 1 (December 2015)
- Main Title:
- Optimally choosing PWM motif databases and sequence scanning approaches based on ChIP-seq data
- Authors:
- Dabrowski, Michal
Dojer, Norbert
Krystkowiak, Izabella
Kaminska, Bozena
Wilczynski, Bartek - Abstract:
- Abstract Background For many years now, binding preferences of Transcription Factors have been described by so called motifs, usually mathematically defined by position weight matrices or similar models, for the purpose of predicting potential binding sites. However, despite the availability of thousands of motif models in public and commercial databases, a researcher who wants to use them is left with many competing methods of identifying potential binding sites in a genome of interest and there is little published information regarding the optimality of different choices. Thanks to the availability of large number of different motif models as well as a number of experimental datasets describing actual binding of TFs in hundreds of TF-ChIP-seq pairs, we set out to perform a comprehensive analysis of this matter. Results We focus on the task of identifying potential transcription factor binding sites in the human genome. Firstly, we provide a comprehensive comparison of the coverage and quality of models available in different databases, showing that the public databases have comparable TFs coverage and better motif performance than commercial databases. Secondly, we compare different motif scanners showing that, regardless of the database used, the tools developed by the scientific community outperform the commercial tools. Thirdly, we calculate for each motif a detection threshold optimizing the accuracy of prediction. Finally, we provide an in-depth comparison ofAbstract Background For many years now, binding preferences of Transcription Factors have been described by so called motifs, usually mathematically defined by position weight matrices or similar models, for the purpose of predicting potential binding sites. However, despite the availability of thousands of motif models in public and commercial databases, a researcher who wants to use them is left with many competing methods of identifying potential binding sites in a genome of interest and there is little published information regarding the optimality of different choices. Thanks to the availability of large number of different motif models as well as a number of experimental datasets describing actual binding of TFs in hundreds of TF-ChIP-seq pairs, we set out to perform a comprehensive analysis of this matter. Results We focus on the task of identifying potential transcription factor binding sites in the human genome. Firstly, we provide a comprehensive comparison of the coverage and quality of models available in different databases, showing that the public databases have comparable TFs coverage and better motif performance than commercial databases. Secondly, we compare different motif scanners showing that, regardless of the database used, the tools developed by the scientific community outperform the commercial tools. Thirdly, we calculate for each motif a detection threshold optimizing the accuracy of prediction. Finally, we provide an in-depth comparison of different methods of choosing thresholds for all motifs a priori. Surprisingly, we show that selecting a common false-positive rate gives results that are the least biased by the information content of the motif and therefore most uniformly accurate. Conclusion We provide a guide for researchers working with transcription factor motifs. It is supplemented with detailed results of the analysis and the benchmark datasets athttp://bioputer.mimuw.edu.pl/papers/motifs/ . … (more)
- Is Part Of:
- BMC bioinformatics. Volume 16:Issue 1(2015)
- Journal:
- BMC bioinformatics
- Issue:
- Volume 16:Issue 1(2015)
- Issue Display:
- Volume 16, Issue 1 (2015)
- Year:
- 2015
- Volume:
- 16
- Issue:
- 1
- Issue Sort Value:
- 2015-0016-0001-0000
- Page Start:
- 1
- Page End:
- 14
- Publication Date:
- 2015-12
- Subjects:
- Transcription factor binding -- Binding site -- Sequence motif -- Motif database
Bioinformatics -- Periodicals
Computational biology -- Periodicals
570.285 - Journal URLs:
- http://www.biomedcentral.com/bmcbioinformatics/ ↗
http://www.pubmedcentral.nih.gov/tocrender.fcgi?journal=13 ↗
http://link.springer.com/ ↗ - DOI:
- 10.1186/s12859-015-0573-5 ↗
- Languages:
- English
- ISSNs:
- 1471-2105
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - Digital store
British Library HMNTS - ELD Digital store - Ingest File:
- 9957.xml