Application of the mol2vec Technology to Large‐size Data Visualization and Analysis. Issue 6 (26th February 2020)
- Record Type:
- Journal Article
- Title:
- Application of the mol2vec Technology to Large‐size Data Visualization and Analysis. Issue 6 (26th February 2020)
- Main Title:
- Application of the mol2vec Technology to Large‐size Data Visualization and Analysis
- Authors:
- Shibayama, Shojiro
Marcou, Gilles
Horvath, Dragos
Baskin, Igor I.
Funatsu, Kimito
Varnek, Alexandre - Abstract:
- Abstract: Generative Topographic Mapping (GTM) is a dimensionality reduction method, which is widely used for both data visualization and structure‐activity modeling. Large dimensionality of the initial data space may require significant computational resources and slow down the GTM construction. Therefore, it may be meaningful to reduce the number of descriptors used for encoding molecular structures. The Principal Component Analysis (PCA), a standard preprocessing tool, suffers from the information loss upon the dimensionality reduction. As an alternative, we propose to use substructure vector embedding provided by the mol2vec technique. In addition to the data dimensionality reduction, this technology also accounts for proximity of substructures in molecular graphs. In this study, dimensionality of large descriptor spaces of ISIDA fragment descriptors or Morgan fingerprints were reduced using either the PCA or the mol2vec method. The latter significantly speeds up GTM training without compromising its predictive power in bioactivity classification tasks. Abstract :
- Is Part Of:
- Molecular informatics. Volume 39:Issue 6(2020)
- Journal:
- Molecular informatics
- Issue:
- Volume 39:Issue 6(2020)
- Issue Display:
- Volume 39, Issue 6 (2020)
- Year:
- 2020
- Volume:
- 39
- Issue:
- 6
- Issue Sort Value:
- 2020-0039-0006-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2020-02-26
- Subjects:
- Generative topographic mapping -- QSAR -- fragment descriptors -- mol2vec -- substructure vector embedding -- distributed representation
Cheminformatics -- Periodicals
QSAR (Biochemistry) -- Periodicals
Structure-activity relationships (Biochemistry) -- Periodicals
Drugs -- Structure-activity relationships -- Periodicals
615.19 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1868-1751 ↗
http://www3.interscience.wiley.com/journal/123236613/home ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/minf.201900170 ↗
- Languages:
- English
- ISSNs:
- 1868-1743
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 5900.817750
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 13167.xml