Improved similarity assessment and spectral clustering for unsupervised linking of data extracted from bridge inspection reports. (January 2022)
- Record Type:
- Journal Article
- Title:
- Improved similarity assessment and spectral clustering for unsupervised linking of data extracted from bridge inspection reports. (January 2022)
- Main Title:
- Improved similarity assessment and spectral clustering for unsupervised linking of data extracted from bridge inspection reports
- Authors:
- Liu, Kaijian
El-Gohary, Nora - Abstract:
- Abstract: Textual bridge inspection reports are important data sources for supporting data-driven bridge deterioration prediction and maintenance decision making. Information extraction methods are available to extract data/information from these reports to support data-driven analytics. However, directly using the extracted data/information in data analytics is still challenging because, even within the same report, there exist multiple data records that describe the same entity, which increases the dimensionality of the data and adversely affects the performance of the analytics. The first step to address this problem is to link the multiple records that describe the same entity and same type of instances (e.g., all cracks on a specific bridge deck), so that they can be subsequently fused into a single unified representation for dimensionality reduction without information loss. To address this need, this paper proposes a spectral clustering-based method for unsupervised data linking. The method includes: (1) a concept similarity assessment method, which allows for assessing concept similarity even when corpus or semantic information is not available for the application at hand; (2) a record similarity assessment method, which captures and uses similarity assessment dependencies to reduce the number of falsely-linked records; and (3) an improved spectral clustering method, which uses iterative bi-partitioning to better link records in an unsupervised way and to address theAbstract: Textual bridge inspection reports are important data sources for supporting data-driven bridge deterioration prediction and maintenance decision making. Information extraction methods are available to extract data/information from these reports to support data-driven analytics. However, directly using the extracted data/information in data analytics is still challenging because, even within the same report, there exist multiple data records that describe the same entity, which increases the dimensionality of the data and adversely affects the performance of the analytics. The first step to address this problem is to link the multiple records that describe the same entity and same type of instances (e.g., all cracks on a specific bridge deck), so that they can be subsequently fused into a single unified representation for dimensionality reduction without information loss. To address this need, this paper proposes a spectral clustering-based method for unsupervised data linking. The method includes: (1) a concept similarity assessment method, which allows for assessing concept similarity even when corpus or semantic information is not available for the application at hand; (2) a record similarity assessment method, which captures and uses similarity assessment dependencies to reduce the number of falsely-linked records; and (3) an improved spectral clustering method, which uses iterative bi-partitioning to better link records in an unsupervised way and to address the transitive closure problem. The proposed data linking method was evaluated in linking records extracted from ten bridge inspection reports. It achieved an average precision, recall, and F-1 measure of 96.2%, 88.3%, and 92.1%, respectively. … (more)
- Is Part Of:
- Advanced engineering informatics. Volume 51(2022)
- Journal:
- Advanced engineering informatics
- Issue:
- Volume 51(2022)
- Issue Display:
- Volume 51, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 51
- Issue:
- 2022
- Issue Sort Value:
- 2022-0051-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-01
- Subjects:
- Data linking/linkage -- Similarity assessment -- Unsupervised machine learning -- Spectral clustering -- Bridges -- Deterioration prediction -- Maintenance decision making
Computer-aided engineering -- Periodicals
Engineering -- Data processing -- Periodicals
620.00285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14740346 ↗
http://books.google.com/books?id=KhFVAAAAMAAJ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.aei.2021.101496 ↗
- Languages:
- English
- ISSNs:
- 1474-0346
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 0696.851100
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 20994.xml