An online multi-source summarization algorithm for text readability in topic-based search. (March 2021)
- Record Type:
- Journal Article
- Title:
- An online multi-source summarization algorithm for text readability in topic-based search. (March 2021)
- Main Title:
- An online multi-source summarization algorithm for text readability in topic-based search
- Authors:
- Curiel, Arturo
Gutiérrez-Soto, Claudio
Rojano-Cáceres, José-Rafael - Abstract:
- Highlights: An on-line extractive multi-source summarization algorithm. Suitable to synthesize topic-related documents. Produces highly readable summaries, while preserving topic information. Language agnostic, shown to work in both English and Spanish sources. Efficient, with execution times strictly below O ( n 2 ) . Abstract: Web search users are likely to face problems related to the availability of large amounts of data. As the quantity of online content grows, the risk of missing relevant information during search can only increase. Moreover, external variables such as the users' reading proficiency level can further complicate the task. This article proposes an online multi-document summarization algorithm for text readability, as a means to simplify web search. The algorithm is designed to work over collections of topic-related documents, such as the ones returned as the results to a web query. Contrary to most modern approaches, no preliminary training for the algorithm is required. The algorithm was tested in both English and Spanish language documents, using different metrics of term and sentence relevance. The results were compared against summaries created by both human summarizers and third-party Automatic Text Summarization (ATS) systems in terms of two variables: readability and information content. In both variables, the results show generalized gains with respect to both the human summarizers and the third-party ATS systems. Furthermore, the algorithmHighlights: An on-line extractive multi-source summarization algorithm. Suitable to synthesize topic-related documents. Produces highly readable summaries, while preserving topic information. Language agnostic, shown to work in both English and Spanish sources. Efficient, with execution times strictly below O ( n 2 ) . Abstract: Web search users are likely to face problems related to the availability of large amounts of data. As the quantity of online content grows, the risk of missing relevant information during search can only increase. Moreover, external variables such as the users' reading proficiency level can further complicate the task. This article proposes an online multi-document summarization algorithm for text readability, as a means to simplify web search. The algorithm is designed to work over collections of topic-related documents, such as the ones returned as the results to a web query. Contrary to most modern approaches, no preliminary training for the algorithm is required. The algorithm was tested in both English and Spanish language documents, using different metrics of term and sentence relevance. The results were compared against summaries created by both human summarizers and third-party Automatic Text Summarization (ATS) systems in terms of two variables: readability and information content. In both variables, the results show generalized gains with respect to both the human summarizers and the third-party ATS systems. Furthermore, the algorithm achieved these results with a time complexity strictly lower than O ( n 2 ) ; well below traditional machine learning approaches. … (more)
- Is Part Of:
- Computer speech & language. Volume 66(2021)
- Journal:
- Computer speech & language
- Issue:
- Volume 66(2021)
- Issue Display:
- Volume 66, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 66
- Issue:
- 2021
- Issue Sort Value:
- 2021-0066-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-03
- Subjects:
- Automatic text summarization -- Text readability -- Online algorithm -- Information retrieval
00-01 -- 99-00
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2020.101143 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 15413.xml