Unsupervised abstractive summarization via sentence rewriting. (March 2023)
- Record Type:
- Journal Article
- Title:
- Unsupervised abstractive summarization via sentence rewriting. (March 2023)
- Main Title:
- Unsupervised abstractive summarization via sentence rewriting
- Authors:
- Zhang, Zhihao
Liang, Xinnian
Zuo, Yuan
Li, Zhoujun - Abstract:
- Abstract: Unsupervised extractive summarization aims to extract salient sentences from the document without labeled corpus. Existing methods have achieved promising progress, thanks to the power of large-scale pre-trained language models and high-quality contextualized representations. However, extractive summaries often fail to maintain smooth transitions between sentences and struggle to form a coherent and fluent text due to splicing of sentences. Nevertheless, to the best of our knowledge, very few studies currently focus on unsupervised abstractive summarization. Inspired by the intuitive human process of writing summaries, which involves extracting salient sentences first and then reconstructing them, in this paper, we propose an Extract-then-Abstract framework to generate more coherent and human-like summary. Specifically, we first adopt extractive summarization model as summarizer to generate extractive summary in the extraction stage. Then in the abstraction stage, we propose a BART-based sentence write model to generate more coherent and fluent abstractive summary. To this end, we design a novel parallel data creation method for our rewrite model by proposing an effective sentence sampling strategy without any manual annotation cost. Extensive experiments including automatic evaluation and human evaluation demonstrate that our framework consistently outperforms strong baselines for unsupervised abstractive summarization and can generate more coherent and human-likeAbstract: Unsupervised extractive summarization aims to extract salient sentences from the document without labeled corpus. Existing methods have achieved promising progress, thanks to the power of large-scale pre-trained language models and high-quality contextualized representations. However, extractive summaries often fail to maintain smooth transitions between sentences and struggle to form a coherent and fluent text due to splicing of sentences. Nevertheless, to the best of our knowledge, very few studies currently focus on unsupervised abstractive summarization. Inspired by the intuitive human process of writing summaries, which involves extracting salient sentences first and then reconstructing them, in this paper, we propose an Extract-then-Abstract framework to generate more coherent and human-like summary. Specifically, we first adopt extractive summarization model as summarizer to generate extractive summary in the extraction stage. Then in the abstraction stage, we propose a BART-based sentence write model to generate more coherent and fluent abstractive summary. To this end, we design a novel parallel data creation method for our rewrite model by proposing an effective sentence sampling strategy without any manual annotation cost. Extensive experiments including automatic evaluation and human evaluation demonstrate that our framework consistently outperforms strong baselines for unsupervised abstractive summarization and can generate more coherent and human-like summary while maintaining in competitive ROUGE scores for unsupervised extractive summarization. … (more)
- Is Part Of:
- Computer speech & language. Volume 78(2023)
- Journal:
- Computer speech & language
- Issue:
- Volume 78(2023)
- Issue Display:
- Volume 78, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 78
- Issue:
- 2023
- Issue Sort Value:
- 2023-0078-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-03
- Subjects:
- Unsupervised abstractive summarization -- Sentence rewrite model -- Pre-trained language model -- Coherence text
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2022.101467 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 24451.xml