Improved feature decay algorithms for statistical machine translation. (22nd January 2022)
- Record Type:
- Journal Article
- Title:
- Improved feature decay algorithms for statistical machine translation. (22nd January 2022)
- Main Title:
- Improved feature decay algorithms for statistical machine translation
- Authors:
- Poncelas, Alberto
Maillette de Buy Wenniger, Gideon
Way, Andy - Abstract:
- Abstract: In machine-learning applications, data selection is of crucial importance if good runtime performance is to be achieved. In a scenario where the test set is accessible when the model is being built, training instances can be selected so they are the most relevant for the test set. Feature Decay Algorithms (FDA) are a technique for data selection that has demonstrated excellent performance in a number of tasks. This method maximizes the diversity of the n -grams in the training set by devaluing those ones that have already been included. We focus on this method to undertake deeper research on how to select better training data instances. We give an overview of FDA and propose improvements in terms of speed and quality. Using German-to-English parallel data, first we create a novel approach that decreases the execution time of FDA when multiple computation units are available. In addition, we obtain improvements on translation quality by extending FDA using information from the parallel corpus that is generally ignored.
- Is Part Of:
- Natural language engineering. Volume 28:Number 1(2022)
- Journal:
- Natural language engineering
- Issue:
- Volume 28:Number 1(2022)
- Issue Display:
- Volume 28, Issue 1 (2022)
- Year:
- 2022
- Volume:
- 28
- Issue:
- 1
- Issue Sort Value:
- 2022-0028-0001-0000
- Page Start:
- 71
- Page End:
- 91
- Publication Date:
- 2022-01-22
- Subjects:
- Machine translation -- Data selection -- Statistical methods
Natural language processing (Computer science) -- Periodicals
Software engineering -- Periodicals
006.35 - Journal URLs:
- http://journals.cambridge.org/action/displayJournal?jid=NLE ↗
- DOI:
- 10.1017/S1351324920000467 ↗
- Languages:
- English
- ISSNs:
- 1351-3249
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library HMNTS - ELD Digital store
- Ingest File:
- 20537.xml