Multi-distribution noise quantisation: an extreme compression scheme for transformer according to parameter distribution. Issue 1 (31st December 2022)
- Record Type:
- Journal Article
- Title:
- Multi-distribution noise quantisation: an extreme compression scheme for transformer according to parameter distribution. Issue 1 (31st December 2022)
- Main Title:
- Multi-distribution noise quantisation: an extreme compression scheme for transformer according to parameter distribution
- Authors:
- Yu, Zaiyang
Li, Shuang
Sun, Linjun
Liu, Liang
Haining, Wang - Abstract:
- Abstract : With the development of deep learning, neural networks are widely used in various fields, and the improved model performance also introduces a considerable number of parameters and computations. Model quantisation is a technique that turns floating-point computing into low-specific-point computing, which can effectively reduce model computation strength, parameter size, and memory consumption but often bring a considerable loss of accuracy. This paper mainly addresses the problem where the distribution of parameters is too concentrated during quantisation aware training (QAT). In the QAT process, we use a piecewise function to statistics the parameter distributions and simulate the effect of quantisation noise in each round of training, based on the statistical results. Experimental results show that by quantising the Transformer network, we lose less precision and significantly reduce the storage cost of the model; compared with the full precision LSTM network, our model has higher accuracy under the condition of a similar storage cost. Meanwhile, compared with other quantisation methods on language modelling task, our approach is more accurate. We validated the effectiveness of our policy on the WikiText-103 and PENN Treebank datasets. The experiments show that our method extremely compresses the storage cost and maintains high model performance.
- Is Part Of:
- Connection science. Volume 34:Issue 1(2022)
- Journal:
- Connection science
- Issue:
- Volume 34:Issue 1(2022)
- Issue Display:
- Volume 34, Issue 1 (2022)
- Year:
- 2022
- Volume:
- 34
- Issue:
- 1
- Issue Sort Value:
- 2022-0034-0001-0000
- Page Start:
- 990
- Page End:
- 1004
- Publication Date:
- 2022-12-31
- Subjects:
- Compression algorithms -- natural language processing -- transformer -- vector quantisation appendices
Neural computers -- Periodicals
Artificial intelligence -- Periodicals
Cognitive science -- Periodicals
Connectionism -- Periodicals
006.3 - Journal URLs:
- http://www.tandfonline.com/toc/ccos20/current ↗
http://www.tandfonline.com/ ↗ - DOI:
- 10.1080/09540091.2021.2024510 ↗
- Languages:
- English
- ISSNs:
- 0954-0091
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3417.662450
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 23302.xml