AdaDerivative optimizer: Adapting step-sizes by the derivative term in past gradient information. (March 2023)
- Record Type:
- Journal Article
- Title:
- AdaDerivative optimizer: Adapting step-sizes by the derivative term in past gradient information. (March 2023)
- Main Title:
- AdaDerivative optimizer: Adapting step-sizes by the derivative term in past gradient information
- Authors:
- Zou, Weidong
Xia, Yuanqing
Cao, Weipeng - Abstract:
- Abstract: AdaBelief fully utilizes "belief" to iteratively update the parameters of deep neural networks. However, the reliability of the "belief" is determined by the gradient's prediction accuracy, and the key to this prediction accuracy is the selection of the smoothing parameter β 1 . AdaBelief also suffers from the overshoot problem, which occurs when the value of parameters exceeds the value of the target and cannot be changed along the gradient direction. In this paper, we propose AdaDerivative to eliminate the overshoot problem of AdaBelief. The key to AdaDerivative is that the "belief" of AdaBelief is replaced by the derivative term's exponential moving average (EMA), which can be constructed as ( 1 − β 2 ) ∑ i = 1 t β 2 t − i ( g i − g i − 1 ) 2 based on the past and current gradients. We validate the performance of AdaDerivative on a variety of tasks, including image classification, language modeling, node classification, image generation, and object detection tasks. Extensive experimental results demonstrate that AdaDerivative can achieve state-of-the-art performance.
- Is Part Of:
- Engineering applications of artificial intelligence. Volume 119(2023)
- Journal:
- Engineering applications of artificial intelligence
- Issue:
- Volume 119(2023)
- Issue Display:
- Volume 119, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 119
- Issue:
- 2023
- Issue Sort Value:
- 2023-0119-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-03
- Subjects:
- Deep neural networks -- Optimization algorithms -- Adam -- Stochastic gradient descent
Engineering -- Data processing -- Periodicals
Artificial intelligence -- Periodicals
Expert systems (Computer science) -- Periodicals
Ingénierie -- Informatique -- Périodiques
Intelligence artificielle -- Périodiques
Systèmes experts (Informatique) -- Périodiques
Artificial intelligence
Engineering -- Data processing
Expert systems (Computer science)
Periodicals
620.00285 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09521976 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.engappai.2022.105755 ↗
- Languages:
- English
- ISSNs:
- 0952-1976
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3755.704500
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 25680.xml