Variational posterior approximation using stochastic gradient ascent with adaptive stepsize. (April 2021)
- Record Type:
- Journal Article
- Title:
- Variational posterior approximation using stochastic gradient ascent with adaptive stepsize. (April 2021)
- Main Title:
- Variational posterior approximation using stochastic gradient ascent with adaptive stepsize
- Authors:
- Lim, Kart-Leong
Jiang, Xudong - Abstract:
- Highlights: Variational inference, most notably the stochastic variational inference rely on the closed formed coordinate ascent algorithm. A challenge is how to scale up the learning. We proposed using Stochastic Gradient Ascent as the scalable learner for the variational inference of the Variational Bayes Dirichlet Process Mixture. In order to achieve speed while maintaining performance for variational inference of DPM using SGA, we adopted two stochastic optimization techniques for our SGA, for comparison. Namely, the natural gradient ascent and the momentum method. We show that our new stochastic gradient ascent approach to variational inference is compatible with deep ConvNet features when applied to large scale datasets such as Caltech256 and SUN397. Lastly, we justify our speed and performance claims when compared to closed form coordinate ascent learning on these datasets. Abstract: Scalable algorithms of variational posterior approximation allow Bayesian nonparametrics such as Dirichlet process mixture to scale up to larger dataset at fractional cost. Recent algorithms, notably the stochastic variational inference performs local learning from minibatch. The main problem with stochastic variational inference is that it relies on closed form solution. Stochastic gradient ascent is a modern approach to machine learning and is widely deployed in the training of deep neural networks. In this work, we explore using stochastic gradient ascent as a fast algorithm for theHighlights: Variational inference, most notably the stochastic variational inference rely on the closed formed coordinate ascent algorithm. A challenge is how to scale up the learning. We proposed using Stochastic Gradient Ascent as the scalable learner for the variational inference of the Variational Bayes Dirichlet Process Mixture. In order to achieve speed while maintaining performance for variational inference of DPM using SGA, we adopted two stochastic optimization techniques for our SGA, for comparison. Namely, the natural gradient ascent and the momentum method. We show that our new stochastic gradient ascent approach to variational inference is compatible with deep ConvNet features when applied to large scale datasets such as Caltech256 and SUN397. Lastly, we justify our speed and performance claims when compared to closed form coordinate ascent learning on these datasets. Abstract: Scalable algorithms of variational posterior approximation allow Bayesian nonparametrics such as Dirichlet process mixture to scale up to larger dataset at fractional cost. Recent algorithms, notably the stochastic variational inference performs local learning from minibatch. The main problem with stochastic variational inference is that it relies on closed form solution. Stochastic gradient ascent is a modern approach to machine learning and is widely deployed in the training of deep neural networks. In this work, we explore using stochastic gradient ascent as a fast algorithm for the posterior approximation of Dirichlet process mixture. However, stochastic gradient ascent alone is not optimal for learning. In order to achieve both speed and performance, we turn our focus to stepsize optimization in stochastic gradient ascent. As as intermediate approach, we first optimize stepsize using the momentum method. Finally, we introduce Fisher information to allow adaptive stepsize in our posterior approximation. In the experiments, we justify that our approach using stochastic gradient ascent do not sacrifice performance for speed when compared to closed form coordinate ascent learning on these datasets. Lastly, our approach is also compatible with deep ConvNet features as well as scalable to large class datasets such as Caltech256 and SUN397. … (more)
- Is Part Of:
- Pattern recognition. Volume 112(2021)
- Journal:
- Pattern recognition
- Issue:
- Volume 112(2021)
- Issue Display:
- Volume 112, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 112
- Issue:
- 2021
- Issue Sort Value:
- 2021-0112-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-04
- Subjects:
- Dirichlet process mixture -- Stochastic gradient ascent -- Fisher information -- Scalable algorithm
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2020.107783 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 15784.xml