Entropy-SGD: biasing gradient descent into wide valleys*This article is an updated version of Chaudhari P, Choromanska A, Soatto S, LeCun Y, Baldassi C, Borgs C, Chayes J, Sagun L, and Zecchina R 2017 Entropy-SGD: biasing gradient descent into wide valleys. Proc. of the International Conference of Learning and Representations (ICLR 2017).Code: https://github.com/ucla-vision/entropy-sgd. (20th December 2019)

Record Type:: Journal Article
Title:: Entropy-SGD: biasing gradient descent into wide valleys*This article is an updated version of Chaudhari P, Choromanska A, Soatto S, LeCun Y, Baldassi C, Borgs C, Chayes J, Sagun L, and Zecchina R 2017 Entropy-SGD: biasing gradient descent into wide valleys. Proc. of the International Conference of Learning and Representations (ICLR 2017).Code: https://github.com/ucla-vision/entropy-sgd. (20th December 2019)
Main Title:: Entropy-SGD: biasing gradient descent into wide valleys*This article is an updated version of Chaudhari P, Choromanska A, Soatto S, LeCun Y, Baldassi C, Borgs C, Chayes J, Sagun L, and Zecchina R 2017 Entropy-SGD: biasing gradient descent into wide valleys. Proc. of the International Conference of Learning and Representations (ICLR 2017).Code: https://github.com/ucla-vision/entropy-sgd
Authors:: Chaudhari, Pratik
Choromanska, Anna
Soatto, Stefano
LeCun, Yann
Baldassi, Carlo
Borgs, Christian
Chayes, Jennifer
Sagun, Levent
Zecchina, Riccardo
Abstract:: Abstract: This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape. Local extrema with low generalization error have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We leverage upon this observation to construct a local-entropy-based objective function that favors well-generalizable solutions lying in large flat regions of the energy landscape, while avoiding poorly-generalizable solutions located in the sharp valleys. Conceptually, our algorithm resembles two nested loops of SGD where we use Langevin dynamics in the inner loop to compute the gradient of the local entropy before each update of the weights. We show that the new objective has a smoother energy landscape and show improved generalization over SGD using uniform stability, under certain assumptions. Our experiments on convolutional and recurrent networks demonstrate that Entropy-SGD compares favorably to state-of-the-art techniques in terms of generalization error and training time.
Is Part Of:: Journal of statistical mechanics. (2019:Dec.)
Journal:: Journal of statistical mechanics
Issue:: (2019:Dec.)
Issue Display:: Volume 1000060 (2019)
Year:: 2019
Volume:: 1000060
Issue Sort Value:: 2019-1000060-0000-0000
Page Start:
Page End:
Publication Date:: 2019-12-20
Subjects:: Statistical mechanics -- Periodicals
Mechanics -- Statistical methods -- Periodicals
530.1305
Journal URLs:: http://ioppublishing.org/ ↗
DOI:: 10.1088/1742-5468/ab39d9 ↗
Languages:: English
ISSNs:: 1742-5468
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 14316.xml