Entropic gradient descent algorithms and wide flat minima*This article is an updated version of: Pittorino F, Lucibello C, Feinauer C, Perugini G, Baldassi C, Demyanenko E and Zecchina R 2021 Entropic gradient descent algorithms and wide flat minima Int. Conf. on Learning Representations. (29th December 2021)

Record Type:: Journal Article
Title:: Entropic gradient descent algorithms and wide flat minima*This article is an updated version of: Pittorino F, Lucibello C, Feinauer C, Perugini G, Baldassi C, Demyanenko E and Zecchina R 2021 Entropic gradient descent algorithms and wide flat minima Int. Conf. on Learning Representations. (29th December 2021)
Main Title:: Entropic gradient descent algorithms and wide flat minima*This article is an updated version of: Pittorino F, Lucibello C, Feinauer C, Perugini G, Baldassi C, Demyanenko E and Zecchina R 2021 Entropic gradient descent algorithms and wide flat minima Int. Conf. on Learning Representations.
Authors:: Pittorino, Fabrizio
Lucibello, Carlo
Feinauer, Christoph
Perugini, Gabriele
Baldassi, Carlo
Demyanenko, Elizaveta
Zecchina, Riccardo
Abstract:: Abstract: The properties of flat minima in the empirical risk landscape of neural networks have been debated for some time. Increasing evidence suggests they possess better generalization capabilities with respect to sharp ones. In this work we first discuss the relationship between alternative measures of flatness: the local entropy, which is useful for analysis and algorithm development, and the local energy, which is easier to compute and was shown empirically in extensive tests on state-of-the-art networks to be the best predictor of generalization capabilities. We show semi-analytically in simple controlled scenarios that these two measures correlate strongly with each other and with generalization. Then, we extend the analysis to the deep learning scenario by extensive numerical validations. We study two algorithms, entropy-stochastic gradient descent and replicated-stochastic gradient descent, that explicitly include the local entropy in the optimization objective. We devise a training schedule by which we consistently find flatter minima (using both flatness measures), and improve the generalization error for common architectures (e.g. ResNet, EfficientNet).
Is Part Of:: Journal of statistical mechanics. (2021:Dec.)
Journal:: Journal of statistical mechanics
Issue:: (2021:Dec.)
Issue Display:: Volume 1000084 (2021)
Year:: 2021
Volume:: 1000084
Issue Sort Value:: 2021-1000084-0000-0000
Page Start:
Page End:
Publication Date:: 2021-12-29
Subjects:: deep learning -- machine learning -- message-passing algorithms
Statistical mechanics -- Periodicals
Mechanics -- Statistical methods -- Periodicals
530.1305
Journal URLs:: http://ioppublishing.org/ ↗
DOI:: 10.1088/1742-5468/ac3ae8 ↗
Languages:: English
ISSNs:: 1742-5468
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 20931.xml