An analytic theory of shallow networks dynamics for hinge loss classification*This article is an updated version of: Pellegrini F and Biroli G 2020 An analytic theory of shallow networks dynamics for hinge loss classification Advances in Neural Information Processing Systems vol 33 ed H Larochelle, M Ranzato, R Hadsell, M F Balcan and H Lin (New York: Curran Associates) pp 5356–67. (29th December 2021)

Record Type:: Journal Article
Title:: An analytic theory of shallow networks dynamics for hinge loss classification*This article is an updated version of: Pellegrini F and Biroli G 2020 An analytic theory of shallow networks dynamics for hinge loss classification Advances in Neural Information Processing Systems vol 33 ed H Larochelle, M Ranzato, R Hadsell, M F Balcan and H Lin (New York: Curran Associates) pp 5356–67. (29th December 2021)
Main Title:: An analytic theory of shallow networks dynamics for hinge loss classification*This article is an updated version of: Pellegrini F and Biroli G 2020 An analytic theory of shallow networks dynamics for hinge loss classification Advances in Neural Information Processing Systems vol 33 ed H Larochelle, M Ranzato, R Hadsell, M F Balcan and H Lin (New York: Curran Associates) pp 5356–67.
Authors:: Pellegrini, Franco
Biroli, Giulio
Abstract:: Abstract: Neural networks have been shown to perform incredibly well in classification tasks over structured high-dimensional datasets. However, the learning dynamics of such networks is still poorly understood. In this paper we study in detail the training dynamics of a simple type of neural network: a single hidden layer trained to perform a classification task. We show that in a suitable mean-field limit this case maps to a single-node learning problem with a time-dependent dataset determined self-consistently from the average nodes population. We specialize our theory to the prototypical case of a linearly separable data and a linear hinge loss, for which the dynamics can be explicitly solved in the infinite dataset limit. This allows us to address in a simple setting several phenomena appearing in modern networks such as slowing down of training dynamics, crossover between rich and lazy learning, and overfitting. Finally, we assess the limitations of mean-field theory by studying the case of large but finite number of nodes and of training samples.
Is Part Of:: Journal of statistical mechanics. (2021:Dec.)
Journal:: Journal of statistical mechanics
Issue:: (2021:Dec.)
Issue Display:: Volume 1000084 (2021)
Year:: 2021
Volume:: 1000084
Issue Sort Value:: 2021-1000084-0000-0000
Page Start:
Page End:
Publication Date:: 2021-12-29
Subjects:: machine learning
Statistical mechanics -- Periodicals
Mechanics -- Statistical methods -- Periodicals
530.1305
Journal URLs:: http://ioppublishing.org/ ↗
DOI:: 10.1088/1742-5468/ac3a76 ↗
Languages:: English
ISSNs:: 1742-5468
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 20931.xml