Geometric compression of invariant manifolds in neural networks. Issue 4 (26th April 2021)
- Record Type:
- Journal Article
- Title:
- Geometric compression of invariant manifolds in neural networks. Issue 4 (26th April 2021)
- Main Title:
- Geometric compression of invariant manifolds in neural networks
- Authors:
- Paccolat, Jonas
Petrini, Leonardo
Geiger, Mario
Tyloo, Kevin
Wyart, Matthieu - Abstract:
- Abstract: We study how neural networks compress uninformative input space in models where data lie in d dimensions, but the labels of which only vary within a linear manifold of dimension d ∥ < d . We show that for a one-hidden-layer network initialized with infinitesimal weights (i.e. in the feature learning regime) trained with gradient descent, the first layer of weights evolves to become nearly insensitive to the d ⊥ = d − d ∥ uninformative directions. These are effectively compressed by a factor λ ∼ p, where p is the size of the training set. We quantify the benefit of such a compression on the test error ϵ . For large initialization of the weights (the lazy training regime), no compression occurs and for regular boundaries separating labels we find that ϵ ∼ p − β, with β Lazy = d /(3 d − 2). Compression improves the learning curves so that β Feature = (2 d − 1)/(3 d − 2) if d ∥ = 1 and β Feature = ( d + d ⊥ /2)/(3 d − 2) if d ∥ > 1. We test these predictions for a stripe model where boundaries are parallel interfaces ( d ∥ = 1) as well as for a cylindrical boundary ( d ∥ = 2). Next, we show that compression shapes the neural tangent kernel (NTK) evolution in time, so that its top eigenvectors become more informative and display a larger projection on the labels. Consequently, kernel learning with the frozen NTK at the end of training outperforms the initial NTK. We confirm these predictions both for a one-hidden-layer fully connected network trained on the stripe modelAbstract: We study how neural networks compress uninformative input space in models where data lie in d dimensions, but the labels of which only vary within a linear manifold of dimension d ∥ < d . We show that for a one-hidden-layer network initialized with infinitesimal weights (i.e. in the feature learning regime) trained with gradient descent, the first layer of weights evolves to become nearly insensitive to the d ⊥ = d − d ∥ uninformative directions. These are effectively compressed by a factor λ ∼ p, where p is the size of the training set. We quantify the benefit of such a compression on the test error ϵ . For large initialization of the weights (the lazy training regime), no compression occurs and for regular boundaries separating labels we find that ϵ ∼ p − β, with β Lazy = d /(3 d − 2). Compression improves the learning curves so that β Feature = (2 d − 1)/(3 d − 2) if d ∥ = 1 and β Feature = ( d + d ⊥ /2)/(3 d − 2) if d ∥ > 1. We test these predictions for a stripe model where boundaries are parallel interfaces ( d ∥ = 1) as well as for a cylindrical boundary ( d ∥ = 2). Next, we show that compression shapes the neural tangent kernel (NTK) evolution in time, so that its top eigenvectors become more informative and display a larger projection on the labels. Consequently, kernel learning with the frozen NTK at the end of training outperforms the initial NTK. We confirm these predictions both for a one-hidden-layer fully connected network trained on the stripe model and for a 16-layer convolutional neural network trained on the Modified National Institute of Standards and Technology database (MNIST), for which we also find β Feature > β Lazy . The great similarities found in these two cases support the idea that compression is central to the training of MNIST, and puts forward kernel principal component analysis on the evolving NTK as a useful diagnostic of compression in deep networks. … (more)
- Is Part Of:
- Journal of statistical mechanics. Issue 4(2021:Apr.)
- Journal:
- Journal of statistical mechanics
- Issue:
- Issue 4(2021:Apr.)
- Issue Display:
- Volume 1000076, Issue 4 (2021)
- Year:
- 2021
- Volume:
- 1000076
- Issue:
- 4
- Issue Sort Value:
- 2021-1000076-0004-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-04-26
- Subjects:
- deep learning -- learning theory -- machine learning
Statistical mechanics -- Periodicals
Mechanics -- Statistical methods -- Periodicals
530.1305 - Journal URLs:
- http://ioppublishing.org/ ↗
- DOI:
- 10.1088/1742-5468/abf1f3 ↗
- Languages:
- English
- ISSNs:
- 1742-5468
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 18145.xml