Locality defeats the curse of dimensionality in convolutional teacher–student scenarios*This article is an updated version of: Favero A, Cagnetta F and Wyart M 2021 Locality defeats the curse of dimensionality in convolutional teacher–student scenarios Advances in Neural Information Processing Systems vol 34, ed M Ranzato, A Beygelzimer, Y Dauphin, P S Liang and J Wortman Vaughan (New York: Curran Associates) pp 9456–67. (1st November 2022)

Record Type:: Journal Article
Title:: Locality defeats the curse of dimensionality in convolutional teacher–student scenarios*This article is an updated version of: Favero A, Cagnetta F and Wyart M 2021 Locality defeats the curse of dimensionality in convolutional teacher–student scenarios Advances in Neural Information Processing Systems vol 34, ed M Ranzato, A Beygelzimer, Y Dauphin, P S Liang and J Wortman Vaughan (New York: Curran Associates) pp 9456–67. (1st November 2022)
Main Title:: Locality defeats the curse of dimensionality in convolutional teacher–student scenarios*This article is an updated version of: Favero A, Cagnetta F and Wyart M 2021 Locality defeats the curse of dimensionality in convolutional teacher–student scenarios Advances in Neural Information Processing Systems vol 34, ed M Ranzato, A Beygelzimer, Y Dauphin, P S Liang and J Wortman Vaughan (New York: Curran Associates) pp 9456–67.
Authors:: Favero, Alessandro
Cagnetta, Francesco
Wyart, Matthieu
Abstract:: Abstract: Convolutional neural networks perform a local and translationally-invariant treatment of the data: quantifying which of these two aspects is central to their success remains a challenge. We study this problem within a teacher–student framework for kernel regression, using 'convolutional' kernels inspired by the neural tangent kernel of simple convolutional architectures of given filter size. Using heuristic methods from physics, we find in the ridgeless case that locality is key in determining the learning curve exponent β (that relates the test error ϵ t ∼ P − β to the size of the training set P ), whereas translational invariance is not. In particular, if the filter size of the teacher t is smaller than that of the student s, β is a function of s only and does not depend on the input dimension. We confirm our predictions on β empirically. We conclude by proving, under a natural universality assumption, that performing kernel regression with a ridge that decreases with the size of the training set leads to similar learning curve exponents to those we obtain in the ridgeless case.
Is Part Of:: Journal of statistical mechanics. (2022:Nov.)
Journal:: Journal of statistical mechanics
Issue:: (2022:Nov.)
Issue Display:: Volume 1000095 (2022)
Year:: 2022
Volume:: 1000095
Issue Sort Value:: 2022-1000095-0000-0000
Page Start:
Page End:
Publication Date:: 2022-11-01
Subjects:: analysis of algorithms -- deep learning -- learning theory -- machine learning
Statistical mechanics -- Periodicals
Mechanics -- Statistical methods -- Periodicals
530.1305
Journal URLs:: http://ioppublishing.org/ ↗
DOI:: 10.1088/1742-5468/ac98ab ↗
Languages:: English
ISSNs:: 1742-5468
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 24479.xml