WHY ARE CONVOLUTIONAL NETS MORE SAMPLE-EFFICIENT THAN FULLY-CONNECTED NETS?

Zhiyuan Li, Yi Zhang, Sanjeev Arora

Research output: Contribution to conferencePaperpeer-review

Abstract

Convolutional neural networks often dominate fully-connected counterparts in generalization performance, especially on image classification tasks. This is often explained in terms of “better inductive bias.” However, this has not been made mathematically rigorous, and the hurdle is that the sufficiently wide fully-connected net can always simulate the convolutional net. Thus the training algorithm plays a role. The current work describes a natural task on which a provable sample complexity gap can be shown, for standard training algorithms. We construct a single natural distribution on Rd × {±1} on which any orthogonal-invariant algorithm (i.e. fully-connected networks trained with most gradient-based methods from gaussian initialization) requires Ω(d2) samples to generalize while O(1) samples suffice for convolutional architectures. Furthermore, we demonstrate a single target function, learning which on all possible distributions leads to an O(1) vs Ω(d2/ε) gap. The proof relies on the fact that SGD on fully-connected network is orthogonal equivariant. Similar results are achieved for ℓ2 regression and adaptive training algorithms, e.g. Adam and AdaGrad, which are only permutation equivariant.

Original languageAmerican English
StatePublished - 2021
Event9th International Conference on Learning Representations, ICLR 2021 - Virtual, Online
Duration: May 3 2021May 7 2021

Conference

Conference9th International Conference on Learning Representations, ICLR 2021
CityVirtual, Online
Period5/3/215/7/21

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science Applications
  • Education
  • Linguistics and Language

Cite this