











Code [GitHub] 
Paper [arXiv] 
Cite [BibTeX] 
Insight 1: The volume of lowrank parameters increases as a function of the number of layers.
There exists more probability mass for lowerrank rank solutions when adding more layers. The effective rank is computed on the effective weights for linear networks and on the kernel for nonlinear networks.
Insight 2: The parameterization of the network ultimately determines which solution the model will converge to.
In a lowrank underdetermined regime, models with the same training error result in different testerror. Too shallow or too deep networks performs suboptimally. On the contrary, if the underlying solution is fullrank, deep models fail to converge.
Insight 3: Linear overparameterization of nonlinear networks can be used to improve generalization performance.
Linear overparameterization induces lowrank weights without increasing the modeling capacity. The figure below shows the singular values of a CNN for both the original (left) and the linearly overparameterized (right) model throughout training. The overparameterized model exhibits less overfitting, with lower training accuracy and higher testing accuracy.
>>> git clone https://github.com/minyoungg/overparam
>>> cd overparam
>>> pip install .
Integrate to your existing PyTorch code base
from overparam import OverparamLinear, OverparamConv2d
# overparameterized nn.Linear layer
layer = OverparamLinear(32, 32, depth=4)
# overparameterized nn.Conv2d layer (3 layers with 3x3, 3x3, 1x1 kernels)
layer = OverparamConv2d(32, 64, kernel_sizes=(3, 3, 1), stride=1, padding=1)
Automatically linear overparameterize existing models
import torchvision.models as models
from overparam.utils import overparameterize
model = models.alexnet()
model = overparameterize(model, depth=2)
We would like to thank Anurag Ajay, Lucy Chai, Tongzhou Wang, and YenChen Lin for reading over the manuscript and Jeffrey Pennington and Alexei A. Efros for fruitful discussions.
Minyoung Huh is funded by DARPA Machine Common Sense and MIT STL. Brian Cheung is funded by an MIT BCS Fellowship.
This research was also partly sponsored by the United States Air Force Research Laboratory and the United States Air Force Artificial Intelligence Accelerator and was accomplished under Cooperative Agreement Number FA87501921000.
The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the United States Air Force or the U.S. Government.
The U.S. Government is authorized to reproduce and distribute reprints for Government purposes, notwithstanding any copyright notation herein.
Website template edited from Colorful Colorization.