TL;DR: We give some generalization error bounds of noisy gradient methods such as SGLD, Langevin dynamics, noisy momentum and so forth.
Abstract: Generalization error (also known as the out-of-sample error) measures how well the hypothesis learned from training data generalizes to previously unseen data. Proving tight generalization error bounds is a central question in statistical learning  theory.   In  this  paper,  we  obtain  generalization  error  bounds  for  learning general  non-convex  objectives,  which  has  attracted  significant  attention  in  recent years.   We develop a new framework,  termed Bayes-Stability,  for proving algorithm-dependent generalization error bounds.  The new framework combines ideas from both the PAC-Bayesian theory and the notion of algorithmic stability.  Applying the Bayes-Stability method, we obtain new data-dependent generalization bounds for stochastic gradient Langevin dynamics (SGLD) and several other noisy gradient methods (e.g., with momentum, mini-batch and acceleration, Entropy-SGD). Our result recovers (and is typically tighter than) a recent result in Mou et al. (2018) and improves upon the results in Pensia et al. (2018).  Our experiments demonstrate that our data-dependent bounds can distinguish randomly labelled data from normal data, which provides an explanation to the intriguing phenomena observed in Zhang et al. (2017a). We also study the setting where the total loss is the sum of a bounded loss and an additiona l`2 regularization term. We obtain new generalization bounds for the continuous Langevin dynamic in this setting by developing a new Log-Sobolev inequality for the parameter distribution at any time. Our new bounds are more desirable when the noise level of the processis not very small, and do not become vacuous even when T tends to infinity.
Keywords: learning theory, generalization, nonconvex learning, stochastic gradient descent, Langevin dynamics
Data: [MNIST](https://paperswithcode.com/dataset/mnist)
Original Pdf:  pdf
16 Replies
Loading