Deep Learning Dynamics at Edge of Stability

Abstract

This work analyzes why deep networks trained with large learning rates often oscillate or fail to converge. We focus on the “Edge of Stability” regime, where the training dynamics transition to chaos. Using deep matrix factorization as a testbed, we show how loss oscillations arise in a low-dimensional space and explain their connection to learning rate and network depth. The results shed light on when and why instability emerges during training.

References

A. Ghosh, S. M. Kwon, R. Wang, S. Ravishankar, and Q. Qu, "Learning Dynamics of Deep Matrix Factorization Beyond the Edge of Stability," in International Conference on Learning Representations (ICLR) 2025.