Samir ADLY : From Cauchy to Nesterov: why do some descents go (much) faster than others?
Why is classical gradient descent sometimes so painfully slow, while small modifications to the algorithm seem, as if by magic, to greatly accelerate convergence? In this talk, I will take you on a journey that begins with Cauchy in 1847, with the first systematic formulation of the gradient method, and ends with modern accelerated methods such as Nesterov's.
The common thread will always be the same: behind every first-order algorithm lies a differential equation. By reading gradient methods as well-chosen discretizations of continuous dynamics, we can better understand where the acceleration comes from, what exactly the role of the parameters is, and why some schemes are stable and effective while others behave poorly.
I will present this continuous/discrete approach using simple examples, focusing more on the ideas and phenomena than on the technical details of the proofs. Finally, I will discuss some extensions to non-smooth models and problems arising in imaging and learning, as well as several open questions about the true limits of acceleration. The talk is intended for a broad audience of mathematicians, computer scientists, and engineers familiar with the basic concepts of optimization and ordinary differential equations.
