Why Gradient Descent Zigzags and How Momentum Fixes It
Gradient descent has a fundamental limitation: on most real-world loss surfaces, it is inefficient. When the surface has uneven curvature—steep in one direction and flat in another, which is common in practice—the algorithm struggles to make consistent progress. A high learning rate helps move faster along the flat direction but causes overshooting and oscillations along […]
Why Gradient Descent Zigzags and How Momentum Fixes It Read More »
