Parallax: A Parameterized Local Linear Attention That Keeps Softmax and Adds a Learned Covariance Correction Branch
The Transformer’s attention mechanism has barely changed since 2017. Most efficiency work has tried to replace softmax attention outright. A new paper takes a different route. It keeps softmax attention and bolts on a correction branch. A team of researchers from Northwestern University, Tilde Research, and University of Washington introduce a parameterized Local Linear Attention […]










