hi@aiweekly.co.in

MiniMax Sparse Attention (MSA): a Two-Branch Block-Sparse Attention Trained on a 109B-Parameter MoE With a 3T-Token Budget

MiniMax released MSA (MiniMax Sparse Attention), a sparse attention method built directly on Grouped Query Attention (GQA). It targets one bottleneck: the quadratic cost of softmax attention at long context. The MiniMax research team tested it inside a 109B-parameter Mixture-of-Experts model trained with native multimodal data. They also open-sourced an inference kernel and shipped a

MiniMax Sparse Attention (MSA): a Two-Branch Block-Sparse Attention Trained on a 109B-Parameter MoE With a 3T-Token Budget Read More »

From leaked question papers to piracy and CSAM – why is Telegram always in trouble in one country or another?

While this is the first time Telegram has faced such a restriction in India, the platform launched by Russia-born Pavel Durov in August 2013 has come under bans or other punitive actions in several countries in the past.

From leaked question papers to piracy and CSAM – why is Telegram always in trouble in one country or another? Read More »