AI Paper Summary

Auto Added by WPeMatico

NVIDIA Introduces SANA-WM: A 2.6B-Parameter Open-Source World Model That Generates Minute-Scale 720p Video on a Single GPU

World models (systems that synthesize realistic video sequences from an initial image and a set of actions) are becoming central to embodied AI, simulation, and robotics research. The core challenge is scaling these systems to generate minute-long, high-resolution video without requiring prohibitively large clusters for both training and inference. Most competitive open-source baselines either require […]

NVIDIA Introduces SANA-WM: A 2.6B-Parameter Open-Source World Model That Generates Minute-Scale 720p Video on a Single GPU Read More »

Nous Research Releases Token Superposition Training to Speed Up LLM Pre-Training by Up to 2.5x Across 270M to 10B Parameter Models

Pre-training large language models is expensive enough that even modest efficiency improvements can translate into meaningful cost and time savings. Nous Research is releasing Token Superposition Training (TST), a method that substantially reduces pre-training wall-clock time at fixed compute without touching the model architecture, optimizer, tokenizer, parallelism strategy, or training data. At the 10B-A1B mixture-of-experts

Nous Research Releases Token Superposition Training to Speed Up LLM Pre-Training by Up to 2.5x Across 270M to 10B Parameter Models Read More »

Fastino Labs Open-Sources GLiGuard: A 300M Parameter Safety Moderation Model That Matches or Exceeds Accuracy of Models 23–90x Its Size

As LLM-powered applications move into production — and as AI agents take on more consequential tasks like browsing the web, writing and executing code, and interacting with external services — safety moderation has quietly become one of the most operationally expensive parts of the stack. Most developers who’ve deployed a production LLM system know the

Fastino Labs Open-Sources GLiGuard: A 300M Parameter Safety Moderation Model That Matches or Exceeds Accuracy of Models 23–90x Its Size Read More »

▶

Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon

Researchers at Tilde Research have released Aurora, a new optimizer for training neural networks that addresses a structural flaw in the widely-used Muon optimizer. The flaw quietly kills off a significant fraction of MLP neurons during training and keeps them permanently dead. Aurora comes with a 1.1B parameter pretraining experiment, a new state-of-the-art result on

Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon Read More »

Meta and Stanford Researchers Propose Fast Byte Latent Transformer That Reduces Inference Memory Bandwidth by Over 50% Without Tokenization

A team of researchers from Meta, Stanford University, and the University of Washington have introduced three new methods that substantially accelerate generation in the Byte Latent Transformer (BLT) — a language model architecture that operates directly on raw bytes instead of tokens. Byte-Level Models Are Slow at Inference To understand what this new research solves,

Meta and Stanford Researchers Propose Fast Byte Latent Transformer That Reduces Inference Memory Bandwidth by Over 50% Without Tokenization Read More »

Sakana AI and NVIDIA Introduce TwELL with CUDA Kernels for 20.5% Inference and 21.9% Training Speedup in LLMs

Scaling large language models (LLMs) is expensive. Every token processed during inference and every gradient computed during training flows through feedforward layers that account for over two-thirds of model parameters and more than 80% of total FLOPs in larger models. A team researchers from Sakana AI and NVIDIA have worked on a new research that

Sakana AI and NVIDIA Introduce TwELL with CUDA Kernels for 20.5% Inference and 21.9% Training Speedup in LLMs Read More »

NVIDIA AI Releases Star Elastic: One Checkpoint that Contains 30B, 23B, and 12B Reasoning Models with Zero-Shot Slicing

Training a family of large language models (LLMs) has always come with a painful multiplier: every model variant in the family—whether 8B, 30B, or 70B—typically requires its own full training run, its own storage, and its own deployment stack. For a dev team running inference at scale, this means multiplying compute costs by the number

NVIDIA AI Releases Star Elastic: One Checkpoint that Contains 30B, 23B, and 12B Reasoning Models with Zero-Shot Slicing Read More »

Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo

If you’ve ever watched a motion capture system struggle with a person’s fingers, or seen a segmentation model fail to distinguish teeth from gums, you already understand why human-centric computer vision is hard. Humans are not just objects, they come with articulated structure, fine surface details, and enormous variation in pose, clothing, lighting, and ethnicity.

Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo Read More »

DeepSeek AI Releases DeepSeek-V4: Compressed Sparse Attention and Heavily Compressed Attention Enable One-Million-Token Contexts

DeepSeek-AI has released a preview version of the DeepSeek-V4 series: two Mixture-of-Experts (MoE) language models built around one core challenge making one-million-token context windows practical and affordable at inference time. The series consists of DeepSeek-V4-Pro, with 1.6T total parameters and 49B activated per token, and DeepSeek-V4-Flash, with 284B total parameters and 13B activated per token.

DeepSeek AI Releases DeepSeek-V4: Compressed Sparse Attention and Heavily Compressed Attention Enable One-Million-Token Contexts Read More »

Google Cloud AI Research Introduces ReasoningBank: A Memory Framework that Distills Reasoning Strategies from Agent Successes and Failures

Most AI agents today have a fundamental amnesia problem. Deploy one to browse the web, resolve GitHub issues, or navigate a shopping platform, and it approaches every single task as if it has never seen anything like it before. No matter how many times it has stumbled on the same type of problem, it repeats

Google Cloud AI Research Introduces ReasoningBank: A Memory Framework that Distills Reasoning Strategies from Agent Successes and Failures Read More »