Language Model

Auto Added by WPeMatico

A New NVIDIA Research Shows Speculative Decoding in NeMo RL Achieves 1.8× Rollout Generation Speedup at 8B and Projects 2.5× End-to-End Speedup at 235B

If you have been running reinforcement learning (RL) post-training on a language model for math reasoning, code generation, or any verifiable task, you have almost certainly stared at a progress bar while your GPU cluster burns through rollout generation. A team of researchers from NVIDIA proposes a precise fix by integrating speculative decoding into the […]

A New NVIDIA Research Shows Speculative Decoding in NeMo RL Achieves 1.8× Rollout Generation Speedup at 8B and Projects 2.5× End-to-End Speedup at 235B Read More »

Meta Introduces Autodata: An Agentic Framework That Turns AI Models into Autonomous Data Scientists for High-Quality Training Data Creation

The bottleneck in building better AI models has never been compute alone — it has always been data quality. Meta AI’s RAM (Reasoning, Alignment, and Memory) team is now addressing that bottleneck directly. Meta researchers have introduced Autodata, a framework that deploys AI agents in the role of an autonomous data scientist, tasked with iteratively

Meta Introduces Autodata: An Agentic Framework That Turns AI Models into Autonomous Data Scientists for High-Quality Training Data Creation Read More »

A Coding Guide on LLM Post Training with TRL from Supervised Fine Tuning to DPO and GRPO Reasoning

In this tutorial, we walk through a complete, hands-on journey of post-training large language models using the powerful TRL (Transformer Reinforcement Learning) library ecosystem. We start from a lightweight base model and progressively apply four key techniques: Supervised Fine-Tuning (SFT), Reward Modeling (RM), Direct Preference Optimization (DPO), and Group Relative Policy Optimization (GRPO). Also, we

A Coding Guide on LLM Post Training with TRL from Supervised Fine Tuning to DPO and GRPO Reasoning Read More »

A Coding Guide on LLM Post Training with TRL from Supervised Fine Tuning to DPO and GRPO Reasoning

In this tutorial, we walk through a complete, hands-on journey of post-training large language models using the powerful TRL (Transformer Reinforcement Learning) library ecosystem. We start from a lightweight base model and progressively apply four key techniques: Supervised Fine-Tuning (SFT), Reward Modeling (RM), Direct Preference Optimization (DPO), and Group Relative Policy Optimization (GRPO). Also, we

A Coding Guide on LLM Post Training with TRL from Supervised Fine Tuning to DPO and GRPO Reasoning Read More »

Qwen AI Releases Qwen-Scope: An Open-Source Sparse AutoEncoders (SAE) Suite That Turns LLM Internal Features into Practical Development Tools

Large language models are remarkably capable, yet frustratingly opaque. When a model misbehaves — generating responses in the wrong language, repeating itself endlessly, or refusing safe requests — AI devs have very few tools to diagnose why it happened at the level of internal computations. That’s the problem Qwen-Scope is built to solve. Qwen Team

Qwen AI Releases Qwen-Scope: An Open-Source Sparse AutoEncoders (SAE) Suite That Turns LLM Internal Features into Practical Development Tools Read More »

Moonshot AI Open-Sources FlashKDA: CUTLASS Kernels for Kimi Delta Attention with Variable-Length Batching and H20 Benchmarks

The team behind Kimi.ai (Moonshot AI) just made a significant contribution to the open-source AI infrastructure space. The research team has made a significant contribution to the open-source AI infrastructure space. They released FlashKDA (Flash Kimi Delta Attention), a high-performance CUTLASS-based kernel implementation of the Kimi Delta Attention (KDA) mechanism. The FlashKDA library is available

Moonshot AI Open-Sources FlashKDA: CUTLASS Kernels for Kimi Delta Attention with Variable-Length Batching and H20 Benchmarks Read More »

Microsoft Research’s World-R1 Uses Flow-GRPO and 3D-Aware Rewards to Inject Geometric Consistency Into Wan 2.1 Without Architectural Changes

Video foundation models can paint a beautiful frame. They are still notoriously bad at remembering it. Push the camera through a corridor in Wan 2.1 or CogVideoX and walls warp, objects morph, and details vanish — the giveaway that these models are fitting 2D pixel correlations rather than simulating a coherent 3D scene. A team

Microsoft Research’s World-R1 Uses Flow-GRPO and 3D-Aware Rewards to Inject Geometric Consistency Into Wan 2.1 Without Architectural Changes Read More »

IBM Releases Two Granite Speech 4.1 2B Models: Autoregressive ASR with Translation and Non-Autoregressive Editing for Fast Inference

IBM released two new open speech recognition models— Granite Speech 4.1 2B and Granite Speech 4.1 2B-NAR — and they make a compelling case for what a ~2B-parameter speech model can do. Both are available on Hugging Face under the Apache 2.0 license. The pair targets a specific problem that enterprise AI teams know well:

IBM Releases Two Granite Speech 4.1 2B Models: Autoregressive ASR with Translation and Non-Autoregressive Editing for Fast Inference Read More »

Top 10 KV Cache Compression Techniques for LLM Inference: Reducing Memory Overhead Across Eviction, Quantization, and Low-Rank Methods

As large language models scale to longer context windows and serve more concurrent users, the key-value (KV) cache has emerged as a primary memory bottleneck in production inference systems. For a 30-billion-parameter model with a batch size of 128 and an input length of 1,024 tokens, the resulting KV cache can occupy up to 180

Top 10 KV Cache Compression Techniques for LLM Inference: Reducing Memory Overhead Across Eviction, Quantization, and Low-Rank Methods Read More »

Meet Talkie-1930: A 13B Open-Weight LLM Trained on Pre-1931 English Text for Historical Reasoning and Generalization Research

What if a language model had never heard of the internet, smartphones, or even World War II? That’s not a hypothetical — it’s exactly what a team of researchers led by Nick Levine, David Duvenaud, and Alec Radford has built. They call it talkie, and it may be the most historically disciplined large language model

Meet Talkie-1930: A 13B Open-Weight LLM Trained on Pre-1931 English Text for Historical Reasoning and Generalization Research Read More »