New Releases

Auto Added by WPeMatico

Meta and Stanford Researchers Propose Fast Byte Latent Transformer That Reduces Inference Memory Bandwidth by Over 50% Without Tokenization

A team of researchers from Meta, Stanford University, and the University of Washington have introduced three new methods that substantially accelerate generation in the Byte Latent Transformer (BLT) — a language model architecture that operates directly on raw bytes instead of tokens. Byte-Level Models Are Slow at Inference To understand what this new research solves, […]

Meta and Stanford Researchers Propose Fast Byte Latent Transformer That Reduces Inference Memory Bandwidth by Over 50% Without Tokenization Read More »

Sakana AI and NVIDIA Introduce TwELL with CUDA Kernels for 20.5% Inference and 21.9% Training Speedup in LLMs

Scaling large language models (LLMs) is expensive. Every token processed during inference and every gradient computed during training flows through feedforward layers that account for over two-thirds of model parameters and more than 80% of total FLOPs in larger models. A team researchers from Sakana AI and NVIDIA have worked on a new research that

Sakana AI and NVIDIA Introduce TwELL with CUDA Kernels for 20.5% Inference and 21.9% Training Speedup in LLMs Read More »

NVIDIA AI Just Released cuda-oxide: An Experimental Rust-to-CUDA Compiler Backend that Compiles SIMT GPU Kernels Directly to PTX

NVIDIA AI researchers recently released cuda-oxide, an experimental compiler that allows developers to write CUDA SIMT (Single Instruction, Multiple Threads) GPU kernels in standard Rust code. The project compiles Rust directly to PTX (Parallel Thread Execution) — the assembly-like intermediate representation that CUDA uses to target NVIDIA GPUs — without requiring domain-specific languages, foreign function

NVIDIA AI Just Released cuda-oxide: An Experimental Rust-to-CUDA Compiler Backend that Compiles SIMT GPU Kernels Directly to PTX Read More »

NVIDIA AI Releases Star Elastic: One Checkpoint that Contains 30B, 23B, and 12B Reasoning Models with Zero-Shot Slicing

Training a family of large language models (LLMs) has always come with a painful multiplier: every model variant in the family—whether 8B, 30B, or 70B—typically requires its own full training run, its own storage, and its own deployment stack. For a dev team running inference at scale, this means multiplying compute costs by the number

NVIDIA AI Releases Star Elastic: One Checkpoint that Contains 30B, 23B, and 12B Reasoning Models with Zero-Shot Slicing Read More »

💡

Meet GitHub Spec-Kit: An Open Source Toolkit for Spec-Driven Development with AI Coding Agents

If you have spent time using AI coding agents — GitHub Copilot, Claude Code, Gemini CLI — you have probably run into this situation: you describe what you want, the agent generates a block of code that looks correct, compiles, and then subtly misses the actual intent. This “vibe-coding” approach can work for quick prototypes

Meet GitHub Spec-Kit: An Open Source Toolkit for Spec-Driven Development with AI Coding Agents Read More »

⚠

OpenAI Adds Chrome Extension to Codex, Letting Its AI Agent Access LinkedIn, Salesforce, Gmail, and Internal Tools via Signed-In Sessions

OpenAI has launched a Codex Chrome extension for Mac and PC to streamline browser-based workflows that were previously difficult to handle via APIs or plugins. This release follows a trend where most users preferred working in a browser after the launch of “Computer Use,” allowing Codex to operate more effectively across various web-based tasks. What

OpenAI Adds Chrome Extension to Codex, Letting Its AI Agent Access LinkedIn, Salesforce, Gmail, and Internal Tools via Signed-In Sessions Read More »

Meta FAIR Releases NeuralSet: A Python Package for Neuro-AI That Supports fMRI, M/EEG, Spikes, and HuggingFace Embeddings

Researchers at Meta’s FAIR lab have released NeuralSet, a Python framework designed to eliminate one of the most persistent bottlenecks in Neuro-AI research: the painful, fragmented process of getting brain data into a deep learning pipeline. https://kingjr.github.io/files/neuralset.pdf The Problem: Neuroscience Data Is Stuck in the Pre-Deep-Learning Era Neuroscience already has excellent, battle-tested software. Tools like

Meta FAIR Releases NeuralSet: A Python Package for Neuro-AI That Supports fMRI, M/EEG, Spikes, and HuggingFace Embeddings Read More »

smol-audio: A Colab-Friendly Notebook Collection for Fine-Tuning Whisper, Parakeet, Voxtral, Granite Speech, and Audio Flamingo 3

Audio AI has had a breakout year. Automatic speech recognition has gotten dramatically better with models like OpenAI’s Whisper variants, NVIDIA’s Parakeet, and Mistral’s Voxtral. Audio understanding stepped forward with models like NVIDIA’s Audio Flamingo 3. Dialogue-grade text-to-speech arrived via Nari Labs’ Dia-1.6B. And Meta shipped the Perception Encoder Audiovisual (PE-AV), a multimodal encoder capable

smol-audio: A Colab-Friendly Notebook Collection for Fine-Tuning Whisper, Parakeet, Voxtral, Granite Speech, and Audio Flamingo 3 Read More »

Poolside AI Introduces Laguna XS.2 and M.1: Agentic Coding Models Reaching 68.2% and 72.5% on SWE-bench Verified

Poolside AI released the first two models in its Laguna family: Laguna M.1 and Laguna XS.2. Alongside these, the company is releasing pool — a lightweight terminal-based coding agent and a dual Agent Client Protocol (ACP) client-server — the same environment Poolside uses internally for agent RL training and evaluation, now available as a research

Poolside AI Introduces Laguna XS.2 and M.1: Agentic Coding Models Reaching 68.2% and 72.5% on SWE-bench Verified Read More »

OpenAI Releases Privacy Filter: A 1.5B-Parameter Open-Source PII Redaction Model with 50M Active Parameters

OpenAI just quietly dropped something worth paying close attention to. Released on Hugging Face under an Apache 2.0 license, Privacy Filter is an open, bidirectional token-classification model purpose-built for detecting and redacting personally identifiable information (PII) in text. It is small enough to run in a web browser or on a laptop and fast enough

OpenAI Releases Privacy Filter: A 1.5B-Parameter Open-Source PII Redaction Model with 50M Active Parameters Read More »