Tech News

Auto Added by WPeMatico

Parallax: A Parameterized Local Linear Attention That Keeps Softmax and Adds a Learned Covariance Correction Branch

The Transformer’s attention mechanism has barely changed since 2017. Most efficiency work has tried to replace softmax attention outright. A new paper takes a different route. It keeps softmax attention and bolts on a correction branch. A team of researchers from Northwestern University, Tilde Research, and University of Washington introduce a parameterized Local Linear Attention […]

Parallax: A Parameterized Local Linear Attention That Keeps Softmax and Adds a Learned Covariance Correction Branch Read More »

Trajectory Releases a Concurrent Multi-LoRA Training Stack for Continual Learning, Reporting a 2.81× Experiment-Throughput Gain

Trajectory’s concurrent multi-LoRA stack reports a 2.81× experiment-throughput gain over single-tenant RL, with all code in the NovaSky-AI/SkyRL GitHub repository. Most language models improve in discontinuous jumps. A team collects data, trains, and ships a new version. This takes months and produces remarkable or catastrophic behavior for users. Trajectory wants to replace that cycle with

Trajectory Releases a Concurrent Multi-LoRA Training Stack for Continual Learning, Reporting a 2.81× Experiment-Throughput Gain Read More »

Genesis AI Releases Nyx, Quadrants, and Genesis World 1.0 Physics Platform for Scalable Robotics Foundation Model Evaluation

Genesis AI released Genesis World 1.0. The platform consists of four components: the Genesis World physics engine, Nyx (a real-time path-traced renderer), Quadrants (a Python-to-GPU compiler), and a simulation interface. It is designed to accelerate robotics foundation model development through simulation-based evaluation. Robotics model development has two bottlenecks: data and iteration speed. The field has

Genesis AI Releases Nyx, Quadrants, and Genesis World 1.0 Physics Platform for Scalable Robotics Foundation Model Evaluation Read More »

Hermes Agent Ships Tool Search for MCP: Anthropic Evals Show 49% to 74% Accuracy Gain on Opus 4

Nous Research’s open-source Hermes Agent now ships a Tool Search feature. It directly addresses a growing bottleneck in AI agent systems: too many MCP tools filling up the context window. In this explainer article, we will breaks down what Tool Search does, how it works, and when to use it. The Problem: MCP Tools Are

Hermes Agent Ships Tool Search for MCP: Anthropic Evals Show 49% to 74% Accuracy Gain on Opus 4 Read More »

NVIDIA Introduces X-Token: Projection-Guided Cross-Tokenizer KD That Outperforms GOLD by +3.82 Average Points on Llama-3.2-1B

Knowledge distillation (KD) transfers “dark knowledge” from a large teacher model to a smaller student. The student learns from the teacher’s full output probability distribution over tokens, not just correct answers. This is done via per-position Kullback–Leibler (KL) divergence over next-token probability distributions. This formulation requires a shared tokenizer. A practitioner committed to Llama-3.2-1B cannot

NVIDIA Introduces X-Token: Projection-Guided Cross-Tokenizer KD That Outperforms GOLD by +3.82 Average Points on Llama-3.2-1B Read More »

StepFun Releases Step 3.7 Flash: A 198B MoE Vision-Language Model for Coding Agents and Search Workflows

StepFun today released Step 3.7 Flash, a multimodal Mixture-of-Experts model targeting agentic use cases. It adds native vision input and improved tool-use reliability over Step 3.5 Flash. What is Step 3.7 Flash? Step 3.7 Flash is a 198B-parameter sparse Mixture-of-Experts (MoE) vision-language model. It pairs a 196B-parameter language backbone with a 1.8B-parameter vision encoder (ViT)

StepFun Releases Step 3.7 Flash: A 198B MoE Vision-Language Model for Coding Agents and Search Workflows Read More »

⚡

Meet mKernel: A Multi-GPU, Multi-Node Fused Kernel Library for GPU-Driven Communication

GPU communication overhead is a measurable bottleneck in production AI workloads. According to data cited by the mKernel project, communication can consume 43.6% of the forward pass and 32% of end-to-end training time. Across popular Mixture-of-Experts (MoE) models, inter-device communication can account for up to 47% of total execution time. Researchers from UC Berkeley’s UCCL

Meet mKernel: A Multi-GPU, Multi-Node Fused Kernel Library for GPU-Driven Communication Read More »

Hexo Labs Open-Sources SIA: A Self-Improving Agent That Updates Both the Harness and the Model Weights

Most AI agents stop improving once a human stops tuning them. The model is fixed. The scaffold around it is fixed. Hexo Labs wants to move both at once. It released SIA (Self-Improving AI) this week as an open-source framework under an MIT license. The core claim of this research is narrow but concrete. SIA

Hexo Labs Open-Sources SIA: A Self-Improving Agent That Updates Both the Harness and the Model Weights Read More »

Liquid AI Releases LFM2.5-8B-A1B: An On-Device MoE Model With 8.3B Total and 1.5B Active Parameters

Liquid AI just shipped LFM2.5-8B-A1B. It is an on-device Mixture-of-Experts (MoE) model built for tool calling. The model holds 8.3B total parameters but activates only 1.5B per token. That sparsity is what lets it run on consumer hardware. The release follows LFM2-8B-A1B, which Liquid AI team published earlier. LFM2.5 is a new family of hybrid

Liquid AI Releases LFM2.5-8B-A1B: An On-Device MoE Model With 8.3B Total and 1.5B Active Parameters Read More »

Anthropic Ships Claude Opus 4.8 Alongside Dynamic Workflows and Cheaper Fast Mode, With Workflows Capped at 1,000 Subagents

Anthropic just launched Claude Opus 4.8. Also, there two Claude Code updates shipped with it. Dynamic workflows run many subagents in parallel. Fast mode now supports Opus 4.8 at a lower price. Both are research previews. What Dynamic Workflows Actually Are A dynamic workflow is a JavaScript script that orchestrates subagents at scale. Claude writes

Anthropic Ships Claude Opus 4.8 Alongside Dynamic Workflows and Cheaper Fast Mode, With Workflows Capped at 1,000 Subagents Read More »