Tech News

Auto Added by WPeMatico

OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric

Most biology benchmarks ask narrow, fact-based questions with clean answers. Scientists weigh imperfect evidence and make decisions. OpenAI released LifeSciBench and it targets that gap directly. Even the strongest model passes roughly one task in three. The benchmark is far from saturated. What is LifeSciBench LifeSciBench contains 750 expert-authored tasks. They span seven workflows and […]

OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric Read More »

Vercel Releases Eve: An Open-Source AI Agent Framework Where Each Agent is a Directory of Files Mapped to Capabilities

Vercel has released eve, an open-source framework for building, running, and scaling agents. The project is published as the npm package eve, licensed under Apache-2.0. Building an agent should mean defining what it does. It should not mean assembling all the plumbing that an agent needs to run in production. eve is the framework Vercel

Vercel Releases Eve: An Open-Source AI Agent Framework Where Each Agent is a Directory of Files Mapped to Capabilities Read More »

MiniMax Sparse Attention (MSA): a Two-Branch Block-Sparse Attention Trained on a 109B-Parameter MoE With a 3T-Token Budget

MiniMax released MSA (MiniMax Sparse Attention), a sparse attention method built directly on Grouped Query Attention (GQA). It targets one bottleneck: the quadratic cost of softmax attention at long context. The MiniMax research team tested it inside a 109B-parameter Mixture-of-Experts model trained with native multimodal data. They also open-sourced an inference kernel and shipped a

MiniMax Sparse Attention (MSA): a Two-Branch Block-Sparse Attention Trained on a 109B-Parameter MoE With a 3T-Token Budget Read More »

OpenAI’s Deployment Simulation Extends Pre-Deployment Risk Assessment to Agentic Coding Through Simulated Tool Calls

OpenAI published a new pre-deployment safety method called Deployment Simulation. The idea is direct. Before a model ships, simulate its deployment first. Replay past conversations through the new candidate model. Then study how it behaves in realistic contexts. OpenAI already uses insights from the method during model development. It has informed mitigations and deployment decisions,

OpenAI’s Deployment Simulation Extends Pre-Deployment Risk Assessment to Agentic Coding Through Simulated Tool Calls Read More »

🪽

Hermes Agent Adds Asynchronous Subagents, So Delegated Work No Longer Blocks the Parent Chat

Nous Research has shipped a change to Hermes Agent. Its delegate tool can now run subagents asynchronously. Per the announcement, delegated work no longer blocks the parent chat. Hermes Agent is an open-source personal agent from Nous Research. A parent agent can spawn child agents, called subagents, to fan out work. Until now, that delegation

Hermes Agent Adds Asynchronous Subagents, So Delegated Work No Longer Blocks the Parent Chat Read More »

Meet Atoms: A Vibe Coding Tool That Uses AI Agents to Build, Deploy, and Market Your App (No Code)

The concept of vibe coding is interesting; you don’t need to be a developer or software engineer to build your own applications. You can describe your idea to an AI in plain language, and it will build, edit, and refine your applications so you don’t have to write code line by line. It sounds simple

Meet Atoms: A Vibe Coding Tool That Uses AI Agents to Build, Deploy, and Market Your App (No Code) Read More »

Google Cloud Introduces Open Knowledge Format (OKF): A Vendor-Neutral Markdown Spec for Giving AI Agents Curated Context

Foundation models keep getting stronger, yet they still stall on the same thing: context. A model can write code or analyze a dataset, but only with the right internal knowledge. That knowledge includes table schemas, metric definitions, runbooks, join paths and it lives scattered across catalogs, wikis, and a few senior engineers’ heads. Google Cloud

Google Cloud Introduces Open Knowledge Format (OKF): A Vendor-Neutral Markdown Spec for Giving AI Agents Curated Context Read More »

Sakana AI Commercializes AB-MCTS in Sakana Marlin, an Enterprise Agent Generating Up to 100-Page Research Reports With Slides

Tokyo-based Sakana AI shipped its first commercial product ‘Sakana Marlin’ this week. Sakana team positions it as a Virtual CSO (Chief Strategy Officer). It is a B2B autonomous research agent built for enterprises. Marlin does not answer in seconds like a chatbot. You give it one research topic. It then runs autonomously for up to

Sakana AI Commercializes AB-MCTS in Sakana Marlin, an Enterprise Agent Generating Up to 100-Page Research Reports With Slides Read More »

Meet Flash-KMeans: An IO-Aware, Exact K-Means That Runs Over 200× Faster Than FAISS on GPUs

k-means has been an offline tool for decades. You run it once to preprocess data, then move on. A team of researchers from UC Berkeley and UT Austin released Flash-KMeans, a new open-source library that targets a different setting. Modern AI pipelines now call k-means inside training and inference loops. At that frequency, latency per

Meet Flash-KMeans: An IO-Aware, Exact K-Means That Runs Over 200× Faster Than FAISS on GPUs Read More »

Z.ai Launches GLM-5.2 With a Usable 1M-Token Context, Two Thinking-Effort Levels, and No Benchmarks at Launch

GLM-5.2 is the latest large language model from Z.ai, becoming the third major release in the GLM-5 line. It follows GLM-5 (February 11), GLM-5-Turbo (March 15), and GLM-5.1 (April 7). That makes four flagship-tier coding releases in roughly four months. Usable 1M-Token Context Window GLM-5.2’s standout spec is a 1,000,000-token context window. Z.ai labels the

Z.ai Launches GLM-5.2 With a Usable 1M-Token Context, Two Thinking-Effort Levels, and No Benchmarks at Launch Read More »