AI Infrastructure

Auto Added by WPeMatico

Databricks Open-Sources Omnigent: A Meta-Harness That Composes, Governs, and Shares AI Agents Across Claude Code, Codex, and Pi

agentic ai, ai, AI (Artificial Intelligence), AI Agents, AI Infrastructure, AI Shorts, Applications, Artificial Intelligence, Editors Pick, New Releases, Open Source, Python, Software engineering, Staff, Tech News, Technology

Databricks released Omnigent, an open source ‘meta-harness’ for AI agents. The project ships under the Apache 2.0 license. The Databricks AI team built it with Neon. A harness is the wrapper around a model that turns it into an agent. Claude Code, Codex, and Pi are harnesses. Omnigent sits one level above them. It treats […]

Databricks Open-Sources Omnigent: A Meta-Harness That Composes, Governs, and Shares AI Agents Across Claude Code, Codex, and Pi Read More »

Nous Research Ships Hermes Agent Profile Builder: Identity, Model, Skills, and MCP Servers in One Dashboard Flow

agentic ai, ai, AI (Artificial Intelligence), AI Agents, AI Infrastructure, Applications, Artificial Intelligence, Editors Pick, Model Context Protocol (MCP), New Releases, Software engineering, Staff, Tech News

Nous Research has shipped a Profile Builder for Hermes Agent. It lives inside the project’s local web dashboard. Standing up a distinct agent used to mean several CLI steps. The builder now walks you through one guided flow. In that flow you define an agent’s identity. You pick a model and provider. You choose built-in

Nous Research Ships Hermes Agent Profile Builder: Identity, Model, Skills, and MCP Servers in One Dashboard Flow Read More »

Meet ‘North Mini Code’: Cohere’s 30B Open-Weight Mixture-of-Experts Model With 3B Active Parameters for Agentic Coding

agentic ai, ai, AI (Artificial Intelligence), AI Infrastructure, Artificial Intelligence, Editors Pick, Machine Learning, New Releases, Open Source, Software engineering, Staff, Tech News, Technology

This week, Cohere AI team shipped its first developer-facing coding model named ‘North Mini Code‘. ‘North Mini Code’ is open-weight and focused at software engineers. It is a mixture-of-experts (MoE) model with 30B total parameters. Only 3B of those parameters activate per token. The release is positioned around “sovereign” AI. The idea is simple: run

Meet ‘North Mini Code’: Cohere’s 30B Open-Weight Mixture-of-Experts Model With 3B Active Parameters for Agentic Coding Read More »

Google AI Releases DiffusionGemma, a 26B MoE Open Model Using Text Diffusion for Up to 4x Faster Generation

ai, AI (Artificial Intelligence), AI Infrastructure, AI Shorts, Applications, Artificial Intelligence, Editors Pick, Language Model, Large Language Model, Machine Learning, New Releases, Open Source, Software engineering, Staff, Tech News, Technology, Uncategorized

Google AI team including the Google DeepMind researchers have just released DiffusionGemma, an experimental open model for text generation. It uses text diffusion instead of standard autoregressive decoding. The model ships under a permissive Apache 2.0 license. Google positions it for devs and researchers exploring speed-critical, interactive local workflows. Examples include in-line editing, rapid iteration,

Google AI Releases DiffusionGemma, a 26B MoE Open Model Using Text Diffusion for Up to 4x Faster Generation Read More »

NVIDIA cuTile Python Tutorial: Building Tiled GPU Kernels for Vector Addition, Matrix Addition, and Matrix Multiplication in Colab

ai, AI (Artificial Intelligence), AI Infrastructure, Applications, Artificial Intelligence, Editors Pick, Staff, Technology, Tutorials

In this tutorial, we implement an advanced hands-on workflow for NVIDIA cuTile Python, a tile-based GPU programming interface for writing efficient CUDA-style kernels directly in Python. We start by preparing a Colab-friendly environment, checking the available GPU, driver, CUDA, and cuTile installations before running any kernel code. We then build tiled examples for vector addition,

NVIDIA cuTile Python Tutorial: Building Tiled GPU Kernels for Vector Addition, Matrix Addition, and Matrix Multiplication in Colab Read More »

A New Study from Harvard and Perplexity Finds AI Agents Perform 26 Minutes of Autonomous Work per Session vs 33 Seconds for Search

agentic ai, ai, AI (Artificial Intelligence), AI Infrastructure, AI Paper Summary, AI Shorts, Applications, Artificial Intelligence, Editors Pick, Software engineering, Staff, Tech News, Technology, Uncategorized

A new working research from Perplexity and Harvard offers field evidence on what AI agents do to knowledge work. It draws on production data from two Perplexity products: Search and Computer. The setup is a natural comparison. Search is a conversational answer engine. Computer is an agent that plans and executes tasks end to end.

A New Study from Harvard and Perplexity Finds AI Agents Perform 26 Minutes of Autonomous Work per Session vs 33 Seconds for Search Read More »

Xiaomi MiMo and TileRT Push a 1-Trillion-Parameter Model Past 1000 Tokens Per Second on Commodity GPUs

ai, AI (Artificial Intelligence), AI Infrastructure, AI Shorts, Applications, Artificial Intelligence, Editors Pick, Machine Learning, New Releases, Software engineering, Staff, Tech News, Technology, Uncategorized

Inference speed is becoming a competitive metric for large language models. Xiaomi’s MiMo team just released MiMo-V2.5-Pro-UltraSpeed, built in collaboration with the TileRT systems group. It decodes faster than 1000 tokens per second on a 1-trillion-parameter model. Xiaomi team describes this as a first at trillion-parameter scale. Demos show generation peaks near 1200 tokens per

Xiaomi MiMo and TileRT Push a 1-Trillion-Parameter Model Past 1000 Tokens Per Second on Commodity GPUs Read More »

Meet Harness-1: A 20B Retrieval Subagent Trained With Reinforcement Learning Inside a Stateful Search Harness on gpt-oss-20b

agentic ai, ai, AI (Artificial Intelligence), AI Agents, AI Infrastructure, AI Paper Summary, AI Shorts, Applications, Artificial Intelligence, Editors Pick, New Releases, Open Source, Software engineering, Staff, Tech News, Technology

Most search agents are trained as policies over a growing transcript. The model decides how to search. It must also remember what it saw, which evidence matters, and which claims it checked. A team of researchers from University of Illinois Urbana-Champaign, UC Berkeley, and Chroma argues this asks too much. Reinforcement learning ends up optimizing

Meet Harness-1: A 20B Retrieval Subagent Trained With Reinforcement Learning Inside a Stateful Search Harness on gpt-oss-20b Read More »

NVIDIA garak Tutorial: Build a Complete Defensive LLM Red-Teaming Workflow with Custom Probes and Detectors

agentic ai, ai, AI (Artificial Intelligence), AI Infrastructure, Applications, Artificial Intelligence, Editors Pick, Security, Software engineering, Staff, Technology

In this tutorial, we analyze NVIDIA garak as a practical framework for defensive LLM red-teaming. We start by setting up Garak, then move through plugin discovery, dry runs, real-model scans, multi-probe evaluations, report analysis, custom probe creation, custom detector creation, and AVID export. Instead of running only a single scan, we use Garak end-to-end to

NVIDIA garak Tutorial: Build a Complete Defensive LLM Red-Teaming Workflow with Custom Probes and Detectors Read More »

Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory

agentic ai, ai, AI (Artificial Intelligence), AI Infrastructure, AI Shorts, Applications, Artificial Intelligence, edge ai, Editors Pick, Language Model, Large Language Model, New Releases, Software engineering, Staff, Tech News, Technology

Google DeepMind released Quantization-Aware Training (QAT) checkpoints for the Gemma 4 family. The release targets local deployment on edge devices and consumer GPUs. It follows the Gemma 4 launch in April and a 12B model two days earlier. We compared the available Gemma 4 edge-model formats using only published numbers. The goal was simple. Show

Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory Read More »