AI Infrastructure

Auto Added by WPeMatico

Tencent Open-Sources TencentDB Agent Memory: A 4-Tier Local Memory Pipeline for AI Agents

agentic ai, ai, AI (Artificial Intelligence), AI Infrastructure, Applications, Artificial Intelligence, Editors Pick, For Devs, New Releases, Open Source, Software engineering, Staff, Tech News, Technology

Tencent has released TencentDB Agent Memory, an open-source memory system for AI agents. The project ships under the MIT license. It targets a problem familiar to anyone shipping long-horizon agents: context bloat and recall failure. It is symbolic short-term memory along with layered long-term memory. It integrates with OpenClaw as a plugin and with the […]

Tencent Open-Sources TencentDB Agent Memory: A 4-Tier Local Memory Pipeline for AI Agents Read More »

Nous Research Releases Contrastive Neuron Attribution (CNA): Sparse MLP Circuit Steering Without SAE Training or Weight Modification

ai, AI (Artificial Intelligence), AI Infrastructure, AI Paper Summary, AI Shorts, Applications, Artificial Intelligence, Editors Pick, Language Model, Machine Learning, New Releases, Python, Software engineering, Staff, Tech News, Technology

Instruction-tuned language models refuse harmful requests. But which part of the model is actually responsible — and how does that mechanism get installed during training? A new research from Nous Research team takes a neuron-level look at this question. The Nous research team developed contrastive neuron attribution (CNA), a method that identifies the specific MLP

Nous Research Releases Contrastive Neuron Attribution (CNA): Sparse MLP Circuit Steering Without SAE Training or Weight Modification Read More »

Perplexity Open-Sources Bumblebee: A Read-Only Supply-Chain Scanner for Developer Endpoints

agentic ai, ai, AI (Artificial Intelligence), AI Infrastructure, AI Shorts, Applications, Artificial Intelligence, Editors Pick, For Devs, Machine Learning, Model Context Protocol (MCP), New Releases, Open Source, Security, Software engineering, Staff, Tech News, Technology

Attackers increasingly target the packages, editor extensions, and AI tool configs on developer machines and not just production systems. Perplexity has open-sourced an internal tool it uses to address this problem. Perplexity released Bumblebee on GitHub. The tool is a read-only inventory collector for macOS and Linux developer endpoints. It is written entirely in Go

Perplexity Open-Sources Bumblebee: A Read-Only Supply-Chain Scanner for Developer Endpoints Read More »

What is a Forward Deployed Engineer: The AI Role OpenAI, Anthropic, and Google Are Hiring in 2026

ai, AI (Artificial Intelligence), AI Infrastructure, AI Shorts, Applications, Artificial Intelligence, Editors Pick, For Devs, Language Model, Software engineering, Staff, Tech News, Technology

What is a Forward Deployed Engineer? The term ‘Forward Deployed Engineer’ (FDE) sounds military. That is intentional. A Forward Deployed Engineer is a software engineer who works embedded with the customer’s technical and operational environment on-site, hybrid, remote, or inside a customer cloud or VPC, depending on the engagement. The FDE does not sit at

What is a Forward Deployed Engineer: The AI Role OpenAI, Anthropic, and Google Are Hiring in 2026 Read More »

Meet Turbovec: A Rust Vector Index with Python Bindings, and Built on Google’s TurboQuant Algorithm

ai, AI (Artificial Intelligence), AI Infrastructure, AI Shorts, Applications, Artificial Intelligence, Editors Pick, Machine Learning, New Releases, Open Source, Python, Software engineering, Staff, Tech News, Technology, Uncategorized, Vector Database

Vector search underpins most retrieval-augmented generation (RAG) pipelines. At scale, it gets expensive. Storing 10 million document embeddings in float32 consumes 31 GB of RAM. For dev teams running local or on-premise inference, that number creates real constraints. A new open-source library called turbovec addresses this directly. It is a vector index written in Rust

Meet Turbovec: A Rust Vector Index with Python Bindings, and Built on Google’s TurboQuant Algorithm Read More »

NVIDIA AI Releases Nemotron-Labs-Diffusion: A Tri-Mode Language Model with 6× Tokens Per Forward Over Qwen3-8B

agentic ai, ai, AI (Artificial Intelligence), AI Infrastructure, AI Paper Summary, AI Shorts, Applications, Artificial Intelligence, Editors Pick, Language Model, Large Language Model, Machine Learning, New Releases, Open Source, Software engineering, Staff, Tech News, Technology

NVIDIA researchers have released Nemotron-Labs-Diffusion, a language model family that unifies three decoding modes in one architecture. The model supports autoregressive (AR) decoding, diffusion-based parallel decoding, and self-speculation decoding. It is available in 3B, 8B, and 14B parameter sizes. The family includes base, instruct, and vision-language variants. Sequential Decoding Limits Throughput Standard autoregressive (AR) language

NVIDIA AI Releases Nemotron-Labs-Diffusion: A Tri-Mode Language Model with 6× Tokens Per Forward Over Qwen3-8B Read More »

Meet MemPrivacy: An Edge-Cloud Framework that Uses Local Reversible Pseudonymization to Protect User Data Without Breaking Memory Utility

ai, AI (Artificial Intelligence), AI Infrastructure, AI Paper Summary, AI Shorts, Applications, Artificial Intelligence, Editors Pick, Embedding Model, Language Model, Large Language Model, Machine Learning, New Releases, Staff, Tech News, Technology

As LLM-powered agents move from research to production, one design tension is becoming harder to ignore: the more useful cloud-hosted memory becomes, the more private user data it exposes. Researchers from MemTensor (Shanghai), HONOR Device and Tongji University have introduced MemPrivacy, a framework that attempts to resolve this tension without sacrificing the utility that makes

Meet MemPrivacy: An Edge-Cloud Framework that Uses Local Reversible Pseudonymization to Protect User Data Without Breaking Memory Utility Read More »

Nous Research Proposes Lighthouse Attention: A Training-Only Selection-Based Hierarchical Attention That Delivers 1.4–1.7× Pretraining Speedup at Long Context

ai, AI (Artificial Intelligence), AI Infrastructure, AI Paper Summary, AI Shorts, Applications, Artificial Intelligence, Context Engineering, deep learning, Editors Pick, Language Model, Large Language Model, Machine Learning, Software engineering, Staff, Tech News, Technology

Training large language models on long sequences has a well-known problem: attention is expensive. The scaled dot-product attention (SDPA) at the core of every transformer scales quadratically Θ(N²) in both compute and memory with sequence length N. FlashAttention addressed this through IO-aware tiling that avoids materializing the full N×N attention matrix in high-bandwidth memory, reducing

Nous Research Proposes Lighthouse Attention: A Training-Only Selection-Based Hierarchical Attention That Delivers 1.4–1.7× Pretraining Speedup at Long Context Read More »

Meet LiteLLM Agent Platform: A Kubernetes-Based, Self-Hosted Infrastructure Layer for Isolated Agent Sandboxes and Persistent Session Management in Production

agentic ai, ai, AI (Artificial Intelligence), AI Infrastructure, AI Shorts, Artificial Intelligence, Editors Pick, Generative AI, Language Model, Machine Learning, New Releases, Open Source, Software engineering, Staff, Tech News, Technology

Running AI agents in a local script is straightforward. Running them reliably in production across teams, across restarts, with isolated environments per context is a different problem entirely. BerriAI, the company behind the LiteLLM AI Gateway, is now open-sourcing a purpose-built answer to that problem: the LiteLLM Agent Platform. The platform is described as a

Meet LiteLLM Agent Platform: A Kubernetes-Based, Self-Hosted Infrastructure Layer for Isolated Agent Sandboxes and Persistent Session Management in Production Read More »

Zyphra Releases ZAYA1-8B-Diffusion-Preview: The First MoE Diffusion Model Converted From an Autoregressive LLM With Up to 7.7x Speedup

ai, AI (Artificial Intelligence), AI Infrastructure, AI Shorts, Applications, Artificial Intelligence, Editors Pick, Language Model, Large Language Model, Machine Learning, Software engineering, Staff, Tech News, Technology

Zyphra, the San Francisco-based AI lab behind the ZAYA1 model family, released ZAYA1-8B-Diffusion-Preview — a preview of its early work in diffusion-language models. The release demonstrates that an existing autoregressive language model can be converted into a discrete diffusion model with no systematic loss of evaluation performance, while delivering substantial inference speedups on AMD hardware.

Zyphra Releases ZAYA1-8B-Diffusion-Preview: The First MoE Diffusion Model Converted From an Autoregressive LLM With Up to 7.7x Speedup Read More »