AI Infrastructure

Auto Added by WPeMatico

Kimi AI and kvcache-ai Open Sources ‘AgentENV’: A Distributed System that Powers Agentic Reinforcement Learning (RL) Training for Kimi K3

agentic ai, ai, AI (Artificial Intelligence), AI Infrastructure, AI Shorts, Applications, Artificial Intelligence, Editors Pick, Language Model, Machine Learning, New Releases, Open Source, Software engineering, Staff, Tech News, Technology

Moonshot AI’s Kimi team and kvcache-ai have open-sourced AgentENV (AENV), a distributed platform for running agent environments at scale. AgentENV powers agentic reinforcement learning (RL) training for Kimi K3, Moonshot’s 2.8-trillion-parameter Mixture-of-Experts model. The code ships under an MIT license. Why Environment Infra Holds Back Agentic RL Agentic RL does not just sample text. It […]

Kimi AI and kvcache-ai Open Sources ‘AgentENV’: A Distributed System that Powers Agentic Reinforcement Learning (RL) Training for Kimi K3 Read More »

Meet Open Dreamer: A JAX/Flax Reproduction of the Dreamer 4 World Model Pipeline, With the Full Training Recipe Published

ai, AI (Artificial Intelligence), AI Infrastructure, AI Shorts, Applications, Artificial Intelligence, Editors Pick, Embedding Model, Language Model, Large Language Model, Machine Learning, New Releases, physical ai, Staff, Tech News, Technology

A small group of AI researchers (Reactor) have released Open Dreamer, an open implementation of the Dreamer 4 world-model pipeline written in JAX and Flax NNX. What actually shipped Two repositories were released. next-state/open-dreamer holds the training pipeline: a causal video tokenizer, an action-conditioned latent dynamics model, rollout generation, and FVD scoring. reactor-team/open-dreamer holds a

Meet Open Dreamer: A JAX/Flax Reproduction of the Dreamer 4 World Model Pipeline, With the Full Training Recipe Published Read More »

Designing High-Performance GPU Kernels with TileLang: Tensor-Core GEMM, Fused Softmax, FlashAttention, and Autotuning

ai, AI (Artificial Intelligence), AI Infrastructure, Artificial Intelligence, Editors Pick, Staff, Technology, Tutorials

In this tutorial, we explore TileLang as a high-level Python domain-specific language for designing and compiling performance-oriented GPU kernels through TVM. We begin by validating the CUDA environment and establishing reusable benchmarking and numerical-verification utilities, then progressively implement vector addition, tiled tensor-core matrix multiplication, schedule exploration, fused GEMM epilogues, row-wise softmax, and FlashAttention. Throughout the

Designing High-Performance GPU Kernels with TileLang: Tensor-Core GEMM, Fused Softmax, FlashAttention, and Autotuning Read More »

How to Build an End-to-End OCR Pipeline with Baidu’s Unlimited-OCR for High-Resolution Images and Multi-Page PDF Parsing

ai, AI (Artificial Intelligence), AI Infrastructure, Applications, Artificial Intelligence, Editors Pick, Machine Learning, OCR, Staff, Technology, Tutorials

In this tutorial, we build a complete workflow for running Baidu’s Unlimited-OCR model on document images and multi-page PDFs. We configure the GPU environment, install the required dependencies, load the 3B-parameter vision-language model with automatic selection of bfloat16 or float16, and generate structured sample documents for testing. We then evaluate both the tiled Gundam inference

How to Build an End-to-End OCR Pipeline with Baidu’s Unlimited-OCR for High-Resolution Images and Multi-Page PDF Parsing Read More »

Banner for AI & Big Data Expo by TechEx events.

AMD to invest up to $5 billion in Anthropic under AI infrastructure deal

ai, AI (Artificial Intelligence), AI Business Strategy, AI chips, AI Hardware & Chips, AI Infrastructure, AI Market Trends, AI Startups & Funding, Artificial Intelligence, Funding, hardware, Infrastructure & Hardware

AMD has agreed to invest up to $5 billion in Anthropic under an infrastructure agreement covering tens of billions of dollars’ worth of AI systems. Anthropic will deploy up to two gigawatts of capacity using AMD’s Instinct MI450-series accelerators, with deployment of the first gigawatt beginning in the first half of 2027. AMD’s investment will

AMD to invest up to $5 billion in Anthropic under AI infrastructure deal Read More »

Meet Gigatoken: A Rust BPE Tokenizer that Encodes Text at 24.53 GB/s, up to 989x Faster than HuggingFace Tokenizers

agentic ai, ai, AI (Artificial Intelligence), AI Infrastructure, AI Shorts, Artificial Intelligence, Editors Pick, Machine Learning, New Releases, Open Source, Software engineering, Staff, Tech News, Technology

Tokenization is the one part of the language modeling stack that almost nobody profiles. Gigatoken, released by Marcel Rød (a PhD student from Stanford) under an MIT license, argues that this was a mistake. The library encodes text at gigabytes per second on a single machine, against baselines that are already multithreaded Rust. The GPT-2

Meet Gigatoken: A Rust BPE Tokenizer that Encodes Text at 24.53 GB/s, up to 989x Faster than HuggingFace Tokenizers Read More »

Unsloth vs Axolotl vs TRL vs LLaMA-Factory: A Fine-Tuning Framework Comparison on Speed, VRAM, and Multi-GPU

agentic ai, ai, AI (Artificial Intelligence), AI Infrastructure, Applications, Artificial Intelligence, Editors Pick, Machine Learning, Staff, Tech News, Technology

Four open source projects dominate LLM fine-tuning today. Unsloth, Axolotl, TRL, and LLaMA-Factory all wrap the same underlying PyTorch and Hugging Face stack. They diverge on where they spend engineering effort. Unsloth rewrites kernels. Axolotl composes parallelism strategies. TRL defines the trainer APIs the others build on. LLaMA-Factory optimizes for breadth of model coverage and

Unsloth vs Axolotl vs TRL vs LLaMA-Factory: A Fine-Tuning Framework Comparison on Speed, VRAM, and Multi-GPU Read More »

Poolside releases Laguna S 2.1, a 118B open-weight coding model that matches rivals many times its size

agentic ai, ai, AI (Artificial Intelligence), AI Infrastructure, AI Shorts, Applications, Artificial Intelligence, Editors Pick, Language Model, Large Language Model, New Releases, Open Source, Software engineering, Staff, Tech News, Technology, Uncategorized

Poolside has released Laguna S 2.1, a 118B-parameter open-weight model built for agentic coding. It is a Mixture-of-Experts (MoE) model with 8B activated parameters per token. It supports a context window of up to 1M tokens in both thinking and no-thinking modes. The weights are on Hugging Face under an OpenMDW-1.1 license, and the model

Poolside releases Laguna S 2.1, a 118B open-weight coding model that matches rivals many times its size Read More »

Poolside Releases Laguna S 2.1, an Open-Weight Agentic Coding Model Punching Above Its Weight Class on SWE-Bench Multilingual

Poolside Releases Laguna S 2.1, an Open-Weight Agentic Coding Model Punching Above Its Weight Class on SWE-Bench Multilingual Read More »

Validating Distributed LLM Serving Benchmarks with NVIDIA srt-slurm, SLURM Recipes, Parameter Sweeps, and Pareto Analysis

ai, AI (Artificial Intelligence), AI Infrastructure, Artificial Intelligence, Editors Pick, Staff, Technology, Tutorials

In this tutorial, we explore NVIDIA’s srt-slurm framework and learn how we use srtctl to convert declarative YAML configurations into reproducible SLURM benchmark workflows for distributed LLM serving. We set up the project in Google Colab, inspect its internal architecture, define a cluster configuration, dry-run built-in and custom recipes, and model a disaggregated prefill-and-decode deployment

Validating Distributed LLM Serving Benchmarks with NVIDIA srt-slurm, SLURM Recipes, Parameter Sweeps, and Pareto Analysis Read More »