Staff

Auto Added by WPeMatico

StepFun Releases Step 3.7 Flash: A 198B MoE Vision-Language Model for Coding Agents and Search Workflows

agentic ai, ai, AI (Artificial Intelligence), AI Shorts, Applications, Artificial Intelligence, Editors Pick, Language Model, Large Language Model, Machine Learning, New Releases, Open Source, Software engineering, Staff, Tech News, Technology, Vision Language Model

StepFun today released Step 3.7 Flash, a multimodal Mixture-of-Experts model targeting agentic use cases. It adds native vision input and improved tool-use reliability over Step 3.5 Flash. What is Step 3.7 Flash? Step 3.7 Flash is a 198B-parameter sparse Mixture-of-Experts (MoE) vision-language model. It pairs a 196B-parameter language backbone with a 1.8B-parameter vision encoder (ViT) […]

StepFun Releases Step 3.7 Flash: A 198B MoE Vision-Language Model for Coding Agents and Search Workflows Read More »

Meet mKernel: A Multi-GPU, Multi-Node Fused Kernel Library for GPU-Driven Communication

ai, AI (Artificial Intelligence), AI Infrastructure, AI Shorts, Applications, Artificial Intelligence, Editors Pick, Machine Learning, New Releases, Open Source, Software engineering, Staff, Tech News, Technology

GPU communication overhead is a measurable bottleneck in production AI workloads. According to data cited by the mKernel project, communication can consume 43.6% of the forward pass and 32% of end-to-end training time. Across popular Mixture-of-Experts (MoE) models, inter-device communication can account for up to 47% of total execution time. Researchers from UC Berkeley’s UCCL

Meet mKernel: A Multi-GPU, Multi-Node Fused Kernel Library for GPU-Driven Communication Read More »

Hexo Labs Open-Sources SIA: A Self-Improving Agent That Updates Both the Harness and the Model Weights

agentic ai, ai, AI (Artificial Intelligence), AI Agents, Artificial Intelligence, Editors Pick, New Releases, Open Source, Software engineering, Staff, Tech News, Technology

Most AI agents stop improving once a human stops tuning them. The model is fixed. The scaffold around it is fixed. Hexo Labs wants to move both at once. It released SIA (Self-Improving AI) this week as an open-source framework under an MIT license. The core claim of this research is narrow but concrete. SIA

Hexo Labs Open-Sources SIA: A Self-Improving Agent That Updates Both the Harness and the Model Weights Read More »

How to Design an End-to-End Ansible Automation Lab with Playbooks, Inventories, Roles, Vault, Dynamic Inventory, and Custom Modules

ai, AI (Artificial Intelligence), AI Infrastructure, Artificial Intelligence, Editors Pick, Machine Learning, Staff, Tutorials

In this tutorial, we build a complete Ansible lab that runs end-to-end in Google Colab or any Linux environment. We start by installing ansible-core, setting up a local workspace, creating an Ansible configuration file, and defining both static and dynamic inventories. We then explore key Ansible concepts, including group variables, host variables, variable precedence, ad

How to Design an End-to-End Ansible Automation Lab with Playbooks, Inventories, Roles, Vault, Dynamic Inventory, and Custom Modules Read More »

Liquid AI Releases LFM2.5-8B-A1B: An On-Device MoE Model With 8.3B Total and 1.5B Active Parameters

ai, AI (Artificial Intelligence), AI Infrastructure, AI Shorts, Applications, Artificial Intelligence, Editors Pick, Language Model, Large Language Model, Machine Learning, New Releases, Software engineering, Staff, Tech News, Technology, Uncategorized

Liquid AI just shipped LFM2.5-8B-A1B. It is an on-device Mixture-of-Experts (MoE) model built for tool calling. The model holds 8.3B total parameters but activates only 1.5B per token. That sparsity is what lets it run on consumer hardware. The release follows LFM2-8B-A1B, which Liquid AI team published earlier. LFM2.5 is a new family of hybrid

Liquid AI Releases LFM2.5-8B-A1B: An On-Device MoE Model With 8.3B Total and 1.5B Active Parameters Read More »

Anthropic Ships Claude Opus 4.8 Alongside Dynamic Workflows and Cheaper Fast Mode, With Workflows Capped at 1,000 Subagents

agentic ai, ai, AI (Artificial Intelligence), AI Agents, AI Shorts, Applications, Artificial Intelligence, Editors Pick, Language Model, Large Language Model, Machine Learning, New Releases, Software engineering, Staff, Tech News, Technology

Anthropic just launched Claude Opus 4.8. Also, there two Claude Code updates shipped with it. Dynamic workflows run many subagents in parallel. Fast mode now supports Opus 4.8 at a lower price. Both are research previews. What Dynamic Workflows Actually Are A dynamic workflow is a JavaScript script that orchestrates subagents at scale. Claude writes

Anthropic Ships Claude Opus 4.8 Alongside Dynamic Workflows and Cheaper Fast Mode, With Workflows Capped at 1,000 Subagents Read More »

Perplexity AI Open-Sources Unigram Tokenizer That Achieves 5x Lower p50 Latency Than Hugging Face tokenizers Crate

Perplexity AI’s research team reimplemented their Unigram tokenizer from scratch in Rust and open-sourced the code in pplx-garden, their inference technology repository. At production input lengths, the new encoder cuts p50 latency by roughly 5x versus the Hugging Face tokenizers crate, ~2x versus SentencePiece (C++), and ~1.5x versus IREE’s tokenizer (C), with zero steady-state heap

Perplexity AI Open-Sources Unigram Tokenizer That Achieves 5x Lower p50 Latency Than Hugging Face tokenizers Crate Read More »

A Coding Guide to Implement a pgvector-Powered Semantic, Hybrid, Sparse, and Quantized Vector Search System

ai, AI (Artificial Intelligence), Applications, Artificial Intelligence, Big Data, Data Science, Editors Pick, Staff, Technology, Tutorials

In this tutorial, we build a complete pgvector playground inside Google Colab and explore how PostgreSQL can work as a powerful vector database for modern AI applications. We start by installing PostgreSQL, compiling the pgvector extension, connecting through Psycopg, and registering vector types for smooth Python integration. Then, we create embeddings with SentenceTransformers, store them

A Coding Guide to Implement a pgvector-Powered Semantic, Hybrid, Sparse, and Quantized Vector Search System Read More »

Sakana AI Proposes DiffusionBlocks: a Block-wise Training Framework That Converts Residual Networks into Independently Trainable Denoising Modules

ai, AI (Artificial Intelligence), AI Infrastructure, AI Paper Summary, AI Shorts, Artificial Intelligence, Editors Pick, Language Model, Large Language Model, Machine Learning, New Releases, Staff, Tech News, Technology

Researchers from Sakana AI and the University of Tokyo propose DiffusionBlocks. It trains transformer-based networks one block at a time. Training memory is reduced by a factor of B, where B is the number of blocks. Performance is maintained across diverse architectures. The Memory Problem in Neural Network Training End-to-end backpropagation requires storing intermediate activations

Sakana AI Proposes DiffusionBlocks: a Block-wise Training Framework That Converts Residual Networks into Independently Trainable Denoising Modules Read More »

NVIDIA Releases Polar, a Token-Faithful Rollout Framework for GRPO Training Across Codex, Claude Code, and Qwen Code

agentic ai, ai, AI (Artificial Intelligence), AI Paper Summary, AI Shorts, Applications, Artificial Intelligence, Editors Pick, Language Model, Large Language Model, Machine Learning, New Releases, Open Source, Staff, Tech News, Technology

Reinforcement learning for language agents is growing more complex. Agents now manage multi-turn tool use, long-running contexts, and multi-agent orchestration. The main engineering challenge is connecting existing agent software to training pipelines without breaking how those tools work. NVIDIA’s research team introduced Polar, a rollout framework that lets researchers run reinforcement learning over any agent

NVIDIA Releases Polar, a Token-Faithful Rollout Framework for GRPO Training Across Codex, Claude Code, and Qwen Code Read More »