Language Model

Auto Added by WPeMatico

Ant Group Releases LingBot-VLA, A Vision Language Action Foundation Model For Real World Robot Manipulation

How do you build a single vision language action model that can control many different dual arm robots in the real world? LingBot-VLA is Ant Group Robbyant’s new Vision Language Action foundation model that targets practical robot manipulation in the real world. It is trained on about 20,000 hours of teleoperated bimanual data collected from 9 […]

Ant Group Releases LingBot-VLA, A Vision Language Action Foundation Model For Real World Robot Manipulation Read More »

Google DeepMind Unveils AlphaGenome: A Unified Sequence-to-Function Model Using Hybrid Transformers and U-Nets to Decode the Human Genome

Google DeepMind is expanding its biological toolkit beyond the world of protein folding. After the success of AlphaFold, the Google’s research team has introduced AlphaGenome. This is a unified deep learning model designed for sequence to function genomics. This represents a major shift in how we model the human genome. AlphaGenome does not treat DNA

Google DeepMind Unveils AlphaGenome: A Unified Sequence-to-Function Model Using Hybrid Transformers and U-Nets to Decode the Human Genome Read More »

Alibaba Introduces Qwen3-Max-Thinking, a Test Time Scaled Reasoning Model with Native Tool Use Powering Agentic Workloads

Qwen3-Max-Thinking is Alibaba’s new flagship reasoning model. It does not only scale parameters, it also changes how inference is done, with explicit control over thinking depth and built in tools for search, memory, and code execution. https://qwen.ai/blog?id=qwen3-max-thinking Model scale, data, and deployment Qwen3-Max-Thinking is a trillion-parameter MoE flagship LLM pretrained on 36T tokens and built

Alibaba Introduces Qwen3-Max-Thinking, a Test Time Scaled Reasoning Model with Native Tool Use Powering Agentic Workloads Read More »

MBZUAI Releases K2 Think V2: A Fully Sovereign 70B Reasoning Model For Math, Code, And Science

Can a fully sovereign open reasoning model match state of the art systems when every part of its training pipeline is transparent. Researchers from Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) release K2 Think V2, a fully sovereign reasoning model designed to test how far open and fully documented pipelines can push long horizon

MBZUAI Releases K2 Think V2: A Fully Sovereign 70B Reasoning Model For Math, Code, And Science Read More »

Moonshot AI Releases Kimi K2.5: An Open Source Visual Agentic Intelligence Model with Native Swarm Execution

Moonshot AI has released Kimi K2.5 as an open source visual agentic intelligence model. It combines a large Mixture of Experts language backbone, a native vision encoder, and a parallel multi agent system called Agent Swarm. The model targets coding, multimodal reasoning, and deep web research with strong benchmark results on agentic, vision, and coding

Moonshot AI Releases Kimi K2.5: An Open Source Visual Agentic Intelligence Model with Native Swarm Execution Read More »

DSGym Offers a Reusable Container Based Substrate for Building and Benchmarking Data Science Agents

Data science agents should inspect datasets, design workflows, run code, and return verifiable answers, not just autocomplete Pandas code. DSGym, introduced by researchers from Stanford University, Together AI, Duke University, and Harvard University, is a framework that evaluates and trains such agents across more than 1,000 data science challenges with expert curated ground truth and

DSGym Offers a Reusable Container Based Substrate for Building and Benchmarking Data Science Agents Read More »

How Tree-KG Enables Hierarchical Knowledge Graphs for Contextual Navigation and Explainable Multi-Hop Reasoning Beyond Traditional RAG

In this tutorial, we implement Tree-KG, an advanced hierarchical knowledge graph system that goes beyond traditional retrieval-augmented generation by combining semantic embeddings with explicit graph structure. We show how we can organize knowledge in a tree-like hierarchy that mirrors how humans learn, from broad domains to fine-grained concepts, and then reason across this structure using

How Tree-KG Enables Hierarchical Knowledge Graphs for Contextual Navigation and Explainable Multi-Hop Reasoning Beyond Traditional RAG Read More »

NVIDIA Revolutionizes Climate Tech with ‘Earth-2’: The World’s First Fully Open Accelerated AI Weather Stack

For decades, predicting the weather has been the exclusive domain of massive government supercomputers running complex physics-based equations. NVIDIA has shattered that barrier with the release of the Earth-2 family of open models and tools for AI weather and climate prediction accessible to virtually anyone, from tech startups to national meteorological agencies. In a move

NVIDIA Revolutionizes Climate Tech with ‘Earth-2’: The World’s First Fully Open Accelerated AI Weather Stack Read More »

StepFun AI Introduce Step-DeepResearch: A Cost-Effective Deep Research Agent Model Built Around Atomic Capabilities

StepFun has introduced Step-DeepResearch, a 32B parameter end to end deep research agent that aims to turn web search into actual research workflows with long horizon reasoning, tool use and structured reporting. The model is built on Qwen2.5 32B-Base and is trained to act as a single agent that plans, explores sources, verifies evidence and

StepFun AI Introduce Step-DeepResearch: A Cost-Effective Deep Research Agent Model Built Around Atomic Capabilities Read More »

🔧

A Coding Implementation to Automating LLM Quality Assurance with DeepEval, Custom Retrievers, and LLM-as-a-Judge Metrics

We initiate this tutorial by configuring a high-performance evaluation environment, specifically focused on integrating the DeepEval framework to bring unit-testing rigor to our LLM applications. By bridging the gap between raw retrieval and final generation, we implement a system that treats model outputs as testable code and uses LLM-as-a-judge metrics to quantify performance. We move

A Coding Implementation to Automating LLM Quality Assurance with DeepEval, Custom Retrievers, and LLM-as-a-Judge Metrics Read More »