Alibaba’s Qwen Team Launches Qwen3.7-Plus, Adding Vision, Deep Reasoning, Tool Invocation, and Autonomous Iteration on the Bailian Platform

Alibaba’s Qwen team has released Qwen3.7-Plus. The model is now available through Alibaba Cloud’s Bailian platform. Bailian is the console international users access as Model Studio. It offers API services to external developers. The release follows Alibaba’s May unveiling of the Qwen3.7 generation.

Qwen3.7-Plus

Qwen3.7-Plus is a multimodal large language model. The model understands images and video, alongside written prompts. Its sibling, Qwen3.7-Max, is text-only.

This is visual understanding, not generation. The model reads images and video; it does not create them. Alibaba’s image and video generation work sits in separate model families.

Alibaba team describes the release as a step in multimodal hybrid agent technology. An agent is a model that plans and acts across steps. Building on image and video understanding, Qwen3.7-Plus adds five abilities. These are deep reasoning, self-programming, tool invocation, verification and testing, and autonomous iteration.

Self-programming means the model writes and revises its own code. Tool invocation means it calls external functions or APIs. Verification and testing means it runs outputs and checks results. Autonomous iteration means it loops until the task is done. Together, they describe a model built to act, not just answer.

The Vision Case

Qwen3.7-Plus is the multimodal half of the 3.7 family. Its preview already posted measurable vision results. In Vision Arena, Qwen3.7-Plus-Preview ranked #16 overall. That placed Alibaba as the #5 lab in vision. The model rank and the lab rank are separate figures.

Vision Arena is a neutral leaderboard run by LM Arena. Users vote on image-understanding answers in blind matchups. The #16 result sits behind the top US labs, but inside the field. For image-heavy work, this is the signal that matters. Think OCR at scale, chart reading, or video-frame analysis.

The text-only Max sibling anchors the generation’s reasoning. Max scored 56.6 on the Artificial Analysis Intelligence Index. That was the highest placement for a Chinese model at release.

https://qwen.ai/blog?id=qwen3.7-plus

The Agentic Loop

The clear shift in Qwen3.7 is its agentic focus. Alibaba team is positioning the models for long-running tasks. Bailian, the host platform, adds two relevant pieces.

The first is an Agentic RL (reinforcement learning) mechanism. The platform uses real-world execution feedback to refine model accuracy over time. The second is a set of built-in safety guardrails. These keep autonomous tools inside preset operational limits. That detail matters when an agent runs commands or edits files.

Marktechpost’s Visual Explainer

#mtp-qwen37plus-slider.mtp-root{
–mtp-bg:#ffffff;
–mtp-canvas:#f4f5f0;
–mtp-ink:#15171a;
–mtp-sub:#565b61;
–mtp-line:#e4e6df;
–mtp-green:#76B900;
–mtp-green-ink:#4d7a00;
–mtp-soft:#f0f3e8;
–mtp-serif:’Fraunces’,Georgia,’Times New Roman’,serif;
–mtp-sans:’Inter’,-apple-system,BlinkMacSystemFont,’Segoe UI’,sans-serif;
–mtp-mono:’JetBrains Mono’,’SFMono-Regular’,Menlo,Consolas,monospace;
}
@import url(‘https://fonts.googleapis.com/css2?family=Fraunces:opsz,wght@9..144,400;9..144,500;9..144,600&family=Inter:wght@400;500;600&family=JetBrains+Mono:wght@500;600&display=swap’);

#mtp-qwen37plus-slider, #mtp-qwen37plus-slider *{box-sizing:border-box!important;margin:0;padding:0}
#mtp-qwen37plus-slider hr,#mtp-qwen37plus-slider p:empty,#mtp-qwen37plus-slider del,#mtp-qwen37plus-slider s{display:none!important}

#mtp-qwen37plus-slider.mtp-root{
background:var(–mtp-canvas)!important;
color:var(–mtp-ink)!important;
font-family:var(–mtp-sans);
border:1px solid var(–mtp-line)!important;
border-radius:18px;
padding:22px;
max-width:860px;
margin:24px auto;
line-height:1.55;
-webkit-font-smoothing:antialiased;
}

#mtp-qwen37plus-slider .mtp-card{
background:var(–mtp-bg)!important;
border:1px solid var(–mtp-line)!important;
border-radius:14px;
overflow:hidden;
box-shadow:0 1px 0 rgba(21,23,26,.02),0 18px 40px -28px rgba(21,23,26,.25);
}

#mtp-qwen37plus-slider .mtp-bar{
display:flex;align-items:center;justify-content:space-between;
padding:16px 22px;border-bottom:1px solid var(–mtp-line)!important;
background:var(–mtp-bg)!important;
}
#mtp-qwen37plus-slider .mtp-brand{
font-family:var(–mtp-mono);font-size:12px;letter-spacing:.14em;
text-transform:uppercase;color:var(–mtp-green-ink)!important;font-weight:600;
}
#mtp-qwen37plus-slider .mtp-count{
font-family:var(–mtp-mono);font-size:12px;color:var(–mtp-sub)!important;font-weight:600;
}
#mtp-qwen37plus-slider .mtp-count b{color:var(–mtp-ink)!important}

#mtp-qwen37plus-slider .mtp-stage{position:relative;padding:34px 30px 30px;background:var(–mtp-bg)!important;min-height:300px}
#mtp-qwen37plus-slider .mtp-slide{display:none}
#mtp-qwen37plus-slider .mtp-slide.is-on{display:block;animation:mtpIn .45s cubic-bezier(.2,.7,.2,1)}
@keyframes mtpIn{from{opacity:0;transform:translateY(10px)}to{opacity:1;transform:none}}

#mtp-qwen37plus-slider .mtp-kick{
font-family:var(–mtp-mono);font-size:12px;letter-spacing:.12em;text-transform:uppercase;
color:var(–mtp-green-ink)!important;font-weight:600;display:inline-block;margin-bottom:14px;
}
#mtp-qwen37plus-slider .mtp-kick::before{content:””;display:inline-block;width:22px;height:2px;background:var(–mtp-green)!important;vertical-align:middle;margin-right:9px;border-radius:2px}

#mtp-qwen37plus-slider h2.mtp-h{
font-family:var(–mtp-serif);font-weight:500;font-size:30px;line-height:1.15;color:var(–mtp-ink)!important;margin-bottom:16px;letter-spacing:-.01em;
}
#mtp-qwen37plus-slider .mtp-lead{font-size:16px;color:var(–mtp-sub)!important;max-width:60ch}

/* cover */
#mtp-qwen37plus-slider .mtp-eyebrow{font-family:var(–mtp-mono);font-size:12px;letter-spacing:.16em;text-transform:uppercase;color:var(–mtp-green-ink)!important;font-weight:600;margin-bottom:18px}
#mtp-qwen37plus-slider h1.mtp-title{font-family:var(–mtp-serif);font-weight:600;font-size:48px;line-height:1.05;color:var(–mtp-ink)!important;letter-spacing:-.02em;margin-bottom:10px}
#mtp-qwen37plus-slider .mtp-title small{display:block;font-family:var(–mtp-serif);font-style:italic;font-weight:400;font-size:22px;color:var(–mtp-sub)!important;margin-top:8px;letter-spacing:0}
#mtp-qwen37plus-slider .mtp-meta{margin-top:22px;padding-top:16px;border-top:1px solid var(–mtp-line)!important;font-size:14px;color:var(–mtp-sub)!important}
#mtp-qwen37plus-slider .mtp-meta b{color:var(–mtp-ink)!important;font-weight:600}
#mtp-qwen37plus-slider .mtp-hint{margin-top:14px;font-family:var(–mtp-mono);font-size:12px;color:var(–mtp-green-ink)!important}

/* lists */
#mtp-qwen37plus-slider ul.mtp-list{list-style:none;margin-top:6px}
#mtp-qwen37plus-slider ul.mtp-list li{position:relative;padding:11px 0 11px 26px;border-bottom:1px solid var(–mtp-line)!important;font-size:15.5px;color:var(–mtp-ink)!important}
#mtp-qwen37plus-slider ul.mtp-list li:last-child{border-bottom:0!important}
#mtp-qwen37plus-slider ul.mtp-list li::before{content:””;position:absolute;left:2px;top:18px;width:8px;height:8px;border-radius:50%;background:var(–mtp-green)!important}
#mtp-qwen37plus-slider ul.mtp-list li b{font-weight:600}
#mtp-qwen37plus-slider ul.mtp-list li span{color:var(–mtp-sub)!important}

#mtp-qwen37plus-slider .mtp-note{margin-top:16px;padding:12px 14px;background:var(–mtp-soft)!important;border-left:3px solid var(–mtp-green)!important;border-radius:6px;font-size:14px;color:var(–mtp-ink)!important}

/* two columns */
#mtp-qwen37plus-slider .mtp-cols{display:grid;grid-template-columns:1fr 1fr;gap:16px;margin-top:6px}
#mtp-qwen37plus-slider .mtp-col{border:1px solid var(–mtp-line)!important;border-radius:10px;padding:16px;background:var(–mtp-bg)!important}
#mtp-qwen37plus-slider .mtp-col h3{font-family:var(–mtp-mono);font-size:11px;letter-spacing:.1em;text-transform:uppercase;font-weight:600;margin-bottom:10px;color:var(–mtp-ink)!important}
#mtp-qwen37plus-slider .mtp-col.ok h3{color:var(–mtp-green-ink)!important}
#mtp-qwen37plus-slider .mtp-col ul{list-style:none}
#mtp-qwen37plus-slider .mtp-col li{font-size:14px;padding:6px 0;color:var(–mtp-sub)!important;border-bottom:1px dashed var(–mtp-line)!important}
#mtp-qwen37plus-slider .mtp-col li:last-child{border-bottom:0!important}

/* nav */
#mtp-qwen37plus-slider .mtp-nav{display:flex;align-items:center;justify-content:space-between;padding:16px 22px;border-top:1px solid var(–mtp-line)!important;background:var(–mtp-bg)!important}
#mtp-qwen37plus-slider .mtp-dots{display:flex;gap:8px}
#mtp-qwen37plus-slider .mtp-dot{width:8px;height:8px;border-radius:50%;background:#d6d9d1!important;border:0;padding:0;cursor:pointer;transition:.2s}
#mtp-qwen37plus-slider .mtp-dot.is-on{background:var(–mtp-green)!important;width:22px;border-radius:5px}
#mtp-qwen37plus-slider .mtp-arrows{display:flex;gap:10px}
#mtp-qwen37plus-slider .mtp-btn{width:40px;height:40px;border-radius:50%;border:1px solid var(–mtp-line)!important;background:var(–mtp-ink)!important;color:#fff!important;cursor:pointer;font-size:18px;line-height:1;display:flex;align-items:center;justify-content:center;transition:.18s}
#mtp-qwen37plus-slider .mtp-btn:hover{background:var(–mtp-green)!important;color:var(–mtp-ink)!important}
#mtp-qwen37plus-slider .mtp-btn:disabled{opacity:.32;cursor:default;background:var(–mtp-ink)!important;color:#fff!important}

/* tagline */
#mtp-qwen37plus-slider .mtp-tag{display:flex;align-items:center;gap:12px;margin-top:18px;padding:14px 16px;background:var(–mtp-bg)!important;border:1px solid var(–mtp-line)!important;border-radius:12px}
#mtp-qwen37plus-slider .mtp-tag .mtp-logo{font-family:var(–mtp-serif);font-weight:600;font-size:18px;color:var(–mtp-ink)!important;white-space:nowrap}
#mtp-qwen37plus-slider .mtp-tag .mtp-logo span{color:var(–mtp-green)!important}
#mtp-qwen37plus-slider .mtp-tag .mtp-tag-txt{font-size:13px;color:var(–mtp-sub)!important;border-left:1px solid var(–mtp-line)!important;padding-left:12px}
#mtp-qwen37plus-slider .mtp-tag a{color:var(–mtp-green-ink)!important;text-decoration:none;font-weight:600}

@media(max-width:640px){
#mtp-qwen37plus-slider.mtp-root{padding:12px;border-radius:14px}
#mtp-qwen37plus-slider .mtp-stage{padding:24px 18px;min-height:340px}
#mtp-qwen37plus-slider h1.mtp-title{font-size:34px}
#mtp-qwen37plus-slider .mtp-title small{font-size:18px}
#mtp-qwen37plus-slider h2.mtp-h{font-size:23px}
#mtp-qwen37plus-slider .mtp-cols{grid-template-columns:1fr}
#mtp-qwen37plus-slider .mtp-bar,#mtp-qwen37plus-slider .mtp-nav{padding:12px 16px}
#mtp-qwen37plus-slider .mtp-btn{width:36px;height:36px}
#mtp-qwen37plus-slider .mtp-tag{flex-direction:column;align-items:flex-start;gap:6px}
#mtp-qwen37plus-slider .mtp-tag .mtp-tag-txt{border-left:0!important;padding-left:0}
}

AI Models · Field Guide
1 / 7

Alibaba Qwen · June 2, 2026
Qwen3.7-PlusAlibaba’s multimodal agent model, now on Bailian
A multimodal large language model with image and video understanding, deep reasoning, and agentic features. Available via API on Alibaba Cloud’s Bailian platform, accessed internationally as Model Studio.
Use the arrows or swipe to explore →

01 · What it is
A multimodal large language model

Multimodal — it reads images and video, alongside text input.
Visual understanding, not generation — it reads media, it does not create it.
The multimodal sibling to the text-only Qwen3.7-Max.
Alibaba describes it as multimodal hybrid agent technology.

02 · Capabilities
Five abilities beyond seeing

Deep reasoning — works through problems step by step.
Self-programming — writes and revises its own code.
Tool invocation — calls external functions or APIs.
Verification and testing — runs outputs and checks results.
Autonomous iteration — loops until the task is done.

03 · Vision benchmarks
Where it stands on vision

The preview ranked #16 overall in Vision Arena (LM Arena).
That placed Alibaba as the #5 lab in vision.
Model rank and lab rank are separate figures.
Relevant for OCR, chart reading, and video-frame analysis.

For reference, the text-only Max sibling scored 56.6 on the Artificial Analysis Intelligence Index, the highest Chinese model at release.

04 · The agentic loop
Built for long-running tasks

Bailian adds an Agentic RL (reinforcement learning) mechanism.
It uses real-world execution feedback to refine accuracy.
Built-in safety guardrails keep autonomous tools within limits.
That matters when an agent runs commands or edits files.

05 · Confirmed vs unconfirmed
What we know today

Confirmed

Image and video understanding
Agentic feature set
Bailian API access
Proprietary, API-only

Not yet published

Public price sheet
Context window size
Output token limits
Open weights

06 · Why it matters
The practical read

A vision-capable agent backend through one API.
Suits workloads mixing images, video, and tool use.
A leaderboard rank shows promise, not a guarantee.
Validate accuracy on your own data before committing.


Marktechpost
AI research, news, and developer signal for engineers and data scientists. Read more at marktechpost.com.

(function(){
var root=document.getElementById(‘mtp-qwen37plus-slider’);
if(!root||root.dataset.mtpReady)return; root.dataset.mtpReady=’1′;
var slides=root.querySelectorAll(‘.mtp-slide’);
var prev=root.querySelector(‘#mtp-prev’);
var next=root.querySelector(‘#mtp-next’);
var cur=root.querySelector(‘#mtp-cur’);
var dotWrap=root.querySelector(‘#mtp-dots’);
var i=0,n=slides.length;
for(var d=0;d45){ go(dx