This week, Moonshot AI released Kimi K2.7-Code. It is a coding-focused, agentic model. The model weights ship on Hugging Face under a Modified MIT license. You can also reach it through the Kimi API and Kimi Code.
K2.7-Code targets long-horizon software engineering, not general chat. It plans, edits, runs tools, and debugs across many steps. Moonshot pairs the model with a subscription coding platform around it.
Kimi K2.7-Code
K2.7-Code is a Mixture-of-Experts model. It holds 1T total parameters and activates 32B per token. The design uses 384 experts, with 8 selected per token and 1 shared. It has 61 layers, including 1 dense layer.
Attention uses MLA, and the feed-forward path uses SwiGLU. A MoonViT vision encoder adds 400M parameters for image and video input. The model ships with native INT4 quantization. The context window is 256K tokens (262,144).
Two constraints matters: Thinking mode is mandatory; disabling it returns an API error. Sampling is fixed: temperature 1.0, top_p 0.95, n 1, penalties 0.0. Default max output is 32,768 tokens.
You can self-host with vLLM, SGLang, or KTransformers. The Hugging Face repository is large, roughly 595 GB on disk. This is a server-class deployment target, not a laptop model.
Benchmark
Moonshot team published six benchmark rows. They compare K2.7-Code against K2.6, GPT-5.5, and Claude Opus 4.8. K2.7-Code beats K2.6 on every row. The largest coding jump is Kimi Code Bench v2, from 50.9 to 62.0.
BenchmarkKimi K2.6Kimi K2.7-CodeGPT-5.5Claude Opus 4.8K2.7 vs K2.6Kimi Code Bench v250.962.069.067.4+21.8%Program Bench48.353.669.163.8+11.0%MLS Bench Lite26.735.135.542.8+31.5%Kimi Claw 24/7 Bench42.946.952.850.4+9.3%MCP Atlas69.476.079.481.3+9.5%MCP Mark Verified72.881.192.976.4+11.4%
K2.7-Code does beat Opus 4.8 on MCP Mark Verified, 81.1 versus 76.4. It also lands close to GPT-5.5 on MLS Bench Lite. K2.7-Code ran in Kimi Code CLI, GPT-5.5 in Codex xhigh, and Opus 4.8 in Claude Code xhigh.
Reasoning-Token Efficiency: A Cost Claim, Not Just Quality
Moonshot team reports about 30% lower reasoning-token usage than K2.6. It frames this as ‘less overthinking.’
Reasoning tokens bill as output tokens on most price cards. Agentic coding runs hundreds or thousands of steps. Each plan, retry, and verification pays the thinking cost again. A 30% cut compounds across a long run.
The effect lands in three places at once. First, lower output-token cost per task. Second, faster steps, which helps interactive CLI sessions. Third, more steps before hitting context limits.
Use Cases With Examples
Repo-scale refactors are the main use case. Point the agent at a failing test suite. It reads files, edits across modules, then reruns tests until green.
Code review is a second fit. Feed a pull request diff and ask for risk analysis. The 256K window holds large diffs, logs, and related files together.
MCP tool-use workflows are a third fit. K2.7-Code scored 81.1 on MCP Mark Verified. That suite tests correct tool invocation through the Model Context Protocol. Think CI checks, ticket updates, and file edits in one loop.
Long-context analysis is a fourth fit. The model accepts text, image, and video input. Documentation, screenshots, and a recorded repro can share one prompt.
Marktechpost’s Interactive Explorer
#mtp-k27-demo *{box-sizing:border-box!important;margin:0;padding:0}
#mtp-k27-demo{
background:#111!important;color:#e7e7e7!important;
font-family:-apple-system,BlinkMacSystemFont,”Segoe UI”,Roboto,Helvetica,Arial,sans-serif!important;
border:1px solid #222!important;border-radius:14px!important;
padding:22px!important;max-width:920px;margin:0 auto;line-height:1.5;
}
#mtp-k27-demo .k27-head{display:flex;align-items:center;gap:12px;flex-wrap:wrap;margin-bottom:4px}
#mtp-k27-demo .k27-dot{width:11px;height:11px;border-radius:50%;background:#76B900;box-shadow:0 0 10px #76B900}
#mtp-k27-demo h2{font-size:20px;color:#fff!important;font-weight:700;letter-spacing:-.2px}
#mtp-k27-demo .k27-sub{color:#9aa0a6!important;font-size:13px;margin:2px 0 16px}
#mtp-k27-demo .k27-tabs{display:flex;gap:6px;flex-wrap:wrap;margin-bottom:18px}
#mtp-k27-demo .k27-tab{
background:#181818!important;color:#cfcfcf!important;border:1px solid #2a2a2a!important;
padding:9px 14px;border-radius:9px;cursor:pointer;font-size:13px;font-weight:600;transition:.15s
}
#mtp-k27-demo .k27-tab:hover{border-color:#76B900!important}
#mtp-k27-demo .k27-tab.on{background:#76B900!important;color:#0a0a0a!important;border-color:#76B900!important}
#mtp-k27-demo .k27-panel{display:none}
#mtp-k27-demo .k27-panel.on{display:block}
#mtp-k27-demo .k27-pills{display:flex;gap:8px;flex-wrap:wrap;margin-bottom:16px}
#mtp-k27-demo .k27-pill{
display:flex;align-items:center;gap:7px;background:#181818!important;border:1px solid #2a2a2a!important;
padding:7px 11px;border-radius:20px;cursor:pointer;font-size:12px;color:#bdbdbd!important;user-select:none
}
#mtp-k27-demo .k27-pill .sw{width:11px;height:11px;border-radius:3px}
#mtp-k27-demo .k27-pill.off{opacity:.38}
#mtp-k27-demo .k27-bench{margin-bottom:16px}
#mtp-k27-demo .k27-bname{font-size:13px;color:#dcdcdc!important;font-weight:600;margin-bottom:7px}
#mtp-k27-demo .k27-row{display:flex;align-items:center;gap:10px;margin-bottom:6px}
#mtp-k27-demo .k27-mlabel{width:120px;min-width:120px;font-size:11px;color:#9aa0a6!important}
#mtp-k27-demo .k27-track{flex:1;background:#1c1c1c!important;border-radius:6px;height:22px;overflow:hidden}
#mtp-k27-demo .k27-fill{height:100%;border-radius:6px;width:0;transition:width .7s cubic-bezier(.22,1,.36,1);
display:flex;align-items:center;justify-content:flex-end;padding-right:8px;font-size:11px;font-weight:700;color:#0a0a0a}
#mtp-k27-demo .k27-note{font-size:11px;color:#7c8085!important;margin-top:10px;border-top:1px solid #222!important;padding-top:10px}
#mtp-k27-demo .k27-calc{display:grid;grid-template-columns:1fr 1fr;gap:16px}
#mtp-k27-demo .k27-field{margin-bottom:12px}
#mtp-k27-demo .k27-field label{display:block;font-size:12px;color:#bdbdbd!important;margin-bottom:6px}
#mtp-k27-demo .k27-field .val{color:#76B900!important;font-weight:700}
#mtp-k27-demo input[type=range]{width:100%;accent-color:#76B900;cursor:pointer}
#mtp-k27-demo .k27-out{background:#161616!important;border:1px solid #262626!important;border-radius:11px;padding:16px}
#mtp-k27-demo .k27-line{display:flex;justify-content:space-between;font-size:13px;padding:7px 0;border-bottom:1px dashed #262626!important}
#mtp-k27-demo .k27-line:last-child{border-bottom:0}
#mtp-k27-demo .k27-line b{color:#fff!important}
#mtp-k27-demo .k27-total{font-size:22px;color:#76B900!important;font-weight:800;margin-top:6px}
#mtp-k27-demo .k27-save{background:#13210a!important;border:1px solid #2f4d14!important;border-radius:9px;
padding:10px 12px;font-size:12px;color:#aadd72!important;margin-top:12px}
#mtp-k27-demo .k27-specs{display:grid;grid-template-columns:1fr 1fr;gap:10px}
#mtp-k27-demo .k27-spec{background:#161616!important;border:1px solid #242424!important;border-radius:10px;padding:12px}
#mtp-k27-demo .k27-spec .l{font-size:11px;color:#9aa0a6!important;text-transform:uppercase;letter-spacing:.4px}
#mtp-k27-demo .k27-spec .v{font-size:14px;color:#fff!important;font-weight:600;margin-top:4px}
@media (max-width:640px){
#mtp-k27-demo{padding:16px!important}
#mtp-k27-demo .k27-calc{grid-template-columns:1fr}
#mtp-k27-demo .k27-specs{grid-template-columns:1fr}
#mtp-k27-demo .k27-mlabel{width:84px;min-width:84px}
}
Kimi K2.7-Code — Interactive Explorer
Company-reported benchmarks and official API pricing. Released June 12, 2026. Verified June 12, 2026.
Benchmarks
Cost Calculator
Specs
Source: Moonshot AI Kimi K2.7-Code model card. K2.7-Code ran in Kimi Code CLI; GPT-5.5 in Codex xhigh; Claude Opus 4.8 in Claude Code xhigh. First-party numbers, not an independent leaderboard.
Input tokens / run: 50,000
Output tokens / run: 8,000
Cache hit rate: 50%
Runs / month: 1,000
Reasoning share of output: 40%
Input cost$0.00
Output cost$0.00
Est. monthly total$0.00
$0.00
Rates: cached input $0.19 / 1M, cache-miss input $0.95 / 1M, output $4.00 / 1M (official Kimi pricing). Savings line illustrates K2.7-Code’s reported ~30% lower reasoning-token usage vs K2.6, applied to the reasoning share of output. Estimate only.
Source: Kimi K2.7-Code Hugging Face model card and Kimi API docs.
(function(){
var root = document.getElementById(‘mtp-k27-demo’);
if(!root) return;
// —- data (verified, company-reported) —-
var models = [
{key:’k27′, name:’K2.7-Code’, color:’#76B900′},
{key:’k26′, name:’Kimi K2.6′, color:’#5a7fb5′},
{key:’gpt’, name:’GPT-5.5′, color:’#c98a3a’},
{key:’opus’, name:’Opus 4.8′, color:’#b070c9′}
];
var active = {k27:true,k26:true,gpt:true,opus:true};
var benches = [
{name:’Kimi Code Bench v2′, k27:62.0, k26:50.9, gpt:69.0, opus:67.4},
{name:’Program Bench’, k27:53.6, k26:48.3, gpt:69.1, opus:63.8},
{name:’MLS Bench Lite’, k27:35.1, k26:26.7, gpt:35.5, opus:42.8},
{name:’Kimi Claw 24/7 Bench’,k27:46.9, k26:42.9, gpt:52.8, opus:50.4},
{name:’MCP Atlas’, k27:76.0, k26:69.4, gpt:79.4, opus:81.3},
{name:’MCP Mark Verified’, k27:81.1, k26:72.8, gpt:92.9, opus:76.4}
];
var specs = [
[‘Architecture’,’Mixture-of-Experts’],
[‘Total parameters’,’1T’],
[‘Activated parameters’,’32B’],
[‘Experts’,’384 (8 active, 1 shared)’],
[‘Layers’,’61 (1 dense)’],
[‘Attention / activation’,’MLA / SwiGLU’],
[‘Context length’,’256K (262,144)’],
[‘Vision encoder’,’MoonViT (400M)’],
[‘Inputs’,’Text, image, video’],
[‘Thinking mode’,’Required’],
[‘Default max output’,’32,768 tokens’],
[‘Quantization’,’Native INT4′],
[‘Deployment’,’vLLM, SGLang, KTransformers’],
[‘License’,’Modified MIT’]
];
// —- tabs —-
root.querySelectorAll(‘.k27-tab’).forEach(function(t){
t.addEventListener(‘click’,function(){
root.querySelectorAll(‘.k27-tab’).forEach(function(x){x.classList.remove(‘on’)});
root.querySelectorAll(‘.k27-panel’).forEach(function(x){x.classList.remove(‘on’)});
t.classList.add(‘on’);
root.querySelector(‘[data-panel=”‘+t.dataset.tab+'”]’).classList.add(‘on’);
if(t.dataset.tab===’bench’) setTimeout(renderBars,30);
});
});
// —- benchmark pills —-
var pills = root.querySelector(‘#k27-pills’);
models.forEach(function(m){
var p=document.createElement(‘div’);
p.className=’k27-pill’;
p.innerHTML=”+m.name;
p.addEventListener(‘click’,function(){
active[m.key]=!active[m.key];
p.classList.toggle(‘off’,!active[m.key]);
renderBars();
});
pills.appendChild(p);
});
// —- benchmark charts —-
var charts = root.querySelector(‘#k27-charts’);
benches.forEach(function(b,i){
var wrap=document.createElement(‘div’);wrap.className=’k27-bench’;
var h=”+b.name+”;
models.forEach(function(m){
h+=”
+”+m.name+”
+”;
});
wrap.innerHTML=h;charts.appendChild(wrap);
});
function renderBars(){
benches.forEach(function(b,i){
models.forEach(function(m){
var el=root.querySelector(‘#f-‘+i+’-‘+m.key);
var parent=el.closest(‘.k27-row’);
if(!active[m.key]){parent.style.display=’none’;el.style.width=’0′;return;}
parent.style.display=’flex’;
el.style.width=b[m.key]+’%’;
el.textContent=b[m.key].toFixed(1);
});
});
}
setTimeout(renderBars,60);
// —- specs —-
var sp=root.querySelector(‘#k27-specs’);
specs.forEach(function(s){
var d=document.createElement(‘div’);d.className=’k27-spec’;
d.innerHTML=”+s[0]+”+s[1]+”;
sp.appendChild(d);
});
// —- calculator —-
var R_CACHE=0.19, R_MISS=0.95, R_OUT=4.00; // per 1M tokens
function fmt(n){return ‘$’+n.toLocaleString(‘en-US’,{minimumFractionDigits:2,maximumFractionDigits:2});}
function comma(n){return n.toLocaleString(‘en-US’);}
var I={inp:root.querySelector(‘#k27-in’),out:root.querySelector(‘#k27-out’),
cache:root.querySelector(‘#k27-cache’),runs:root.querySelector(‘#k27-runs’),
think:root.querySelector(‘#k27-think’)};
function calc(){
var inp=+I.inp.value, out=+I.out.value, cache=+I.cache.value/100,
runs=+I.runs.value, think=+I.think.value/100;
root.querySelector(‘#k27-in-v’).textContent=comma(inp);
root.querySelector(‘#k27-out-v’).textContent=comma(out);
root.querySelector(‘#k27-cache-v’).textContent=(cache*100).toFixed(0)+’%’;
root.querySelector(‘#k27-runs-v’).textContent=comma(runs);
root.querySelector(‘#k27-think-v’).textContent=(think*100).toFixed(0)+’%’;
var inRate=cache*R_CACHE+(1-cache)*R_MISS;
var inCost=runs*inp*inRate/1e6;
var outCost=runs*out*R_OUT/1e6;
var total=inCost+outCost;
// illustrative 30% reasoning-token reduction on the reasoning share of output
var reasonOut=out*think;
var saved=runs*(reasonOut*0.30)*R_OUT/1e6;
root.querySelector(‘#k27-r-in’).textContent=fmt(inCost);
root.querySelector(‘#k27-r-out’).textContent=fmt(outCost);
root.querySelector(‘#k27-r-total’).textContent=fmt(total);
root.querySelector(‘#k27-r-big’).textContent=fmt(total)+’ /mo’;
root.querySelector(‘#k27-r-save’).innerHTML=
‘≈ ‘+fmt(saved)+’/mo saved vs K2.6-style reasoning, from ~30% fewer reasoning tokens.’;
}
Object.keys(I).forEach(function(k){I[k].addEventListener(‘input’,calc);});
calc();
})();

