Kimi K2.7 Code: The Open-Source Coding Model That's Closing the Gap With Big Tech

Moonshot AI's Kimi K2.7 Code is an open-source, trillion-parameter coding model with substantially lower list pricing than GPT-5.5 or Claude. Here's what builders should know.

Written by
Bhavyadeep
Reviewed by
Sakthy
Last updated: 
June 29, 2026
0
 min read
Table of Contents

The AI coding landscape just shifted again. On June 12, Beijing-based Moonshot AI released Kimi K2.7 Code, a trillion-parameter, open-weight MoE model that the company positions specifically for coding and agent workflows. It's available for anyone to download, use commercially, or self-host under a Modified MIT license.

What makes it notable isn't just the spec sheet. It's the price. At $0.95 per million input tokens and $4.00 per million output tokens on Moonshot's published list pricing, K2.7 Code is roughly 5x cheaper on input and up to 7.5x cheaper on output than GPT-5.5 or Claude Opus 4.8 on their respective list rates. Exact cost multiples will vary by plan, tier, and cache behavior, but the gap is substantial by any measure. For anyone building AI-powered products, that cost curve matters.

What Kimi K2.7 Code actually is

Kimi K2.7 Code is not a general-purpose chatbot. Moonshot has been explicit: it's a coding-specialized model designed for long-horizon software engineering. Think multi-file refactors, extended debugging sessions, and agentic workflows that run for hundreds of steps. For general tasks like writing, research, or conversation, Moonshot still recommends its predecessor, Kimi K2.6.

Under the hood, K2.7 Code uses a Mixture-of-Experts (MoE) architecture with 1 trillion total parameters but only 32 billion active per token, an efficiency design that keeps inference costs low despite the model's size. It includes 384 experts with 8 selected per token, a 256K-token context window, and a 400-million-parameter vision encoder called MoonViT that allows it to accept image and video input alongside text.

One important quirk: thinking mode is always on. Every API call generates reasoning tokens that count toward output billing, and there's no way to disable it. This is a deliberate design choice by Moonshot to ensure consistent reasoning quality across multi-step coding workflows.

The benchmark picture (with caveats)

Moonshot published six benchmarks at launch, all showing meaningful improvement over K2.6. The standout numbers, per Moonshot's model card:

  • Kimi Code Bench v2: 62.0 (up 21.8% from K2.6's 50.9)
  • Program Bench: 53.6 (up 11.0% from 48.3)
  • MLS Bench Lite: 35.1 (up 31.5% from 26.7)
  • MCP Mark Verified: 81.1 (up from 72.8)

That last number is the one drawing the most attention. MCP Mark Verified measures tool invocation accuracy through the Model Context Protocol. In Moonshot's tests, K2.7 Code scored 81.1 on this metric versus Claude Opus 4.8's 76.4. For teams building AI agents that need to interact with databases, APIs, and developer toolchains, that's a notable result, though it has not yet been independently replicated.

The honest caveat: all six launch benchmarks are either Moonshot's proprietary suites or were run under Moonshot's own test conditions. As of publication, there are no independent third-party results on standard public suites like SWE-bench Verified, SWE-bench Pro, or Terminal-Bench 2.0. And where Moonshot did publish comparisons against frontier models, GPT-5.5 still leads K2.7 Code on all six rows. Claude Opus 4.8 leads on five of six, with that MCP Mark score being the exception.

In short: K2.7 Code is a significant upgrade over K2.6 and competitive on specific capabilities, but it isn't beating the top closed models across the board.

The token efficiency story

Beyond raw benchmarks, the practical gain that developers care about most is a roughly 30% reduction in "thinking" tokens compared to K2.6, per Moonshot's launch materials. Since K2.7 Code always runs with reasoning enabled and those tokens bill as output, this isn't a minor detail. It's the real cost cut.

In agentic coding sessions that run for hundreds of steps, each plan, retry, and verification pass pays the thinking cost again. A 30% reduction compounds across a long run, showing up in three places at once: lower output-token cost per task, faster step completion in interactive sessions, and more work done within the same context budget.

How it compares

Here's where K2.7 Code sits relative to the models builders are most likely evaluating:

Against GPT-5.5 ($5/$30 per million tokens on published list pricing): GPT-5.5 scores 82.7% on Terminal-Bench 2.0 and leads on most coding benchmarks. It's the stronger model by capability on independently verified suites. But K2.7 Code costs roughly 5x less on input and 7.5x less on output based on published list rates, though actual costs will vary by plan and cache behavior. For teams running high-volume agent pipelines, the cost math changes the calculus.

Against Claude Opus 4.8 ($5/$25 per million tokens on published list pricing): Opus 4.8 offers a 1M-token context window (4x K2.7's 256K) and 88.6% on SWE-bench Verified. It's the safer choice for proven reliability on independently tested benchmarks. However, Moonshot reports K2.7 Code scored higher than Opus 4.8 on MCP Mark Verified (81.1 vs 76.4 in Moonshot's tests), suggesting potential advantages in tool invocation accuracy for certain agent workflows. Independent confirmation of this result is still pending.

Against GLM-5.2 (open-source, MIT license): Zhipu AI's GLM-5.2 launched the same week and tops the Artificial Analysis open-weight Intelligence Index. It's the broader open-source rival. K2.7 Code is narrower, purpose-built for coding, but paired with a first-party CLI agent (Kimi Code) that makes it more turnkey for developer workflows.

What developers are saying

Developer reaction has been positive but measured. A few notable data points from trusted voices in the AI community:

Jun Song, Qwen ambassador and local LLM ecosystem contributor, ranked models after running agent tests: "Fable > Kimi-2.7 > Opus-4.8 = GLM-5.2 > GPT5.5 > Minimax-M3", placing K2.7 Code above both Claude Opus 4.8 and GPT-5.5 in his real-world testing.

xjdr, an AI infrastructure developer, called it: "k2.7 has been extremely impressive so far (as was k2.6 before it). Fantastic job Moonshot team."

Community threads on r/LocalLLaMA and r/ChatGPTCoding have surfaced a consistent pattern: developers praise the open-weight availability and the MCP tool-use results, and several report 75-90% reductions in API spend after switching from GPT-equivalent models for batch coding tasks.

The recurring ask: independent SWE-bench results before making definitive claims about where K2.7 Code truly sits relative to the frontier.

The bigger picture: Moonshot AI's momentum

K2.7 Code is the fifth major release in the K2 series in under a year: K2 (July 2025), K2 Thinking (November 2025), K2.5 (January 2026), K2.6 (April 2026), and now K2.7 Code (June 2026). That pace is unusually fast, even by 2026 standards.

Moonshot AI itself is on a steep trajectory. The company raised $2 billion at a $20 billion valuation in May 2026, led by Meituan's venture arm, and is reportedly targeting a $30 billion valuation in its next round. Annual recurring revenue topped $200 million in April, per Bloomberg. The Kimi chatbot has over 36 million monthly active users, and K2.6 is currently the second most-used LLM on OpenRouter globally.

The company was founded in March 2023 by Yang Zhilin, a former Meta AI and Google Brain researcher, alongside Zhou Xinyu and Wu Yuxin, all Tsinghua University alumni. Its backers include Alibaba, Tencent, Meituan, HongShan (formerly Sequoia China), and IDG Capital.

What this means for builders

If you're a non-technical founder using AI-powered tools to build products, you don't need to care about K2.7 Code's architecture or benchmark tables. What you should care about is what they signal.

AI coding capabilities are commoditizing fast. A model that's open-source, commercially licensed, and costs under a dollar per million input tokens can now compete with models from OpenAI and Anthropic that cost 5-7x more on specific tasks based on published list pricing. That cost curve flows downstream into every AI-powered tool, platform, and workflow that builders rely on.

One practical note: if you or your team have data residency requirements or want to avoid routing requests through Moonshot's cloud API, self-hosting the open weights via Hugging Face is an option. But it's resource-intensive. Reporting suggests you'll need multi-H100 class GPU hardware for reasonable latency at the model's full precision. For most non-technical builders, the API is the practical path.

For platforms like Emergent that turn descriptions into working products, this trend is unambiguously positive. Better models at lower costs mean better outputs for the people using them, whether they ever touch a line of code or not.

The race between open and closed AI models isn't slowing down. It's accelerating. And every step closer to parity is a step that makes AI-powered building more accessible.

Stay updated on the latest AI developments that matter for builders. Check out the Emergent news section for more.

kimi 2.7 code
Build your app in minutes

Emergent turns your idea into a full-stack web or mobile app, no coding required.

  • No coding required
  • Web & mobile apps
  • Deploys instantly
Sign up
Start Building
on emergent today
Try Emergent