What Is Kimi 2.7 Code? Moonshot AI's Open-Source Coding Model Explained

Kimi 2.7 code is Moonshot AI's open-source 1T agentic coding model. Specs, benchmarks, pricing, limitations, and how it compares to GPT-5.5 and Claude.

Written by

Bhavyadeep

Reviewed by

Sakthy

Last updated:

June 29, 2026

min read

Table of Contents

Heading

Kimi 2.7 code is Moonshot AI's newest open-source coding model, released on June 12, 2026. If you have been tracking AI model launches this year, you know the pace is relentless. A new model drops nearly every week, and most of them blur together.

Kimi 2.7 (officially named Kimi K2.7 Code) is worth paying attention to for a specific reason: it is one of the largest open-weight coding models available, it costs a fraction of what proprietary alternatives charge, and it is built specifically for the kind of long, multi-step coding workflows that most models still struggle with. This guide covers what Kimi Code 2.7 actually is, how it works under the hood, where it leads, where it falls short, and how it compares to GPT-5.5 and Claude Opus 4.8.

TL;DR

Kimi Code 2.7 is an open-source AI coding model from Moonshot AI, released June 12, 2026.
1 trillion total parameters, 32 billion active per token (Mixture-of-Experts architecture), 256K context window.
Roughly 30% fewer reasoning tokens than its predecessor K2.6, with higher scores on Moonshot's coding benchmarks.
API pricing: $0.95 per million input tokens, $4.00 per million output tokens.
All published benchmarks are Moonshot first-party. No independent third-party scores exist yet.
Best suited for high-volume agentic coding pipelines where cost and open-weight flexibility matter more than frontier benchmark leadership.

What is Kimi Code 2.7?

Kimi Code 2.7, is a coding-focused, open-source AI model developed by Moonshot AI and released on June 12, 2026. Built on a 1 trillion parameter Mixture-of-Experts (MoE) architecture, it is designed for long-horizon software engineering tasks where the model plans, writes, tests, and debugs code autonomously across many steps.

Moonshot AI is a Beijing-based AI company founded in 2023 by Zhilin Yang, a Tsinghua University alumnus. The company is backed by Alibaba and operates the Kimi chatbot, which has grown to over 36 million monthly active users. The K2 model family is Moonshot's open-weight play, and K2.7 Code is the fifth major release in under a year.

One distinction matters upfront: Kimi K2.7 Code is not a general-purpose chatbot. Moonshot AI explicitly recommends its predecessor, K2.6, for general tasks like writing, analysis, and conversation. K2.7 Code is a coding specialist, tuned for agentic workflows where the model operates across files, tools, and terminal commands over extended sessions.

The model weights are available on Hugging Face under a Modified MIT license. That license permits commercial use, with an attribution requirement that only kicks in above 100 million monthly active users or $20 million in monthly revenue.

Release	Date	Focus
Kimi K2	July 2025	Base 1T MoE model, Apache 2.0
K2-Thinking	November 2025	Chain-of-thought reasoning
K2.5	January 2026	Multimodal + Agent Swarm v1
K2.6	April 2026	12-hour runs, 300-agent swarms
K2.7 Code	June 2026	Coding efficiency, 30% fewer tokens

Five major releases in 11 months signals that Moonshot is iterating aggressively on the developer tooling market. Each release is built on the same 1T MoE backbone while sharpening a different capability.

Technical architecture and key specs

Kimi K2.7 Code uses a Mixture-of-Experts architecture with 1 trillion total parameters and 32 billion activated per token. It includes 384 experts (eight selected plus one shared per token), 61 layers, Multi-head Latent Attention (MLA), SwiGLU activations, and a 400 million parameter MoonViT vision encoder for multimodal input.

1. Architecture overview

The MoE design is the reason K2.7 Code can pack a trillion parameters without trillion-parameter inference costs. Only 32 billion parameters activate on any given token, which means the model's compute cost scales with the active count, not the total. This is the core efficiency argument for the entire K2 family.

The architecture uses 384 experts with eight routed and one shared per token, spread across 61 layers. Attention runs through MLA (Multi-head Latent Attention), which compresses the key-value cache for longer contexts. The vocabulary covers 160,000 tokens, and the feed-forward layers use SwiGLU activation.

For teams already running K2.5 or K2.6, K2.7 Code shares the same backbone. Moonshot AI confirms that existing deployment configurations can be reused directly. Swapping from K2.6 to K2.7 Code is a one-line model ID change.

2. Context window and multimodal input

K2.7 Code supports a 256K token context window (262,144 tokens). That is large enough to hold entire repository-scale codebases, long multi-turn debugging sessions, and the kind of extended agent loops that shorter-context models lose track of.

The model also accepts multimodal input through MoonViT, a 400 million parameter vision encoder. It handles text, images (PNG, JPEG, WEBP, GIF), and video (experimental on the official API). In practice, that means developers can upload a screenshot of a UI bug, a product mockup, or a recorded reproduction alongside a code prompt. For front-end development and visual debugging, this adds a workflow that text-only models cannot match.

3. Mandatory thinking mode

Every call to K2.7 Code reasons before responding. Thinking mode is always on and cannot be disabled. The model also forces preserve_thinking, which retains full reasoning context across multi-turn conversations. Sampling parameters are locked at temperature 1.0 and top_p 0.95.

For developers who need deterministic output or tighter control over generation, this is a real constraint. You cannot toggle thinking off for simple tasks, and you cannot lower the temperature for more predictable responses. The tradeoff is reasoning depth at the cost of output control.

Spec	K2.7 Code	K2.6
Total parameters	1T	1T
Active parameters	32B	32B
Context window	256K	256K
Thinking mode	Always on (mandatory)	Supports both
Reasoning token efficiency	~30% fewer than K2.6	Baseline
Focus	Coding and agentic tasks	General-purpose
License	Modified MIT	Modified MIT

What changed from K2.6 to K2.7 Code

The single most important improvement is token efficiency. Moonshot AI reports that K2.7 Code uses roughly 30% fewer reasoning tokens than K2.6 while scoring higher on every coding benchmark the company published. In concrete terms: a long agentic run that previously consumed around two million reasoning tokens on K2.6 now uses closer to 1.4 million on K2.7 Code. The accuracy went up at the same time.

Benchmark gains over K2.6 (all Moonshot-reported):

Kimi Code Bench v2: 62.0 vs 50.9 (+21.8%)
Program Bench: 53.6 vs 48.3 (+11.0%)
MLS Bench Lite: 35.1 vs 26.7 (+31.5%)

Agentic benchmarks also improved. On MCP Atlas, K2.7 Code scores 76.0 (up from K2.6's 69.4). On MCP Mark Verified, which tests tool invocation across real-world environments like GitHub, Postgres, and Playwright, K2.7 Code reaches 81.1 (up from 72.8).

An honest caveat belongs here. Every benchmark published for K2.7 Code comes from Moonshot AI's own proprietary suites. As of June 2026, no independent third-party results exist on SWE-bench Verified, SWE-bench Pro, Terminal-Bench, LiveCodeBench, or any other public leaderboard. Multiple developer reports on forums note that the headline numbers do not always replicate in production repositories. Treat the gains as directional and meaningful, but not independently verified.

Want a deeper look at what's changed? Our Kimi 2.7 Code vs Kimi 2.6 comparison covers the benchmark improvements, token efficiency, and new capabilities in detail.

Benchmark comparison with GPT-5.5 and Claude Opus 4.8

The comparison most developers care about is how K2.7 Code stacks up against the proprietary frontier. Moonshot AI published its own head-to-head table, and the numbers tell a nuanced story.

Benchmark	K2.7 Code	GPT-5.5	Claude Opus 4.8
Kimi Code Bench v2	62.0	69.0	67.4
Program Bench	53.6	69.1	63.8
MLS Bench Lite	35.1	35.5	42.8
MCP Mark Verified	81.1	92.9	76.4

K2.7 Code does not beat GPT-5.5 on any of the four benchmarks shown here. It does beat Claude Opus 4.8 on one: MCP Mark Verified, which measures tool invocation accuracy across the Model Context Protocol. That 81.1 vs 76.4 gap matters for teams building agent pipelines that integrate with databases, APIs, and development toolchains.

On MLS Bench Lite, K2.7 Code nearly matches GPT-5.5 (35.1 vs 35.5), which is notable given the price difference.

The methodology matters too. K2.7 Code was tested through Kimi Code CLI with thinking enabled, GPT-5.5 ran in Codex at xhigh mode, and Opus 4.8 ran in Claude Code at xhigh mode. Different harnesses can shift results meaningfully.

K2.7 Code is not trying to win on raw capability. Its pitch is the combination of open weights, dramatically lower cost, strong MCP tool-use performance, and self-hosting flexibility. For teams choosing between models, the question is not "which scores highest" but "which delivers the best results per dollar on my specific workload."

Pricing and access

1. API pricing

K2.7 Code is priced for volume. The Moonshot AI pricing page lists the following rates (pricing as of June 2026):

Cache-miss input: $0.95 per million tokens
Cached input: $0.19 per million tokens
Output: $4.00 per million tokens

For comparison, Claude Opus 4.8 runs at $5/$25 per million input/output tokens. K2.7 Code is roughly five times cheaper on input and six to seven times cheaper on output. Stack the 30% reduction in reasoning tokens on top of those rates, and the per-task cost gap widens further for long agentic runs.

2. Access methods

There are five paths to K2.7 Code, each suited to a different use case:

Method	Best for	Cost
Kimi web app (kimi.com)	Quick questions, testing the model	Free
Kimi Code CLI	Terminal-based agentic coding	Free starter quota, then $19+/month
Kimi API (platform.moonshot.ai)	Integrating into your own tools and workflows	$0.95/$4.00 per 1M tokens
Self-hosting via Hugging Face	Data residency, high-volume, full control	Hardware cost only
Third-party providers (OpenRouter, Cloudflare)	Multi-model routing	Varies by provider

Moonshot also offers a HighSpeed version (kimi-k2.7-code-highspeed) that outputs at approximately 180 tokens per second, reaching up to 260 tokens per second in short-context scenarios. Resource availability for the high-speed model may fluctuate during the initial rollout.

3. Licensing

The Modified MIT license permits commercial use. The attribution requirement only triggers above 100 million monthly active users or $20 million in monthly revenue. Below those thresholds, the model is effectively permissive for startups and mid-market teams.

What Kimi Code 2.7 is good at

Kimi Code 2.7 earns its keep in a specific class of work: long, multi-step coding tasks run at scale where cost per token matters.

Long-horizon agentic coding is the headline capability. The model plans an approach, edits files across a repository, runs shell commands, calls tools, inspects results, and iterates. Tasks like codebase-wide refactors, multi-file feature implementations, and extended debugging sessions are where K2.7 Code is designed to operate. Compared to K2.6, it follows instructions more reliably across extended contexts and achieves higher end-to-end task success rates.

MCP tool use is where K2.7 Code posts its strongest comparative result. The 81.1 score on MCP Mark Verified (beating Claude Opus 4.8's 76.4) measures precise tool invocation across real-world environments including Notion, GitHub, Postgres, Filesystem, and Playwright. For teams building agent pipelines that chain CI checks, ticket updates, and file edits in a single loop, this is the metric that matters most.

Token efficiency compounds across long sessions. The 30% reduction in reasoning tokens means fewer tokens billed per task, faster step completion in interactive CLI sessions, and more headroom within the 256K context window before the model needs to compact its history.

Drop-in compatibility lowers switching costs. K2.7 Code works with Claude Code, Cline, Roo Code, OpenCode, and Aider through Anthropic-compatible API routing. For developers already using any of these tools, testing K2.7 Code requires changing two environment variables (base URL and API key) and nothing else.

Multimodal coding adds a visual dimension. Upload a screenshot of a broken layout, a design mockup, or a recorded bug reproduction alongside a code prompt. The MoonViT vision encoder processes the visual input natively, so the model can generate or fix code based on what it sees.

Honest limitations

K2.7 Code has real constraints that affect whether it fits your workflow. Ignoring them leads to wasted evaluation time.

No independent benchmark verification exists. Every performance number published for K2.7 Code comes from Moonshot AI's own test suites. Until SWE-bench Verified, Terminal-Bench, or another independent leaderboard includes K2.7 Code, the benchmark claims remain directional vendor data. Several developers have publicly noted that the headline numbers do not fully replicate on production repositories.

It is a coding specialist, not a generalist. Moonshot AI explicitly recommends K2.6 for writing, analysis, and conversation. If you route general-purpose queries through K2.7 Code, you are paying for coding-focused optimization on tasks that do not benefit from it.

Mandatory thinking mode limits control. You cannot disable thinking, cannot change the temperature from 1.0, and cannot adjust top_p from 0.95. For tasks that need deterministic or low-variance output, this is a meaningful limitation that competing models handle more flexibly.

Self-hosting demands serious hardware. The INT4 quantized version requires roughly 577GB of VRAM. A full FP16 deployment needs a multi-node cluster. Community quantizations from Unsloth reduce the footprint, but even the smallest practical configuration needs hundreds of gigabytes of RAM or VRAM. Most teams are better served by the API unless data residency or volume economics force the issue.

Context degradation at extreme lengths has been reported. Developers dumping very large repositories into a single context window have observed "lost in the middle" behavior where the model loses track of content injected in the middle of the context. The workaround (sequential file reading through the agent) adds workflow complexity.

Data governance on the hosted API is worth evaluating. Moonshot AI is headquartered in Beijing. Teams sending proprietary code through the hosted API should consider their data residency requirements. Self-hosting resolves this concern but introduces the hardware cost described above.

The 256K context window is large, but it trails Claude Opus 4.8's one million token window by a factor of four. For tasks that genuinely need a full-repository view held in a single context, that gap matters.

Who should use Kimi Code 2.7 (and who should not)

K2.7 Code fits well if you run high-volume agentic coding pipelines where API cost is a primary constraint. Teams already on K2.6 can upgrade with a one-line model ID swap. If you need an open-weight model for self-hosting, data residency, or vendor independence, Kimi Code 2.7 is one of the strongest options available in June 2026. And if your workflows depend heavily on MCP tool use, K2.7 Code's 81.1 score on MCP Mark Verified puts it ahead of Claude Opus 4.8 on that specific capability.

Skip it if you need the absolute frontier on hard reasoning. Claude Opus 4.8 and GPT-5.5 still lead on independently verified benchmarks. Skip it if you need a general-purpose model for writing, research, or conversation. Skip it if you want proven third-party benchmark results before committing. And skip it if your tasks require a context window larger than 256K tokens.

For non-technical builders, the calculus is different entirely. If you are building applications without writing code, choosing between individual AI models is the wrong layer to optimize. You need a platform that handles model selection, infrastructure, and deployment for you. Emergent, for example, lets you build full-stack apps using GPT, Claude, and Gemini through a single Universal LLM Key. You describe what you want, and Emergent's agents handle the engineering. As more models like Kimi Code 2.7 enter the market, the value of platforms that let you switch between models without rebuilding your entire stack only grows.

Build with the model that fits your project

Kimi Code 2.7 is Moonshot AI's strongest open-source coding model. It is cheaper and more token-efficient than K2.6, competitive with frontier proprietary models on MCP tool use, and available as open weights for anyone who needs to self-host. For teams running cost-sensitive agentic coding at scale, it belongs in the evaluation queue alongside Claude Opus 4.8, GPT-5.5, DeepSeek V4, and GLM-5.1.

Kimi Code 2.7 is not the best model on every benchmark. It is the best value proposition for a specific class of work: high-volume, long-horizon coding pipelines where open weights, low per-token cost, and strong tool-use performance matter more than raw benchmark leadership on proprietary test suites.

The AI model landscape is moving fast. The model you choose today may not be the model you need in six months. Platforms that let you swap models without rebuilding your stack are how you stay flexible. Emergent lets you build full-stack apps using GPT, Claude, and Gemini through a single Universal LLM Key. Start Building.

Build your app in minutes

Emergent turns your idea into a full-stack web or mobile app, no coding required.

No coding required
Web & mobile apps
Deploys instantly

Frequently Asked Questions

Your Questions, Answered

Is Kimi Code 2.7 free to use?

The model weights are free to download from Hugging Face under a Modified MIT license. The Kimi web app offers free chat access, and the Kimi Code CLI includes a free starter quota that refreshes weekly. The hosted API is pay-per-token at $0.95 per million input tokens and $4.00 per million output tokens.

What is the difference between Kimi Code 2.7 and Kimi K2.6?

K2.7 Code is a coding-focused specialist built on K2.6. It uses roughly 30% fewer reasoning tokens and scores higher on Moonshot AI's coding benchmarks. K2.6 remains the recommended model for general-purpose tasks like writing, analysis, and conversation.

Can I use Kimi Code 2.7 in Claude Code or Cursor?

Yes. Kimi K2.7 Code offers an Anthropic-compatible API. You can route Claude Code, Cline, Roo Code, and other agent frameworks to K2.7 Code by changing your base URL and API key environment variables. No code changes are required.

Is Kimi Code 2.7 better than GPT-5.5 or Claude Opus 4.8?

On most published benchmarks, GPT-5.5 and Claude Opus 4.8 lead. K2.7 Code beats Opus 4.8 on MCP Mark Verified (81.1 vs 76.4) and costs roughly five to seven times less per token. For cost-sensitive, high-volume agentic coding, K2.7 Code offers the strongest value. For frontier reasoning quality, the proprietary models still lead.

Who made Kimi Code 2.7?

Moonshot AI, a Beijing-based AI company founded in 2023 by Zhilin Yang. The company is backed by Alibaba and operates the Kimi chatbot with over 36 million monthly active users.

Can I run Kimi Code 2.7 locally?

Technically yes, but the hardware requirements are steep. The INT4 quantized version needs roughly 577GB of VRAM. A Mac Studio with 512GB unified memory can run heavily quantized versions through Unsloth, but performance is slow. For most developers, the API at $0.95/$4.00 per million tokens or the Kimi Code CLI is more practical than self-hosting.

Start Building
on emergent today

Try Emergent

Build Full-Stack

Web & mobile apps in minutes

Continue with Google

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

By continuing, you agree to our
Terms of Service and Privacy Policy.