GLM 5.2 vs Kimi K2.7 Code: Which Coding Model Wins in 2026?
GLM 5.2 vs Kimi K2.7 Code compared on benchmarks, pricing, context, and agentic coding. Find which open-weight model fits your workflow.
Two open-weight coding models launched within 24 hours of each other in June 2026, and both claim to rival the closed-source frontier. GLM 5.2 from Z.ai (Zhipu AI) and Kimi K2.7 Code from Moonshot AI are built for the same job: long-horizon, agentic software engineering. But they approach it with different architectures, different context windows, different pricing, and critically different levels of public benchmark evidence.
If you build coding agents, run autonomous engineering workflows, or just want the strongest open-weight model you can self-host, this is the comparison that matters. We break down what each model actually delivers, where each one leads, and how to pick between them based on your workload.
What is GLM 5.2?
GLM 5.2 is Z.ai's flagship open-weight coding model, released on June 13, 2026. It is the third major iteration in the GLM-5 line, following GLM-5 (February 2026), GLM-5-Turbo (March), and GLM-5.1 (April).
Z.ai is the international brand for Zhipu AI, a Beijing-based foundation model company spun out of Tsinghua University in 2019. The company completed a Hong Kong Stock Exchange IPO in January 2026, raising approximately $558 million.
GLM 5.2 runs a 744-billion-parameter Mixture-of-Experts (MoE) architecture with roughly 40 billion active parameters per token. Its defining feature is a 1M-token context window, confirmed in Z.ai's developer documentation and the official Hugging Face blog post, enabled by a new sparse-attention optimization called IndexShare. The model ships under a fully permissive MIT license with open weights on Hugging Face.
Two reasoning-effort levels let users balance capability against latency: "High" for routine tasks and "Max" for complex coding and reasoning work.
What is Kimi K2.7 Code?
Kimi K2.7 Code is Moonshot AI's latest coding-focused model, released one day earlier on June 12, 2026. It is the fifth major release in the K2 series, built directly on the K2.6 architecture rather than introducing a new foundation.
Moonshot AI is a Beijing-based company founded in 2023 by Zhilin Yang, backed by Alibaba, and valued at approximately $4.8 billion. The Kimi chatbot now has over 36 million monthly active users.
K2.7 Code uses a 1-trillion-parameter MoE architecture with 32 billion active parameters per token and 384 experts. It supports a 256K-token context window and uses Multi-head Latent Attention (MLA). The model also includes MoonViT, a 400M-parameter vision encoder that accepts image and video input natively.
The model always runs with thinking enabled. You cannot disable the reasoning mode. It also enforces "preserve_thinking," which retains the full reasoning chain across multi-turn conversations. Both design choices optimize for agentic workflows but mean you cannot run it in a cheap, no-reasoning mode for trivial calls.
K2.7 Code is open-weight under a Modified MIT license on Hugging Face.
Architecture comparison
Both models are massive MoE transformers, but the architectural differences shape what each one does well.
The 4x context window gap is the most consequential difference for production work. GLM 5.2's IndexShare optimization makes 1M-token inference practical by reusing the same attention indexer across every four sparse-attention layers, cutting per-token FLOPs by 2.9x at full context length. Kimi K2.7 Code's 256K window is solid for most single-repo work but starts to constrain workflows involving monorepos, large documentation sets, or multi-file refactoring across very large codebases.
On the other hand, K2.7 Code brings native multimodal capability. Developers can upload screenshots, diagrams, product mockups, or even videos and ask the model to generate or debug code based on them. GLM 5.2 is text-only at launch.
Benchmark comparison: what the numbers actually say
Benchmarks tell part of the story, but there is a critical caveat here: the two models have very different levels of independent verification.
GLM 5.2 has scores on standard public benchmarks (SWE-bench Pro, Terminal-Bench 2.1, FrontierSWE) that Z.ai ran and published in its model documentation and Hugging Face release blog. Third-party outlets including VentureBeat covered these numbers, and Artificial Analysis tracks GLM 5.2 on its own composite evaluations. The SWE-bench and Terminal-Bench scores themselves were produced by Z.ai's evaluation runs, not by an independent neutral harness.
Kimi K2.7 Code's published benchmarks are almost entirely from Moonshot's own proprietary test suites: Kimi Code Bench v2, Program Bench, MLS Bench Lite, and MCP Mark Verified. As of late June 2026, no independent third-party results exist for K2.7 Code on SWE-bench Verified, SWE-bench Pro, Terminal-Bench 2.1, or other standard public leaderboards. VentureBeat, Codersera, and TechTimes have all independently confirmed this gap.
That evidence gap matters when you are making a production decision.
1. Standard coding benchmarks
GLM 5.2 holds the top open-weight position on SWE-bench Pro at 62.1%, edging out GPT-5.5 (58.6%) and marking a 3.7-point gain over its own predecessor GLM-5.1 (58.4%), according to Z.ai's published results confirmed by VentureBeat. On Terminal-Bench 2.1, it scores 81.0, trailing Claude Opus 4.8 (85.0) by four points but well ahead of the open-weight field.
Kimi K2.7 Code has no SWE-bench Pro or Terminal-Bench 2.1 submissions yet. For context, K2.6 (the prior version) posted 58.6% on SWE-bench Pro and 66.7% on Terminal-Bench 2.0 in April 2026. If K2.7 Code's internal gains translate proportionally to public benchmarks, it could be competitive, but that is an assumption, not a measurement.
2. Agentic and tool-use benchmarks
This is where K2.7 Code makes its strongest case.
K2.7 Code's 81.1% on MCP Mark Verified, as reported by Moonshot, beats Claude Opus 4.8 (76.4%) on correct tool invocation via the Model Context Protocol. If your agent pipeline is MCP-heavy, that number signals a real strength. GLM 5.2 has a slight edge on MCP Atlas (77.0 vs. 76.0), which measures broader tool-use navigation.
The honest read: GLM 5.2 has stronger verified evidence across standard coding benchmarks. K2.7 Code has stronger vendor-reported performance on MCP-specific tests. Neither model dominates across the full spectrum.
3. Independent real-world testing
Community benchmarks add useful color beyond lab numbers.
A head-to-head test on daily.dev ran both models through a two-phase challenge: planning and building a feature flag backend service. GLM 5.2 scored 9.0 vs. K2.7 Code's 8.1 on planning, making explicit decisions on edge cases like negative-result caching and rollout bucketing. In the build phase, both models produced nearly identical, fully working services.
A Composion benchmark using Terminal-Bench 2.0 tasks and real tool-use suites (Gmail, GitHub, Slack, Notion) found that GLM 5.2 cost less on coding tasks and scored slightly higher on tool quality, while K2.7 Code cost less on the tool-use run ($1.78 total vs. $2.55).
One independent Rails 8 coding benchmark gave both models the same app-building prompt. GLM 5.2 scored 87/100 (Tier A, rank #6), a massive jump from GLM 5.1's 46/100. K2.7 Code scored slightly lower due to a system-prompt regression from K2.6, though both delivered functional outputs.
The pattern from independent testing: GLM 5.2 produces stronger one-shot outputs and plans with more edge-case coverage. K2.7 Code shows slightly more autonomous initiative and extends functionality beyond the original requirements, but occasionally stumbles on configuration details.
Pricing comparison
Both models undercut the closed-source frontier by a wide margin. The pricing picture breaks into two paths: per-token API access and subscription plans.
API pricing
Prices below reflect each vendor's first-party API rates. Third-party providers like OpenRouter may list lower rates due to provider competition and routing discounts.
K2.7 Code is cheaper on input tokens by about 32% ($0.95 vs. $1.40) and slightly cheaper on output ($4.00 vs. $4.40). Its cached-input rate is also lower ($0.19 vs. $0.26), which matters for agentic workflows that repeatedly reference the same codebase context.
For a practical comparison: a 10M-token daily workload at a 70/30 input/output split costs roughly $23 on GLM 5.2 and $18.65 on K2.7 Code. At scale, that difference compounds.
Both models are dramatically cheaper than the closed frontier. Claude Opus 4.8 runs $5/$25 per million tokens (input/output). GPT-5.5 costs $5/$30. GLM 5.2 and K2.7 Code both deliver roughly 80-90% savings for similar categories of work.
Subscription plans
Z.ai offers the GLM Coding Plan for use inside supported tools like Claude Code, Cline, OpenClaw, and Kilo Code. Launch-week promotional pricing started at $3/month for Lite and $15 for Pro, though standard rates have since risen (Pro reported around $72/month, Max around $160/month as of late June 2026). The plan meters usage in prompts per cycle, not tokens.
Moonshot offers Kimi Code, its terminal-first coding agent, with plans starting at $19/month under annual billing. K2.7 Code is the default model inside Kimi Code, with a high-speed mode (approximately 180 tokens/second, up to 260 in short contexts) available as a premium option.
For teams processing steady volume, self-hosting either model using the open weights eliminates per-token costs entirely, though both require significant GPU resources (8x H100 80GB minimum for practical inference on either model).
Context window: why the 4x gap matters
GLM 5.2's 1M-token context window is four times the size of K2.7 Code's 256K window. This is not just a spec-sheet number.
For single-file work, short conversations, and most isolated coding tasks, 256K tokens is more than enough. The difference shows up in specific scenarios:
- Monorepo refactoring: A large codebase with dozens of interconnected files can easily exceed 256K tokens when the model needs to hold the full dependency graph in context. GLM 5.2 handles this without truncation.
- Long agent sessions: Agentic workflows that run for hours accumulate context. An agent that forgets the architecture after 200K tokens will quietly reintroduce bugs it fixed earlier. GLM 5.2's larger window extends how long an agent can operate without losing coherence.
- Documentation + code combos: When you need the model to reference API docs, internal wikis, and your codebase simultaneously, context capacity is the bottleneck.
K2.7 Code mitigates its smaller window through token efficiency. Moonshot reports it uses approximately 30% fewer reasoning tokens than K2.6, which fits more useful work into the same 256K budget. For many real-world workflows, that efficiency gain closes the effective gap.
The practical question is whether your workload regularly exceeds 256K tokens of combined context. If it does, GLM 5.2 is the stronger fit. If it does not, K2.7 Code's lower cost and token efficiency become more relevant.
Tool use and MCP performance
Model Context Protocol (MCP) compatibility is increasingly important as coding agents interact with external services, databases, and development tools.
Moonshot reports K2.7 Code's score of 81.1% on MCP Mark Verified as the highest open-weight result on that benchmark. That figure beats Claude Opus 4.8 (76.4%) on correct tool invocation accuracy, though it still trails GPT-5.5 (92.9%). Moonshot has clearly optimized K2.7 Code's fine-tuning for the agentic loop: tool calls, MCP workflows, and multi-step execution.
GLM 5.2 leads on MCP Atlas (77.0 vs. 76.0), which measures broader tool-use navigation rather than invocation accuracy alone. Z.ai validated GLM 5.2 at launch with eight coding agents: Claude Code, Cline, OpenCode, Roo Code, Goose, Crush, OpenClaw, and Kilo Code.
If your workflow involves heavy MCP tool calls across a session, K2.7 Code's higher invocation accuracy could translate to fewer retries and cleaner agent runs. If your tool-use pattern is more varied and navigation-dependent, GLM 5.2's broader MCP Atlas score may be the better signal.
Multimodal capabilities
K2.7 Code has a clear structural advantage here. Its MoonViT vision encoder accepts images and videos natively, which means developers can upload UI screenshots, architecture diagrams, error logs rendered as images, or even video walkthroughs and ask the model to generate or debug code from visual input.
GLM 5.2 is text-only at launch. Z.ai does offer multimodal models in the GLM family (GLM-4.5V), but GLM 5.2 itself does not process images or video.
For frontend development, debugging visual regressions, or reverse-engineering existing interfaces, K2.7 Code's multimodal input is a meaningful advantage. For backend-heavy, text-centric workflows, the difference is irrelevant.
Token efficiency and reasoning costs
Moonshot reports that K2.7 Code reduces reasoning-token usage by approximately 30% compared to K2.6, and this is one of its strongest selling points. Reasoning tokens are billed at the standard output rate on both models, so a model that "thinks less" to arrive at the same answer directly reduces your bill.
GLM 5.2 addresses this differently through its selectable effort levels. Setting reasoning effort to "High" instead of "Max" produces fewer reasoning tokens on simpler tasks, giving you manual control over the cost-quality tradeoff. K2.7 Code's always-on thinking mode removes that choice. Every call goes through the full reasoning pipeline, even trivial ones.
For teams running diverse workloads (mixing complex refactors with simple formatting tasks), GLM 5.2's flexibility is valuable. For teams running homogeneous agentic coding pipelines where every task benefits from reasoning, K2.7 Code's automatic efficiency is the better deal.
Licensing and self-hosting
Both models are open-weight, but the licenses differ in a detail that matters for commercial deployment.
GLM 5.2 ships under a standard MIT license with no regional restrictions. You can download, modify, fine-tune, and deploy it commercially with no attribution requirements beyond the license text.
K2.7 Code uses a Modified MIT license. It still permits commercial use, but requires attribution for large-scale deployments. The practical difference is minimal for most teams, but enterprise legal departments may prefer the simpler MIT license.
Self-hosting either model requires serious hardware. Both need a minimum of 8x H100 80GB GPUs for INT4 inference at production quality. For most teams below enterprise scale, the API is more practical than self-hosting.
Decision framework: which model fits your workflow?
Rather than declaring a universal winner, here is how to choose based on your actual use case.
Choose GLM 5.2 when:
- Your codebase regularly exceeds 256K tokens of context.
- You need verified, independent benchmark evidence before committing.
- Your pipeline involves planning-heavy tasks where edge-case coverage matters.
- You want selectable reasoning effort to control costs across mixed workloads.
- You prefer a clean MIT license with no attribution requirements.
- You are building from scratch (frontend apps, greenfield projects).
Choose Kimi K2.7 Code when:
- Your agent pipeline is MCP-heavy and tool-invocation accuracy is the priority.
- You process images or video as part of your coding workflow.
- Lower input token cost matters at your volume (32% cheaper per million input tokens).
- Your tasks benefit from mandatory reasoning and preserve_thinking across turns.
- You run cost-sensitive, long-running agentic workflows where token efficiency compounds.
- You work primarily with repository-scale codebases within 256K tokens.
Consider both (and route by task) when:
- Your workload mixes planning and execution phases.
- You use a multi-model routing setup (OpenRouter, ofox, or similar).
- Cost optimization matters, but you also hit occasional long-context tasks.
What if you do not want to code at all?
Both GLM 5.2 and Kimi K2.7 Code are powerful models, but they are built for developers and teams comfortable with API calls, agent pipelines, and coding infrastructure. If you want to build a working app without writing code or managing model deployments, Emergent takes a different approach.
Emergent is an AI app building platform where you describe what you want in plain language, and the platform's multi-agent architecture builds a full-stack application: React frontend, Python backend, MongoDB database, authentication, payments, and deployment. You own the code and can export it anytime. It is built for founders, operators, and non-technical builders who need production software without the engineering overhead.
Skip the model comparisons and API setup. Describe your app and let Emergent handle the rest. Start Building

Emergent turns your idea into a full-stack web or mobile app, no coding required.
- No coding required
- Web & mobile apps
- Deploys instantly
Frequently Asked Questions
Your Questions, Answered
GLM 5.2 has a stronger public benchmark trail (scores on SWE-bench Pro, Terminal-Bench 2.1, FrontierSWE) and a 4x larger context window. Kimi K2.7 Code has stronger vendor-reported MCP tool-use scores and lower API pricing. Neither model is categorically better. The right choice depends on whether your priority is benchmark breadth and long context (GLM 5.2) or token efficiency and tool-calling accuracy (K2.7 Code).
Both models have open weights you can download from Hugging Face. Running them requires 8x H100 80GB GPUs minimum. New Z.ai accounts get 20 million free tokens. Neither model has a free production API tier, but third-party providers on OpenRouter offer competitive per-token rates.
Yes. K2.7 Code uses MoonViT, a 400M-parameter vision encoder, to accept images and video natively. You can upload UI screenshots, diagrams, or videos alongside text prompts. GLM 5.2 is text-only at launch.
GLM 5.2 has published SWE-bench Pro scores (62.1%). Kimi K2.7 Code has not submitted to SWE-bench Pro or SWE-bench Verified as of late June 2026. The predecessor K2.6 scored 58.6% on SWE-bench Pro, so K2.7 Code is likely competitive, but no verified number exists yet.
Self-hosting the open weights eliminates per-token costs but requires enterprise GPU hardware. For API use, K2.7 Code is cheaper on input tokens ($0.95 vs. $1.40 per million) and cached input ($0.19 vs. $0.26). GLM 5.2's Coding Plan subscription may be cheaper for high-volume single-developer use. Compare your expected token volume against both pricing models before committing.
For many coding workloads, yes, at a fraction of the cost. According to Z.ai's published comparisons, GLM 5.2 trails Opus 4.8 by four points on Terminal-Bench 2.1 (81.0 vs. 85.0) and by less than one percentage point on FrontierSWE (74.4% vs. 75.1%). The gap is real but narrow, and the 80-90% cost savings may justify the tradeoff depending on your accuracy requirements.
on emergent today
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.






