What Is Kimi? The Complete Guide to Moonshot AI's Chatbot and Models
Learn what Kimi is, from Moonshot AI's founding story and model release timeline to K2.6 benchmarks, Agent Swarm, Kimi Work, pricing, and how it compares to ChatGPT and Claude.
Kimi is the AI chatbot and model family from China's Moonshot AI. This guide covers its origins, every major model release, key features, pricing, and where it stands in 2026.
What is Kimi?
Kimi is both a consumer AI chatbot and a family of large language models developed by Beijing-based Moonshot AI. It launched in October 2023, and the name comes from founder Yang Zhilin's English name, "Kimi," which he has used since his university years.
What made Kimi stand out from the start was context length. The first version supported 128,000 tokens of lossless context, which was among the largest context windows available at the time. By March 2024, Moonshot AI upgraded Kimi to handle 2 million Chinese characters in a single prompt, a 10x jump that caused a two-day service outage from the flood of new users.
Today, Kimi is far more than a chatbot. It has expanded into a productivity platform called Kimi Work, with tools for coding, deep research, document creation, slides, spreadsheets, and multi-agent orchestration, as confirmed on Moonshot's official product pages. The underlying models have evolved in parallel, with the Kimi K2.7 Code variant released in June 2026 representing the latest in a rapid series of open-weight releases.
Who built Kimi? The Moonshot AI story
Moonshot AI was founded in March 2023 by Yang Zhilin and Tsinghua University alumni Zhou Xinyu and Wu Yuxin. Yang studied computer science at Tsinghua and earned his PhD at Carnegie Mellon University, where he co-authored the influential Transformer-XL and XLNet papers.
Moonshot is one of China's "four new AI tigers" alongside Zhipu AI, Baichuan, and MiniMax, though it remains the leanest of the group at roughly 80 employees during its early growth phase.
Fundraising has moved fast. Alibaba led a $1 billion round in February 2024 at a $2.5 billion valuation, taking a 36% stake. In May 2026, TechCrunch reported that Moonshot closed a $2 billion round at a $20 billion valuation led by Meituan's Long-Z Investments, with total fundraising over the prior six months reaching $3.9 billion.
As of June 2026, Bloomberg reported the company was in early discussions for a new round targeting a $30 billion valuation, and separately considering a Hong Kong IPO. These later figures have not been independently confirmed by Moonshot. Annual recurring revenue reportedly topped $200 million by April 2026 per the company's financial advisor. Key backers include Alibaba, Tencent, HongShan (formerly Sequoia China), Meituan, and IDG Capital.
Kimi model release timeline (2023 to 2026)
Moonshot AI has maintained a rapid release cadence, shipping a significant model update roughly every two to three months since mid-2025. Here is every major Kimi release to date.
Kimi model release timeline, October 2023 to June 2026
Two releases stand out as the most relevant today.
Kimi K2.6 (April 2026) is the current flagship. According to Moonshot's reported benchmarks, it scores 58.6% on SWE-Bench Pro (comparable to GPT-5.5 on the same test) and 54.0% on Humanity's Last Exam with tools, while costing significantly less per million tokens than comparable closed models. On the agentic side, K2.6 expanded the Agent Swarm system to 300 sub-agents and 4,000 coordinated steps, up from 100 sub-agents in K2.5.
Kimi K2.7 Code (June 2026) is the newest release, focused specifically on coding and agentic tasks. It reports a 21.8% improvement on Kimi Code Bench v2 over K2.6 while using roughly 30% fewer reasoning tokens. One important caveat: as of its release, all published benchmarks for K2.7 are Moonshot's own proprietary suites, with no independent third-party results yet available on standard public leaderboards.
If you're evaluating both versions, our Kimi 2.7 Code vs Kimi 2.6 comparison highlights the biggest differences.
How does Kimi work? Architecture and key technologies
Kimi's performance comes from a set of specific engineering choices. Understanding the architecture helps explain both why it performs well on coding tasks and why it stays cost-effective.
1. Mixture-of-Experts (MoE) architecture
The K2 series uses a 1-trillion-parameter MoE architecture with 32 billion active parameters per token. The model contains 384 routed experts plus one shared expert, with eight experts selected per token across 61 transformer layers. It uses Multi-head Latent Attention (MLA) and SwiGLU activation.
The practical effect: Kimi carries the knowledge capacity of a trillion-parameter model while only computing 32 billion parameters for any given input. This is what makes its API pricing a fraction of dense models with comparable output quality.
2. MuonClip optimizer
Training a trillion-parameter model typically involves loss spikes and manual intervention. Moonshot's custom MuonClip optimizer allowed K2's pre-training to complete on 15.5 trillion tokens with zero loss spikes. The research, published jointly with UCLA, demonstrated that the Muon optimizer (previously known to work only on small models) could improve computational efficiency by a factor of two compared to the standard AdamW optimizer. Moonshot open-sourced the implementation.
3. Kimi Delta Attention (KDA) and Kimi Linear
Released in October 2025, Kimi Linear introduced a new attention mechanism called Kimi Delta Attention. KDA extends Gated DeltaNet with a finer-grained gating mechanism, and the results were significant: it reduces KV cache usage by up to 75% and achieves up to 6x higher decoding throughput at a 1-million-token context window. The architecture uses a 3:1 KDA-to-MLA hybrid ratio that balances quality and efficiency. Moonshot open-sourced the KDA kernel and vLLM implementation on GitHub.
4. Agent Swarm
Agent Swarm is Kimi's multi-agent orchestration system. Rather than processing tasks sequentially with a single agent, Kimi decomposes complex problems into parallelizable subtasks assigned to dynamically instantiated specialist agents.
K2.5 introduced the system with 100 sub-agents and 1,500 tool calls. K2.6 scaled it to 300 sub-agents and 4,000 coordinated steps. Moonshot trained this capability using Parallel-Agent Reinforcement Learning (PARL), which incentivizes parallel execution to prevent the orchestrator from defaulting to single-agent mode. The result is a 4.5x speedup over single-agent execution on tasks requiring broad information gathering.
Documented production use cases include a 13-hour unsupervised optimization of a financial matching engine that produced a 185% throughput improvement across 1,000+ tool calls.
5. MoonViT vision encoder
Kimi K2.5 and later models use MoonViT-3D, a 400-million-parameter vision encoder based on SigLIP-SO-400M. It uses the NaViT packing strategy for variable-resolution images, supporting images up to 4K resolution and video up to 2K in formats including PNG, JPEG, WebP, MP4, MOV, and WebM. The vision capabilities are native to the model's pretraining rather than a bolt-on module added after the fact, which means visual and language understanding developed in tandem.
Kimi Work: the productivity platform
Kimi Work is Moonshot AI's integrated workspace built around Kimi's models. According to the official K2.6 product page, it combines AI chat, coding, research, document creation, slides, spreadsheets, and agent orchestration in a shared-context environment. The pitch is that switching between tools within Kimi Work retains session context, so a research task can flow into a report and then into a slide deck without re-pasting content.
1. Modes
Kimi Work offers four distinct operating modes:
- Instant: Fast Q&A without a reasoning trace. Low latency, useful for quick lookups and simple code completions
- Thinking: Full chain-of-thought reasoning with tool calls. This is the mode that produces Kimi's benchmark scores
- Agent: Multi-step task execution that generates structured outputs like documents, slides, websites, and research reports
- Agent Swarm (beta): Parallel multi-agent orchestration. Available from the Allegretto tier ($39/month) and above
2. Tool suite
The platform integrates ten distinct tools under one roof:
- AI Chat: Shared-context foundation for every session
- Kimi Code: Terminal-first coding agent with CLI and IDE integration, supporting Python, Rust, Go, and more
- Deep Research: Autonomous web research across hundreds of live sources with structured report output
- Docs: Formatted document generation
- Slides: Presentation generation with editable SmartArt (timelines, flowcharts, funnels)
- Sheets: Spreadsheet and formula generation with pivot tables and charts
- Websites: Full-stack web generation from prompts or design mockups
- Document-to-Skills: Convert PDFs into reusable custom skills for consistent output
- Kimi Claw / Claw Groups: Human-in-the-loop intervention during swarm sessions
- Kimi Work desktop agent: Local file access, WebBridge browser automation, and cron scheduling for background tasks
3. Visual coding and design-to-code
Kimi's visual coding capabilities let you upload screenshots, mockups, or design files and have the model convert them into structured, production-ready code. The K2.6 release expanded this to include frontend animations, WebGL shaders, and scroll-triggered interactions. Combined with full-stack generation from natural language prompts, Kimi can produce working websites with authentication, database layers, and deployment configurations from a single instruction.
That said, Kimi's design-to-code workflow still generates code you need to assemble and deploy yourself. If your goal is to go from an idea to a fully deployed, custom web or mobile app without writing or managing any code, Emergent takes a different approach. You describe what you want in plain English, and Emergent's AI agents build the complete application for you, including hosting, database, and deployment. Kimi generates the code; Emergent delivers the finished product.
How to access Kimi?
1. Web, mobile, and desktop
Kimi is available through kimi.com for web access, iOS and Android apps for mobile, and the Kimi Work desktop agent for local file access and background automation. There is also a Chrome extension for browser integration.
2. API access
Developers can access Kimi through an OpenAI-compatible API at platform.moonshot.ai. Pricing as of June 2026:
Kimi API pricing as of June 2026. Verify current rates at platform.moonshot.ai.
Kimi models are also available through third-party providers including OpenRouter, DeepInfra, Fireworks, and Cloudflare Workers AI.
3. Open weights (self-hosting)
K2 series model weights are available on Hugging Face (see the K2.6 model card and K2.7 Code model card). Moonshot describes the license as a Modified MIT license. Per the Verdent Guides analysis of the license terms, commercial use is permitted, with one condition: products exceeding $20 million in monthly revenue or 100 million monthly active users must prominently display Kimi K2 branding. Check the license file in the specific Hugging Face repository for the exact terms before deploying commercially.
Self-hosting requires substantial GPU infrastructure for the full 1T model. Quantized versions are available for more accessible deployments, though expect significant performance trade-offs on consumer hardware.
4. Pricing and plans
Kimi offers a free tier with unlimited basic chat and usage limits on advanced features like Agent Swarm and Deep Research. International paid tiers include:
- Adagio: Free (basic access)
- Moderato: $19/month (unlocks visual mode, priority queue, Deep Research integration)
- Allegretto: $39/month (adds Agent Swarm access)
- Higher tiers available for enterprise needs
In China, Kimi offers six plan tiers ranging from 5.2 yuan (four days) to 399 yuan (annual). App membership covers tool quotas but does not include API token credits, which are billed separately.
Who uses Kimi?
1. Developer and enterprise adoption
Kimi K2.6 is the second most-used LLM on OpenRouter as of May 2026. When K2 first launched in July 2025, it became the fastest-downloaded model on Hugging Face within a single day.
Enterprise evaluations have included Vercel, Factory.ai, and CodeBuddy, all of which reported positive results. Vercel, for instance, reported a 50%+ improvement on their internal Next.js benchmark versus K2.5. The Kimi Code CLI is positioned as a direct competitor to Claude Code and GitHub Copilot, with subscription plans starting at $19/month.
Individual developers have gravitated toward Kimi for cost reasons. One developer interviewed by Thoughtworks noted that tasks costing $10 to $20 with Claude Sonnet 4 could be completed with Kimi K2 for roughly $7 across ten similar tasks.
2. Consumer user base in China
Kimi's consumer trajectory in China has been uneven. It ranked third in monthly active users as of August 2024, dropped to seventh by June 2025 after the DeepSeek wave, and sat at eighth as of April 2026. It trails ByteDance's Doubao, Alibaba's Qwen, DeepSeek, and Tencent's Yuanbao in raw user numbers.
The gap between Kimi's consumer ranking and its developer adoption tells a meaningful story. Moonshot's strength is increasingly on the technical and API side, not in the consumer chatbot race.
3. Open-source community
Moonshot's open-weight strategy has created a compounding ecosystem effect. Each release generates more fine-tuned variants, more production feedback, and more adoption from the developer community, which feeds back into the next release cycle. The pace of iteration, five major releases between July 2025 and June 2026, is faster than most Western frontier labs.
Chinese open-source models as a category grew from 1.2% of global AI usage in late 2024 to nearly 30% by end of 2025, with Kimi and DeepSeek leading that shift.
How Kimi compares to ChatGPT, Claude, and Gemini
Kimi's positioning is distinct from Western frontier models in several ways. The comparison table below uses K2.6, Kimi's current flagship, against the closest competitors as of mid-2026.
Kimi K2.6 vs leading frontier models, as of June 2026. This comparison reflects models available at K2.6's April 2026 launch. Claude Opus 4.8 was released after K2.6, so head-to-head benchmarks are limited
Where Kimi leads (per vendor-reported data)
Agentic coding tasks, cost efficiency (significantly cheaper than Claude Opus for comparable output based on published API pricing), open-weight availability for self-hosting, and the built-in Agent Swarm orchestration system.
Where Kimi trails
Pure mathematical reasoning (GPT-5.4 leads on AIME 2026 and GPQA-Diamond per published benchmarks), single-turn creative writing, multimodal breadth (Kimi's multimodal scores lag behind the best closed models), and enterprise trust/compliance considerations for organizations with strict AI supply-chain requirements.
Where it's comparable
On coding benchmarks like SWE-Bench Verified and SWE-Bench Pro, Kimi K2.6 posts scores in the same range as Claude Opus 4.6 and GPT-5.4, though results vary by benchmark and evaluation conditions. The gap between open-weight and closed-frontier models has narrowed considerably.
Kimi in 2026: from long-context pioneer to open-weight frontrunner
In three years, Kimi has gone from a Chinese chatbot notable for its context window to a trillion-parameter open-weight model family that posts competitive scores against closed models from OpenAI, Anthropic, and Google on vendor-reported coding benchmarks. Moonshot AI's reported $20 billion valuation, reported $200 million ARR, and second-place ranking on OpenRouter tell the story of a company that found its footing by open-sourcing what most competitors keep proprietary.
For developers and teams evaluating AI models in 2026, Kimi belongs on the shortlist for agentic coding, long-horizon autonomous tasks, and any use case where cost efficiency and open-weight flexibility matter. Its weaknesses in pure reasoning and multimodal breadth are real, but the gap is closing with each quarterly release.
If reading about Kimi's capabilities has you thinking about building your own AI-powered tool or app, you don't need to manage models, infrastructure, or code yourself. Emergent lets you describe what you want in plain English, and its AI agents build, deploy, and host the finished product, using models from OpenAI, Claude, and Google AI through one platform. Start Building today.

Emergent turns your idea into a full-stack web or mobile app, no coding required.
- No coding required
- Web & mobile apps
- Deploys instantly
Frequently Asked Questions
Your Questions, Answered
Yes. Kimi.com offers a free tier with unlimited basic chat. Advanced features like Agent Swarm and Deep Research have usage limits on free plans. Paid international plans start at $19/month (Moderato).
The K2 series models are open-weight, meaning you can download the weights from Hugging Face and self-host them. Moonshot describes the license as a Modified MIT license. The branding requirement applies above $20 million in monthly revenue or 100 million monthly active users. Check the license file in the specific Hugging Face repository for exact terms.
Moonshot AI is a privately held Chinese company founded by Yang Zhilin, Zhou Xinyu, and Wu Yuxin. Major investors include Alibaba (approximately 36% stake), Tencent, Meituan, HongShan, and IDG Capital. As of June 2026, the company is reportedly valued at $20 billion per TechCrunch.
Yes. Kimi.com, the mobile apps, and the API are accessible internationally. Some features may vary by region, and pricing differs between mainland China (RMB) and international users (USD). The open-weight models can be self-hosted anywhere with appropriate hardware.
Both are open-weight Chinese AI model families with strong coding performance. Kimi's K2 series focuses on agentic coding and Agent Swarm orchestration, while DeepSeek emphasizes efficient training methodology and reasoning. DeepSeek's V4-Pro is larger and reportedly cheaper to run, while Kimi's Agent Swarm offers a multi-agent parallel execution system that DeepSeek does not have.
Agent Swarm is Kimi's built-in multi-agent system, where the model orchestrates up to 300 specialized sub-agents working in parallel on a single task. It can execute 4,000+ coordinated steps and achieves a 4.5x speedup over single-agent execution. It is available from the Allegretto plan ($39/month) and above.
on emergent today
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.






