What Is Kimi? The Complete Guide to Moonshot AI's Chatbot and Models

Learn what Kimi is, from Moonshot AI's founding story and model release timeline to K2.6 benchmarks, Agent Swarm, Kimi Work, pricing, and how it compares to ChatGPT and Claude.

Written by

Bhavyadeep

Reviewed by

Sakthy

Last updated:

June 29, 2026

min read

Table of Contents

Heading

Kimi is the AI chatbot and model family from China's Moonshot AI. This guide covers its origins, every major model release, key features, pricing, and where it stands in 2026.

TL;DR

Kimi is an AI chatbot and series of large language models built by Moonshot AI, a Beijing-based startup founded in 2023.
It first gained attention for its 128,000-token context window, and has since expanded into agentic coding and productivity tools through the Kimi Work platform.
The current flagship, Kimi K2.6, is a 1-trillion-parameter open-weight model that performs competitively with GPT-5.5 and Claude Opus on vendor-reported coding benchmarks, at significantly lower token costs.
Kimi is free to use at kimi.com, with paid plans starting at $19/month and API access available.
K2 series model weights are available on Hugging Face under Moonshot's Modified MIT license.

What is Kimi?

Kimi is both a consumer AI chatbot and a family of large language models developed by Beijing-based Moonshot AI. It launched in October 2023, and the name comes from founder Yang Zhilin's English name, "Kimi," which he has used since his university years.

What made Kimi stand out from the start was context length. The first version supported 128,000 tokens of lossless context, which was among the largest context windows available at the time. By March 2024, Moonshot AI upgraded Kimi to handle 2 million Chinese characters in a single prompt, a 10x jump that caused a two-day service outage from the flood of new users.

Today, Kimi is far more than a chatbot. It has expanded into a productivity platform called Kimi Work, with tools for coding, deep research, document creation, slides, spreadsheets, and multi-agent orchestration, as confirmed on Moonshot's official product pages. The underlying models have evolved in parallel, with the Kimi K2.7 Code variant released in June 2026 representing the latest in a rapid series of open-weight releases.

Who built Kimi? The Moonshot AI story

Moonshot AI was founded in March 2023 by Yang Zhilin and Tsinghua University alumni Zhou Xinyu and Wu Yuxin. Yang studied computer science at Tsinghua and earned his PhD at Carnegie Mellon University, where he co-authored the influential Transformer-XL and XLNet papers.

Moonshot is one of China's "four new AI tigers" alongside Zhipu AI, Baichuan, and MiniMax, though it remains the leanest of the group at roughly 80 employees during its early growth phase.

Fundraising has moved fast. Alibaba led a $1 billion round in February 2024 at a $2.5 billion valuation, taking a 36% stake. In May 2026, TechCrunch reported that Moonshot closed a $2 billion round at a $20 billion valuation led by Meituan's Long-Z Investments, with total fundraising over the prior six months reaching $3.9 billion.

As of June 2026, Bloomberg reported the company was in early discussions for a new round targeting a $30 billion valuation, and separately considering a Hong Kong IPO. These later figures have not been independently confirmed by Moonshot. Annual recurring revenue reportedly topped $200 million by April 2026 per the company's financial advisor. Key backers include Alibaba, Tencent, HongShan (formerly Sequoia China), Meituan, and IDG Capital.

Kimi model release timeline (2023 to 2026)

Moonshot AI has maintained a rapid release cadence, shipping a significant model update roughly every two to three months since mid-2025. Here is every major Kimi release to date.

Date	Release	Key detail
Oct 2023	Kimi chatbot (v1)	128K token context, first model of this size publicly available
Mar 2024	Kimi 2M context	Upgraded to 2 million Chinese characters
Jul 2024	Context caching	Public beta of context caching
Nov 2024	Video generation	Internal testing of AI video model
Jan 2025	Kimi K1.5	Reasoning model, claimed parity with OpenAI o1
Apr 2025	Kimi-VL	Open-source 16B MoE (3B active), vision-language model
Jun 2025	Kimi-Dev	72B coding model, SOTA on SWE-bench Verified (open source)
Jun 2025	Kimi-Researcher	Autonomous AI research agent
Jul 2025	Kimi K2	1T MoE, 32B active, open-weight, trained on 15.5T tokens
Sep 2025	K2-Instruct-0905	Doubled context to 256K, improved agentic coding
Oct 2025	Kimi Linear	48B MoE, 3B active, introduced Kimi Delta Attention
Nov 2025	Kimi K2 Thinking	256K context, 200-300 sequential tool calls, INT4 quantization
Jan 2026	Kimi K2.5	Native multimodal via MoonViT, Agent Swarm (100 sub-agents)
Apr 2026	Kimi K2.6	300 sub-agents, 4,000 steps, native video, 12-hour autonomous runs
Jun 2026	Kimi K2.7 Code	Coding-focused variant, ~30% fewer thinking tokens vs K2.6

Kimi model release timeline, October 2023 to June 2026

Two releases stand out as the most relevant today.

Kimi K2.6 (April 2026) is the current flagship. According to Moonshot's reported benchmarks, it scores 58.6% on SWE-Bench Pro (comparable to GPT-5.5 on the same test) and 54.0% on Humanity's Last Exam with tools, while costing significantly less per million tokens than comparable closed models. On the agentic side, K2.6 expanded the Agent Swarm system to 300 sub-agents and 4,000 coordinated steps, up from 100 sub-agents in K2.5.

Kimi K2.7 Code (June 2026) is the newest release, focused specifically on coding and agentic tasks. It reports a 21.8% improvement on Kimi Code Bench v2 over K2.6 while using roughly 30% fewer reasoning tokens. One important caveat: as of its release, all published benchmarks for K2.7 are Moonshot's own proprietary suites, with no independent third-party results yet available on standard public leaderboards.

If you're evaluating both versions, our Kimi 2.7 Code vs Kimi 2.6 comparison highlights the biggest differences.

How does Kimi work? Architecture and key technologies

Kimi's performance comes from a set of specific engineering choices. Understanding the architecture helps explain both why it performs well on coding tasks and why it stays cost-effective.

1. Mixture-of-Experts (MoE) architecture

The K2 series uses a 1-trillion-parameter MoE architecture with 32 billion active parameters per token. The model contains 384 routed experts plus one shared expert, with eight experts selected per token across 61 transformer layers. It uses Multi-head Latent Attention (MLA) and SwiGLU activation.

The practical effect: Kimi carries the knowledge capacity of a trillion-parameter model while only computing 32 billion parameters for any given input. This is what makes its API pricing a fraction of dense models with comparable output quality.

2. MuonClip optimizer

Training a trillion-parameter model typically involves loss spikes and manual intervention. Moonshot's custom MuonClip optimizer allowed K2's pre-training to complete on 15.5 trillion tokens with zero loss spikes. The research, published jointly with UCLA, demonstrated that the Muon optimizer (previously known to work only on small models) could improve computational efficiency by a factor of two compared to the standard AdamW optimizer. Moonshot open-sourced the implementation.

3. Kimi Delta Attention (KDA) and Kimi Linear

Released in October 2025, Kimi Linear introduced a new attention mechanism called Kimi Delta Attention. KDA extends Gated DeltaNet with a finer-grained gating mechanism, and the results were significant: it reduces KV cache usage by up to 75% and achieves up to 6x higher decoding throughput at a 1-million-token context window. The architecture uses a 3:1 KDA-to-MLA hybrid ratio that balances quality and efficiency. Moonshot open-sourced the KDA kernel and vLLM implementation on GitHub.

4. Agent Swarm

Agent Swarm is Kimi's multi-agent orchestration system. Rather than processing tasks sequentially with a single agent, Kimi decomposes complex problems into parallelizable subtasks assigned to dynamically instantiated specialist agents.

K2.5 introduced the system with 100 sub-agents and 1,500 tool calls. K2.6 scaled it to 300 sub-agents and 4,000 coordinated steps. Moonshot trained this capability using Parallel-Agent Reinforcement Learning (PARL), which incentivizes parallel execution to prevent the orchestrator from defaulting to single-agent mode. The result is a 4.5x speedup over single-agent execution on tasks requiring broad information gathering.

Documented production use cases include a 13-hour unsupervised optimization of a financial matching engine that produced a 185% throughput improvement across 1,000+ tool calls.

5. MoonViT vision encoder

Kimi K2.5 and later models use MoonViT-3D, a 400-million-parameter vision encoder based on SigLIP-SO-400M. It uses the NaViT packing strategy for variable-resolution images, supporting images up to 4K resolution and video up to 2K in formats including PNG, JPEG, WebP, MP4, MOV, and WebM. The vision capabilities are native to the model's pretraining rather than a bolt-on module added after the fact, which means visual and language understanding developed in tandem.

Kimi Work: the productivity platform

Kimi Work is Moonshot AI's integrated workspace built around Kimi's models. According to the official K2.6 product page, it combines AI chat, coding, research, document creation, slides, spreadsheets, and agent orchestration in a shared-context environment. The pitch is that switching between tools within Kimi Work retains session context, so a research task can flow into a report and then into a slide deck without re-pasting content.

1. Modes

Kimi Work offers four distinct operating modes:

Instant: Fast Q&A without a reasoning trace. Low latency, useful for quick lookups and simple code completions
Thinking: Full chain-of-thought reasoning with tool calls. This is the mode that produces Kimi's benchmark scores
Agent: Multi-step task execution that generates structured outputs like documents, slides, websites, and research reports
Agent Swarm (beta): Parallel multi-agent orchestration. Available from the Allegretto tier ($39/month) and above

2. Tool suite

The platform integrates ten distinct tools under one roof:

AI Chat: Shared-context foundation for every session
Kimi Code: Terminal-first coding agent with CLI and IDE integration, supporting Python, Rust, Go, and more
Deep Research: Autonomous web research across hundreds of live sources with structured report output
Docs: Formatted document generation
Slides: Presentation generation with editable SmartArt (timelines, flowcharts, funnels)
Sheets: Spreadsheet and formula generation with pivot tables and charts
Websites: Full-stack web generation from prompts or design mockups
Document-to-Skills: Convert PDFs into reusable custom skills for consistent output
Kimi Claw / Claw Groups: Human-in-the-loop intervention during swarm sessions
Kimi Work desktop agent: Local file access, WebBridge browser automation, and cron scheduling for background tasks

3. Visual coding and design-to-code

Kimi's visual coding capabilities let you upload screenshots, mockups, or design files and have the model convert them into structured, production-ready code. The K2.6 release expanded this to include frontend animations, WebGL shaders, and scroll-triggered interactions. Combined with full-stack generation from natural language prompts, Kimi can produce working websites with authentication, database layers, and deployment configurations from a single instruction.

That said, Kimi's design-to-code workflow still generates code you need to assemble and deploy yourself. If your goal is to go from an idea to a fully deployed, custom web or mobile app without writing or managing any code, Emergent takes a different approach. You describe what you want in plain English, and Emergent's AI agents build the complete application for you, including hosting, database, and deployment. Kimi generates the code; Emergent delivers the finished product.

How to access Kimi?

1. Web, mobile, and desktop

Kimi is available through kimi.com for web access, iOS and Android apps for mobile, and the Kimi Work desktop agent for local file access and background automation. There is also a Chrome extension for browser integration.

2. API access

Developers can access Kimi through an OpenAI-compatible API at platform.moonshot.ai. Pricing as of June 2026:

Model	Input (per 1M tokens)	Output (per 1M tokens)
Kimi K2.6	~$0.55	~$2.65
Kimi K2.7 Code	$0.95	$4.00

Kimi API pricing as of June 2026. Verify current rates at platform.moonshot.ai.

Kimi models are also available through third-party providers including OpenRouter, DeepInfra, Fireworks, and Cloudflare Workers AI.

3. Open weights (self-hosting)

K2 series model weights are available on Hugging Face (see the K2.6 model card and K2.7 Code model card). Moonshot describes the license as a Modified MIT license. Per the Verdent Guides analysis of the license terms, commercial use is permitted, with one condition: products exceeding $20 million in monthly revenue or 100 million monthly active users must prominently display Kimi K2 branding. Check the license file in the specific Hugging Face repository for the exact terms before deploying commercially.

Self-hosting requires substantial GPU infrastructure for the full 1T model. Quantized versions are available for more accessible deployments, though expect significant performance trade-offs on consumer hardware.

4. Pricing and plans

Kimi offers a free tier with unlimited basic chat and usage limits on advanced features like Agent Swarm and Deep Research. International paid tiers include:

Adagio: Free (basic access)
Moderato: $19/month (unlocks visual mode, priority queue, Deep Research integration)
Allegretto: $39/month (adds Agent Swarm access)
Higher tiers available for enterprise needs

In China, Kimi offers six plan tiers ranging from 5.2 yuan (four days) to 399 yuan (annual). App membership covers tool quotas but does not include API token credits, which are billed separately.

Who uses Kimi?

1. Developer and enterprise adoption

Kimi K2.6 is the second most-used LLM on OpenRouter as of May 2026. When K2 first launched in July 2025, it became the fastest-downloaded model on Hugging Face within a single day.

Enterprise evaluations have included Vercel, Factory.ai, and CodeBuddy, all of which reported positive results. Vercel, for instance, reported a 50%+ improvement on their internal Next.js benchmark versus K2.5. The Kimi Code CLI is positioned as a direct competitor to Claude Code and GitHub Copilot, with subscription plans starting at $19/month.

Individual developers have gravitated toward Kimi for cost reasons. One developer interviewed by Thoughtworks noted that tasks costing $10 to $20 with Claude Sonnet 4 could be completed with Kimi K2 for roughly $7 across ten similar tasks.

2. Consumer user base in China

Kimi's consumer trajectory in China has been uneven. It ranked third in monthly active users as of August 2024, dropped to seventh by June 2025 after the DeepSeek wave, and sat at eighth as of April 2026. It trails ByteDance's Doubao, Alibaba's Qwen, DeepSeek, and Tencent's Yuanbao in raw user numbers.

The gap between Kimi's consumer ranking and its developer adoption tells a meaningful story. Moonshot's strength is increasingly on the technical and API side, not in the consumer chatbot race.

3. Open-source community

Moonshot's open-weight strategy has created a compounding ecosystem effect. Each release generates more fine-tuned variants, more production feedback, and more adoption from the developer community, which feeds back into the next release cycle. The pace of iteration, five major releases between July 2025 and June 2026, is faster than most Western frontier labs.

Chinese open-source models as a category grew from 1.2% of global AI usage in late 2024 to nearly 30% by end of 2025, with Kimi and DeepSeek leading that shift.

How Kimi compares to ChatGPT, Claude, and Gemini

Kimi's positioning is distinct from Western frontier models in several ways. The comparison table below uses K2.6, Kimi's current flagship, against the closest competitors as of mid-2026.

Dimension	Kimi K2.6	GPT-5.4 / GPT-5.5	Claude Opus 4.6 / 4.7	Gemini 3.1 Pro
Architecture	1T MoE, 32B active	Dense (undisclosed)	Dense (undisclosed)	MoE (undisclosed)
Context window	262K tokens	~128K	~200K	2M
API pricing (input/1M)	~$0.55	~$2.50+	~$3.00+	Competitive
SWE-Bench Verified (vendor-reported)	80.2%	Comparable	Comparable	Lower
AIME 2026 (math)	96.4%	99.2% (GPT-5.4)	High	High
Agent Swarm	300 sub-agents, 4K steps	Not built-in	Not built-in	Not built-in
Open weights	Yes (Modified MIT)	No	No	No
Native multimodal	Text, image, video	Text, image, audio	Text, image	Text, image, video, audio

Kimi K2.6 vs leading frontier models, as of June 2026. This comparison reflects models available at K2.6's April 2026 launch. Claude Opus 4.8 was released after K2.6, so head-to-head benchmarks are limited

Where Kimi leads (per vendor-reported data)

Agentic coding tasks, cost efficiency (significantly cheaper than Claude Opus for comparable output based on published API pricing), open-weight availability for self-hosting, and the built-in Agent Swarm orchestration system.

Where Kimi trails

Pure mathematical reasoning (GPT-5.4 leads on AIME 2026 and GPQA-Diamond per published benchmarks), single-turn creative writing, multimodal breadth (Kimi's multimodal scores lag behind the best closed models), and enterprise trust/compliance considerations for organizations with strict AI supply-chain requirements.

Where it's comparable

On coding benchmarks like SWE-Bench Verified and SWE-Bench Pro, Kimi K2.6 posts scores in the same range as Claude Opus 4.6 and GPT-5.4, though results vary by benchmark and evaluation conditions. The gap between open-weight and closed-frontier models has narrowed considerably.

Kimi in 2026: from long-context pioneer to open-weight frontrunner

In three years, Kimi has gone from a Chinese chatbot notable for its context window to a trillion-parameter open-weight model family that posts competitive scores against closed models from OpenAI, Anthropic, and Google on vendor-reported coding benchmarks. Moonshot AI's reported $20 billion valuation, reported $200 million ARR, and second-place ranking on OpenRouter tell the story of a company that found its footing by open-sourcing what most competitors keep proprietary.

For developers and teams evaluating AI models in 2026, Kimi belongs on the shortlist for agentic coding, long-horizon autonomous tasks, and any use case where cost efficiency and open-weight flexibility matter. Its weaknesses in pure reasoning and multimodal breadth are real, but the gap is closing with each quarterly release.

If reading about Kimi's capabilities has you thinking about building your own AI-powered tool or app, you don't need to manage models, infrastructure, or code yourself. Emergent lets you describe what you want in plain English, and its AI agents build, deploy, and host the finished product, using models from OpenAI, Claude, and Google AI through one platform. Start Building today.

Build your app in minutes

Emergent turns your idea into a full-stack web or mobile app, no coding required.

No coding required
Web & mobile apps
Deploys instantly

Frequently Asked Questions

Your Questions, Answered

Is Kimi free to use?

Yes. Kimi.com offers a free tier with unlimited basic chat. Advanced features like Agent Swarm and Deep Research have usage limits on free plans. Paid international plans start at $19/month (Moderato).

Is Kimi open source?

The K2 series models are open-weight, meaning you can download the weights from Hugging Face and self-host them. Moonshot describes the license as a Modified MIT license. The branding requirement applies above $20 million in monthly revenue or 100 million monthly active users. Check the license file in the specific Hugging Face repository for exact terms.

Who owns Moonshot AI?

Moonshot AI is a privately held Chinese company founded by Yang Zhilin, Zhou Xinyu, and Wu Yuxin. Major investors include Alibaba (approximately 36% stake), Tencent, Meituan, HongShan, and IDG Capital. As of June 2026, the company is reportedly valued at $20 billion per TechCrunch.

Is Kimi available outside China?

Yes. Kimi.com, the mobile apps, and the API are accessible internationally. Some features may vary by region, and pricing differs between mainland China (RMB) and international users (USD). The open-weight models can be self-hosted anywhere with appropriate hardware.

How does Kimi compare to DeepSeek?

Both are open-weight Chinese AI model families with strong coding performance. Kimi's K2 series focuses on agentic coding and Agent Swarm orchestration, while DeepSeek emphasizes efficient training methodology and reasoning. DeepSeek's V4-Pro is larger and reportedly cheaper to run, while Kimi's Agent Swarm offers a multi-agent parallel execution system that DeepSeek does not have.

What is Agent Swarm?

Agent Swarm is Kimi's built-in multi-agent system, where the model orchestrates up to 300 specialized sub-agents working in parallel on a single task. It can execute 4,000+ coordinated steps and achieves a 4.5x speedup over single-agent execution. It is available from the Allegretto plan ($39/month) and above.

Start Building
on emergent today

Try Emergent

Build Full-Stack

Web & mobile apps in minutes

Continue with Google

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

By continuing, you agree to our
Terms of Service and Privacy Policy.