What Is Sakana Fugu? How One Model Runs Many

Sakana Fugu is an AI model that orchestrates other AI models. Here is how it works, what it costs, and whether the frontier-without-vendor pitch holds up.

Written by
Divit Bhat
Reviewed by
Sakthy
Last updated: 
June 30, 2026
0
 min read
Table of Contents

On June 22, 2026, a Tokyo-based AI lab called Sakana shipped something the industry had not seen before: a single model whose main job is not to answer your questions, but to direct other models to answer them for you.

That is Sakana Fugu in one sentence. You send a prompt to one API. Behind the scenes, Fugu decides which models should handle the work, splits the task into pieces, hands them off to a pool of frontier AIs, checks the results, and returns one synthesized answer. From your side, it looks like calling any other language model. Internally, it is more like hiring a team.

The pitch is bold: frontier-level performance without depending on any single vendor, and without the export-control risk that took Claude Fable 5 and Mythos off the table for many international users on June 12, 2026. The reality is more interesting than the pitch, and worth understanding before you wire it into anything serious.

This guide walks through what Sakana Fugu actually is, how the orchestration works under the hood, where it shines, where the asterisks sit, and how it stacks up against the models you are probably already using.

The One-Line Answer

Sakana Fugu is a multi-agent AI orchestration system from Sakana AI, delivered as a single OpenAI-compatible API. It comes in two variants:

  • Fugu for everyday work where speed matters
  • Fugu Ultra for hard, multi-step problems where answer quality matters more than latency

Both variants behave like one model from the outside. Internally, Fugu is itself a language model that has been trained to call other language models, including instances of itself, and coordinate them into a final answer.

That distinction matters. Fugu is not a router that picks one model and forwards your request. It is an orchestrator that can use multiple models on a single query, have them check each other, and synthesize a unified response.

Why Sakana Built It This Way

For the last few years, almost every major AI lab has been chasing the same thing: a single model that is bigger, smarter, and more general than the one before it. Anthropic built Fable 5. OpenAI built GPT-5.5. Google built Gemini 3.1 Pro. Each one is monolithic. Each one is its own brain.

Sakana made a different bet. Their thesis is that the future of AI capability is not one bigger brain. It is many specialist brains, coordinated well.

The name reflects the philosophy. "Sakana" means "fish" in Japanese, a reference to how a school of fish behaves as a coordinated unit despite no individual fish being in charge. "Fugu" is the pufferfish, the species that small Japanese restaurants prepare with extreme precision because getting it wrong is dangerous. The naming hints at the architectural bet: precise orchestration of multiple capable pieces, instead of one giant piece that tries to do everything.

There is also a practical motivation behind the timing. On June 12, 2026, the US Department of Commerce placed export controls on Anthropic's Fable 5 and Mythos models. Teams that had built on those APIs lost access overnight in restricted regions. Sakana's launch ten days later was not a coincidence. The product was already in motion, but the regulatory backdrop turned its "frontier capability without vendor dependency" pitch from interesting into urgent.

How Sakana Fugu Actually Works

Here is what happens when you send a request to Sakana Fugu:

  1. The request hits the Fugu endpoint. From your side, this looks identical to calling any OpenAI-compatible API. You send a prompt, you get a response.
  2. Fugu decides how to handle it. If the task is simple, Fugu may answer it directly. If it is complex, Fugu enters orchestration mode.
  3. Roles get assigned. Fugu's coordination layer assigns specialist roles to different models in the agent pool. The roles are typically Thinker (plans the approach), Worker (executes the steps), and Verifier (checks the output).
  4. The work gets done in parallel and in sequence. Multiple models may run simultaneously on different parts of the problem. Their outputs feed back into Fugu, which can re-delegate, re-verify, or escalate to a deeper round of analysis.
  5. A single answer is synthesized. Fugu combines the outputs from all the involved models into one coherent response and returns it to you.

The architecture is grounded in two research papers presented at ICLR 2026:

  • TRINITY introduces a lightweight evolved coordinator that orchestrates multiple models over several turns, assigning Thinker, Worker, and Verifier roles dynamically.
  • Conductor is a 7B model trained with reinforcement learning to discover its own coordination strategies in natural language. It can call itself recursively to scale compute at test time.

The key insight from these papers is that orchestration can be a learned capability, not a hand-built workflow. Most multi-agent systems before Fugu (frameworks like LangGraph, CrewAI, AutoGen) require developers to manually wire up the steps, define when one model hands off to another, and maintain the glue code. Fugu collapses all of that into a single API call by training the coordinator itself.

What Is in the Agent Pool

Sakana Fugu's pool includes the frontier models you would expect:

What is notably not in the pool: Claude Fable 5 and Claude Mythos Preview. These models are not publicly accessible through any provider, so Fugu cannot route to them, since the US shut both down on June 12. When Sakana claims Fugu Ultra "matches" Fable 5, they are saying a coordinated team of other public models can rival the frontier. That is a different claim than direct head-to-head benchmarking, and worth keeping in mind.

You can also opt specific providers out of your agent pool. If your compliance framework prohibits sending data to a particular vendor, you can configure Fugu to exclude that vendor from orchestration. This is a meaningful feature for regulated industries.

Fugu vs Fugu Ultra: When to Use Each

The two variants share the same architecture but target different workloads.

Fugu is the balanced default. It coordinates a smaller pool of agents and prioritizes low latency. It is suitable for:

  • Everyday coding work and code review
  • Interactive chatbots
  • Document analysis at conversational speed
  • Internal assistants where response time matters

Fugu Ultra is the maximum-quality tier. It coordinates a deeper pool of expert agents and prioritizes answer quality over speed. It is intended for:

  • Hard multi-step coding problems
  • Scientific research and analysis
  • Cybersecurity investigations
  • Patent searches
  • Long-running reasoning tasks where getting the right answer matters more than getting a fast answer

Both variants are available through the same API. You select one by specifying either fugu or fugu-ultra-20260615 as the model parameter. No SDK migration, no separate endpoint.

Want to know more? Read our Fugu vs Fugu Ultra breakdown for a closer look at the tradeoffs.

The Benchmark Story

On Sakana's own published benchmarks, Fugu Ultra performs surprisingly well against models that are individually much larger and more capable. Sakana ran a head-to-head benchmark suite against Mythos Preview across six tests covering coding, reasoning, and scientific problem-solving.

Benchmark Fugu Ultra Opus 4.8 GPT-5.5 Gemini 3.1 Pro
SWE-Bench Pro 73.7% 69.2% 58.6% 54.2%
Terminal-Bench 2.1 82.1% 79.0% 76.4% 71.8%
GPQA-Diamond 95.5% 88.4% 86.1% 84.7%
Humanity's Last Exam 64.5% 57.9% 52.2% 51.4%
LiveCodeBench Leading Behind Behind Behind

The headline: Fugu Ultra tops most benchmarks against the best individual models currently in its own agent pool. That is the central claim of the architecture. A coordinated team of GPT-5.5, Opus 4.8, and Gemini 3.1 Pro can outperform any one of them working alone.

Important caveats:

  • These are Sakana's published numbers, not independent reproductions
  • Fable 5 and Mythos Preview are excluded from the comparison because they are not publicly accessible
  • Multi-agent systems often score well on benchmarks because verification rounds catch errors that single models miss, but real-world workloads may not see the same lift
  • Some hands-on testers have reported that Fugu Ultra is slower and more token-hungry than the headlines suggest

The most honest read: the benchmark numbers are likely real but represent the optimistic ceiling. Your actual mileage depends heavily on your specific workload.

Pricing at a Glance

Sakana Fugu offers two billing models.

Subscription plans (both Fugu and Fugu Ultra included):

  • Standard: $20/month
  • Pro: $100/month (10x the Standard quota)
  • Max: $200/month (30x the Standard quota)

Pay-as-you-go (API token billing) for Fugu Ultra:

  • Input: $5 per million tokens
  • Output: $30 per million tokens
  • Cached input: $0.50 per million tokens
  • Above 272K context: rates increase to $10/$45/$1.00

The standard Fugu engine on pay-as-you-go is priced at the rate of whichever top-tier underlying model is active for that request, without stacking fees when multiple agents are involved.

One important detail in the billing model: Fugu Ultra's responses separate user-visible token generation from internal orchestration tokens. The background tokens consumed when Fugu delegates subtasks, runs verification, or routes between agents are counted toward the final cost of the request at standard rates. This means a single user-facing request can consume meaningfully more tokens than its visible output suggests.

A subscription launch offer runs through July 2026: subscribe before month-end and get a free second month at your initial tier.

If you want to see how these numbers compare before committing, this Sakana Fugu pricing breakdown lays out every tier and token rate side by side.

What Makes Sakana Fugu Different from a Router

This is the most common question and the most important one. Existing tools like Not Diamond, Martian, and OpenRouter already route queries to the best model based on the prompt. What does Fugu do that they do not?

The core difference is depth of coordination:

  • Routers pick one model and forward the request. You get whichever model the router thinks is best for that prompt.
  • Fugu can use multiple models on a single query. It can have one model plan, another execute, a third verify, and a fourth synthesize. It can iterate when verification fails.

Routers are decision systems. Fugu is a coordinator. The result is that Fugu can produce answers no single model in its pool could produce alone, because the verification and synthesis steps catch errors and combine strengths in ways a single forward pass cannot.

That said, the line is fuzzier than the marketing suggests. For simple prompts, Fugu may behave essentially like a router. For complex ones, it behaves like a small multi-agent system. The exact behavior on any given query is determined by the proprietary coordination logic and is not exposed to the user.

If that opacity is a dealbreaker for your workflow, these best Sakana Fugu alternatives give you more visibility into which model handles your request.

The Black Box Problem

This is the most important constraint to understand before you build on Fugu.

You cannot see which models Fugu chose for any specific query. You cannot see how the work was split. You cannot reproduce a result by knowing exactly which agents handled which subtasks. The routing decisions and coordination strategies are proprietary by design.

For most workloads, this is fine. You get a good answer and you move on. For regulated workloads, this is a serious issue:

  • Healthcare environments often require knowing exactly which model processed patient data
  • Legal workflows need reproducibility and audit trails
  • Financial services must demonstrate model risk management for every system that influences decisions
  • Government procurement frequently mandates transparent model selection

For these use cases, an opaque orchestration layer in front of multiple opaque models is a regulatory non-starter. The agent opt-out feature helps with data residency and vendor restriction, but it does not solve the broader auditability problem.

Where Sakana Fugu Genuinely Shines

The use cases where Fugu Ultra has a clear advantage over single models share a few characteristics:

  • Tasks where verification catches real errors (debugging complex code, fact-checking research)
  • Problems where different models bring different strengths (combining Claude's reasoning with GPT's tool use with Gemini's long context)
  • Long-horizon multi-step work where iteration matters (Kaggle competitions, paper reproduction, deep research)
  • Situations where vendor diversity is a feature, not a bug (hedging against API outages, export controls, or pricing changes)

The use cases where Fugu adds little value over a single strong model:

  • Simple factual queries with clear right answers
  • Conversational chat that does not benefit from multi-model verification
  • Latency-sensitive workloads where the orchestration overhead is a tax, not an investment
  • Bounded tasks that a strong single model already nails

How to Get Started

Getting Sakana Fugu running is intentionally low-friction:

  1. Create an account at console.sakana.ai
  2. Generate an API key from the developer console
  3. Point your existing OpenAI client at Sakana's endpoint by changing only base_url, api_key, and model
  4. Start with Fugu for everyday work, escalate to fugu-ultra-20260615 for hard problems
  5. Log everything: tokens consumed, latency, failures, and answer quality before scaling up

Because the API is OpenAI-compatible, any framework or SDK that speaks the OpenAI chat-completions format works without modification. This is part of the deliberate positioning: the switching cost from any other OpenAI-compatible provider is essentially zero.

The Honest Bottom Line

Sakana Fugu is a genuinely novel piece of engineering. The idea of training a model to coordinate other models is academically interesting and commercially relevant in a world where vendor lock-in and export controls are increasingly serious risks for enterprise AI buyers.

The benchmarks are likely real but represent the optimistic ceiling. The black-box routing is a meaningful trade-off for any team operating under audit requirements. The pricing is competitive with frontier single models but can scale unpredictably because background orchestration tokens count toward the bill.

For most teams, the right way to evaluate Fugu is the way you should evaluate any new model: test it on your actual workload, measure cost and quality against your current setup, and decide based on real data rather than launch-day benchmarks. Sakana has made that test cheap by keeping the API surface familiar and the pricing transparent.

Whether Fugu represents the future of AI architecture or a clever interim product depends on questions that will not be answered for at least another year. What it represents today is the clearest signal yet that the industry's bet on bigger monolithic models has competition. The coordinated school of specialists is now a real commercial product, not just a research idea.

what is sakana fugu
Build your app in minutes

Emergent turns your idea into a full-stack web or mobile app, no coding required.

  • No coding required
  • Web & mobile apps
  • Deploys instantly
Sign up

Frequently Asked Questions

Your Questions, Answered

What is Sakana Fugu in simple terms?

Sakana Fugu is an AI model that does not answer your questions directly. Instead, it acts like a project manager that picks the right AI models (like Claude, GPT-5.5, or Gemini) for your task, coordinates them, and gives you back one combined answer through a single API.

How is Sakana Fugu different from ChatGPT or Claude?

ChatGPT and Claude are single models that answer using only their own capabilities. Sakana Fugu is an orchestrator that uses multiple models together on the same query, has them verify each other, and synthesizes a final answer. It is one API endpoint, but multiple models do the actual work.

How much does Sakana Fugu cost?

Sakana Fugu has subscription plans starting at $20/month (Standard), $100/month (Pro), and $200/month (Max). On pay-as-you-go API billing, Fugu Ultra costs $5 per million input tokens and $30 per million output tokens, with higher rates for contexts above 272K tokens.

Is Sakana Fugu better than Claude Fable 5?

On Sakana's published benchmarks, Fugu Ultra performs comparably to Fable 5 across coding, reasoning, and scientific tasks. However, Fable 5 and Mythos Preview are not in Fugu's agent pool because they are not publicly accessible, so the comparison is between a coordinated team of public models and Anthropic's restricted frontier model.

Can I see which AI model answered my Sakana Fugu query?

No. The routing decisions and which specific models handled your query are proprietary and not exposed to users by design. For most workloads this is fine, but for regulated industries that require audit trails (healthcare, legal, financial services), this opacity may be a blocker.

Start Building
on emergent today
Try Emergent
This is some text inside of a div block.
This is some text inside of a div block.
Note

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.