Swarm: OpenAI’s Experiment for Multi-Role Agents
Actualizado: 2026-05-03
On 10 October 2024, OpenAI published a GitHub repository called Swarm with an unusual warning in large letters: do not use in production. That phrase, repeated in the README and in the examples, is not polite humility — it is a statement of intent. Swarm does not aim to compete with CrewAI, LangGraph, or its own Assistants API. It is an open laboratory where the OpenAI team shows, in fewer than five hundred lines of Python, how they themselves think about agent coordination.
Key takeaways
- Swarm reduces the multi-agent vocabulary to two concepts: agents and handoffs.
- Routing emerges from the model, not from an explicit graph or centralised planner.
- Its fewer than 500 lines of Python make the design fully readable and the code easy to adapt.
- It is not a production solution: it lacks persistent state, retries, observability, and robust error handling.
- The philosophical bet is that coordination does not need heavy infrastructure if the models are capable enough.
Why an experimental framework, and why now
Multi-agent frameworks have been accumulating abstractions for over a year. CrewAI talks about crews, tasks, and processes. LangGraph models state machines with explicit nodes, edges, and checkpointers. Each layer solves real problems but also imposes an ontology the developer must learn before writing the first useful line.
Swarm responds in the opposite direction: reduce the vocabulary to two ideas and see how far that gets you.
The two ideas are agents and handoffs. An agent is a configuration with instructions and functions. A handoff is, literally, a function that returns another agent. When the model decides to call that function, control passes to the returned agent and the next conversation round occurs under its instructions. No explicit graph, no state machine, no planner. Routing emerges from the fact that models, given well-named functions, know when to call them.
This is a philosophical bet: coordination between agents does not need heavy infrastructure if the underlying models are capable enough. With GPT-4o as the engine, OpenAI is saying it prefers trusting model reasoning over framework scaffolding.
Swarm’s structure
Swarm’s core code fits in a single file. The client manages the conversation loop: sends the message to the model, receives the response, executes tool calls, and repeats. The only addition over the standard chat API is the Agent Python object and the handoff logic:
from swarm import Swarm, Agent
client = Swarm()
support = Agent(
name="Support",
instructions="Resolve technical problems. If the issue is about billing, transfer to Billing.",
)
billing = Agent(
name="Billing",
instructions="Handle questions about invoices and payments.",
)
def transfer_to_billing():
return billing
support.functions = [transfer_to_billing]
response = client.run(
agent=support,
messages=[{"role": "user", "content": "I have an incorrect charge"}],
)When the support model decides to transfer, it calls transfer_to_billing(), Swarm receives the returned agent, and continues the conversation under billing’s instructions. That is the entire framework.
Differences from CrewAI and LangGraph
The three approaches reflect different philosophies about where coordination intelligence should live.
CrewAI externalises coordination into a Manager Agent that distributes tasks. This is the model closest to a traditional organisational hierarchy: useful when roles are stable and tasks well-defined, but adds an extra LLM call layer for every routing decision.
LangGraph models flow as a directed graph with explicit nodes and edges. The advantage is predictability and visual debugging of the flow; the disadvantage is that you must design the graph upfront, which is not always possible for dynamic flows.
Swarm eliminates both layers and trusts that the base model, with clear instructions and well-named functions, will make routing decisions at each turn. It is more fragile — a less capable model may make incorrect handoff decisions — but also more transparent and easier to reason about.
The OpenAI Assistants API already covers production cases with persistent state and managed threads. Swarm exists in the conceptual space, not to replace it.
Patterns emerging from Swarm
Although Swarm is experimental, several interesting patterns emerge from its examples:
- Triage agent: an initial agent that analyses the request and transfers to the right specialist. The simplest pattern and the one that works best with less powerful models.
- Escalation chain: agents in a chain where each can escalate to the next when the issue exceeds its competence, similar to a support ticket system.
- Simulated parallel agents: several agents processing different aspects of a request and a synthesiser agent combining the responses.
The limit of all these patterns is the same: without persistent state between sessions, without automatic retries, without integrated observability, they are proof-of-concept demonstrations — not production systems. For production multi-agent systems with trace observability, the Assistants API or mature frameworks like LangGraph are the appropriate choice.
Real value of Swarm
Swarm’s real value is not the framework itself — it is the code. Reading the five hundred lines of implementation is one of the best ways to understand what an agent framework actually does under its abstractions: manage message history, execute tool calls, handle the conversation loop. Once you read the code, the “mystery” of how more complex frameworks work reduces considerably.
It also opens a conversation about whether current framework complexity is justified. For many use cases, Swarm’s answer — “trust the model and give it good functions” — may be sufficient. For others, the persistent state, retries, and observability of LangGraph or CrewAI are not optional. Swarm’s merit is forcing that conversation with code rather than arguments.
Conclusion
Swarm is an executable paper, not a production framework. Its most valuable contribution is demonstrating that coordination between agents can be surprisingly simple when the base model is capable, and that most of the complexity in current frameworks exists either to compensate for model fragility or to add features Swarm deliberately omits. For teams that want to understand how multi-agent systems work before adopting a framework, reading Swarm’s code is the best starting point.