OpenAI Assistants API abstrae patrones comunes de agents LLM: threads persistentes (conversaciones multi-turn), tool calling estructurado, code interpreter built-in, file search (RAG integrado). Ahorra infrastructure para estos patterns pero introduce vendor lock-in. Este artículo cubre cuándo usar y cuándo preferir chat completions + logic propia.
Componentes
Assistant: configuration reusable — model, instructions, tools.
Thread: conversación persistente, mantiene messages history.
Message: user o assistant message en un thread.
Run: execution de un assistant sobre un thread.
Tools: code_interpreter, file_search, function_calling.
Setup básico
from openai import OpenAI
client = OpenAI()
# Create assistant (once)
assistant = client.beta.assistants.create(
name="My Support Bot",
instructions="You are a helpful customer support assistant.",
model="gpt-4o",
tools=[{"type": "code_interpreter"}]
)
# Create thread (per conversation)
thread = client.beta.threads.create()
# Add message
client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="Hola, tengo un problema"
)
# Run
run = client.beta.threads.runs.create_and_poll(
thread_id=thread.id,
assistant_id=assistant.id
)
# Get messages
messages = client.beta.threads.messages.list(thread_id=thread.id)
Features
Code Interpreter
Python execution en sandbox isolated:
tools=[{"type": "code_interpreter"}]
Assistant puede generar + ejecutar Python. Files generados (charts, CSVs) returned.
File Search (RAG integrado)
# Upload files
file = client.files.create(
file=open("docs.pdf", "rb"),
purpose="assistants"
)
# Create assistant with file_search
assistant = client.beta.assistants.create(
name="Docs Bot",
instructions="Answer questions about the docs",
tools=[{"type": "file_search"}],
tool_resources={
"file_search": {
"vector_store_ids": ["vs_xxx"]
}
}
)
OpenAI maneja chunking, embeddings, retrieval. Eliminates DIY RAG pipeline para cases simples.
Function Calling
Custom tools definidos:
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"}
}
}
}
}]
Assistant pausa ejecución pidiendo tu function call; tu code ejecuta, devuelve resultado, continues.
Threads persistentes
Assistants API maneja history:
# User sigue conversación 3 días después
thread = client.beta.threads.retrieve("thread_xxx")
client.beta.threads.messages.create(thread_id=thread.id, role="user", content="...")
Backend OpenAI persiste. No DB propia para chat history.
Assistants vs Chat Completions
| Aspecto | Assistants API | Chat Completions + propio |
|---|---|---|
| Thread history | Managed | Tu DB |
| File search RAG | Built-in | Tu pipeline (vector DB) |
| Code interpreter | Sandbox included | Tu containers |
| Function calling | Sí | Sí |
| Vendor lock-in | Alto | Bajo |
| Cost transparency | Opaque | Claro |
| Customization | Limited | Full |
Assistants = rapid development. Chat completions = more control.
Pricing
Además de token cost standard:
- Code interpreter: $0.03 per session/hour.
- File search: $0.10 per GB/day storage + assistant tokens.
- Thread storage: “free” pero included en tokens consumed.
Coste total puede superar chat completions approach para casos high-volume.
Casos donde brilla
- Chatbots simples con RAG sobre docs small-mid.
- Prototipos rápidos: POC en horas, no semanas.
- Internal tools: analytics assistants.
- Support bots con knowledge base.
Cuándo NO usar
- Producción seria con control sobre data flow.
- RAG sofisticado: file_search es black box.
- Multi-LLM stack: locked a OpenAI.
- Cost-sensitive: chat completions puede ser cheaper.
- Compliance estricta: data en OpenAI es concern.
Latencia
- File search: añade 2-5s por query.
- Code interpreter: puede ser lento (Python sandbox).
- Multi-turn con history: depende de history length.
Para realtime, consider alternatives.
Data privacy
OpenAI:
- No training con Assistants data (business tier).
- Retention: 30 días default, opcional 0 días.
- Residencia: US (Azure OpenAI ofrece EU).
Para data sensitive, Azure OpenAI Assistants.
Error handling
Runs pueden fallar en several states:
queued→in_progress→completed(happy path).requires_action: tu function call needs response.failed: error (checklast_error).cancelled: manual cancel.expired: timeout.
Poll vía retrieve() o streaming.
Streaming
with client.beta.threads.runs.stream(
thread_id=thread.id,
assistant_id=assistant.id
) as stream:
for event in stream:
# Handle event
pass
Real-time updates durante run. UX mejor vs blocking polling.
v2 improvements
Assistants API v2 (2024) añadió:
- File search con vector stores (replaced retrieval).
- Better token usage reporting.
- Improved file management.
- Multiple files per assistant cleaner.
V1 deprecated; v2 es standard.
Migration path
De Assistants to custom chat completions:
- Extract thread history to your DB.
- Implement RAG pipeline (pgvector/Pinecone + embeddings).
- Function calling with chat completions (similar API).
- Custom code execution si needed.
Más work pero more control y portable.
Conclusión
Assistants API es useful para prototypes y apps simple-medium. El “managed infrastructure” ahorra tiempo inicial pero introduces lock-in y costs opacos. Para producción seria con complex requirements, chat completions + custom infrastructure (vector DB propia, logic orchestration en code, tracked costs) da more control. La decisión balancea speed vs control. Para startups iterating rápido, Assistants. Para apps maduras scaling, custom. Para mixto, Assistants for experiments + custom for core product.
Síguenos en jacar.es para más sobre OpenAI, agents y LLM applications.