OpenAI Assistants API: agentes con estado sin infraestructura propia

Interfaz de asistente digital sobre pantalla moderna en tema oscuro

OpenAI Assistants API abstrae patrones comunes de agents LLM: threads persistentes (conversaciones multi-turn), tool calling estructurado, code interpreter built-in, file search (RAG integrado). Ahorra infrastructure para estos patterns pero introduce vendor lock-in. Este artículo cubre cuándo usar y cuándo preferir chat completions + logic propia.

Componentes

Assistant: configuration reusable — model, instructions, tools.

Thread: conversación persistente, mantiene messages history.

Message: user o assistant message en un thread.

Run: execution de un assistant sobre un thread.

Tools: code_interpreter, file_search, function_calling.

Setup básico

from openai import OpenAI
client = OpenAI()

# Create assistant (once)
assistant = client.beta.assistants.create(
    name="My Support Bot",
    instructions="You are a helpful customer support assistant.",
    model="gpt-4o",
    tools=[{"type": "code_interpreter"}]
)

# Create thread (per conversation)
thread = client.beta.threads.create()

# Add message
client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Hola, tengo un problema"
)

# Run
run = client.beta.threads.runs.create_and_poll(
    thread_id=thread.id,
    assistant_id=assistant.id
)

# Get messages
messages = client.beta.threads.messages.list(thread_id=thread.id)

Features

Code Interpreter

Python execution en sandbox isolated:

tools=[{"type": "code_interpreter"}]

Assistant puede generar + ejecutar Python. Files generados (charts, CSVs) returned.

File Search (RAG integrado)

# Upload files
file = client.files.create(
    file=open("docs.pdf", "rb"),
    purpose="assistants"
)

# Create assistant with file_search
assistant = client.beta.assistants.create(
    name="Docs Bot",
    instructions="Answer questions about the docs",
    tools=[{"type": "file_search"}],
    tool_resources={
        "file_search": {
            "vector_store_ids": ["vs_xxx"]
        }
    }
)

OpenAI maneja chunking, embeddings, retrieval. Eliminates DIY RAG pipeline para cases simples.

Function Calling

Custom tools definidos:

tools=[{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string"}
            }
        }
    }
}]

Assistant pausa ejecución pidiendo tu function call; tu code ejecuta, devuelve resultado, continues.

Threads persistentes

Assistants API maneja history:

# User sigue conversación 3 días después
thread = client.beta.threads.retrieve("thread_xxx")
client.beta.threads.messages.create(thread_id=thread.id, role="user", content="...")

Backend OpenAI persiste. No DB propia para chat history.

Assistants vs Chat Completions

Aspecto Assistants API Chat Completions + propio
Thread history Managed Tu DB
File search RAG Built-in Tu pipeline (vector DB)
Code interpreter Sandbox included Tu containers
Function calling
Vendor lock-in Alto Bajo
Cost transparency Opaque Claro
Customization Limited Full

Assistants = rapid development. Chat completions = more control.

Pricing

Además de token cost standard:

  • Code interpreter: $0.03 per session/hour.
  • File search: $0.10 per GB/day storage + assistant tokens.
  • Thread storage: “free” pero included en tokens consumed.

Coste total puede superar chat completions approach para casos high-volume.

Casos donde brilla

  • Chatbots simples con RAG sobre docs small-mid.
  • Prototipos rápidos: POC en horas, no semanas.
  • Internal tools: analytics assistants.
  • Support bots con knowledge base.

Cuándo NO usar

  • Producción seria con control sobre data flow.
  • RAG sofisticado: file_search es black box.
  • Multi-LLM stack: locked a OpenAI.
  • Cost-sensitive: chat completions puede ser cheaper.
  • Compliance estricta: data en OpenAI es concern.

Latencia

  • File search: añade 2-5s por query.
  • Code interpreter: puede ser lento (Python sandbox).
  • Multi-turn con history: depende de history length.

Para realtime, consider alternatives.

Data privacy

OpenAI:

  • No training con Assistants data (business tier).
  • Retention: 30 días default, opcional 0 días.
  • Residencia: US (Azure OpenAI ofrece EU).

Para data sensitive, Azure OpenAI Assistants.

Error handling

Runs pueden fallar en several states:

  • queuedin_progresscompleted (happy path).
  • requires_action: tu function call needs response.
  • failed: error (check last_error).
  • cancelled: manual cancel.
  • expired: timeout.

Poll vía retrieve() o streaming.

Streaming

with client.beta.threads.runs.stream(
    thread_id=thread.id,
    assistant_id=assistant.id
) as stream:
    for event in stream:
        # Handle event
        pass

Real-time updates durante run. UX mejor vs blocking polling.

v2 improvements

Assistants API v2 (2024) añadió:

  • File search con vector stores (replaced retrieval).
  • Better token usage reporting.
  • Improved file management.
  • Multiple files per assistant cleaner.

V1 deprecated; v2 es standard.

Migration path

De Assistants to custom chat completions:

  1. Extract thread history to your DB.
  2. Implement RAG pipeline (pgvector/Pinecone + embeddings).
  3. Function calling with chat completions (similar API).
  4. Custom code execution si needed.

Más work pero more control y portable.

Conclusión

Assistants API es useful para prototypes y apps simple-medium. El “managed infrastructure” ahorra tiempo inicial pero introduces lock-in y costs opacos. Para producción seria con complex requirements, chat completions + custom infrastructure (vector DB propia, logic orchestration en code, tracked costs) da more control. La decisión balancea speed vs control. Para startups iterating rápido, Assistants. Para apps maduras scaling, custom. Para mixto, Assistants for experiments + custom for core product.

Síguenos en jacar.es para más sobre OpenAI, agents y LLM applications.

Entradas relacionadas