OpenAI Assistants API: Stateful Agents Without Your Own Infrastructure

Interfaz de asistente digital sobre pantalla moderna en tema oscuro

OpenAI Assistants API abstracts common LLM-agent patterns: persistent threads (multi-turn conversations), structured tool calling, built-in code interpreter, file search (integrated RAG). Saves infrastructure for these patterns but introduces vendor lock-in. This article covers when to use and when to prefer chat completions + own logic.

Components

Assistant: reusable configuration — model, instructions, tools.

Thread: persistent conversation, maintains message history.

Message: user or assistant message in a thread.

Run: execution of an assistant over a thread.

Tools: code_interpreter, file_search, function_calling.

Basic Setup

from openai import OpenAI
client = OpenAI()

# Create assistant (once)
assistant = client.beta.assistants.create(
    name="My Support Bot",
    instructions="You are a helpful customer support assistant.",
    model="gpt-4o",
    tools=[{"type": "code_interpreter"}]
)

# Create thread (per conversation)
thread = client.beta.threads.create()

# Add message
client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Hi, I have a problem"
)

# Run
run = client.beta.threads.runs.create_and_poll(
    thread_id=thread.id,
    assistant_id=assistant.id
)

# Get messages
messages = client.beta.threads.messages.list(thread_id=thread.id)

Features

Code Interpreter

Python execution in isolated sandbox:

tools=[{"type": "code_interpreter"}]

Assistant can generate + execute Python. Generated files (charts, CSVs) returned.

File Search (Integrated RAG)

# Upload files
file = client.files.create(
    file=open("docs.pdf", "rb"),
    purpose="assistants"
)

# Create assistant with file_search
assistant = client.beta.assistants.create(
    name="Docs Bot",
    instructions="Answer questions about the docs",
    tools=[{"type": "file_search"}],
    tool_resources={
        "file_search": {
            "vector_store_ids": ["vs_xxx"]
        }
    }
)

OpenAI handles chunking, embeddings, retrieval. Eliminates DIY RAG pipeline for simple cases.

Function Calling

Custom tools defined:

tools=[{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string"}
            }
        }
    }
}]

Assistant pauses execution asking your function call; your code executes, returns result, continues.

Persistent Threads

Assistants API handles history:

# User continues conversation 3 days later
thread = client.beta.threads.retrieve("thread_xxx")
client.beta.threads.messages.create(thread_id=thread.id, role="user", content="...")

OpenAI backend persists. No own DB for chat history.

Assistants vs Chat Completions

Aspect Assistants API Chat Completions + Own
Thread history Managed Your DB
File search RAG Built-in Your pipeline (vector DB)
Code interpreter Sandbox included Your containers
Function calling Yes Yes
Vendor lock-in High Low
Cost transparency Opaque Clear
Customisation Limited Full

Assistants = rapid development. Chat completions = more control.

Pricing

Beyond standard token cost:

  • Code interpreter: $0.03 per session/hour.
  • File search: $0.10 per GB/day storage + assistant tokens.
  • Thread storage: “free” but included in consumed tokens.

Total cost can exceed chat completions approach for high-volume cases.

Cases Where It Shines

  • Simple chatbots with RAG over small-mid docs.
  • Rapid prototyping: POC in hours, not weeks.
  • Internal tools: analytics assistants.
  • Support bots with knowledge base.

When NOT to Use

  • Serious production with control over data flow.
  • Sophisticated RAG: file_search is black box.
  • Multi-LLM stack: locked to OpenAI.
  • Cost-sensitive: chat completions can be cheaper.
  • Strict compliance: data in OpenAI is concern.

Latency

  • File search: adds 2-5s per query.
  • Code interpreter: can be slow (Python sandbox).
  • Multi-turn with history: depends on history length.

For realtime, consider alternatives.

Data Privacy

OpenAI:

  • No training with Assistants data (business tier).
  • Retention: 30 days default, optional 0 days.
  • Residency: US (Azure OpenAI offers EU).

For sensitive data, Azure OpenAI Assistants.

Error Handling

Runs can fail in several states:

  • queuedin_progresscompleted (happy path).
  • requires_action: your function call needs response.
  • failed: error (check last_error).
  • cancelled: manual cancel.
  • expired: timeout.

Poll via retrieve() or streaming.

Streaming

with client.beta.threads.runs.stream(
    thread_id=thread.id,
    assistant_id=assistant.id
) as stream:
    for event in stream:
        # Handle event
        pass

Real-time updates during run. Better UX vs blocking polling.

v2 Improvements

Assistants API v2 (2024) added:

  • File search with vector stores (replaced retrieval).
  • Better token usage reporting.
  • Improved file management.
  • Cleaner multiple files per assistant.

V1 deprecated; v2 is standard.

Migration Path

From Assistants to custom chat completions:

  1. Extract thread history to your DB.
  2. Implement RAG pipeline (pgvector/Pinecone + embeddings).
  3. Function calling with chat completions (similar API).
  4. Custom code execution if needed.

More work but more control and portable.

Conclusion

Assistants API is useful for prototypes and simple-medium apps. The “managed infrastructure” saves initial time but introduces lock-in and opaque costs. For serious production with complex requirements, chat completions + custom infrastructure (own vector DB, logic orchestration in code, tracked costs) gives more control. Decision balances speed vs control. For startups iterating fast, Assistants. For mature apps scaling, custom. For mixed, Assistants for experiments + custom for core product.

Follow us on jacar.es for more on OpenAI, agents, and LLM applications.

Entradas relacionadas