Written by

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

Desarrollo de Software Inteligencia Artificial

agentes api function calling json schema llm openai

OpenAI Function Calling: Structuring Model Output

October 20, 2023 10 min read 132 reads

Table of contents

Key takeaways
The Problem It Solves
How to Declare One
Use Cases That Have Matured
The ReAct Pattern With Function Calling
Common Errors Seen in Production
Cost and Latency
Conclusion

Actualizado: 2026-05-03

OpenAI introduced function calling in GPT-3.5-turbo and GPT-4 in June 2023. It’s one of the features that most changes how an LLM integrates into an application: instead of parsing free text and praying, the model returns structured JSON you can execute directly. We cover how it works, which patterns have consolidated, and common pitfalls seen in production.

Key takeaways

Function calling formalises communication between the model and external code via declarative JSON Schema.
The model decides when to invoke a function — or to answer directly. Always handle both cases.
Overly permissive schemas and poor descriptions are the number-one cause of incorrect outputs.
The ReAct pattern — decide, act, observe, iterate — is the foundation of all agent frameworks.
Each function declaration consumes input tokens; with many functions or long contexts, costs add up.

The Problem It Solves

Before function calling, integrating GPT with an external system followed a fragile pattern:

You ask the model to respond in JSON with a specific schema.
You apply regex or a robust parser to the output.
If the model deviates slightly (a comment outside the JSON, an extra comma), everything fails.
You add retries with increasingly strict prompts.

Function calling formalises this pattern. You explicitly declare available “functions” and their parameters using JSON Schema. The model decides when to invoke one and returns an object respecting the schema. Parsing is trivial.

How to Declare One

A function is described with three fields:

python

{
  "name": "get_weather",
  "description": "Gets current weather in a city",
  "parameters": {
    "type": "object",
    "properties": {
      "city":   {"type": "string", "description": "City, e.g. 'Madrid'"},
      "units":  {"type": "string", "enum": ["celsius", "fahrenheit"]}
    },
    "required": ["city"]
  }
}

You pass the list of available functions along with the user message:

python

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What's the weather in Madrid?"}],
    functions=[get_weather_function],
    function_call="auto",
)

The response can be a normal message, or an object with function_call containing name and arguments (a JSON string with parameters). Your code receives the JSON, executes the real function, and optionally returns the result to the model so it composes the final answer.

Use Cases That Have Matured

In the months since launch, clear patterns have emerged:

Simple agents. A model that can invoke tools (web search, calculator, database) when needed.
Structured extraction. Convert free text (email, transcribed call, unstructured form) into typed data — a case where fragile regex or traditional NER was used before.
“Product assistant” conversational APIs. The user asks in natural language; the model decides which backend endpoint to invoke.
Query routing. Decide between multiple pipelines (FAQ, support ticket, human escalation) based on question nature.
Code generation calling internal APIs. The model “composes” function calls given a high-level goal.

The ReAct Pattern With Function Calling

The loop all agent frameworks — LangChain, LlamaIndex, AutoGen — repeat in essence:

1. User asks question
2. Model receives question + list of functions
3. Model decides: answer directly or call function
4. If calls: execute real function
5. Pass result to model
6. Model decides again: new function or final answer
7. Repeat until model gives final answer

Function calling makes this loop robust. Without it, it was a house of cards based on text parsing. With it, the architecture is deterministic and testable.

Diagrama de red neuronal coloreada representando la arquitectura de un agente LLM con capas de decisión

Common Errors Seen in Production

After months of use, the problems appearing again and again:

Schemas too permissive. If you declare a parameter as string with no enum or precise description, the model improvises. Be strict with types and enumerations.
Poor descriptions. The description of the function and each parameter is what the model reads to decide when to invoke it. It’s prompt, not documentation. Invest time in it.
Too many functions at once. More than 10-15 simultaneous functions degrades the model’s decision. Consider categorisation or an initial routing step.
Trusting the function is always invoked. The model may decide to answer directly. Handle both cases explicitly.
Not validating arguments. The model conforms to the schema in >95% of cases but not always. Validate before executing — especially if the function has side effects (delete data, send email).
Infinite loops in agent loops. Without iteration limits, a confused agent can call itself forever. Set max_iterations.

Cost and Latency

Function calling has extra costs worth knowing:

Tokens consumed: function declarations are included in every call — they take input tokens. A large schema can add 500-1000 tokens per request.
Chain latency: each “model → function → model” loop adds roundtrips. An agent requiring four steps takes 4× longer than a direct answer.
Caching: declared functions don’t change between calls; with prompt caching the cost amortises below what it seems. This connects with the principles of mature prompt engineering.

Conclusion

Function calling is one of the best recent additions to the OpenAI API: it solves a real problem (parsing LLM output) with the right abstraction. For any project integrating GPT with systems that expect structured data, it’s the default option. The agent patterns built on this will define how software with AI is built in the coming years.

Was this useful?

[Total: 10 · Average: 4.5]

Post Views: 132

Written by

Javier Cañete

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

Desarrollo de Software

AI editors in 2026: comparison after a year of use

Claude Code, Cursor, Aider, Copilot, Windsurf. Tras un año intenso con los principales editores asistidos por IA, esta es la comparativa que importa para quien elige hoy.

243 5 min April 28, 2026

Desarrollo de Software

AI tools for developers: the 2026 stack

El stack de herramientas IA que un desarrollador usa en 2026 es distinto al de hace dieciocho meses. Editores agénticos, herramientas de revisión, agentes de terminal y asistentes de pruebas se han estabilizado en roles reconocibles. Guía práctica por categoría.

178 13 min March 29, 2026 4.5

Desarrollo de Software

Rust in the Linux kernel: balance after several years

Cuatro años y medio después de la entrada oficial de Rust en el kernel Linux 6.1, con drivers reales de GPU Apple y NVMe en producción y tras varios conflictos mediáticos entre mantenedores, toca hacer balance técnico sin histrionismo. Qué funciona, qué cuesta y hacia dónde va la próxima fase.

149 11 min March 8, 2026 4.3

Desarrollo de Software

WASI preview 3: adoption and real cases

WASI preview 3 llegó como estándar estable a finales de 2025 y ha tenido unos meses para demostrar si realmente desbloquea los casos que preview 2 se quedaba cortos. Recorrido honesto por adopciones reales, bibliotecas maduras y patrones que empiezan a funcionar en producción.

274 13 min February 6, 2026 4.6

OpenAI Function Calling: Structuring Model Output

Key takeaways

The Problem It Solves

How to Declare One

Use Cases That Have Matured

The ReAct Pattern With Function Calling

Common Errors Seen in Production

Cost and Latency

Conclusion

Related posts

AI editors in 2026: comparison after a year of use

AI tools for developers: the 2026 stack

Rust in the Linux kernel: balance after several years

WASI preview 3: adoption and real cases