OpenAI introduced function calling in GPT-3.5-turbo and GPT-4 in June 2023. It’s one of the features that most changes how an LLM integrates into an application: instead of parsing free text and praying, the model returns structured JSON you can execute directly. We cover how it works, which patterns have consolidated, and common pitfalls seen in production.
The Problem It Solves
Before function calling, integrating GPT with an external system followed a fragile pattern:
- You ask the model to respond in JSON with a specific schema.
- You apply regex or a robust parser to the output.
- If the model deviates slightly (a comment outside the JSON, an extra comma), everything fails.
- You add retries with increasingly strict prompts.
Function calling formalises this pattern. You explicitly declare available “functions” and their parameters using JSON Schema. The model decides when to invoke one and returns an object respecting the schema. Parsing is trivial.
How to Declare One
A function is described with three fields:
{
"name": "get_weather",
"description": "Gets current weather in a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City, e.g. 'Madrid'"},
"units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["city"]
}
}
You pass the list of available functions along with the user message:
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": "What's the weather in Madrid?"}],
functions=[get_weather_function],
function_call="auto",
)
The response can be a normal message, or an object with function_call containing name and arguments (a JSON string with parameters). Your code receives the JSON, executes the real function, and optionally returns the result to the model so it composes the final answer.
Use Cases That Have Matured
In the months since launch, clear patterns have emerged:
- Simple agents. A model that can invoke tools (web search, calculator, database) when needed.
- Structured extraction. Convert free text (email, transcribed call, unstructured form) into typed data — a case where fragile regex or traditional NER was used before.
- “Product assistant” conversational APIs. The user asks in natural language; the model decides which backend endpoint to invoke.
- Query routing. Decide between multiple pipelines (FAQ, support ticket, human escalation) based on question nature.
- Code generation calling internal APIs. The model “composes” function calls given a high-level goal.
The ReAct Pattern With Function Calling
The loop all agent frameworks (LangChain, LlamaIndex, AutoGen) repeat in essence:
1. User asks question
2. Model receives question + list of functions
3. Model decides: answer directly or call function
4. If calls: execute real function
5. Pass result to model
6. Model decides again: new function or final answer
7. Repeat until model gives final answer
Function calling makes it robust. Without it, it was a house of cards based on text parsing.
Common Errors Seen in Production
After months of use, the problems appearing again and again:
- Schemas too permissive. If you declare a parameter as
stringwith no enum or precise description, the model improvises. Be strict with types and enumerations. - Poor descriptions. The
descriptionof the function and each parameter is what the model reads to decide when to invoke it. It’s prompt, not documentation. Invest time in it. - Too many functions at once. More than 10-15 simultaneous functions degrades the model’s decision. Consider categorisation or an initial routing step.
- Trusting the function is always invoked. The model may decide to answer directly. Handle both cases.
- Not validating arguments. The model conforms to the schema in >95% of cases but not always. Validate before executing — especially if the function has side effects (delete data, send email).
- Infinite loops in agent loops. Without iteration limits, a confused agent can call itself forever. Set
max_iterations.
Cost and Latency
Function calling has extra cost worth knowing:
- Tokens consumed: function declarations are included in every call — they take input tokens. A large schema can add 500-1000 tokens per request.
- Chain latency: each “model → function → model” loop adds roundtrips. An agent requiring 4 steps takes 4× longer than a direct answer.
- Caching: declared functions don’t change between calls; with prompt caching (when available for your model) the cost amortises.
Conclusion
Function calling is one of the best recent additions to the OpenAI API. It solves a real problem (parsing LLM output) with the right abstraction. For any project integrating GPT with systems that expect structured data, it’s the default option. The agent patterns built on this will define how software with AI is built in the coming years.
Follow us on jacar.es for more on LLM integration and tools to build AI products.