GPT-5: public availability and early impressions
Actualizado: 2026-05-03
OpenAI released GPT-5 in early August 2025 after nearly a year and a half of rumors, delays, and inflated expectations. After a few weeks of real use in my own and others’ projects, first impressions can be ordered. The leap is not the earthquake the marketing suggested, but it is not trivial either. It is, above all, a release that consolidates lines already opened by o1 and o3 and puts them within reach of anyone with a paid account.
Key takeaways
- GPT-5 is the fusion of the GPT-4o and o3 families: a single model that adapts how much reasoning it applies based on problem complexity.
- The
reasoning_effortparameter (low / medium / high) gives explicit control over how much to think before answering. - It performs better than predecessors on structured reasoning, multi-file code generation, and long-instruction following.
- For casual conversation, summaries, or translations, the difference with GPT-4o is small and the extra cost rarely justifies itself.
- Hallucinations still exist; any important data from the model must be verified against a primary source.
- The August 2025 landscape shows rough parity among labs: GPT-5 is not clearly superior in everything.
What GPT-5 actually is
Initial confusion came from OpenAI having offered two parallel families for a long time:
- GPT-4o and variants: general chat with integrated multimodal.
- o family (o1, o1-mini, o3, o3-mini): step-by-step reasoning.
GPT-5 is the announced fusion of both: a single model that internally adjusts how much reasoning it applies based on problem complexity. In practice, the same API call can solve a quick sum in under a second and a complex mathematical proof while spending thirty seconds of internal reasoning. The reasoning_effort parameter, inherited from the o family, allows explicit control: low, medium, and high. Without the parameter, the model decides by itself.
Where it performs best
Tasks where GPT-5 takes a clear leap over predecessors:
- Mathematical and logical reasoning at university level.
- Code generation requiring planning across multiple files: coordinated refactors come out right on the first attempt more often.
- Legal contract analysis with extraction of dependent clauses.
- Long instruction following and strict output format adherence, a weak point of GPT-4o. If you ask for JSON with a specific schema, it complies with high precision.
This improvement in instruction following has implications for those building agents: the validation effort on outputs goes down.
Where it does not change as much
For casual conversation, email summaries, draft text, or translations, the difference with GPT-4o is small. So small that in many flows the extra cost is not justified. OpenAI implicitly acknowledges this by keeping GPT-4o and its cheaper variants available. The documentation’s own recommendation: use GPT-5 when the task benefits from reasoning and GPT-4o for standard conversation.
Hallucinations still exist. They are less frequent and tend to appear on more specific topics, but they remain a real problem. The classic advice does not change: any important data from the model must be verified against a primary source.
GPT-5 also does not solve the fundamental limitations of transformer models:
- No persistent memory across separate conversations by default (though OpenAI has added an optional memory system with consent).
- Context window of 400 thousand tokens, finite and with attention degradation at the extremes.
- The model cannot execute code by itself; it depends on external tools.
Price and availability
GPT-5 is available in ChatGPT Plus, Pro, and Enterprise, and via the API. Prices published in August put GPT-5 at roughly eight times the per-token cost of GPT-4o, with a GPT-5-mini model at a price comparable to GPT-4o. The API natively supports structured outputs, parallel tool calls, and an incremental response mode that emits reasoning and result interleaved. For those already using o3 with tool use, the transition is trivial; for those coming only from GPT-4o, code needs adaptation.
The competitive landscape
GPT-5’s arrival reopens the question of where the rivals stand. In August 2025 the landscape shows rough parity among leading labs, each standing out in specific areas:
- Claude (Anthropic): reference for structured reasoning and long context.
- Gemini 2.5 Pro (Google): concrete advantages in multimodal tasks.
- Llama 4 (Meta): open-weights versions approaching GPT-4o on standard tasks, with the huge benefit of running on your own hardware.
- Grok 4 (xAI): has climbed positions on specific benchmarks.
GPT-5 is not clearly superior in everything. In advanced math, Claude performs at the same level. In visual tasks, Gemini has concrete advantages.
My read
GPT-5 is a useful update but not essential. For teams already built on GPT-4o with flows that work, the right question is not whether to migrate but which parts of the flow would benefit: those involving multi-step reasoning, complex code generation, or strict long-instruction following. Those parts can be moved to GPT-5 without changing the rest.
For teams starting now, choosing GPT-5 as the default model and GPT-5-mini as the volume model is a reasonable strategy. The combination offers an acceptable compromise between cost and capability.
What no longer makes sense in 2025 is using a single provider for everything: each task has its best-fit model. Mixing a local open model with occasional calls to GPT-5 for the cases that require it is a pattern with reasonable economics that is seen more and more.