GPT-5: public availability and early impressions

Diagrama oficial de la estructura corporativa de OpenAI en Wikimedia Commons, que contextualiza a la compañía responsable del lanzamiento de GPT-5 discutido en el artículo

OpenAI released GPT-5 in early August 2025 after nearly a year and a half of rumors, delays, and inflated expectations. After a few weeks of real use in my own and others’ projects, first impressions can be ordered. The leap is not the earthquake the marketing thread suggested, but it is not trivial either. It is, above all, a release that consolidates lines already opened by o1 and o3 and puts them within reach of anyone with a paid account, not just early adopters.

What GPT-5 actually is

Initial confusion about GPT-5 came from the fact that OpenAI had been offering two parallel families for a long time. On one side GPT-4o and its variants, aimed at general chat with integrated multimodal. On the other the o family, with o1, o1-mini, then o3 and o3-mini, aimed at step-by-step reasoning. GPT-5 is the announced fusion of both: a single model that internally adjusts how much reasoning it applies according to the complexity of the problem.

In practice this means the same API call can solve a quick sum in under a second and a complex mathematical proof while spending thirty seconds of internal reasoning. The reasoning_effort parameter, inherited from the o family, allows explicit control of how much to think before answering: low, medium, and high are the options. Without the parameter, the model decides by itself based on the question.

The internal architecture is not published, but external characteristics suggest a mixture-of-experts model larger than GPT-4o with a more refined planning layer in reasoning mode than o3. OpenAI has not released parameter counts, continuing the policy of withholding them since GPT-4.

Where it performs best

The tasks where GPT-5 takes a clear leap over predecessors are those requiring structured reasoning. University-level math problems, code generation that requires planning across several files, analysis of legal contracts with extraction of dependent clauses. In these, the combination of higher raw capacity and explicit reasoning yields results noticeably better than GPT-4o and comparable or superior to o3 with lower latency cost.

In code generation I have noticed a concrete difference versus earlier models: GPT-5 is much better at understanding cross-file context when given a repository or multiple files. Refactors that require coordinated changes in multiple places come out right on the first attempt more often. This is a practical kind of gain: not that it does new things, but that the things that used to fail half the time now fail fifteen percent of the time.

Adherence to long instructions and strict output formats, a weak point of GPT-4o, has also improved. If you ask for JSON with a specific schema, it complies with high precision. If you ask it to follow a multi-step checklist, the order is respected. This has implications for those building agents: the validation effort on outputs goes down.

Where it does not change as much

For casual conversation, email summaries, draft text, or translations, the difference with GPT-4o is small. So small that in many flows the extra cost of GPT-5 is not justified. OpenAI implicitly acknowledges this by keeping GPT-4o and its cheaper variants available. The documentation’s own recommendation is to use GPT-5 when the task benefits from reasoning and use GPT-4o for standard conversation.

Hallucinations still exist. They are less frequent and tend to appear on more specific topics, but they remain a real problem. On questions about recent programming libraries, on current events beyond the training cut-off, or on very specialized topics, the model can still invent with great confidence. The classic advice does not change: any important data coming out of the model must be verified against a primary source.

It also does not solve the fundamental limitations of transformer models. There is no persistent memory across separate conversations by default, though OpenAI has added an optional memory system with consent. The context window is larger, 400 thousand tokens, but still finite and with attention degradation at the extremes. The model cannot execute code by itself; it depends on external tools.

Price and availability

GPT-5 is available in ChatGPT Plus, Pro, and Enterprise, and via the API. Prices published in August put GPT-5 at roughly eight times the per-token cost of GPT-4o, with a GPT-5-mini model at a price comparable to GPT-4o. The API remains the preferred route for integration into products: GPT-5 natively supports structured outputs, parallel tool calls, and an incremental response mode that emits reasoning and result interleaved. For those already using o3 with tool use, the transition is trivial; for those coming only from GPT-4o, code needs adaptation. OpenAI has tightened some usage policies and the API rate limit is stricter at the first tiers.

Impact on competitors

The arrival of GPT-5 reopens the question of where the rivals stand. Anthropic keeps Claude 3.7 Sonnet as a reference for structured reasoning and remains competitive on many tasks. Google shipped Gemini 2.5 Pro earlier in the year and Gemini 3 is announced for autumn. Meta released Llama 4 in June with open-weights versions. xAI and Grok 4 have climbed positions on specific benchmarks.

The August 2025 landscape is one of rough parity among leading labs, with each standing out in specific areas. GPT-5 is not clearly superior in everything. In advanced math Claude still performs at the same level. In visual tasks Gemini has concrete advantages from its multimodal base training. In local models, large Llama 4 variants come close to GPT-4o on standard tasks with the huge benefit of running on your own hardware.

How to think about the decision

My reading of the first weeks is that GPT-5 is a useful update but not essential. For teams already built on GPT-4o with flows that work, the right question is not whether to migrate but which parts of the flow would benefit. The answer is usually: those involving multi-step reasoning, complex code generation, or strict long-instruction following. Those parts can be moved to GPT-5 without changing the rest.

For teams starting now, choosing GPT-5 as the default model and GPT-5-mini as the volume model is a reasonable strategy. The combination offers an acceptable compromise between cost and capability, with a simple fallback curve if cost grows. The unified API avoids having to learn three different families.

For cost-sensitive projects or those that depend on data sovereignty, open models in 2025 are good enough to cover a large share of cases. Mixing a local open model with occasional calls to GPT-5 for the cases that require it is a pattern I see more and more, and it has reasonable economics. What no longer makes sense in 2025 is using a single provider for everything: each task has its best-fit model.

Entradas relacionadas