Agents that drive the computer: patterns that work

Logotipo oficial del proyecto browser-use, una de las bibliotecas más adoptadas en 2026 para construir agentes que controlan navegadores web mediante modelos de lenguaje, con una arquitectura que combina captura DOM estructurada, capturas de pantalla y razonamiento del modelo sobre qué acción realizar en cada paso del flujo automatizado de navegación multipágina

When Anthropic released computer use in October 2024, many teams tried it for an afternoon, were surprised to see Claude move the mouse and type in a spreadsheet, and then shelved it as a tech curiosity. A year and a half later, with computer use stabilized in Claude 4.5, browser-use turned into the standard browser automation library, and OpenAI Operator and Gemini Control covering the space, agents that drive the computer have become real tools. Not for everything, but for a slice of cases where they replace brittle RPA macros or scrapers that break every week.

What has changed since 2024

The material change from the first versions is that models now understand graphical interfaces with enough precision for multi-minute tasks without intervention. Computer use in Claude 4.5 resolves ten-to-fifteen-step flows with reasonable success rates when the interface is standard. Browser-use has matured into a production library with handlers for common failures, session persistence across steps, and structured DOM capture the model can query without repainting the whole screen.

Cost remains a central topic. An agent that clicks fifteen elements in a web interface consumes screenshots and reasoning tokens each step. In 2026, a ten-minute computer-use flow costs between fifty cents and two euros depending on model and resolution. This rules out automating tasks of seconds a script or macro solves better, but makes it competitive for multi-minute tasks where maintaining a classic scraper costs human hours weekly.

Reliability has improved a lot but isn’t perfect. A well-prompted agent on a known interface hits between seventy and ninety percent of runs; as soon as unexpected modal alerts, redesigns, or visually dense screens appear, the rate drops. This forces designing flows with clear checkpoints, screenshots saved as evidence, and retries with explicit context on what failed before.

Patterns that survive in production

The first pattern that has shown consistent value is the human-interface scraper. Legacy enterprise apps without an API, SaaS panels with poor exports, or proprietary systems where the only way to extract data is click-and-copy. An agent with browser-use walks the flow daily, extracts the data you need, and drops it into CSV or a database. Versus a Selenium scraper with brittle selectors, the agent is more expensive per run but survives minor redesigns better because it understands each screen’s purpose.

The second pattern is low-volume administrative task automation. Filling forms in supplier portals, uploading files to platforms with changing interfaces, booking resources in legacy internal systems. Where an RPA macro needs maintenance every two months, an agent absorbs small variations and keeps working. The limit is volume: if you do the task a hundred times a day, agent cost skyrockets and a proper API integration or hire becomes more profitable. If you do it five times a day, the agent is reasonable.

The third pattern is the QA exploratory-testing assistant. Instead of writing end-to-end tests that break every DOM change, an agent receives a functional goal and walks the app verifying the flow completes. It produces reports with screenshots and behavior descriptions. It doesn’t replace stable automated tests that are much cheaper to run, but covers well the areas where the team can’t get to tests or where the interface changes too much for tests to be worth it.

Anti-patterns you pay dearly for

There are three anti-patterns that by 2026 are documented enough to avoid. The first is using agents for high-volume or low-latency tasks: if you need to process thousands of operations in minutes or respond to events in seconds, the agent is too slow and expensive; build the proper API integration. The second is delegating critical decisions without supervision: approving payments, changing production config, or any irreversible action. Agents hit almost always but not one hundred percent, and the one percent on irreversible decisions wrecks the whole business case.

The third anti-pattern is pretending the agent will replace a human on tasks requiring judgment. An agent fills structured forms well; it does not evaluate well whether a contract has problematic clauses, whether a candidate fits the culture, or whether a design is good. Confusing click automation with replacing human reasoning leads to expensive deployments that get abandoned once the team realizes reviewing agent output costs more than doing the task fresh.

Typical 2026 architecture

A reasonable computer-use agent deployment in 2026 has several pieces. An orchestration layer that fires the flow on schedule or event, an agent layer with packaged context, a verification layer that checks the result makes sense, and an evidence-persistence layer with screenshots and reasoning traces. Teams that skip verification and evidence layers discover quickly that when something goes wrong they can’t explain why or reproduce the failure.

# Typical browser-use 2026 pattern
from browser_use import Agent
from anthropic import Anthropic

agent = Agent(
    task="Download monthly report from supplier panel",
    llm=Anthropic(model="claude-4.5-sonnet"),
    max_steps=15,
    save_screenshots=True,
    on_step=lambda s: log_step(s),
)
result = agent.run()
verify_and_persist(result)

This skeleton is the reasonable minimum. In production you add retries on failed steps, channel notifications when the agent asks for human help, and a maximum budget per run so a badly managed loop doesn’t eat the monthly quota.

Real cost and when it pays off

The arithmetic we do today is fairly simple. If a human spends two hours weekly on a repetitive interface task, annual cost is around four thousand euros at mid-range Spanish salaries. If the agent costs twenty euros a month in tokens and three hours of initial development, payoff is in months, not years. If the task only takes half an hour weekly, automating it probably isn’t worth it unless it’s prone to expensive errors.

The tipping point where a custom API integration beats the agent usually sits at daily or higher runs with stable tasks. If the system has an API, even a private one, and the team can spend a week building against it, the integration is cheaper to operate and more reliable. If the system has no API and never will, the agent is the best available option. The common mistake is assuming the agent is always cheaper; it isn’t when volume rises.

My reading

In 2026, agents that drive the computer have found their pragmatic niche: low-to-medium volume tasks on systems without APIs, exploratory test automation, and brittle scraper replacement. Where there used to be expensive-to-maintain RPA macros or repetitive human work with no solution, there’s now a reasonable middle option.

The decision to adopt an agent looks like any tool decision: measure current cost of the problem, cost of building a proper solution, and cost of maintaining it. If the agent saves more than it costs to run, go ahead with solid verification and evidence layers. Otherwise, what you need is probably an API integration or accepting that task stays manual. What makes no sense in 2026 is either ignoring the tool out of 2024-hype prejudice or deploying it everywhere because you can. The middle path is where the real value sits.

Entradas relacionadas