UI design for agents: principles we're starting to understand

For two years the conversation about how to integrate LLM agents into products reduced to a single answer: a chat with history on the left, a message box at the bottom, and a bubble area in the middle. That was the canonical form, the de facto template, and every new product reused it. Twelve months after industry conversation started questioning that consensus, UI patterns designed specifically for what agents do today are appearing. Some are consolidating quickly. Others are 2025 cycle fashion that won’t survive an honest evaluation.

This post reviews the principles I’m starting to see work in real products, the ones still in experiment, and where I think agent interfaces will stabilize in the coming months.

Why chat stopped being enough

Chat worked as long as the agent did one task at a time and the conversation fit on a screen. As soon as agents started doing background work for minutes, invoking several tools in parallel, and keeping state over files, chat came up short. The conversation window filled with noise, tool messages mixed with model messages, and the user lost sight of what was actually happening.

The concrete symptoms I saw repeating across many products were three. First, the user hit the stop button because they didn’t know if the agent was still working or had hung. Second, after a long run, the user didn’t know which files had changed or which commands had been run. Third, when the agent failed, the user had no clear way to see at which step it failed or how to resume.

Those three symptoms pointed to the same problem: chat as a metaphor wasn’t adequate for stateful, long-running, tool-using tasks. Something different was needed.

The progress panel pattern

The first pattern I’ve seen work well is separating conversation from task progress. The conversation stays in one column, and progress is shown in a separate panel listing step by step what the agent is doing, with per-tool state, files touched, commands executed, and summarized outputs. The user sees the execution history without having to read the whole conversation.

This pattern has appeared almost identically in several recent products. Claude Code, Cursor, Cline, and Aider have variants of the same principle: minimal conversation, rich progress panel. The convergence suggests the pattern works, it isn’t coincidence.

The key to the progress panel is that each step must be collapsible and auditable. The user doesn’t want to read each intermediate output, but they want to open a specific step and examine it when something has gone wrong. It’s the difference between a simple progress bar and a navigable log.

The review and approval interface

The second consolidated pattern is the review interface. When an agent is about to execute an action with side effects, like modifying files, running a command, or sending a request to an API, the user should be able to review and approve before it happens. In theory this was already in the original chat as text confirmation, but in practice text confirmation is insufficient because the user doesn’t read the full message.

What works better is showing the exact proposed change with the domain’s visual semantics: a side-by-side diff if it’s files, a table if it’s database rows, a rendered preview if it’s publishable content. Approval stops being a blind button and becomes an informed decision.

This pattern requires the agent to express its intention in a reviewable format before acting. That’s useful architectural pressure: it forces separating reasoning from effect, which makes the system safer and more debuggable.

The execution explorer

The third pattern, less consolidated but promising, is the execution explorer. An agent operating for hours may have executed dozens of tasks, and the user needs to navigate that history with a clear mental model. A linear list doesn’t work; a tree view is needed where each task has subtasks and each invocation has inputs and outputs.

Products starting to implement this pattern, like OpenAI Agent Responses or execution views in platforms like Modal, are experimenting with hierarchical structures. There’s no consensus yet, but the direction is clear: the linear log doesn’t scale with current complexity.

Fashion that doesn’t work

There are patterns that appeared with a lot of noise but haven’t settled. The first is voice interface as primary channel for work agents. It fits consumer assistants and hands-busy cases but not technical tasks where the user needs to see code, diffs, and tables.

The second is zero-interface, the agent that learns your habits and acts without confirmation. It sounds ambitious but clashes with the reality that models still make mistakes frequently enough that the user needs control over effectful actions. A fantasy for important tasks.

The third is the anthropomorphized agent with avatar and facial mimicry. It generates engagement in demos but noise in habitual use. Teams that have tried it in product have ended up removing it because it distracts.

Principles starting to emerge

Five principles repeat across products that work. The first is making agent state visible at all times. The user should never doubt whether it’s working, waiting for input, or finished. State ambiguity is the number one problem.

The second is separating conversation from execution. Talking to the agent and watching what it does are two distinct tasks requiring two distinct visual spaces.

The third is preserving reversibility. Every side-effect step should be reversible or at least audited in enough detail to undo. The user must be able to recover the previous state if they decide the direction was wrong.

The fourth is calibrating trust explicitly. The agent should indicate when it’s doing something routine and when something experimental where the user should review more carefully.

The fifth is giving the user the domain vocabulary. If the domain is code, the UI should speak of files, functions, and commits. If support, of tickets and customer conversations. A generic chat UI in a technical domain loses expressive capacity.

How to think about the decision

When a team asks me how to design their agent product’s UI, the question that helps most isn’t which pattern to copy but what tasks users are doing and how long each one takes. If the task takes less than a minute and has no persistent effects, chat is enough. If it takes more than five minutes or has persistent effects, chat stops being enough and you should think in progress panel, review, and execution explorer.

The other useful question is how much trust the user has in the agent. If it’s new, over-inform: every step visible, every decision auditable, every action reversible. If the user has used it for months and trusts it, you can simplify and hide detail. The same product probably needs two UI modes for the same functionality depending on user maturity level.

My conviction after watching this cycle is that chat will remain as universal input channel, but serious products will have much richer interfaces behind it. The era of the text bubble as the only metaphor for agents is ending, and teams investing now in alternative patterns will have a competitive advantage when user expectations evolve over the coming months.

UI design for agents: principles we’re starting to understand

Why chat stopped being enough

The progress panel pattern

The review and approval interface

The execution explorer

Fashion that doesn’t work

Principles starting to emerge

How to think about the decision

Entradas relacionadas

Why chat stopped being enough

The progress panel pattern

The review and approval interface

The execution explorer

Fashion that doesn’t work

Principles starting to emerge

How to think about the decision

Entradas relacionadas

Agent OS in production: real cases without the marketing

Enterprise GraphRAG: patterns after a year of adoption

How to install a local MCP server for your editor