Product discovery with AI: practices that stick

Equipo de producto en taller presencial trabajando con notas adhesivas de colores sobre pared, escenario clásico de una sesión de síntesis tras entrevistas de descubrimiento; la fotografía ilustra cómo en 2026 la inteligencia artificial generativa ha entrado en varias fases concretas del proceso pero no sustituye esta conversación humana entre quienes diseñan, investigan y priorizan, donde se contrastan hipótesis con hallazgos reales y se decide qué problemas merecen inversión de producto

Product discovery, the continuous process by which a product team figures out which problems are worth solving and how to validate solutions before building them, was one of the territories where generative AI created most expectation through 2023 and 2024. Two years later, with accumulated experience including failures, we can make a cooler read of which practices have passed the test of time and which should be discarded without nostalgia.

What AI does well in discovery

Interview-transcript synthesis is probably the most useful and least controversial use case. A team doing ten user interviews per sprint generates dozens of hours of audio and corresponding transcripts, and processing that mass of text to extract patterns, relevant quotes, and recurring themes is exactly the kind of task where current models add value without high risk. Synthesis doesn’t replace the researcher; it frees them from tedious work to spend time on interpretation.

Generating questions for interview guides also works well as a first approximation. Asking the model to propose twenty open questions about a concrete problem, with variants by user segment, produces a starting point the researcher refines and filters. The value isn’t accepting questions as-is, but not starting from zero and having a wider set than the brain alone would have generated before the session.

Exploring cross-industry analogies has surprised positively. Asking the model how unrelated sectors solve similar problems, or what product patterns have worked in comparable contexts, sometimes yields insights we wouldn’t have found manually. It doesn’t always work, but as a search-space widening technique it performs acceptably, especially in initial phases of a new problem where there’s still no clarity on where to look.

Drafting specs, user stories, and acceptance criteria has also found its place. The model produces a first reasonably structured text on an idea, the product manager edits it, discusses with engineering and design, and the time from concept to discussable document shrinks. Note, the final document remains human responsibility; the model just shortens the path to having something tangible on the table.

What AI does badly

Direct generation of product hypotheses without real data has been a repeated failure. Asking a model to suggest what problems a segment’s users have, without feeding it real prior research, produces plausible but generic lists that sound like an external consultant voicing obviousness. The problem isn’t linguistic quality; without concrete-segment data, the model produces the internet’s statistical consensus, not useful insight.

Automatic prioritization has disappointed in most cases. Several teams tried through 2024 and 2025 to use models to score backlogs against criteria like impact, effort, and risk. Results were superficially coherent but didn’t withstand detailed discussion with stakeholders, because the model lacked full context on internal constraints, strategic partnerships, key-customer timelines, or political motivations that weigh heavily on real decisions. Prioritization remains human conversation with analytical support.

User simulation, where the team asked the model to act as a specific persona and respond to prototypes, had mixed and mostly discouraging results. The model produces plausible answers but converges to a statistical average that doesn’t reflect real user diversity, and generates false positives about adoption because it tends to be nice to proposed ideas. There are documented cases of teams prematurely validating directions that later failed contact with real users, precisely because the model never objected with the force real humans do.

Detection of unexpressed needs is territory where the model usually falls short. Users often can’t articulate what they need, and the good researcher’s job is to listen between lines, observe contradictions, and catch non-verbal signals. These capabilities, the core of serious qualitative discovery, aren’t delegated to text or to the model. Any attempt to automate them produces polished transcripts with the interesting observations eliminated in the process.

Practices that have matured

After two years of trial and error, some concrete practices have consolidated as stable. The first is using the model as an analysis partner after real interviews. The researcher transcribes, the model helps synthesize, the human team discusses findings. This pattern leverages AI’s good parts without falling into the failures of delegating core thinking.

The second consolidated practice is using the model to generate counterarguments. After formulating a hypothesis, asking the model to build the three best arguments against it, without being nice, as if a hostile external critic. This helps harden the hypothesis before investing time in validating it. It works much better than asking for validation, which tends to produce superficial approval.

The third is generating text variants to test, applied to copy, value-proposition phrasings, and marketing messages. The model produces twenty variants, the team picks five, they’re tested with real users, and learning returns to the model for the next iteration. The cycle is fast, cost is low, and the outcome is usually more refined text than came out of pure internal marketing sessions.

The fourth mature practice is analyzing customer-support conversations as a discovery source. Support interactions contain a mine of information on real user frictions, and processing that corpus with model help identifies patterns otherwise hidden. Companies with large ticket volumes have started to integrate this source as systematic input to discovery cycles with concrete results.

Common errors to avoid

Several recurring traps deserve explicit mention. The first is confusing speed with rigor. The model accelerates many discovery tasks but doesn’t make them better by itself; an accelerated cycle without adequate human controls quickly produces well-documented bad decisions. Speed must be earned by simplifying tedious tasks, not by eliminating reflective steps.

The second trap is assuming the model understands business context. Without explicit loading of segment information, metrics, partnerships, and constraints, model suggestions are so general they don’t help decide. Investing time in preparing well-curated context, reused across interactions, does more for answer quality than any advanced prompting technique.

The third trap is excluding experienced researchers and product managers from model interaction. In several teams we’ve seen that when juniors use AI without senior supervision, the team loses the ability to detect when outputs are plausible but wrong. Senior knowledge remains the critical filter, and that filter requires being present in the process.

How to integrate without corrupting

For a product team wanting to incorporate AI into discovery without corrupting its fundamentals, the sensible sequence is gradual. Start with post-interview synthesis and drafting, which are lowest-risk and highest-immediate-return areas. Measure time saved and perceived quality of outputs over two or three sprints.

Move next to counterargument generation and analogy exploration, higher-leverage areas but requiring more maturity in critical model use. In this phase, spend time documenting which query types produce useful results and which don’t, so the team develops shared intuition on when to invoke AI and when not.

Only after, with months of accumulated experience, consider more ambitious integrations like systematic support-conversation analysis or hybrid workflows where the model intervenes at multiple cycle points. By then, team error tolerance is realistic and human-review processes are well established.

When it pays off

AI in product discovery pays off when the team has a mature process without AI. Introducing generative tools in a team that doesn’t yet have regular-interview discipline, rigorous synthesis, and well-formulated hypotheses only accelerates chaos. First, solid basic practices, then tools that amplify them.

It also pays off when discovery-input volume justifies the investment in AI integration. A small team with one weekly interview and little material doesn’t benefit much; a team with five active researchers, multiple segments, and much accumulated material extracts significant returns. The inflection point is usually when manual synthesis becomes the bottleneck.

My reading

Product discovery with AI in 2026 is a mature area with concrete practices that work and others discarded clearly. The most important lesson is that AI amplifies good processes and accelerates bad ones toward faster, better-documented failures. Teams already doing serious discovery before, with AI do more and better; those without process who expected AI to replace it have learned it doesn’t work that way.

The product manager’s and researcher’s role remains central, but its profile has changed. Less time on mechanical synthesis and drafting tasks, more time on critical decisions, user conversation, and critique of model outputs. It’s a positive change for those who love the craft and a threat only to those who confused activity with value. The good news is that the standard of good discovery has risen without the practicing team’s cost rising; the bad is that those who don’t adapt fall behind those who have integrated the practices that work.

Entradas relacionadas