Jacar mascot — reading along A laptop whose eyes follow your cursor while you read.
Metodologías

Blameless Post-Mortems: How to Actually Improve

Blameless Post-Mortems: How to Actually Improve

Actualizado: 2026-05-03

“Doing blameless post-mortems” has been the SRE mantra since Google popularised it. Everyone nods. Few organisations do it well. Between concept and practice there is a distance most teams fail to close — and the result is empty ritual: filled-out templates nobody reads, action items nobody executes, and the same incidents repeating.

This article is about how to run post-mortems that actually produce learning and change — with concrete techniques, not good intentions.

Key Takeaways

  • Performative blameless — saying “no blame” while making someone’s involvement clear — destroys the mechanism.
  • The three non-negotiable elements are: factual timeline, honest contributor analysis, and action items with owner and deadline.
  • The “5 whys” assume linear causality; real incidents are multi-causal.
  • Error budget policy and post-mortems are complementary tools: both require that decisions change based on data.
  • The quarterly meta-post-mortem is what converts individual learning into systemic resilience.

Why “Officially Blameless” Post-Mortems Fail

Failure patterns are recognisable:

  • Disguised blame. The post-mortem says “no blame” but the narrative makes clear who was at fault. The implicated person knows.
  • Sanitised official narrative. What really happened is softened not to offend stakeholders. Real learning stays in private conversations.
  • Theatrical action items. “Add more monitoring” / “Improve documentation”. Vague, no owner, no deadline. Never done.
  • Not reading old post-mortems. Each incident seems new because nobody checks if it happened before.
  • Only the big ones go to post-mortem. You lose the learning from near-misses, which are more valuable because they are more frequent.

Recognising these patterns is step one.

The Three Non-Negotiable Elements

A functional post-mortem has:

  1. A factual timeline of what happened, when, who saw it first, what was done.
  2. An honest analysis of contributors — not just “what failed” but “what made it easy or possible to fail”.
  3. Specific action items with owner and deadline, tracked to completion in a centralised system.

Without all three, it is wet paper.

Timeline: Details Matter

The timeline must answer six questions:

  • T-0: what was happening before the incident. Often reveals a forgotten trigger (deploy, cron, config change).
  • T + n: moment of initial failure. Who saw it first? How? (alert, customer, luck).
  • Escalation: how it reached the right person. If it took too long, that’s process, not person.
  • Mitigation: what worked, what didn’t, what was tried first.
  • Recovery: when service came back.
  • Follow-up: when officially closed.

Times in UTC or declared timezone. Better too many timestamps than too few.

The “5 Whys” Problem

5-whys is traditional technique: why did X fail? Because A. Why A? Because B. And so on to “root cause”. The problem is the assumption of linear, single causality. Real incidents are multi-causal: three services misalign at once, an alert existed but pager was misconfigured, a runbook existed but wasn’t found.

The better alternative is to think in contributors, not root cause. A list of factors that individually wouldn’t have caused the incident, but together did. Each deserves its own action item.

Blameless Interview Techniques

In the post-mortem meeting, the facilitator makes the difference. Five techniques that work:

  • Ask “what information did you have”, not “why did you make that decision”. The decision is explained by available information, not the reverse.
  • Chronology before interpretation. First agree what happened at each moment; then discuss why.
  • Refer to person by role, not name, in the document. “The on-call” instead of “John”. Avoids focusing on who when reading later.
  • Normalise human errors. “Anyone in that position with that information would have done the same” — if true, say it explicitly.
  • Separate observations from judgements. “The alert took 7 minutes to fire” (observation) vs “the alert took too long” (judgement).

Action Items That Get Done

Badly defined action items are the post-mortem graveyard. Ones that get done have five characteristics:

  • Specific owner. A person, not a team. If a team, nobody does it.
  • Bounded deadline. “Q1” is too vague. “By 28 February” lands.
  • Clear completion criteria. Not “improve monitoring” — “add alert X with threshold Y, reviewed by Z”.
  • Centralised tracking. A system (Jira, Linear, GitHub Issues) where all action items live, with monthly review.
  • Proportionality. Not 20 action items per incident. Prioritise 3-5 that actually move the needle.

The Quarterly Meta-Post-Mortem

Quarterly, looking at accumulated post-mortems is what separates learning organisations from those that repeat cycles. Key questions:

  • Which action items were open and overdue?
  • Which patterns repeat across incidents?
  • Are there structural investments that would have prevented several incidents?
  • Are SLOs and error budgets informing those investment priorities?

Without meta-analysis the cycle is infinite. With it, focus shifts from firefighting to building resilience.

Small Incidents Too

Most organisations only post-mortem SEV-1 incidents. But the cheapest learnings come from SEV-3 and near-misses — events where something serious almost happened but was caught in time.

A light model for small incidents: five-line timeline, three contributors, one or two specific action items, no formal meeting. The volume of small learning, aggregated, often exceeds that of a few large incidents.

Culture: The Unseen Factor

Techniques help but culture decides. Healthy culture signals:

  • A junior engineer can say “I broke production” without fear.
  • Leaders openly discuss their own mistakes.
  • Lessons learned are celebrated, not hidden.
  • Resources for action items are priority, not afterthought.

Changing culture takes years. Starting with techniques is the way — over time culture adapts to well-executed rituals.

Conclusion

Blameless post-mortems are a powerful tool when done well. The difference between theatre and real learning is in the details: factual timeline, honest contributor analysis, action items with owner and deadline, continuous tracking, and quarterly pattern review. The bigger cost is in rigour, not technique.

Was this useful?
[Total: 10 · Average: 4.5]

Written by

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.