Blameless Post-Mortems: How to Actually Improve
Table of contents
Actualizado: 2026-05-03
“Doing blameless post-mortems” has been the SRE mantra since Google popularised it. Everyone nods. Few organisations do it well. Between concept and practice there is a distance most teams fail to close — and the result is empty ritual: filled-out templates nobody reads, action items nobody executes, and the same incidents repeating.
This article is about how to run post-mortems that actually produce learning and change — with concrete techniques, not good intentions.
Key Takeaways
- Performative blameless — saying “no blame” while making someone’s involvement clear — destroys the mechanism.
- The three non-negotiable elements are: factual timeline, honest contributor analysis, and action items with owner and deadline.
- The “5 whys” assume linear causality; real incidents are multi-causal.
- Error budget policy and post-mortems are complementary tools: both require that decisions change based on data.
- The quarterly meta-post-mortem is what converts individual learning into systemic resilience.
Why “Officially Blameless” Post-Mortems Fail
Failure patterns are recognisable:
- Disguised blame. The post-mortem says “no blame” but the narrative makes clear who was at fault. The implicated person knows.
- Sanitised official narrative. What really happened is softened not to offend stakeholders. Real learning stays in private conversations.
- Theatrical action items. “Add more monitoring” / “Improve documentation”. Vague, no owner, no deadline. Never done.
- Not reading old post-mortems. Each incident seems new because nobody checks if it happened before.
- Only the big ones go to post-mortem. You lose the learning from near-misses, which are more valuable because they are more frequent.
Recognising these patterns is step one.
The Three Non-Negotiable Elements
A functional post-mortem has:
- A factual timeline of what happened, when, who saw it first, what was done.
- An honest analysis of contributors — not just “what failed” but “what made it easy or possible to fail”.
- Specific action items with owner and deadline, tracked to completion in a centralised system.
Without all three, it is wet paper.
Timeline: Details Matter
The timeline must answer six questions:
- T-0: what was happening before the incident. Often reveals a forgotten trigger (deploy, cron, config change).
- T + n: moment of initial failure. Who saw it first? How? (alert, customer, luck).
- Escalation: how it reached the right person. If it took too long, that’s process, not person.
- Mitigation: what worked, what didn’t, what was tried first.
- Recovery: when service came back.
- Follow-up: when officially closed.
Times in UTC or declared timezone. Better too many timestamps than too few.
The “5 Whys” Problem
5-whys is traditional technique: why did X fail? Because A. Why A? Because B. And so on to “root cause”. The problem is the assumption of linear, single causality. Real incidents are multi-causal: three services misalign at once, an alert existed but pager was misconfigured, a runbook existed but wasn’t found.
The better alternative is to think in contributors, not root cause. A list of factors that individually wouldn’t have caused the incident, but together did. Each deserves its own action item.
Blameless Interview Techniques
In the post-mortem meeting, the facilitator makes the difference. Five techniques that work:
- Ask “what information did you have”, not “why did you make that decision”. The decision is explained by available information, not the reverse.
- Chronology before interpretation. First agree what happened at each moment; then discuss why.
- Refer to person by role, not name, in the document. “The on-call” instead of “John”. Avoids focusing on who when reading later.
- Normalise human errors. “Anyone in that position with that information would have done the same” — if true, say it explicitly.
- Separate observations from judgements. “The alert took 7 minutes to fire” (observation) vs “the alert took too long” (judgement).
Action Items That Get Done
Badly defined action items are the post-mortem graveyard. Ones that get done have five characteristics:
- Specific owner. A person, not a team. If a team, nobody does it.
- Bounded deadline. “Q1” is too vague. “By 28 February” lands.
- Clear completion criteria. Not “improve monitoring” — “add alert X with threshold Y, reviewed by Z”.
- Centralised tracking. A system (Jira, Linear, GitHub Issues) where all action items live, with monthly review.
- Proportionality. Not 20 action items per incident. Prioritise 3-5 that actually move the needle.
The Quarterly Meta-Post-Mortem
Quarterly, looking at accumulated post-mortems is what separates learning organisations from those that repeat cycles. Key questions:
- Which action items were open and overdue?
- Which patterns repeat across incidents?
- Are there structural investments that would have prevented several incidents?
- Are SLOs and error budgets informing those investment priorities?
Without meta-analysis the cycle is infinite. With it, focus shifts from firefighting to building resilience.
Small Incidents Too
Most organisations only post-mortem SEV-1 incidents. But the cheapest learnings come from SEV-3 and near-misses — events where something serious almost happened but was caught in time.
A light model for small incidents: five-line timeline, three contributors, one or two specific action items, no formal meeting. The volume of small learning, aggregated, often exceeds that of a few large incidents.
Culture: The Unseen Factor
Techniques help but culture decides. Healthy culture signals:
- A junior engineer can say “I broke production” without fear.
- Leaders openly discuss their own mistakes.
- Lessons learned are celebrated, not hidden.
- Resources for action items are priority, not afterthought.
Changing culture takes years. Starting with techniques is the way — over time culture adapts to well-executed rituals.
Conclusion
Blameless post-mortems are a powerful tool when done well. The difference between theatre and real learning is in the details: factual timeline, honest contributor analysis, action items with owner and deadline, continuous tracking, and quarterly pattern review. The bigger cost is in rigour, not technique.