all frameworks
07 · Framework

SRE Maturity

From on-call heroes to error budgets and chaos
L1Reactive
What we measure:uptime
How we respond:heroics
How we learn:firefighting
L2Managed
What we measure:SLA
How we respond:runbooks
How we learn:postmortems
L3Proactive
What we measure:SLO + error budget
How we respond:automation
How we learn:blameless reviews
L4Resilient
What we measure:user-journey SLO
How we respond:self-healing
How we learn:game-days
L5Antifragile
What we measure:business SLI
How we respond:auto-rollback
How we learn:chaos in prod

SRE maturity is not 'do we have an on-call rotation', but how deeply the discipline is wired into the product. A 5-step ladder: Reactive → Managed → Proactive → Resilient → Antifragile.

Three things change at each level: what we measure (uptime → SLO → user journey SLO → business SLI), how we respond (heroics → runbooks → automation → self-healing) and how we learn (firefighting → postmortems → game-days → chaos in prod).

The ladder helps a team honestly name its current step and pick 2-3 practices to install to move up — without a cargo cult of 'let's do it like Google'.

How to use this model
01

Name the current maturity level without trying to look better than reality.

02

Compare what the team measures, how it responds, and how it learns after incidents.

03

Choose 2-3 practices that move the team to the next level.

Source talks
Share