The Five Stages of a Production Bug (It's Basically Grief)

When the Pager Goes Off

It doesn't matter if you've been writing code for six months or sixteen years. The moment a production alert fires — especially on a Friday afternoon, especially after you just told your team "this deploy is totally safe" — a very predictable psychological journey begins. Psychologists have Kübler-Ross. Developers have this.

Stage 1: Denial

The alert fires. You glance at it and your first instinct is to question the monitoring system itself. "That can't be right." You refresh the dashboard. You check if it's a fluke. You ask Slack if anyone else is seeing this, hoping someone will say "oh that alert is always noisy, ignore it."

Nobody says that. The error rate is very, very real. But denial buys you approximately 90 seconds of comfort, and those 90 seconds are precious.

Stage 2: Anger

The realization sets in. Something is broken in production. Now comes the attribution phase, which is really just anger with a target:

"Who deployed last?"
"Which PR touched this?"
"Why didn't the tests catch this?"
"Why do we even HAVE this service?"

The anger is not entirely unproductive — it does propel you toward the git log, which is where the actual investigation begins. But let's be honest: for the first few minutes, you're not debugging. You're just upset.

Stage 3: Bargaining

This is the "maybe I can avoid touching the thing" phase. You try the least invasive possible interventions first, with increasing desperation:

Restart the service. (It doesn't help.)
Roll back the last config change. (Also doesn't help.)
Seriously consider if turning the server off and on again counts as a solution. (It does not.)
Stare at the logs while mentally bargaining with the universe that this is somehow an upstream problem you can blame on a vendor.

It is not an upstream problem. It is your code.

Stage 4: Despair (a.k.a. The Rubber Duck Phase)

You've been in the codebase for 45 minutes. You've added 12 console.log statements. You've opened four Stack Overflow tabs and none of them are quite the same situation. The error makes no sense given what the code is doing, and you're starting to question whether you ever understood how computers work at all.

This is the valley. Most developers will tell you, if they're being honest, that this is where the real debugging happens — in the quiet, humbling moment when you stop assuming and start actually reading the error message carefully.

Stage 5: Acceptance (and the Hotfix)

You find it. It's always something. A null check that wasn't there. An assumption about data format that stopped being true. A dependency that updated silently. You fix it, you test it locally, you deploy it with the grim efficiency of someone who has been through this before and will go through it again.

The commit message is one word: "fix". Or sometimes two: "fix bug". On reflective days, maybe three: "fix production issue". You write the post-mortem, you add the missing test, and you promise yourself — genuinely, sincerely — that you'll never deploy on a Friday again.

You will deploy on a Friday again. But for now, the service is up. That's enough.