Incident Response Procedure
Phase 12, general incident response for Atrium production.
Referenced from SECURITY.md.
Severity Definitions
Procedure
1. Triage
- Identify severity using definitions above
- Assign incident commander (on-call, see
runbooks/on-call-rotation.md) - Create incident channel in Discord:
#incident-YYYY-MM-DD-<slug>
2. Communicate
#ops-alerts + #incident-*#ops-alerts + #incident-*#ops-alerts3. Mitigate
P0, Fund safety:
# Emergency pause via Praetor multisig
cast send $PRAETOR_TIMELOCK "pause()" --private-key $MULTISIG_KEY_1
# Requires 2/3 multisig confirmation within 48h timelock
# For immediate action: use PosternKillSwitch
cast send $POSTERN_KILL_SWITCH "revokeAll(address)" $COMPROMISED_ACCOUNT
P1, Service restoration:
# Hotfix branch
git checkout -b hotfix/incident-YYYY-MM-DD
# Fix, test, push
git push -u origin hotfix/incident-YYYY-MM-DD
# Vercel auto-deploys preview; promote to production via Vercel UI
P2/P3, Ticket:
- Create GitHub issue with
incidentlabel - Link to incident channel
- Schedule for next sprint
4. Postmortem
Required for P0 and P1 within 48 hours.
Template: incidents/YYYY-MM-DD-<slug>.md
# Incident: <title>
**Date:** YYYY-MM-DD
**Severity:** P0/P1
**Duration:** X hours
**Impact:** <what users experienced>
## Timeline
- HH:MM, Alert fired
- HH:MM, Incident commander assigned
- HH:MM, Root cause identified
- HH:MM, Mitigation applied
- HH:MM, Service restored
## Root Cause (5 Whys)
1. Why did X happen? Because Y.
2. Why did Y happen? Because Z.
...
## Action Items
- [ ] <action>, owner, due date
- [ ] <action>, owner, due date
## Lessons Learned
<what we'll do differently>
5. Follow-Up
- Action items tracked in
docs/plan-tracker.md - Postmortem published to
incidents/ - Alert rules updated if detection was slow
- Runbooks updated if response was unclear