Cloud Outage Response
Detection & Verification
Better Stack alert triggered. Capture incident ID, start time, alert source, and initial scope.
Confirm AWS disruption and note affected regions/services. Record the official status URL.
Confirm alerting tools (Better Stack, Datadog, CloudWatch) aren’t producing false positives.
Internal Communication
Confirm Incident Commander, Comms Lead, Tech Lead, and Customer Support Lead.
Create #inc-[incident-id]. Set update cadence (e.g., every 15 min). Post kickoff with next-update time.
Notify CTO and CS lead; include link to incident channel and current impact.
Capture updates, hypotheses, actions, and timestamps.
Customer Communication
Summarize impact, affected features, workarounds, and next update time.
Post to Better Stack status page with the message and ETA. Link in all comms.
Show banner and message to impacted segments; link to status page.
Mitigation & Monitoring
Pause non-critical tasks, queue background jobs, toggle feature flags. Record actions.
Track AWS recovery status and internal metrics. Update timeline and ETA each cadence.
Verify databases, queues, and integrations are healthy before marking resolved.
Post-Incident Review
Apply learnings to this checklist, status templates, and support macros.
Use this template in Manifestly
Related Information Technology Checklists
- Onboarding a New Software Developer
- Monthly Server Maintenance Checklist
- Monthly Server Maintenance Checklist
- Security Checklist
- Disaster Recovery Checklist
- Network Maintenance Checklist
- Desktop Configuration Checklist
- Server Maintenance Checklist
- Incident Response Checklist
- Software Installation Checklist
- Software Update Checklist
- Server Configuration Checklist
- Patch Management Checklist
- Data Backup Checklist
- Performance Monitoring Checklist
- User Access Control Checklist
- Vulnerability Intake Checklist
Ready to take control of your recurring tasks?
Start Free 14-Day TrialUse Slack? Sign up with one click