Cloud Outage Response
Detection & Verification
Better Stack alert triggered. Capture incident ID, start time, alert source, and initial scope.
Confirm AWS disruption and note affected regions/services. Record the official status URL.
Confirm alerting tools (Better Stack, Datadog, CloudWatch) aren’t producing false positives.
Internal Communication
Confirm Incident Commander, Comms Lead, Tech Lead, and Customer Support Lead.
Create #incident-{{incident ID}}
Post to the #incidents channel to begin conversation. Share this specific incident channel to team members.
Set update cadence (e.g., every 15 min). Post kickoff with next-update time.
Notify CTO and CS lead; include link to incident channel and current impact.
Capture updates, hypotheses, actions, and timestamps.
Customer Communication
Summarize impact, affected features, workarounds, and next update time.
Post to Better Stack status page with the message and ETA. Link in all comms.
Show banner and message to impacted segments; link to status page.
Mitigation & Monitoring
Pause non-critical tasks, queue background jobs, toggle feature flags. Record actions.
Track AWS recovery status and internal metrics. Update timeline and ETA each cadence.
Verify databases, queues, and integrations are healthy before marking resolved.
Post-Incident Review
Apply learnings to this checklist, status templates, and support macros.
Use this template in Manifestly
Cloud outage response guide
Use this checklist to coordinate detection, communication, mitigation, and recovery during AWS, Azure, or GCP disruptions. Assign roles early, keep updates frequent, and record every action.
When to use
- Provider status page reports a service disruption
- Monitoring shows widespread failures across regions or services
- Customer-visible impact such as authentication, payments, or core features
Before you begin
- Alerts wired: Better Stack, Datadog, CloudWatch
- Slack incident channel automation ready (
#inc-[id]) - Status page template approved (PM + CTO)
Related Information Technology Checklists
- Onboarding a New Software Developer
- Monthly Server Maintenance Checklist
- Monthly Server Maintenance Checklist
- Security Checklist
- Disaster Recovery Checklist
- Network Maintenance Checklist
- Desktop Configuration Checklist
- Server Maintenance Checklist
- Incident Response Checklist
- Software Installation Checklist
- Software Update Checklist
- Server Configuration Checklist
- Patch Management Checklist
- Data Backup Checklist
- Performance Monitoring Checklist
- User Access Control Checklist
- Vulnerability Intake Checklist
Ready to take control of your recurring tasks?
Start Free 14-Day TrialUse Slack? Sign up with one click
