Cloud Outage Response
Detection & Verification
Better Stack alert triggered. Capture incident ID, start time, alert source, and initial scope.
Confirm AWS disruption and note affected regions/services. Record the official status URL.
Confirm alerting tools (Better Stack, Datadog, CloudWatch) aren’t producing false positives.
Internal Communication
Confirm Incident Commander, Comms Lead, Tech Lead, and Customer Support Lead.
Create #incident-{{incident ID}}
Post to the #incidents channel to begin conversation. Share this specific incident channel to team members.
Set update cadence (e.g., every 15 min). Post kickoff with next-update time.
Notify CTO and CS lead; include link to incident channel and current impact.
Capture updates, hypotheses, actions, and timestamps.
Customer Communication
Summarize impact, affected features, workarounds, and next update time.
Post to Better Stack status page with the message and ETA. Link in all comms.
Show banner and message to impacted segments; link to status page.
Mitigation & Monitoring
Pause non-critical tasks, queue background jobs, toggle feature flags. Record actions.
Track AWS recovery status and internal metrics. Update timeline and ETA each cadence.
Verify databases, queues, and integrations are healthy before marking resolved.
Post-Incident Review
Apply learnings to this checklist, status templates, and support macros.
Use this template in Manifestly
Cloud outage response guide
Use this checklist to coordinate detection, communication, mitigation, and recovery during AWS, Azure, or GCP disruptions. Assign roles early, keep updates frequent, and record every action.
When to use
- Provider status page reports a service disruption
- Monitoring shows widespread failures across regions or services
- Customer-visible impact such as authentication, payments, or core features
Before you begin
- Alerts wired: Better Stack, Datadog, CloudWatch
- Slack incident channel automation ready (
#inc-[id]) - Status page template approved (PM + CTO)
Related Information Technology Checklists
- Vulnerability Intake Checklist
- Network Maintenance Checklist
- Disaster Recovery Checklist
- Server Maintenance Checklist
- Data Backup Verification Checklist
- Software Installation Checklist
- Onboarding a New Software Developer
- Patch Management Checklist
- Server Configuration Checklist
- Software Update Checklist
- Performance Monitoring Checklist
- Incident Response Checklist
- Quarterly Security Review Checklist
- User Access Control Checklist
- Monthly Server Maintenance Checklist
- Monthly Server Maintenance Checklist
- Desktop Configuration Checklist
Ready to take control of your recurring tasks?
Start Free 14-Day TrialUse Slack? Sign up with one click
