Cloud Outage Response

Detection & Verification

    Better Stack alert triggered. Capture incident ID, start time, alert source, and initial scope.
    Confirm AWS disruption and note affected regions/services. Record the official status URL.
    Confirm alerting tools (Better Stack, Datadog, CloudWatch) aren’t producing false positives.

Internal Communication

    Confirm Incident Commander, Comms Lead, Tech Lead, and Customer Support Lead.

    Create #incident-{{incident ID}}

    Post to the #incidents channel to begin conversation. Share this specific incident channel to team members. 

    Set update cadence (e.g., every 15 min). Post kickoff with next-update time.

    Notify CTO and CS lead; include link to incident channel and current impact.
    Capture updates, hypotheses, actions, and timestamps.

Customer Communication

    Summarize impact, affected features, workarounds, and next update time.
    Post to Better Stack status page with the message and ETA. Link in all comms.
    Show banner and message to impacted segments; link to status page.

Mitigation & Monitoring

    Pause non-critical tasks, queue background jobs, toggle feature flags. Record actions.
    Track AWS recovery status and internal metrics. Update timeline and ETA each cadence.
    Verify databases, queues, and integrations are healthy before marking resolved.

Post-Incident Review

    Apply learnings to this checklist, status templates, and support macros.

Use this template in Manifestly

Start a Free 14 Day Trial
Use Slack? Start your trial with one click

Cloud outage response guide

Use this checklist to coordinate detection, communication, mitigation, and recovery during AWS, Azure, or GCP disruptions. Assign roles early, keep updates frequent, and record every action.

When to use

  • Provider status page reports a service disruption
  • Monitoring shows widespread failures across regions or services
  • Customer-visible impact such as authentication, payments, or core features

Before you begin

  • Alerts wired: Better Stack, Datadog, CloudWatch
  • Slack incident channel automation ready (#inc-[id])
  • Status page template approved (PM + CTO)

Ready to take control of your recurring tasks?

Start Free 14-Day Trial


Use Slack? Sign up with one click

With Slack