Deployment Plan Checklist

Pre-Deployment Preparation

    Branch from main and tag the build using semver (e.g., v2024.45.0-rc.1). Confirm every PR merged since the last release has a changelog entry — release-notes drift is the single most common day-of surprise.

    Deploy the release candidate to staging and run the full Playwright/Cypress e2e suite. Investigate any failures — don't merge a flaky-rerun habit into the release path. QA smoke-tests critical user paths (login, billing, primary workflow) on the staged build.

    Verify the previous container image is still in the registry (not pruned), the previous Helm release is rollback-able, and any DB migration in this release is reversible — or has a documented forward-fix. A rollback plan that's never been tested isn't a rollback plan.

    Send the release window and a plain-language summary of customer-visible changes to #support. Flag any deprecations, UI changes, or known issues so first-line support isn't surprised by a ticket spike.

Release Day Pre-Deploy

    Check PagerDuty / Incident.io for active incidents. Deploying on top of a live SEV1 conflates the rollback signal with the existing incident and turns a ten-minute revert into a two-hour war room.

    Announce the release window, scope, release captain, and rollback contact. Lock main to release-blocking PRs only for the duration of the window so a stray merge doesn't ride along untested.

    Confirm both the release captain and the primary on-call engineer are at keyboards and not in conflicting meetings. Avoid Friday-afternoon and end-of-shift deploys — the people paged at 3am should be the people who shipped the change.

Deploy

    Apply schema changes before the application deploy so the new code lands on a compatible schema. For Postgres, use CREATE INDEX CONCURRENTLY and avoid ADD COLUMN with a default on large tables — that triggers a full table rewrite under exclusive lock. Watch replication lag throughout.

    Route 5% of production traffic to the new backend image. Watch the error-rate and p99 latency dashboards for at least 10 minutes before promoting. The canary catches schema-mismatch and config-drift bugs that slipped past staging.

    Roll out 25% → 50% → 100%, watching golden signals (latency, traffic, errors, saturation) at each step. Hold at each percentage for at least 5 minutes; a regression that affects 1% of requests doesn't show at 5% traffic in 60 seconds.

    Frontend ships after the backend is fully out, since the backend is forward-compatible with the old frontend but not vice-versa. Invalidate the CDN cache for the index document and confirm the new bundle hash is being served.

    Execute the synthetic user journey against production: login, primary workflow, billing-read, logout. A green smoke test is the gate for declaring deploy success; a red smoke test triggers the rollback path.

Rollback

    Page the release captain and post a rollback declaration so the team stops merging and starts watching. If customers are affected, open an incident in PagerDuty/Incident.io and assign an IC — don't try to roll back and run comms simultaneously.

    Redeploy the prior known-good image tag (recorded pre-deploy). For Helm, helm rollback <release> <previous-revision>. If the migration is non-reversible, deploy a forward-fix image instead — never run a destructive down-migration against live data.

    Watch the error-rate and latency dashboards for 10 minutes after rollback. A rollback that doesn't restore baseline metrics means the regression is upstream (CDN, third-party, infra) and the deploy wasn't the cause.

Post-Deploy Monitoring

    Hold the release-captain seat for 30 minutes post-deploy, eyes on Datadog/Grafana. Watch for slow regressions that didn't trigger the canary — N+1 queries on a code path that only fires for paying customers, for example.

    Filter Sentry to errors first seen in the last hour. New signatures from the deploy show up here before they show up in dashboards or support tickets. Triage critical and high severity to a hotfix ticket immediately.

    Enable any flags scheduled to flip with this release in LaunchDarkly / Unleash / Statsig. Flip gradually (internal users → 1% → 10% → 100%) for customer-visible changes; a flag flip is a deploy, not a config change.

    Coordinate with the support lead — a ticket spike on a specific workflow within 60 minutes of deploy is a regression signal even when dashboards are green. User-reported issues often precede metric anomalies for low-frequency code paths.

Wrap-Up

    Promote the rc tag to the final release tag (e.g., v2024.45.0) on the deployed sha. This is the artifact bisect tooling and rollback playbooks rely on — skipping it makes the next post-mortem harder.

    Update the public changelog and the in-app release notes. Update the status page if a user-visible feature shipped. Send the release summary to #engineering and #support so everyone sees what's now in production.

    If the deploy went sideways — rollback, hotfix, smoke-test failure, customer impact — schedule a blameless PIR within 5 business days. Track action items (alert tuning, missing dashboard, automation gap) to closure in Jira/Linear.

Use this template in Manifestly

Start a Free 14 Day Trial
Use Slack? Start your trial with one click

Related Software Development Checklists

Ready to take control of your recurring tasks?

Start Free 14-Day Trial


Use Slack? Sign up with one click

With Slack