Deployment Checklist
Release checklist for a SaaS engineering team running a scheduled production deploy, from pre-release QA and rollback prep through canary rollout, post-deploy monitoring, and wrap-up. Run by the release captain with the on-call engineer as backup.
Pre-Release Preparation
-
Cut the release branch and tag the RC
Branch from main and push a release-candidate tag (e.g., v2024.45.0-rc.1) following semver. The release captain owns this step. Make sure no late-merging PRs slip in after the cut without re-tagging.
Collects text Collects text -
Verify changelog entries for merged PRs
Cross-check PRs merged since the last release tag against the changelog. Flag breaking changes, deprecations, and customer-visible behavior so support and docs aren't surprised on release day.
-
Run the full e2e suite against the RC build
Deploy the RC to staging and run Playwright/Cypress e2e + integration tests. Do not ignore flakes — re-run once, then triage any persistent red. Required status checks must be green before sign-off.
-
QA smoke-test the critical user paths
QA exercises sign-up, login, billing, and the top-three customer workflows in staging. Capture any regressions in Linear/Jira tickets linked to the release ticket.
-
Confirm the rollback plan is viable
Verify the previous container image is still in the registry (not pruned), the DB migration is reversible or has no destructive change, and the rollback runbook references the correct sha. Quarterly rollback drill is what proves this works — release-day verification just confirms artifacts exist.
Collects list Collects text -
Notify support of customer-visible changes
Send a release summary to #customer-support with the deploy window, scope, and any new error messages or UI changes. Surprise UI changes drive ticket spikes that could have been pre-empted.
Database Migration Review
-
Review the migration for table locks
On large tables, ADD COLUMN with a default, ALTER COLUMN, or non-concurrent index creation can take exclusive locks for hours. Use CREATE INDEX CONCURRENTLY, split adds from defaults, and batch backfills with sleeps to keep replication lag bounded.
-
Dry-run the migration against a prod-sized snapshot
Restore last night's prod backup into the staging DB and run the migration end-to-end. Record duration and peak replication lag so the on-call knows what to expect during the real run.
Collects number -
Confirm the backward-compatibility strategy
Backend code must read from both old and new schema during the rollout window — expand-migrate-contract, not a flip. Confirm the contract step is scheduled for a follow-up release, not this one.
Release Day Pre-Deploy
-
Confirm no active SEV1 or SEV2 incidents
Check PagerDuty and the incidents channel. Don't deploy on top of an active incident — you'll obscure the contributing factor and complicate rollback.
-
Confirm release captain and on-call availability
Both the release captain and the primary on-call should be at a keyboard for the duration of the deploy plus the 30-minute monitoring window. No deploys with the on-call boarding a flight.
-
Post the deploy window to #engineering
Announcement includes the window, scope summary, RC tag, and rollback contact. Lock main to release-blocking PRs only until post-deploy monitoring completes.
Deploy
-
Run the database migration
Apply the migration before the backend deploy so the new code lands on a compatible schema. Watch replication lag in Datadog/Grafana — pause if lag climbs past the SLO.
-
Deploy the backend canary at 5% traffic
Route 5% of traffic via the load balancer or service mesh weight. Watch error rate, p99 latency, and saturation for 10 minutes against the golden-signals dashboard.
Collects list Collects number Collects number -
Roll out the backend to 25%, 50%, then 100%
Step the weight up gradually with a watch interval at each stage. If the error rate or latency dashboards show regression at any step, hold and investigate before proceeding.
-
Deploy the frontend bundle
Frontend ships after backend is at 100% — backend is forward-compatible, frontend assumes the new API. Invalidate CDN caches per the release runbook.
-
Run the post-deploy synthetic smoke test
Hit the production synthetic that exercises sign-up, login, and a representative API call. A green CI suite plus a failing prod synthetic = config or env drift, not code.
Collects list
Rollback
-
Declare a rollback and page the on-call
Open an incident in PagerDuty/Incident.io, name an IC, and post to #incidents. Don't try to debug forward — get back to the last known-good state first, then investigate.
-
Redeploy the previous container image
Use the previous image tag captured pre-deploy. If a migration ran, follow the documented down-migration or use the expand-migrate-contract fallback to keep the old code compatible with the new schema.
-
Confirm error rate returns to baseline
Watch the golden-signals dashboard for 15 minutes after rollback. Update the status page once metrics are clean.
Post-Deploy Monitoring
-
Watch error rate and p99 for 30 minutes
Stay on the golden-signals dashboard — latency, traffic, errors, saturation. Sentry / Bugsnag spikes by release tag are the fastest signal of a regression that synthetic tests didn't catch.
-
Flip planned feature flags
Only flip flags listed in the release plan, one at a time, with a few minutes between each so you can attribute any regression to the specific flip. Note flag owner so cleanup happens in the next quarterly flag review.
-
Watch the support inbound for spikes
Coordinate with support lead on a Zendesk/Intercom view filtered to the deploy window. A 3x ticket spike in 30 minutes is a release signal even if dashboards look green.
Wrap-Up
-
Tag the deployed sha as the release version
Promote the RC tag to the GA version (e.g., v2024.45.0). Push the tag and confirm the release shows up in GitHub Releases / GitLab Releases.
-
Publish release notes and update the changelog
Customer-visible release notes go to the public changelog or status page. Internal-only details (infra changes, refactors) stay in the engineering changelog.
-
File tickets for issues found during deploy
Hotfix candidates, runbook gaps, dashboard gaps, alert tuning — log them in Linear/Jira while the context is fresh. These become the action items if a retro is scheduled.
-
Capture release outcome and retro decision
Record whether the release went clean, had issues but recovered, or required rollback. If anything went sideways, schedule a blameless PIR within 5 business days while contributing factors are still recoverable.
Collects list Collects paragraph
Use this template
Copy it to your account, customize the steps, and run it with your team in minutes.
Browse hundreds of free templates across every team and industry.
Back to template libraryRun Deployment Checklist with your team
Customize the steps, assign roles, set a schedule, and keep a complete record for every run.