Release Planning Checklist
Two-week run-up to a production release: backlog refinement through deployment readiness. The release captain owns the checklist; tech leads, QA, and product collaborate on individual steps.
Backlog Review & Refinement
-
Confirm tickets have acceptance criteria
Walk the Jira/Linear board for tickets tagged to this release. Each story needs a measurable AC and a definition of done — vague tickets ("improve dashboard performance") are a common source of scope creep mid-sprint. Kick anything ambiguous back to product before locking scope.
-
Flag cross-team dependencies and blockers
Identify tickets that depend on platform, data, or external API changes. A blocked ticket discovered on release day is the most common reason scope ships short. Link the dependency in the ticket and notify the upstream team's tech lead.
-
Tag candidate tickets with the release version
Apply the fix-version label (e.g., v2024.45.0) so the release notes script can pull the right changelog entries automatically.
Release Scope Definition
-
Lock release scope with product and tech lead
Hold a 30-minute scope-lock meeting with PM and tech lead. After this meeting, additions require a written exception from the release captain — otherwise scope creep eats the QA window.
-
Inventory feature flags for gradual rollout
List every flag this release introduces or flips. Include the flag key, the rollout plan (5% canary → 25% → 100%), and the named owner who will clean it up. Flags without owners become 18-month dead code paths.
Collects paragraph -
Capture whether the release ships a DB migration
If any merged PR includes a schema change, an index addition, or a data backfill, answer Yes. Migrations need their own pre-flight section — adding a column with a default on a 50M-row Postgres table is a full table rewrite under exclusive lock.
Collects list
Risk & Security Review
-
Review SCA findings for new dependencies
Pull the Snyk or Dependabot report for the release branch. Triage anything CVSS 7+ before merge; defer lower-severity transitive findings only if the package is not on a request path. Document the deferral in the ticket.
-
Threat-model new external endpoints
For any new public API or webhook, walk OWASP Top 10 with the AppSec partner: authn/authz coverage, input validation, rate-limit posture, PII in logs. A 30-minute pairing session is enough for incremental changes.
-
Document the rollback decision criteria
Spell out the thresholds that trigger an automatic rollback: error-rate spike above baseline, p99 latency regression, customer-support volume. Without pre-agreed thresholds, the call gets argued in Slack at 2am.
Database Migration Planning
-
Confirm the migration is reversible
Either the migration has a tested down-migration, or there is a written forward-fix plan if rollback after schema change is impossible. Dropping a column you just added is reversible; populating it from a third-party API and then deleting that column is not.
-
Plan the backfill in batches
Backfills that run as a single UPDATE block writes for the duration. Chunk by primary key (e.g., 5,000 rows per batch with a 200ms sleep) and watch replication lag during the rehearsal. Use CREATE INDEX CONCURRENTLY for new indexes on Postgres.
-
Rehearse migration on a production-sized clone
Restore the latest production snapshot to a staging RDS instance and run the migration end-to-end. Capture wall-clock time and lock duration; if the rehearsal takes longer than the agreed deploy window, split the migration into a pre-deploy step.
Capacity & Resource Planning
-
Confirm release captain and on-call coverage
Check PagerDuty for the deploy window and the four hours after. Release captain plus primary on-call must both be online; a vacation collision is a common reason hotfixes get botched.
-
Reserve the deploy window on the team calendar
Block Tuesday 10am–12pm (or your team's standard window). Avoid Fridays and the day before a holiday — if something breaks Saturday morning, the on-call carries it alone.
-
Verify staging matches production configuration
Diff the Terraform state for environment-specific variables: instance sizes, secrets, third-party API endpoints, feature-flag defaults. "It worked in staging" failures usually trace back to an undocumented config drift.
QA & Test Strategy
-
Run the full e2e suite against the release candidate
Cut the release branch, tag -rc.1, and run Playwright/Cypress in the staging pipeline. Investigate every red — "just rerun, it's flaky" is how real regressions slip through. Open tickets for any flakes you defer.
-
Identify regression-risk areas for manual smoke testing
Map the merged PRs to user-facing surfaces. Auth, billing, and primary checkout paths get manual smoke regardless of automated coverage; secondary surfaces only if a PR touched them.
-
Capture QA sign-off on the release candidate
QA lead records the verdict, attaches the test report, and notes any deferred bugs. "Pass with notes" is allowed for cosmetic issues with a tracked follow-up; functional regressions are Fail.
Collects list Collects paragraph -
Halt release and schedule a fix cycle
QA failed the candidate. Notify stakeholders the release is slipping, file fix tickets at SEV2 or above, and schedule a follow-up build for the next release window. Do not let pressure push a known-failing build to production.
Communication & Release Notes
-
Draft customer-facing release notes
Generate the draft from changelog entries, then strip internal jargon. Group by feature/fix/breaking change. Anything marked breaking needs a migration note for API consumers and a 90-day deprecation window if it changes existing endpoints.
-
Brief support on customer-visible changes
15-minute walkthrough with the support lead. Cover what changed, expected ticket categories, and the engineer to escalate to. Without this brief, support backlog spikes the day after release.
-
Post the release window in #engineering
Announcement covers: deploy date and time, scope summary, release captain, rollback contact, link to the runbook. Pin it. Lock main to release-blocking PRs only during the window.
Deployment & Rollback Planning
-
Verify the previous container image is in the registry
Pull-test the prior version's image tag from ECR/GCR. Image lifecycle policies sometimes prune the very tag you need to roll back to — discovering this during an incident is too late.
-
Walk through the rollback runbook end-to-end
Read the runbook out loud with the on-call engineer. Confirm every command works against the current cluster — kubectl contexts, deploy script flags, feature-flag kill switches. Update anything stale before deploy day.
Collects file -
Tag the release candidate sha
Apply the semver tag (e.g., v2024.45.0) to the candidate sha and push. CI builds the final artifact from the tag — never deploy from a moving branch ref.
Monitoring & Support Readiness
-
Confirm dashboards cover the new services
Open the Datadog/Grafana service dashboards. New endpoints need RED-method panels (rate, errors, duration). A new service shipping without a dashboard is a guaranteed blind spot during the post-deploy window.
-
Set SLO alert thresholds for new endpoints
Establish baseline p99 latency and error-rate burn-rate alerts. Route to the on-call rotation in PagerDuty, not a dead Slack channel — the cert-renewal failure pattern lives here too.
-
Brief on-call on the hotfix process
Walk the on-call through: who decides hotfix vs. rollback, the cherry-pick branch convention, and the abbreviated review path for SEV1 fixes. New on-call members get this brief every release; veterans skim.
Use this template
Copy it to your account, customize the steps, and run it with your team in minutes.
Browse hundreds of free templates across every team and industry.
Back to template libraryRun Release Planning Checklist with your team
Customize the steps, assign roles, set a schedule, and keep a complete record for every run.