Software Update Checklist
Planning and Preparation
Read the upstream changelog and CHANGELOG.md entries for every version between current and target. Flag deprecations, config changes, and required migrations. For semver major bumps, expect breaking changes; for minor/patch, watch for security fixes you need to call out to support.
Verify the previous container image is still in ECR/GCR and not subject to lifecycle pruning. If a DB migration is included, confirm it is reversible — or document the forward-fix plan. Untested rollbacks are the most common reason a release night turns into an outage.
Check that runtime versions (Node, Python, JVM, Postgres), Kubernetes API versions, and Terraform provider constraints meet the new release's minimums. A surprise minimum-Postgres bump in a point release has eaten more than one release window.
Trigger an out-of-band RDS snapshot (or equivalent) in addition to the nightly backup. Record the snapshot identifier so the on-call engineer can restore from a known point if the migration goes sideways.
Testing and Quality Assurance
Trigger CI on the release-candidate tag (e.g., v2024.45.0-rc.1). Investigate any flaky tests rather than re-running blindly — a habit of "just rerun it" hides real regressions.
Deploy the RC to staging and run Playwright/Cypress suites against it. Cover the critical user paths: signup, auth, checkout, primary CRUD flows. Staging should mirror prod config — environment drift breaks this gate.
Manual exploratory pass by QA on the changed surfaces. The release-notes review in the prep phase determines what gets exercised here. Any blocking defects gate the release.
Run Snyk / Dependabot / Trivy against the RC build. Triage any new high or critical CVEs introduced by the update. SBOM diff against the prior release goes into the change record for SOC 2 evidence.
Pre-Deploy Coordination
Check PagerDuty / Incident.io for open incidents. Don't ship into an active production fire — even an unrelated SEV2 will mask new symptoms introduced by the deploy.
Post in #engineering and #customer-support: deploy window, scope of changes, named release captain, named on-call. Update the status page if the change is user-visible or carries downtime risk.
Release captain drives the deploy; primary on-call holds the pager. Both must be available for the full deploy plus a 60-minute monitoring window. No solo deploys.
Production Deploy
Run the migration ahead of the application deploy so the new schema is live before code that depends on it. Use CREATE INDEX CONCURRENTLY for index work, batched backfills with sleeps, and watch replication lag throughout. Avoid ADD COLUMN ... DEFAULT on large tables — split into add-column, backfill, set-default.
Route 5% of traffic to the new version via the canary deployment in ArgoCD / the load balancer. Hold for 10 minutes minimum and watch error rate, p99 latency, and saturation on the canary pods specifically — not just aggregate dashboards.
Step through 25% → 50% → 100% with a few minutes between increments to let metrics stabilize. Backend goes fully out before frontend so the frontend can rely on new API contracts.
Push the new frontend artifact to CloudFront / Vercel / Netlify and invalidate the CDN cache. Confirm the new asset hashes are being served before declaring the deploy complete.
Triggered when the canary is degraded. Re-deploy the previous container image, revert the migration if reversible (or apply the documented forward-fix), and post the abort to #engineering. File a ticket capturing the failure mode for the post-incident review.
Verification and Monitoring
Execute the synthetic user journey against production: signup, login, primary action, logout. A green canary plus a green smoke test is the gate for declaring the deploy successful.
Latency (p50, p95, p99), traffic, errors, saturation. Compare to the same time-of-day in the previous week, not just to the prior hour. A 10% error-rate bump can hide in absolute numbers if traffic is also up.
Filter Sentry / Bugsnag to the new release tag. New error fingerprints, even at low volume, are the early signal — investigate before they become a spike.
Ping support lead. A spike in inbound after a deploy — even without an error-rate change — is a sign something user-visible regressed. Cosmetic and copy bugs rarely page; users report them.
Wrap-Up
Promote the RC tag to the final release tag (e.g., v2024.45.0). Update the public changelog and post the release summary to #engineering with the deployed sha.
Reflect any new operational behavior in the service runbook: new env vars, new alerts, new dashboards, deprecated endpoints. Stale runbooks are a SOC 2 finding and an on-call tax.
Attach the approved PR list, QA sign-off, deploy log, and rollback evidence (if any) to the change ticket in Jira / Linear. Vanta / Drata pulls from this for SOC 2 change-management evidence.
Triggered when the deploy was rolled back or caused a customer-visible incident. Blameless PIR within five business days, action items tracked to closure in Jira. First-shift impressions of contributing factors are usually wrong; the second-order causes only surface in writing.
Use this template in Manifestly
- Quality Assurance Checklist
- Prototype Review Checklist
- Requirement Gathering Checklist
- Sprint Planning Checklist
- Project Closure Checklist
- Employee Data Security Checklist
- Security Review Checklist
- Change Management Checklist
- Software Project Management Checklist
- Software Project Initiation Checklist
- Release Checklist
- New Engineer Onboarding Checklist
- Technical Debt Management Checklist
- User Acceptance Testing (UAT) Checklist
- Integration Testing Checklist
- Deployment Plan Checklist
- Performance Testing Checklist
- Release Planning Checklist
- Software Engineer Hiring Checklist
- Project Review and Retrospective Checklist
- Rollback Plan Checklist
- Automated Testing Checklist
- Incident Response Checklist
- System Testing Checklist
- Software Development Plan Checklist
- Refactoring Checklist
- API Development Checklist
- Database Design Checklist
- Performance Optimization Checklist
- Version Control Checklist
- Software Architecture Design Checklist
- Post-Deployment Testing Checklist
- Performance Monitoring Checklist
- Peer Review Onboarding Checklist
- Test Case Review Checklist
- Test Plan Checklist
- Testing Environment Setup Checklist
- Monitoring Setup Checklist
- Security Best Practices Checklist
- Acceptance Testing Checklist
- Feature Development Checklist
- Bug Tracking and Resolution Checklist
- Engineering Resource Allocation Checklist
- Personal Development Plan (PDP) Checklist
- Code Review Checklist
- Service Level Agreement (SLA) Checklist
- Technical Documentation Checklist
- QA Testing Checklist
- Design Documentation Checklist
- Employee Offboarding Checklist
- Engineering Team Building Activity Checklist
- CI/CD Pipeline Review Checklist
- End-User Documentation Checklist
- Deployment Checklist
- Software Licensing Compliance Checklist
- Software Project Risk Management Checklist
- Development Environment Setup Checklist
- Disaster Recovery Plan Checklist
- API Documentation Checklist
- Software Engineer Onboarding Checklist
- Release Notes Checklist
- Code Review Checklist
- Engineer Offboarding Checklist
- Unit Testing Checklist
- Backlog Prioritization Checklist
- User Acceptance Testing Checklist
- New Developer Onboarding Checklist
- Backup and Recovery Checklist
- Release Checklist
- User Acceptance Testing (UAT) Checklist
- Deployment Plan Checklist
- Release Planning Checklist
- Post-Deployment Testing Checklist
- Feature Development Checklist
- Regression Testing Checklist
- Deployment Checklist
- Release Notes Checklist
- Release Management Checklist
- User Acceptance Testing Checklist
- Release Checklist
- Deployment Plan Checklist
- Release Planning Checklist
- Rollback Plan Checklist
- Version Control Checklist
- Testing Environment Setup Checklist
- CI/CD Pipeline Review Checklist
- Infrastructure as Code (IaC) Checklist
- Deployment Checklist
- Release Notes Checklist
- CI/CD Pipeline Checklist
- Quarterly DevOps Security Review
Ready to take control of your recurring tasks?
Start Free 14-Day TrialUse Slack? Sign up with one click
