Software Upgrade Checklist
Pre-Upgrade Preparation
Read the vendor's release notes, KB articles, and known-issues page for the target version. Flag breaking changes, deprecated features, and any prerequisite intermediate versions. Vendors frequently bury config-file format changes and minimum OS / .NET / JDK requirements in the fine print.
Check OS, database, .NET / JDK / Python runtime, and integration partner versions against the new release's compatibility matrix. Confirm any agents (EDR, RMM, backup) are supported on the new version — CrowdStrike and Veeam compatibility lags major OS releases by weeks.
Submit the RFC in ServiceNow / ConnectWise / Jira Service Management with scope, risk classification, maintenance window, rollback plan, and named approvers. Standard changes can use a pre-approved template; normal changes need CAB review before scheduling.
Confirm the window falls outside any change-freeze (quarter-close, retail peak, payroll runs) and aligns with the application owner's blackout calendar. Block calendars for on-call, DBA, and network engineer if their involvement is required.
Send notice to the affected distribution list with start time, expected duration, services impacted, and the status-page URL. For MSP clients, include the vCIO or account manager. Send a 24-hour reminder the day before.
Trigger a Veeam / Datto / native snapshot of the application server, database, and config volumes. Confirm the backup completes successfully and capture the restore point ID — do not rely on last night's scheduled job. For VMs, also take a hypervisor-level snapshot at the start of the window for fast rollback.
Write the named rollback steps: restore from snapshot, revert database schema, redirect DNS / load balancer back to prior nodes. Define the go/no-go decision time and the named decision-maker. A rollback plan that says "restore from backup" without specifying which backup, who runs it, and how long it takes is not a plan.
Test Ring Validation
Deploy to the dev or QA ring first using the same installer and runbook the production change will use. Capture installer exit codes and timing — production estimates come from this run, not the vendor's quoted duration.
Have the app owner or QA execute the documented smoke-test script: login, top three workflows, integrations to upstream / downstream systems, scheduled-job execution. Block production deployment if any test fails.
If smoke tests failed, open a vendor support case with installer logs and stack traces before re-scheduling. Do not push to production hoping the issue is environment-specific — that's how Friday afternoon outages happen.
Production Upgrade Execution
Place the affected hosts in maintenance mode in PRTG / Datadog / LogicMonitor and silence the PagerDuty service so the NOC isn't paged on every health-check failure. Set a timer — re-enable at the end of the window even if you forget.
Stop Windows Task Scheduler jobs, cron entries, and any RMM-scheduled scripts that touch the application or its database. A backup job firing mid-upgrade locks files and corrupts the install.
Trigger an ESXi / Hyper-V / Proxmox snapshot with services stopped — this is the fastest rollback option. Tag the snapshot with the change ticket number; schedule deletion after 7 days of stable running.
Run the installer with logging enabled (msiexec /l*v, install.log, etc.). Watch for prompts that the runbook didn't anticipate — silently accepting an EULA change or a database schema migration without reading it is how upgrades break.
Start services in dependency order — database first, then application tier, then web / load-balanced front-end. Tail the application log during startup; a service that reports "running" while throwing connection errors in the log is not actually up.
Rollback (If Upgrade Failed)
Revert the VM snapshot or restore from the verified backup captured pre-upgrade. Bring services back in dependency order and run smoke tests against the restored version before declaring rollback complete.
Send the same distribution list a status update: rollback complete, current version, root cause under investigation, next attempt TBD. Update the change ticket with rollback timestamp and reason code.
File with the vendor including installer logs, environment details, and exact failure mode. Get a case number before re-scheduling — the next attempt should not repeat the same path without vendor input.
Post-Upgrade Validation
Execute the same smoke-test script used in the test ring: login, top workflows, upstream and downstream integrations. Confirm the application reports the new version number in the about screen or /version endpoint.
Take hosts out of maintenance mode in the monitoring tool, un-silence PagerDuty, and re-enable cron / Task Scheduler jobs. Confirm the next scheduled job actually fires — disabled flags sometimes don't toggle back cleanly.
Stay on the Datadog / PRTG / LogicMonitor dashboard for at least 60 minutes post-cutover. Watch CPU, memory, disk I/O, query latency, and error rates against the prior week's baseline. New versions often regress on a single metric (memory leak, slow query) that only shows up under real load.
Update the configuration record with the new version, upgrade date, change ticket reference, and any config changes (new ports, new service accounts, schema changes). MSPs especially: stale documentation is what makes the next on-call engineer fail.
Review tickets, monitoring trends, and user feedback from the week post-upgrade. Delete the rollback snapshot if the system is stable. Capture lessons learned in the runbook so the next upgrade is shorter.
Use this template in Manifestly
- Cloud Migration Checklist
- Cloud Security Checklist
- User Access Review Checklist
- Data Recovery Checklist
- Containerization Rollout Checklist
- Database Backup Checklist
- Password Management Checklist
- Backup and Restore Checklist
- Network Upgrade Checklist
- Server Backup Checklist
- Business Continuity Plan Checklist
- Problem Management Checklist
- Server Decommissioning Checklist
- Cloud Monitoring Checklist
- Hardware Inventory Checklist
- IT Regulatory Compliance Review
- Release Management Checklist
- Server Maintenance Checklist
- Rollback Plan Checklist
- Customer Support Ticket Workflow
- Quarterly Compliance Reporting Checklist
- Patch Management Checklist
- Hardware Maintenance Checklist
- Server Security Checklist
- IT Emergency Response Checklist
- Incident Management Checklist
- Disaster Recovery Plan Checklist
- User Role Management Checklist
- Software Installation Checklist
- Compliance Audit Checklist
- Access Control Checklist
- Cloud Cost Management Checklist
- IT Staff Performance Review
- Change Management Checklist
- Firewall Configuration Checklist
- Security Audit Checklist
- Quarterly Network Security Review
- Database Migration Checklist
- Employee Onboarding Checklist
- Capacity Planning Checklist
- IT Budgeting Checklist
- Network Monitoring Checklist
- Cloud Deployment Checklist
- Database Installation Checklist
- IT Service Request Checklist
- Database Security Checklist
- System Monitoring Checklist
- Hardware Troubleshooting Checklist
- IT Strategy Checklist
- Patch Deployment Checklist
- Hardware Upgrade Checklist
- Performance Tuning Checklist
- Application Performance Monitoring Checklist
- Employee Training Checklist
- User Onboarding Checklist
- IT Vendor Management Checklist
- Server Build and Hardening Checklist
- IT Policy Review Checklist
- Help Desk Ticket Handling Checklist
- Infrastructure as Code Checklist
- Hardware Disposal Checklist
- IT Resource Allocation Checklist
- Incident Response Checklist
- Network Troubleshooting Checklist
- User Offboarding Checklist
- Change Management Checklist
- Software Installation Checklist
- Software Update Checklist
- Network Upgrade Checklist
- Server Decommissioning Checklist
- Release Management Checklist
- Rollback Plan Checklist
- Software Installation Checklist
- Change Management Checklist
- Cloud Deployment Checklist
- Database Installation Checklist
- Patch Deployment Checklist
- Hardware Upgrade Checklist
- Performance Tuning Checklist
Ready to take control of your recurring tasks?
Start Free 14-Day TrialUse Slack? Sign up with one click
