Software Upgrade Checklist
Steps a sysadmin or MSP engineer follows to plan, execute, and validate a production software upgrade — from change-advisory approval through post-upgrade monitoring and rollback if needed.
Pre-Upgrade Preparation
-
Review the vendor release notes
Read the vendor's release notes, KB articles, and known-issues page for the target version. Flag breaking changes, deprecated features, and any prerequisite intermediate versions. Vendors frequently bury config-file format changes and minimum OS / .NET / JDK requirements in the fine print.
-
Verify compatibility with dependent systems
Check OS, database, .NET / JDK / Python runtime, and integration partner versions against the new release's compatibility matrix. Confirm any agents (EDR, RMM, backup) are supported on the new version — CrowdStrike and Veeam compatibility lags major OS releases by weeks.
-
File the change request with the CAB
Submit the RFC in ServiceNow / ConnectWise / Jira Service Management with scope, risk classification, maintenance window, rollback plan, and named approvers. Standard changes can use a pre-approved template; normal changes need CAB review before scheduling.
Collects list Collects text -
Schedule the maintenance window
Confirm the window falls outside any change-freeze (quarter-close, retail peak, payroll runs) and aligns with the application owner's blackout calendar. Block calendars for on-call, DBA, and network engineer if their involvement is required.
-
Notify stakeholders and end users
Send notice to the affected distribution list with start time, expected duration, services impacted, and the status-page URL. For MSP clients, include the vCIO or account manager. Send a 24-hour reminder the day before.
-
Run a verified backup of affected systems
Trigger a Veeam / Datto / native snapshot of the application server, database, and config volumes. Confirm the backup completes successfully and capture the restore point ID — do not rely on last night's scheduled job. For VMs, also take a hypervisor-level snapshot at the start of the window for fast rollback.
Collects text -
Document the rollback plan
Write the named rollback steps: restore from snapshot, revert database schema, redirect DNS / load balancer back to prior nodes. Define the go/no-go decision time and the named decision-maker. A rollback plan that says "restore from backup" without specifying which backup, who runs it, and how long it takes is not a plan.
Test Ring Validation
-
Apply the upgrade in the test environment
Deploy to the dev or QA ring first using the same installer and runbook the production change will use. Capture installer exit codes and timing — production estimates come from this run, not the vendor's quoted duration.
-
Run the application owner's smoke tests
Have the app owner or QA execute the documented smoke-test script: login, top three workflows, integrations to upstream / downstream systems, scheduled-job execution. Block production deployment if any test fails.
Collects list -
Investigate test failures before proceeding
If smoke tests failed, open a vendor support case with installer logs and stack traces before re-scheduling. Do not push to production hoping the issue is environment-specific — that's how Friday afternoon outages happen.
Production Upgrade Execution
-
Suspend monitoring alerts and on-call paging
Place the affected hosts in maintenance mode in PRTG / Datadog / LogicMonitor and silence the PagerDuty service so the NOC isn't paged on every health-check failure. Set a timer — re-enable at the end of the window even if you forget.
-
Disable scheduled tasks and cron jobs
Stop Windows Task Scheduler jobs, cron entries, and any RMM-scheduled scripts that touch the application or its database. A backup job firing mid-upgrade locks files and corrupts the install.
-
Stop the application and dependent services
-
Take a pre-upgrade VM snapshot
Trigger an ESXi / Hyper-V / Proxmox snapshot with services stopped — this is the fastest rollback option. Tag the snapshot with the change ticket number; schedule deletion after 7 days of stable running.
-
Apply the upgrade package
Run the installer with logging enabled (msiexec /l*v, install.log, etc.). Watch for prompts that the runbook didn't anticipate — silently accepting an EULA change or a database schema migration without reading it is how upgrades break.
-
Review the installer logs for errorsCollects list Collects file
-
Restart application services and verify startup
Start services in dependency order — database first, then application tier, then web / load-balanced front-end. Tail the application log during startup; a service that reports "running" while throwing connection errors in the log is not actually up.
Rollback (If Upgrade Failed)
-
Execute the documented rollback plan
Revert the VM snapshot or restore from the verified backup captured pre-upgrade. Bring services back in dependency order and run smoke tests against the restored version before declaring rollback complete.
-
Notify stakeholders of the rollback
Send the same distribution list a status update: rollback complete, current version, root cause under investigation, next attempt TBD. Update the change ticket with rollback timestamp and reason code.
-
Open a vendor support case
File with the vendor including installer logs, environment details, and exact failure mode. Get a case number before re-scheduling — the next attempt should not repeat the same path without vendor input.
Post-Upgrade Validation
-
Run smoke tests against production
Execute the same smoke-test script used in the test ring: login, top workflows, upstream and downstream integrations. Confirm the application reports the new version number in the about screen or /version endpoint.
-
Re-enable scheduled tasks and monitoring
Take hosts out of maintenance mode in the monitoring tool, un-silence PagerDuty, and re-enable cron / Task Scheduler jobs. Confirm the next scheduled job actually fires — disabled flags sometimes don't toggle back cleanly.
-
Watch dashboards for performance anomalies
Stay on the Datadog / PRTG / LogicMonitor dashboard for at least 60 minutes post-cutover. Watch CPU, memory, disk I/O, query latency, and error rates against the prior week's baseline. New versions often regress on a single metric (memory leak, slow query) that only shows up under real load.
-
Update IT Glue or Hudu documentation
Update the configuration record with the new version, upgrade date, change ticket reference, and any config changes (new ports, new service accounts, schema changes). MSPs especially: stale documentation is what makes the next on-call engineer fail.
-
Close the change ticket and notify stakeholdersCollects list Collects paragraph
-
Hold a 7-day post-upgrade health review
Review tickets, monitoring trends, and user feedback from the week post-upgrade. Delete the rollback snapshot if the system is stable. Capture lessons learned in the runbook so the next upgrade is shorter.
Use this template
Copy it to your account, customize the steps, and run it with your team in minutes.
Browse hundreds of free templates across every team and industry.
Back to template libraryRun Software Upgrade Checklist with your team
Customize the steps, assign roles, set a schedule, and keep a complete record for every run.