Software Upgrade Checklist

Steps a sysadmin or MSP engineer follows to plan, execute, and validate a production software upgrade — from change-advisory approval through post-upgrade monitoring and rollback if needed.

5 sections 26 steps Collects data
1

Pre-Upgrade Preparation

  1. Review the vendor release notes
    • Read the vendor's release notes, KB articles, and known-issues page for the target version. Flag breaking changes, deprecated features, and any prerequisite intermediate versions. Vendors frequently bury config-file format changes and minimum OS / .NET / JDK requirements in the fine print.

  2. Verify compatibility with dependent systems
    • Check OS, database, .NET / JDK / Python runtime, and integration partner versions against the new release's compatibility matrix. Confirm any agents (EDR, RMM, backup) are supported on the new version — CrowdStrike and Veeam compatibility lags major OS releases by weeks.

  3. File the change request with the CAB
    • Submit the RFC in ServiceNow / ConnectWise / Jira Service Management with scope, risk classification, maintenance window, rollback plan, and named approvers. Standard changes can use a pre-approved template; normal changes need CAB review before scheduling.

    Collects list Collects text
  4. Schedule the maintenance window
    • Confirm the window falls outside any change-freeze (quarter-close, retail peak, payroll runs) and aligns with the application owner's blackout calendar. Block calendars for on-call, DBA, and network engineer if their involvement is required.

  5. Notify stakeholders and end users
    • Send notice to the affected distribution list with start time, expected duration, services impacted, and the status-page URL. For MSP clients, include the vCIO or account manager. Send a 24-hour reminder the day before.

  6. Run a verified backup of affected systems
    • Trigger a Veeam / Datto / native snapshot of the application server, database, and config volumes. Confirm the backup completes successfully and capture the restore point ID — do not rely on last night's scheduled job. For VMs, also take a hypervisor-level snapshot at the start of the window for fast rollback.

    Collects text
  7. Document the rollback plan
    • Write the named rollback steps: restore from snapshot, revert database schema, redirect DNS / load balancer back to prior nodes. Define the go/no-go decision time and the named decision-maker. A rollback plan that says "restore from backup" without specifying which backup, who runs it, and how long it takes is not a plan.

2

Test Ring Validation

  1. Apply the upgrade in the test environment
    • Deploy to the dev or QA ring first using the same installer and runbook the production change will use. Capture installer exit codes and timing — production estimates come from this run, not the vendor's quoted duration.

  2. Run the application owner's smoke tests
    • Have the app owner or QA execute the documented smoke-test script: login, top three workflows, integrations to upstream / downstream systems, scheduled-job execution. Block production deployment if any test fails.

    Collects list
  3. Investigate test failures before proceeding
    • If smoke tests failed, open a vendor support case with installer logs and stack traces before re-scheduling. Do not push to production hoping the issue is environment-specific — that's how Friday afternoon outages happen.

3

Production Upgrade Execution

  1. Suspend monitoring alerts and on-call paging
    • Place the affected hosts in maintenance mode in PRTG / Datadog / LogicMonitor and silence the PagerDuty service so the NOC isn't paged on every health-check failure. Set a timer — re-enable at the end of the window even if you forget.

  2. Disable scheduled tasks and cron jobs
    • Stop Windows Task Scheduler jobs, cron entries, and any RMM-scheduled scripts that touch the application or its database. A backup job firing mid-upgrade locks files and corrupts the install.

  3. Stop the application and dependent services
  4. Take a pre-upgrade VM snapshot
    • Trigger an ESXi / Hyper-V / Proxmox snapshot with services stopped — this is the fastest rollback option. Tag the snapshot with the change ticket number; schedule deletion after 7 days of stable running.

  5. Apply the upgrade package
    • Run the installer with logging enabled (msiexec /l*v, install.log, etc.). Watch for prompts that the runbook didn't anticipate — silently accepting an EULA change or a database schema migration without reading it is how upgrades break.

  6. Review the installer logs for errors
    Collects list Collects file
  7. Restart application services and verify startup
    • Start services in dependency order — database first, then application tier, then web / load-balanced front-end. Tail the application log during startup; a service that reports "running" while throwing connection errors in the log is not actually up.

4

Rollback (If Upgrade Failed)

  1. Execute the documented rollback plan
    • Revert the VM snapshot or restore from the verified backup captured pre-upgrade. Bring services back in dependency order and run smoke tests against the restored version before declaring rollback complete.

  2. Notify stakeholders of the rollback
    • Send the same distribution list a status update: rollback complete, current version, root cause under investigation, next attempt TBD. Update the change ticket with rollback timestamp and reason code.

  3. Open a vendor support case
    • File with the vendor including installer logs, environment details, and exact failure mode. Get a case number before re-scheduling — the next attempt should not repeat the same path without vendor input.

5

Post-Upgrade Validation

  1. Run smoke tests against production
    • Execute the same smoke-test script used in the test ring: login, top workflows, upstream and downstream integrations. Confirm the application reports the new version number in the about screen or /version endpoint.

  2. Re-enable scheduled tasks and monitoring
    • Take hosts out of maintenance mode in the monitoring tool, un-silence PagerDuty, and re-enable cron / Task Scheduler jobs. Confirm the next scheduled job actually fires — disabled flags sometimes don't toggle back cleanly.

  3. Watch dashboards for performance anomalies
    • Stay on the Datadog / PRTG / LogicMonitor dashboard for at least 60 minutes post-cutover. Watch CPU, memory, disk I/O, query latency, and error rates against the prior week's baseline. New versions often regress on a single metric (memory leak, slow query) that only shows up under real load.

  4. Update IT Glue or Hudu documentation
    • Update the configuration record with the new version, upgrade date, change ticket reference, and any config changes (new ports, new service accounts, schema changes). MSPs especially: stale documentation is what makes the next on-call engineer fail.

  5. Close the change ticket and notify stakeholders
    Collects list Collects paragraph
  6. Hold a 7-day post-upgrade health review
    • Review tickets, monitoring trends, and user feedback from the week post-upgrade. Delete the rollback snapshot if the system is stable. Capture lessons learned in the runbook so the next upgrade is shorter.

Use this template

Copy it to your account, customize the steps, and run it with your team in minutes.


Sections 5
Steps 26
Category Systems Administration
Price Free to start
Need a different process

Browse hundreds of free templates across every team and industry.

Back to template library

Run Software Upgrade Checklist with your team

Customize the steps, assign roles, set a schedule, and keep a complete record for every run.