Hardware Upgrade Checklist

Steps an IT operations team or MSP runs to plan, install, verify, and support a server or network hardware upgrade through a controlled maintenance window. Covers change-management gating, vendor compatibility, backup verification, UAT, and post-implementation review.

5 sections 24 steps Collects data
1

Pre-Upgrade Planning

  1. Benchmark current performance against capacity thresholds
    • Pull baseline metrics from your monitoring stack (PRTG, Datadog, LogicMonitor, Auvik) for the 30 days preceding the upgrade — CPU, memory, IOPS, and network utilization at peak. Without a captured baseline, post-upgrade validation has nothing to compare against and 'feels faster' becomes the only signal.

  2. Submit the RFC to the change advisory board
    • File the change request in your ITSM (ServiceNow, Jira Service Management, ConnectWise PSA) with rollback plan, blast radius, maintenance window, and validation steps. Classify as a normal change unless it qualifies as pre-approved standard. CAB approval is your go/no-go gate.

  3. Verify components against the vendor HCL
    • Confirm the new parts against the vendor's hardware compatibility list — Dell, HPE, Cisco, or the VMware HCL for the target hypervisor. Mismatched HBA firmware or unsupported RAID controllers are a top reason post-upgrade systems fail to boot or lose disk visibility.

  4. Capture a verified restore point
    • Run a fresh backup with Veeam, Datto, Cohesity, or Rubrik and confirm the job completes with a verified restore point — not just a green status icon. Record the job ID; you may need it during rollback.

    Collects text
  5. Notify stakeholders of the maintenance window
    • Send the maintenance window notification to application owners, the on-call rotation, and (for MSPs) client primary contacts at least 72 hours ahead. Include start time, expected duration, services affected, and the rollback decision time.

2

Hardware Installation

  1. Place affected hosts into maintenance mode
    • In vCenter, Hyper-V failover cluster, or Kubernetes, drain workloads off the target host before shutdown. Skipping this step kills active VMs or pods rather than letting DRS / the scheduler migrate them gracefully.

  2. Shut down per the documented sequence
    • Follow the documented shutdown order — application tier, database tier, then OS — labeling cables as you disconnect them. Out-of-order shutdowns can corrupt clustered databases or leave SAN paths in an unclean state.

  3. Install new components per the vendor procedure
    • Follow the vendor procedure exactly — torque specs on heatsinks, slot assignments for memory channels, cabling order on the RAID backplane. Improvised installations are the leading source of post-upgrade hardware faults.

  4. Update firmware to vendor-recommended levels
    • Flash BIOS, BMC/iDRAC/iLO, RAID controller, and HBA firmware to the versions called out on the vendor HCL. Firmware mismatches between identical hardware in a cluster cause subtle performance and stability issues that take weeks to diagnose.

  5. Verify hardware enumeration at POST
    • On power-on, watch POST and confirm all CPUs, DIMMs, drives, and NICs enumerate correctly. Drop into iDRAC/iLO/IPMI and check the system inventory against the BOM. Missing or degraded components here are easier to fix before the OS boots.

    Collects list
  6. Engage vendor TAC on missing components
    • Open a TAC case with the vendor before proceeding. Capture serial numbers, firmware versions, and POST output. Do not boot the OS on partially-detected hardware — you can mask the failure and create harder-to-diagnose problems later.

3

Post-Upgrade Verification

  1. Run vendor diagnostics on the upgraded host
    • Run the vendor's diagnostics suite — Dell SupportAssist, HPE Smart Storage Administrator, memtest86 for memory, plus hypervisor-level health checks. Catch infant-mortality DIMM and disk failures here, not in production.

  2. Return the host to the production cluster
    • Exit maintenance mode and let DRS / the load balancer rebalance workloads onto the upgraded host. Watch the first 15 minutes of production traffic for unexpected resets or path failovers.

  3. Validate the next backup completes against the host
    • Watch or trigger the next scheduled backup against the upgraded host. New HBA firmware, NIC driver versions, or VSS provider changes can break backup paths in ways that don't surface until the job runs.

  4. Update the CMDB with new asset details
    • Update ServiceNow CMDB, IT Glue, Hudu, or ConnectWise with the new serials, MAC addresses, firmware versions, and warranty terms. CMDB drift is how vendor audits and DR plans go wrong six months from now.

    Collects text
4

User Acceptance Testing

  1. Coordinate UAT with application owners
    • Schedule the post-upgrade UAT window with each affected application owner. Give them specific functions to validate — not just 'is it up' but 'do scheduled jobs run' and 'do peak-hour transactions complete in baseline time.'

  2. Collect UAT performance feedback
    • Capture each application owner's pass/fail with specifics. 'Slower than before' without a measurement is not actionable; pull the monitoring panel and compare against the baseline captured in planning.

    Collects list
  3. Execute the documented rollback plan
    • If UAT fails against the rollback criteria, execute the CAB-approved plan within the maintenance window — restore the prior firmware, fail back to the standby host, or restore from the verified backup captured in planning. Do not improvise.

  4. Confirm line-of-business applications are healthy
    • Walk the LOB application list — ERP, EHR, point-of-sale, scheduling, billing — and confirm each is in normal operating range. These drive the business; their owners' sign-off is what matters most.

  5. Obtain change-owner sign-off
    • Capture the change owner's signature confirming the upgrade is complete and accepted. This closes the CAB-approved change and starts the warranty clock on post-upgrade support.

    Collects signature
5

Post-Upgrade Support

  1. Monitor performance against the captured baseline
    • For the first 7 days, compare the planning-phase baseline to live performance. Watch for thermal anomalies, unusual interrupt rates, or memory pressure that may indicate a marginal component.

  2. Schedule the post-implementation review
    • Hold the PIR roughly 14 days post-upgrade. Bring monitoring data, ticket volume, and any incident reports tied to the change. PIR is where lessons-learned go from anecdotes to documented improvements.

  3. Update runbooks and topology diagrams
    • Refresh runbooks, network diagrams, and rack elevations to reflect the new hardware. Skipping this step is how the next on-call engineer at 2am works from stale documentation.

  4. Close the change ticket with lessons learned
    • Close the change in ITSM with the actual outcome, deviations from plan, and lessons learned. Open follow-up tickets for any deferred items — outstanding firmware, replacement parts, or documentation updates.

Use this template

Copy it to your account, customize the steps, and run it with your team in minutes.


Sections 5
Steps 24
Category Systems Administration
Price Free to start
Need a different process

Browse hundreds of free templates across every team and industry.

Back to template library

Run Hardware Upgrade Checklist with your team

Customize the steps, assign roles, set a schedule, and keep a complete record for every run.