Hardware Upgrade Checklist

Pre-Upgrade Planning

    Pull baseline metrics from your monitoring stack (PRTG, Datadog, LogicMonitor, Auvik) for the 30 days preceding the upgrade — CPU, memory, IOPS, and network utilization at peak. Without a captured baseline, post-upgrade validation has nothing to compare against and 'feels faster' becomes the only signal.

    File the change request in your ITSM (ServiceNow, Jira Service Management, ConnectWise PSA) with rollback plan, blast radius, maintenance window, and validation steps. Classify as a normal change unless it qualifies as pre-approved standard. CAB approval is your go/no-go gate.

    Confirm the new parts against the vendor's hardware compatibility list — Dell, HPE, Cisco, or the VMware HCL for the target hypervisor. Mismatched HBA firmware or unsupported RAID controllers are a top reason post-upgrade systems fail to boot or lose disk visibility.

    Run a fresh backup with Veeam, Datto, Cohesity, or Rubrik and confirm the job completes with a verified restore point — not just a green status icon. Record the job ID; you may need it during rollback.

    Send the maintenance window notification to application owners, the on-call rotation, and (for MSPs) client primary contacts at least 72 hours ahead. Include start time, expected duration, services affected, and the rollback decision time.

Hardware Installation

    In vCenter, Hyper-V failover cluster, or Kubernetes, drain workloads off the target host before shutdown. Skipping this step kills active VMs or pods rather than letting DRS / the scheduler migrate them gracefully.

    Follow the documented shutdown order — application tier, database tier, then OS — labeling cables as you disconnect them. Out-of-order shutdowns can corrupt clustered databases or leave SAN paths in an unclean state.

    Follow the vendor procedure exactly — torque specs on heatsinks, slot assignments for memory channels, cabling order on the RAID backplane. Improvised installations are the leading source of post-upgrade hardware faults.

    Flash BIOS, BMC/iDRAC/iLO, RAID controller, and HBA firmware to the versions called out on the vendor HCL. Firmware mismatches between identical hardware in a cluster cause subtle performance and stability issues that take weeks to diagnose.

    On power-on, watch POST and confirm all CPUs, DIMMs, drives, and NICs enumerate correctly. Drop into iDRAC/iLO/IPMI and check the system inventory against the BOM. Missing or degraded components here are easier to fix before the OS boots.

    Open a TAC case with the vendor before proceeding. Capture serial numbers, firmware versions, and POST output. Do not boot the OS on partially-detected hardware — you can mask the failure and create harder-to-diagnose problems later.

Post-Upgrade Verification

    Run the vendor's diagnostics suite — Dell SupportAssist, HPE Smart Storage Administrator, memtest86 for memory, plus hypervisor-level health checks. Catch infant-mortality DIMM and disk failures here, not in production.

    Exit maintenance mode and let DRS / the load balancer rebalance workloads onto the upgraded host. Watch the first 15 minutes of production traffic for unexpected resets or path failovers.

    Watch or trigger the next scheduled backup against the upgraded host. New HBA firmware, NIC driver versions, or VSS provider changes can break backup paths in ways that don't surface until the job runs.

    Update ServiceNow CMDB, IT Glue, Hudu, or ConnectWise with the new serials, MAC addresses, firmware versions, and warranty terms. CMDB drift is how vendor audits and DR plans go wrong six months from now.

User Acceptance Testing

    Schedule the post-upgrade UAT window with each affected application owner. Give them specific functions to validate — not just 'is it up' but 'do scheduled jobs run' and 'do peak-hour transactions complete in baseline time.'

    Capture each application owner's pass/fail with specifics. 'Slower than before' without a measurement is not actionable; pull the monitoring panel and compare against the baseline captured in planning.

    If UAT fails against the rollback criteria, execute the CAB-approved plan within the maintenance window — restore the prior firmware, fail back to the standby host, or restore from the verified backup captured in planning. Do not improvise.

    Walk the LOB application list — ERP, EHR, point-of-sale, scheduling, billing — and confirm each is in normal operating range. These drive the business; their owners' sign-off is what matters most.

    Capture the change owner's signature confirming the upgrade is complete and accepted. This closes the CAB-approved change and starts the warranty clock on post-upgrade support.

Post-Upgrade Support

    For the first 7 days, compare the planning-phase baseline to live performance. Watch for thermal anomalies, unusual interrupt rates, or memory pressure that may indicate a marginal component.

    Hold the PIR roughly 14 days post-upgrade. Bring monitoring data, ticket volume, and any incident reports tied to the change. PIR is where lessons-learned go from anecdotes to documented improvements.

    Refresh runbooks, network diagrams, and rack elevations to reflect the new hardware. Skipping this step is how the next on-call engineer at 2am works from stale documentation.

    Close the change in ITSM with the actual outcome, deviations from plan, and lessons learned. Open follow-up tickets for any deferred items — outstanding firmware, replacement parts, or documentation updates.

Use this template in Manifestly

Start a Free 14 Day Trial
Use Slack? Start your trial with one click

Related Systems Administration Checklists
Related Hardware Checklists

Ready to take control of your recurring tasks?

Start Free 14-Day Trial


Use Slack? Sign up with one click

With Slack