Monthly Server Maintenance Checklist

Monthly maintenance window run by a sysadmin or NOC tech to verify backups, apply patches, review capacity, harden the security posture, and inspect the physical hardware on a single server. Designed to run alongside an approved change ticket.

7 sections 23 steps Collects data

Run Setup and Server Identification

Capture server hostname and role
- Record the FQDN as it appears in DNS plus the role tag from the CMDB (web, app, db, file, hypervisor, AD DC). Mismatches between DNS and the CMDB are a common reason a maintenance window touches the wrong host.
Collects text
Record management IP and out-of-band address
- Capture the primary management IP and the iLO / iDRAC / IPMI address. If the host goes unresponsive after a kernel patch, the OOB address is what gets you back in without a deskside trip.
Collects text
Record primary NIC MAC and rack location

Collects text
Confirm change ticket and maintenance window
- Verify the RFC is approved by CAB, the blackout calendar is clear, and stakeholders have been notified. A maintenance window run without an approved change ticket is itself an audit finding under SOC 2 CC8.1 and ISO 27001 A.12.1.2.
Collects datetime

Backup and Restore Verification

Verify the last backup job in Veeam or Datto
- Open the backup console (Veeam, Commvault, Datto, Rubrik, or AWS Backup) and confirm the most recent job ran to completion with no warnings. A green job with skipped files is not a successful backup — drill into the per-file log.
Run a test restore from the latest snapshot
- Restore a representative file or VM to an isolated location and validate it opens cleanly. Backups that have never been restored are not backups; this is the step that catches silent corruption before it matters.
Confirm offsite immutable copy retention
- Validate the 3-2-1 chain: at least one immutable / air-gapped copy meets the documented retention. Ransomware playbooks assume an immutable tier exists; confirm it does for this server.
Collects list
Open a P2 remediation ticket for the failed backup
- Do not proceed into the patching window without a working backup. Open a P2 in the PSA, page the on-call backup engineer, and document the rollback plan if patching has to continue without a fresh restore point.

OS, Firmware, and Application Patching

Cross-reference pending patches against CISA KEV
- Pull pending patches from WSUS / SCCM / Action1 / Automox and compare against the CISA Known Exploited Vulnerabilities catalog plus EPSS scores. CVSS alone misranks priorities — a CVSS 7.5 on the KEV list outranks a CVSS 9.8 with no observed exploitation.
Apply OS, firmware, and agent patches
- Stage patches via the patch tool, take a pre-patch snapshot if the platform supports it, then apply OS, BIOS / firmware, and agent (EDR, RMM, monitoring) updates in that order. Reboot once at the end rather than after each — fewer reboots, fewer surprises.
Collects list
Patch hosted applications and the control panel
- Update cPanel / Plesk / IIS / Apache / nginx and any application runtimes (Java, .NET, Node, Python). Vendor-managed control panels often lag the OS patch cadence and are a common foothold; check the vendor's CVE feed even if no automated update is queued.
Roll back the failing patch and notify owners
- Restore the pre-patch snapshot or uninstall the offending update, capture the failure signature for the postmortem, and post in the change channel. Do not leave the server in a half-patched state — either it's at the new baseline or rolled back to the prior known-good state.

Capacity and Database Maintenance

Run database integrity check
- Run DBCC CHECKDB on SQL Server, pg_amcheck on Postgres, or the equivalent for MySQL / Oracle. Schedule during low-traffic; corruption found here is the reason the prior section verified backups first.
Rebuild fragmented indexes and reclaim disk space
- Rebuild indexes above the 30% fragmentation threshold, vacuum / shrink as appropriate, and rotate or compress old logs. Watch for runaway temp tables and orphaned WAL / transaction log files that silently fill the volume.
Review CPU, RAM, disk, and network trends
- Pull the 30-day trend from Datadog / Prometheus / SolarWinds. Flag any volume above 80% capacity, sustained CPU above 70%, or memory pressure that triggered swap. The point is to catch the trend before next month's window, not to firefight today.

Security Review

Rotate service account credentials in the vault
- Rotate via HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault — never edit the password directly on the host. Service accounts last rotated years ago because "changing them breaks things" are the most common credential-theft target on a server.
Run an authenticated vulnerability scan
- Run an authenticated Tenable, Qualys, or Rapid7 scan against the host and compare findings to last month. Unauthenticated scans miss most local privilege issues; the credentialed result is what feeds SOC 2 / PCI evidence.
Review privileged accounts and EDR alerts
- Pull the local Administrators / sudoers / wheel group and reconcile against the IdP entitlement. Review the last 30 days of CrowdStrike or Defender for Endpoint detections on the host; quietly-suppressed alerts are a common audit finding.

Hardware and Facility Inspection

Inspect PSUs, fans, and chassis LEDs
- Walk the rack: confirm both power supplies show green, no amber fault LEDs on the chassis or drive carriers, and intake/exhaust airflow is unobstructed. A single-PSU server running on a redundant pair is silently one PSU failure away from outage.
Check RAID controller and disk health
- Pull RAID status from the controller (PERC, SmartArray, MegaRAID) and SMART data from each disk. A degraded array running on parity is fast to fail to a second-disk loss; treat any non-Healthy state as urgent.
Collects list
Order a replacement drive and escalate to vendor support
- Open a Dell ProSupport / HPE / Lenovo case with the controller log bundle, order the replacement under warranty, and schedule a hot-swap window. Note the failed drive's serial and slot — replacing the wrong slot collapses the array.
Verify rack temperature and HVAC status
- Read the rack-top temperature sensor and compare to the prior month. Cold aisle should be 64–80°F per ASHRAE TC9.9. A creeping baseline is the early signal of a CRAC unit needing service before it fails on a Saturday.

Sign-Off and Reporting

Record maintenance summary and outstanding issues
- Close the change ticket with the outcome, attach the patch report and vulnerability scan output, and link any P2 follow-ups opened during the window. This evidence pack is what SOC 2 and ISO 27001 auditors sample at the next review.
Collects list Collects paragraph Collects file

Use this template

Copy it to your account, customize the steps, and run it with your team in minutes.

Use this workflow Start free trial

Sections 7

Steps 23

Category Information Technology

Price Free to start

Need a different process

Browse hundreds of free templates across every team and industry.

Back to template library

Related templates

More workflows your team can run.

Information Technology

Run Monthly Server Maintenance Checklist with your team

Customize the steps, assign roles, set a schedule, and keep a complete record for every run.

Use this workflow Start free trial

Monthly Server Maintenance Checklist

Run Setup and Server Identification

Backup and Restore Verification

OS, Firmware, and Application Patching

Capacity and Database Maintenance

Security Review

Hardware and Facility Inspection

Sign-Off and Reporting

Use this template

Related templates

Desktop Configuration Checklist

Monthly Server Maintenance Checklist

User Access Control Checklist

Performance Monitoring Checklist

Run Monthly Server Maintenance Checklist with your team