Server Security Checklist

User and Access Management

    Pull the active user list from Entra ID (or AD) and reconcile against the current HR roster. Flag any account whose owner left more than 30 days ago — orphaned accounts with cached Kerberos tickets are the offboarding gap auditors look for first.

    Review Domain Admins, Enterprise Admins, Schema Admins, and any custom privileged groups. The bar is named humans only — no service accounts, no "temporary" elevations from six months ago. Membership changes since last review go in the audit log.

    Run the Entra ID conditional access report (or Okta admin export) and confirm every privileged account has MFA enforced via phishing-resistant factor (FIDO2 / WHfB), not SMS. Confirm legacy basic-auth endpoints (IMAP, POP, SMTP AUTH) are blocked org-wide so attackers can't sidestep MFA entirely.

    Pull the service account inventory and rotate any credential older than the policy window (typically 90 days). Coordinate with app owners — rotating a service account that's hardcoded in a forgotten scheduled task is the #1 way to cause a self-inflicted Monday outage.

Network Security

    Export rules from FortiGate / Palo Alto / Meraki and check the hit-counter or last-used timestamp. Rules with zero hits in 90 days are candidates for removal. Document the business justification for any any-any or wide-open inbound rule before next quarter's review.

    From a host on the user VLAN, attempt to reach the server, IoT, and (if applicable) PCI VLANs on prohibited ports. Flat networks expand PCI and HIPAA scope; the segmentation test is what proves the boundary is real, not just configured.

    Verify conditional access policies block legacy auth protocols (IMAP, POP, MAPI, EWS basic, SMTP AUTH). Run the M365 sign-in log filter for legacy clients over the last 30 days; any non-zero count is a finding worth investigating before close.

System and Software Updates

    Pull the latest Tenable / Qualys / Rapid7 scan and triage anything CVSS 7.0+ with known exploitation. Critical and KEV-listed CVEs go on the emergency change track; the rest land in the next standard maintenance window.

    Push the patch bundle from WSUS / Intune / Automox to the test ring (typically 5-10% of endpoints, including one of every server role). Wait 48 hours for telemetry before promoting. The Friday-afternoon-direct-to-prod approach is how 800 users can't log in Monday morning.

    Halt promotion to production. Roll back the affected patches in the test ring, open a vendor case (KB number, repro steps, event log excerpts), and document the exception in the change record. Schedule a re-test once the vendor publishes a fix.

Backup and Recovery

    Pull the last 30 days of Veeam / Datto / Rubrik job reports. Investigate any consecutive failures, not just last-night green. A green dashboard with a silently-failing offsite replication leg is the classic ransomware-day surprise.

    Verify the 3-2-1 picture: object-locked S3 bucket / immutable Datto cloud / write-once tape — whichever you run. The immutable copy is the only one that survives a backup-aware ransomware attack that pivots from the production network into the backup repository.

    Restore one production server's most recent backup into an isolated test network. Boot it, confirm services start, and time the recovery against your stated RTO. A backup that hasn't been restored is a hypothesis, not a backup.

    File a P2 incident in the PSA / ITSM. Capture the failure mode (corrupt media, missing credential, key-management gap, vendor format change) and assign an owner. Re-run the drill within 14 days; until then the documented RTO is unverified.

Logging, Monitoring, and Sign-Off

    Run the Sentinel / Splunk / QRadar source-coverage report. Every domain controller, file server, EDR console, firewall, and identity provider should be reporting within the last 24 hours. A silent source is usually a forwarder that died after the last patch.

    Review the top 10 noisiest alert rules from the last 30 days. Suppress, retune, or retire — alert fatigue is what causes the on-call to mute the channel and miss the real one. Keep a tuning log so changes are auditable.

    SOC 2 typically expects 12 months of security-relevant logs available for review; PCI DSS requires 1 year with 90 days immediately available. Confirm the SIEM hot/warm/cold tiers match what your auditor was told last cycle.

Use this template in Manifestly

Start a Free 14 Day Trial
Use Slack? Start your trial with one click

Related Systems Administration Checklists

Ready to take control of your recurring tasks?

Start Free 14-Day Trial


Use Slack? Sign up with one click

With Slack