Server Configuration Checklist
Hardware and Platform Sizing
Decide whether this server runs on bare metal, on-prem hypervisor (VMware vSphere, Hyper-V, Proxmox), or cloud (EC2, Azure VM, GCE). The downstream steps for storage, networking, and power differ significantly between platforms.
Match vCPU count and RAM allocation to the workload sizing doc. For database servers, leave 25% headroom for buffer pool growth; for app servers, size to p95 load not average. Note NUMA topology if pinning matters.
Separate OS, application, data, and log volumes. For physical, configure RAID per workload (RAID 10 for DB, RAID 6 for archival). For cloud, pick the right EBS / managed-disk tier (gp3 vs io2, Premium SSD v2) and set IOPS/throughput explicitly — defaults underprovision.
Physical-only step: confirm dual PSU on separate PDU feeds, label both ends of every cable, and update the DCIM record (NetBox, Device42) with rack/unit/serial. Skip if cloud or virtualized.
Network Configuration
Reserve the IP in IPAM, create forward and reverse DNS records, and verify resolution from at least two resolvers. Forgetting reverse DNS breaks mail, log correlation, and some TLS handshakes downstream.
Verify the switch port (or cloud security group / NSG) places the server on the right segment — production, DMZ, management. Cross-check ACLs for required egress (package mirrors, NTP, monitoring) before locking down.
Point chronyd or w32time to the firm's internal NTP source, not pool.ntp.org. Time drift breaks Kerberos auth, log correlation, and TLS cert validation — and it's the single most common silent cause of weird auth failures.
OS Baseline and Hardening
Use the firm's golden image (Packer-built AMI, vSphere template, MDT image). Do not install from an upstream ISO and hand-tune — that bypasses the baseline hardening already in the image.
Run the CIS Benchmark Level 1 profile via Ansible role, Chef cookbook, or InSpec. Capture the pre/post scan as evidence — auditors for SOC 2 and PCI ask for this artifact specifically.
Pull from the firm's WSUS / Satellite / apt mirror, not the public internet. Reboot to confirm kernel and microcode updates take effect — a server in a half-patched state is worse than an unpatched one because it lies to the scanner.
Stop and mask unneeded daemons (cups, avahi, rpcbind on most servers). Audit listening ports with ss -tlnp or netstat -ano and confirm each one is intentional.
Identity, Access, and Endpoint Security
Join Active Directory / Entra ID for Windows; configure SSSD against AD or LDAP for Linux. Avoid local accounts entirely except for a single break-glass account with credentials sealed in the password vault.
Grant access by AD group, never by individual user. Sudo rules go in /etc/sudoers.d/ via configuration management — direct edits get overwritten. Wire privileged access through PAM (CyberArk, BeyondTrust, Teleport) where the workload tier requires it.
Install CrowdStrike Falcon, SentinelOne, or Defender for Endpoint and confirm the sensor checks in to the console with the correct host group and policy. An unenrolled server is invisible to the SOC.
Default-deny inbound; explicitly allow only the application's listening ports plus management (SSH/RDP from jump host CIDR only). Use firewalld, ufw, nftables, or Windows Firewall with Advanced Security via GPO.
BitLocker for Windows, LUKS for Linux, or platform-managed encryption (EBS, Azure Disk, GCE PD). Escrow the recovery key in the firm's KMS or Vault — losing the key on a production server is its own outage.
Application and Data Layer
Apply the role via Ansible, Chef, Puppet, or DSC — never hand-install in production. The configuration management run is itself the documentation of what's on the box.
For PostgreSQL, MySQL, SQL Server, or MongoDB: tune memory settings (shared_buffers, innodb_buffer_pool, max server memory) to the box, not the defaults. Default configs assume a 1 GB laptop.
Issue from the firm's internal CA or ACME (Let's Encrypt, Sectigo, DigiCert). Set up auto-renewal via certbot or the platform's cert manager — manual renewal is the single biggest source of avoidable outages at the 12-month mark.
Monitoring, Backup, and Handoff
Install the Datadog, New Relic, or Prometheus node exporter for metrics, and the Splunk forwarder, Fluent Bit, or Elastic Agent for logs. Confirm the host appears in the monitoring console with hostname, tags, and environment label set correctly.
Wire CPU, memory, disk, and service-up alerts to the right PagerDuty or Opsgenie service. A monitored server with no alert routing is worse than no monitoring — it produces a false sense of coverage.
Schedule the backup in Veeam, Commvault, Datto, or AWS Backup against the 3-2-1 standard. Run a test restore to a sandbox host and confirm the data is readable — backups that haven't been restored aren't backups, they're hopes.
Trigger a Tenable, Qualys, or Rapid7 authenticated scan and resolve any High or Critical findings — including KEV-listed CVEs regardless of CVSS score — before the host enters production rotation.
Use this template in Manifestly
- Cloud Outage Response
- Vulnerability Intake Checklist
- Network Maintenance Checklist
- Disaster Recovery Checklist
- Server Maintenance Checklist
- Data Backup Verification Checklist
- Software Installation Checklist
- Onboarding a New Software Developer
- Patch Management Checklist
- Software Update Checklist
- Performance Monitoring Checklist
- Incident Response Checklist
- Quarterly Security Review Checklist
- User Access Control Checklist
- Monthly Server Maintenance Checklist
- Monthly Server Maintenance Checklist
- Desktop Configuration Checklist
Ready to take control of your recurring tasks?
Start Free 14-Day TrialUse Slack? Sign up with one click
