Server Configuration Checklist

Hardware and Platform Sizing

    Decide whether this server runs on bare metal, on-prem hypervisor (VMware vSphere, Hyper-V, Proxmox), or cloud (EC2, Azure VM, GCE). The downstream steps for storage, networking, and power differ significantly between platforms.

    Match vCPU count and RAM allocation to the workload sizing doc. For database servers, leave 25% headroom for buffer pool growth; for app servers, size to p95 load not average. Note NUMA topology if pinning matters.

    Separate OS, application, data, and log volumes. For physical, configure RAID per workload (RAID 10 for DB, RAID 6 for archival). For cloud, pick the right EBS / managed-disk tier (gp3 vs io2, Premium SSD v2) and set IOPS/throughput explicitly — defaults underprovision.

    Physical-only step: confirm dual PSU on separate PDU feeds, label both ends of every cable, and update the DCIM record (NetBox, Device42) with rack/unit/serial. Skip if cloud or virtualized.

Network Configuration

    Reserve the IP in IPAM, create forward and reverse DNS records, and verify resolution from at least two resolvers. Forgetting reverse DNS breaks mail, log correlation, and some TLS handshakes downstream.

    Verify the switch port (or cloud security group / NSG) places the server on the right segment — production, DMZ, management. Cross-check ACLs for required egress (package mirrors, NTP, monitoring) before locking down.

    Point chronyd or w32time to the firm's internal NTP source, not pool.ntp.org. Time drift breaks Kerberos auth, log correlation, and TLS cert validation — and it's the single most common silent cause of weird auth failures.

OS Baseline and Hardening

    Use the firm's golden image (Packer-built AMI, vSphere template, MDT image). Do not install from an upstream ISO and hand-tune — that bypasses the baseline hardening already in the image.

    Run the CIS Benchmark Level 1 profile via Ansible role, Chef cookbook, or InSpec. Capture the pre/post scan as evidence — auditors for SOC 2 and PCI ask for this artifact specifically.

    Pull from the firm's WSUS / Satellite / apt mirror, not the public internet. Reboot to confirm kernel and microcode updates take effect — a server in a half-patched state is worse than an unpatched one because it lies to the scanner.

    Stop and mask unneeded daemons (cups, avahi, rpcbind on most servers). Audit listening ports with ss -tlnp or netstat -ano and confirm each one is intentional.

Identity, Access, and Endpoint Security

    Join Active Directory / Entra ID for Windows; configure SSSD against AD or LDAP for Linux. Avoid local accounts entirely except for a single break-glass account with credentials sealed in the password vault.

    Grant access by AD group, never by individual user. Sudo rules go in /etc/sudoers.d/ via configuration management — direct edits get overwritten. Wire privileged access through PAM (CyberArk, BeyondTrust, Teleport) where the workload tier requires it.

    Install CrowdStrike Falcon, SentinelOne, or Defender for Endpoint and confirm the sensor checks in to the console with the correct host group and policy. An unenrolled server is invisible to the SOC.

    Default-deny inbound; explicitly allow only the application's listening ports plus management (SSH/RDP from jump host CIDR only). Use firewalld, ufw, nftables, or Windows Firewall with Advanced Security via GPO.

    BitLocker for Windows, LUKS for Linux, or platform-managed encryption (EBS, Azure Disk, GCE PD). Escrow the recovery key in the firm's KMS or Vault — losing the key on a production server is its own outage.

Application and Data Layer

    Apply the role via Ansible, Chef, Puppet, or DSC — never hand-install in production. The configuration management run is itself the documentation of what's on the box.

    For PostgreSQL, MySQL, SQL Server, or MongoDB: tune memory settings (shared_buffers, innodb_buffer_pool, max server memory) to the box, not the defaults. Default configs assume a 1 GB laptop.

    Issue from the firm's internal CA or ACME (Let's Encrypt, Sectigo, DigiCert). Set up auto-renewal via certbot or the platform's cert manager — manual renewal is the single biggest source of avoidable outages at the 12-month mark.

Monitoring, Backup, and Handoff

    Install the Datadog, New Relic, or Prometheus node exporter for metrics, and the Splunk forwarder, Fluent Bit, or Elastic Agent for logs. Confirm the host appears in the monitoring console with hostname, tags, and environment label set correctly.

    Wire CPU, memory, disk, and service-up alerts to the right PagerDuty or Opsgenie service. A monitored server with no alert routing is worse than no monitoring — it produces a false sense of coverage.

    Schedule the backup in Veeam, Commvault, Datto, or AWS Backup against the 3-2-1 standard. Run a test restore to a sandbox host and confirm the data is readable — backups that haven't been restored aren't backups, they're hopes.

    Trigger a Tenable, Qualys, or Rapid7 authenticated scan and resolve any High or Critical findings — including KEV-listed CVEs regardless of CVSS score — before the host enters production rotation.