Performance Tuning Checklist

Phased tuning runbook a sysadmin or MSP engineer follows to take a server from baseline through OS, storage, database, network, and application tuning, with validation and change-management sign-off. Captures pre-tuning metrics and gates database-specific work on the host's wo...

7 sections 30 steps Collects data
1

Baseline and Change Scope

  1. Capture the pre-tuning performance baseline
    • Pull at least one full business day of metrics from PRTG, LogicMonitor, Datadog, or the RMM (CPU, memory, disk IOPS / latency, network throughput, app response time). Attach the export so you can compare against post-tuning numbers in validation. Tuning without a baseline is the most common reason engineers cannot prove improvement at QBR.

    Collects file
  2. Classify the host workload profile
    • The workload drives every downstream tuning choice — swappiness, I/O scheduler, NUMA pinning, buffer pool sizing. Pick the dominant role; mark the host "Mixed / general purpose" only when no role exceeds ~60% of utilization.

    Collects list
  3. Schedule the maintenance window
    • Confirm the window with the application owner and any dependent client (MSP context: notify the account manager so the QBR doesn't surface a surprise). Avoid month-end close, payroll runs, and patch Tuesday collisions.

    Collects datetime
  4. Open a normal change request in the PSA
    • File the RFC in ConnectWise PSA, ServiceNow, or Autotask with implementation steps, rollback plan, and CAB approver. Tuning that touches kernel parameters or DB memory settings is never a standard change — keep it normal-track so the rollback plan is reviewed.

2

Operating System Tuning

  1. Apply the latest vendor-supported kernel and patches
    • Stay one minor release behind the bleeding edge unless a CVE forces otherwise. Confirm the kernel version is on the application vendor's support matrix (Oracle, SAP, and SQL Server all publish certified kernel ranges).

  2. Tune kernel parameters via sysctl
    • Adjust net.core.somaxconn, net.ipv4.tcp_max_syn_backlog, vm.dirty_ratio, and vm.dirty_background_ratio to match the workload profile captured earlier. Persist changes in /etc/sysctl.d so they survive reboot — runtime-only tuning that disappears at next boot is a frequent post-tuning regression.

  3. Disable unused services and daemons
    • Run systemctl list-unit-files --state=enabled and stop anything not required (cups, avahi, bluetooth on a server, sample web stacks). Document each disabled service in the change record so the next engineer knows it was deliberate.

  4. Configure swap and vm.swappiness
    • Database hosts typically run swappiness of 1-10; general-purpose hosts run 60. Verify swap is on a dedicated partition or fast local NVMe — swap on the SAN is a known latency trap during memory pressure.

3

Storage and File System

  1. Update RAID controller and HBA firmware
    • Pull the firmware from the vendor portal (Dell iDRAC Lifecycle, HPE SUM, Lenovo XClarity). Match the firmware against the storage vendor's HCL — a mismatched HBA firmware against a SAN array is a top cause of intermittent latency spikes that masquerade as application bugs.

  2. Set the I/O scheduler for the storage class
    • For NVMe use "none" or "mq-deadline"; for SSDs "mq-deadline"; for spinning disk "bfq" or "mq-deadline". Persist via udev rule so the scheduler sticks across reboots and disk replacements.

  3. Tune file system mount options
    • Add noatime on hot-read volumes; verify discard or run weekly fstrim on SSD-backed file systems; align XFS or ext4 stripe parameters to the underlying RAID geometry. Misalignment doubles small-write latency on RAID-5/6.

  4. Verify RAID stripe size against workload
    • Database workloads with 8K random I/O perform poorly on 256K stripes; sequential workloads (backup target, video) want the larger stripe. If you cannot rebuild the array, document the mismatch as a known ceiling so it doesn't surface later as a "tuning failed" complaint.

4

Database Optimization

  1. Patch the database to the supported release
    • Confirm the engine version against the vendor lifecycle page (Microsoft SQL Server, PostgreSQL, MySQL, Oracle). Apply the latest cumulative update inside the maintenance window with a rollback snapshot taken first.

  2. Rebuild indexes on hot tables
    • Identify fragmentation above 30% (sys.dm_db_index_physical_stats on SQL Server, pgstattuple on Postgres). Rebuild online where the edition supports it; reorganize where it does not. Schedule a follow-up update of statistics after the rebuild.

  3. Review and tune slow-query log entries
    • Pull the top 10 queries by total elapsed time from Query Store, pg_stat_statements, or the slow-query log. Add covering indexes, rewrite SELECT * patterns, and flag any queries that need application-side changes for the dev team — those go on the engineering backlog, not this runbook.

  4. Adjust buffer pool and shared memory
    • SQL Server max server memory typically lands at 75-85% of host RAM with the OS reservation accounted for; PostgreSQL shared_buffers commonly 25% of RAM; MySQL innodb_buffer_pool_size 60-70% on dedicated hosts. Verify huge pages or large pages are enabled where the engine supports it.

  5. Run vacuum, analyze, or equivalent maintenance
    • Postgres VACUUM ANALYZE, SQL Server UPDATE STATISTICS, MySQL ANALYZE TABLE. Confirm autovacuum or auto-stats is enabled going forward — manual maintenance during the tuning window is a one-time payoff; the recurring schedule is what holds gains.

5

Network Performance

  1. Update NIC drivers and firmware
    • Pull from Intel, Mellanox, or Broadcom driver pages — the in-box driver lags by 12-24 months and frequently misses TSO / LRO offload fixes that show up as throughput ceilings.

  2. Tune the TCP stack for the link profile
    • Raise net.core.rmem_max and wmem_max for high-bandwidth-delay paths; switch congestion control to BBR on long-haul links where supported. On 1 Gb LAN the defaults are usually fine — don't tune what isn't broken.

  3. Enable interface bonding or LACP
    • Coordinate with the network team on the switch-side LACP configuration; mismatched bonding modes cause flapping under load. Verify with ethtool and /proc/net/bonding/bond0 that both members are active before declaring done.

  4. Apply QoS marking for priority traffic
    • Mark database replication, backup, and VoIP signaling with appropriate DSCP values; confirm the upstream switch (Meraki, Catalyst, FortiGate) trusts the markings. Untrusted markings get rewritten at the edge and the QoS work is wasted.

6

Application Layer Tuning

  1. Patch the application to a vendor-supported version
    • Check the vendor support matrix and any extended-support contracts. Out-of-support application versions are the single most common finding when a tuning engagement turns into a remediation engagement.

  2. Profile the application under representative load
    • Use k6, JMeter, or a vendor-specific load generator to drive expected peak traffic. Compare the captured profile against the baseline; "tuned" without a profile under load is just patched.

  3. Tune connection pool and thread settings
    • JVM heap, IIS application-pool worker count, Tomcat maxThreads, .NET ThreadPool min/max — pick one per app. Connection-pool exhaustion presents as application timeouts that look like network problems to the helpdesk, so this step prevents tickets later.

  4. Configure the application caching layer
    • Redis or Memcached for session and object cache; IIS output caching or Varnish in front of web tiers. Set explicit TTLs and an eviction policy — caches without bounds eventually consume all available memory and trigger the swap problem you tuned away in step OS-4.

7

Validation and Handoff

  1. Re-run the baseline benchmark suite
    • Run the same tools and the same business-day window as the pre-tuning capture so the comparison is apples-to-apples. Different tooling between baseline and validation is the easiest way to fool yourself into thinking tuning worked.

  2. Compare metrics against the pre-tuning baseline
    • Quantify the delta in CPU saturation, p95 disk latency, network throughput, and application response time. Attach the post-tuning export so the QBR deck has real numbers, not adjectives.

    Collects list Collects paragraph Collects file
  3. Restore the pre-tuning configuration
    • Execute the rollback plan filed with the change request: revert sysctl, restore database memory settings, roll the snapshot if needed. File a follow-up RFC with revised tuning hypothesis — do not retry inside the same change window without CAB review.

  4. Update the runbook and CMDB record
    • Capture the final settings in IT Glue, Hudu, or Confluence so the next on-call engineer doesn't reverse-engineer your work at 2 a.m. Update the asset record in ConnectWise or ServiceNow with the post-tuning configuration baseline.

  5. Obtain change-owner sign-off
    • Application owner or vCIO signs off that the tuning meets the agreed success criteria. Close the change in PSA referencing this checklist run and attach the post-tuning metrics file.

    Collects signature

Use this template

Copy it to your account, customize the steps, and run it with your team in minutes.


Sections 7
Steps 30
Category Systems Administration
Price Free to start
Need a different process

Browse hundreds of free templates across every team and industry.

Back to template library

Run Performance Tuning Checklist with your team

Customize the steps, assign roles, set a schedule, and keep a complete record for every run.