Performance Tuning Checklist
Phased tuning runbook a sysadmin or MSP engineer follows to take a server from baseline through OS, storage, database, network, and application tuning, with validation and change-management sign-off. Captures pre-tuning metrics and gates database-specific work on the host's wo...
Baseline and Change Scope
-
Capture the pre-tuning performance baseline
Pull at least one full business day of metrics from PRTG, LogicMonitor, Datadog, or the RMM (CPU, memory, disk IOPS / latency, network throughput, app response time). Attach the export so you can compare against post-tuning numbers in validation. Tuning without a baseline is the most common reason engineers cannot prove improvement at QBR.
Collects file -
Classify the host workload profile
The workload drives every downstream tuning choice — swappiness, I/O scheduler, NUMA pinning, buffer pool sizing. Pick the dominant role; mark the host "Mixed / general purpose" only when no role exceeds ~60% of utilization.
Collects list -
Schedule the maintenance window
Confirm the window with the application owner and any dependent client (MSP context: notify the account manager so the QBR doesn't surface a surprise). Avoid month-end close, payroll runs, and patch Tuesday collisions.
Collects datetime -
Open a normal change request in the PSA
File the RFC in ConnectWise PSA, ServiceNow, or Autotask with implementation steps, rollback plan, and CAB approver. Tuning that touches kernel parameters or DB memory settings is never a standard change — keep it normal-track so the rollback plan is reviewed.
Operating System Tuning
-
Apply the latest vendor-supported kernel and patches
Stay one minor release behind the bleeding edge unless a CVE forces otherwise. Confirm the kernel version is on the application vendor's support matrix (Oracle, SAP, and SQL Server all publish certified kernel ranges).
-
Tune kernel parameters via sysctl
Adjust net.core.somaxconn, net.ipv4.tcp_max_syn_backlog, vm.dirty_ratio, and vm.dirty_background_ratio to match the workload profile captured earlier. Persist changes in /etc/sysctl.d so they survive reboot — runtime-only tuning that disappears at next boot is a frequent post-tuning regression.
-
Disable unused services and daemons
Run systemctl list-unit-files --state=enabled and stop anything not required (cups, avahi, bluetooth on a server, sample web stacks). Document each disabled service in the change record so the next engineer knows it was deliberate.
-
Configure swap and vm.swappiness
Database hosts typically run swappiness of 1-10; general-purpose hosts run 60. Verify swap is on a dedicated partition or fast local NVMe — swap on the SAN is a known latency trap during memory pressure.
Storage and File System
-
Update RAID controller and HBA firmware
Pull the firmware from the vendor portal (Dell iDRAC Lifecycle, HPE SUM, Lenovo XClarity). Match the firmware against the storage vendor's HCL — a mismatched HBA firmware against a SAN array is a top cause of intermittent latency spikes that masquerade as application bugs.
-
Set the I/O scheduler for the storage class
For NVMe use "none" or "mq-deadline"; for SSDs "mq-deadline"; for spinning disk "bfq" or "mq-deadline". Persist via udev rule so the scheduler sticks across reboots and disk replacements.
-
Tune file system mount options
Add noatime on hot-read volumes; verify discard or run weekly fstrim on SSD-backed file systems; align XFS or ext4 stripe parameters to the underlying RAID geometry. Misalignment doubles small-write latency on RAID-5/6.
-
Verify RAID stripe size against workload
Database workloads with 8K random I/O perform poorly on 256K stripes; sequential workloads (backup target, video) want the larger stripe. If you cannot rebuild the array, document the mismatch as a known ceiling so it doesn't surface later as a "tuning failed" complaint.
Database Optimization
-
Patch the database to the supported release
Confirm the engine version against the vendor lifecycle page (Microsoft SQL Server, PostgreSQL, MySQL, Oracle). Apply the latest cumulative update inside the maintenance window with a rollback snapshot taken first.
-
Rebuild indexes on hot tables
Identify fragmentation above 30% (sys.dm_db_index_physical_stats on SQL Server, pgstattuple on Postgres). Rebuild online where the edition supports it; reorganize where it does not. Schedule a follow-up update of statistics after the rebuild.
-
Review and tune slow-query log entries
Pull the top 10 queries by total elapsed time from Query Store, pg_stat_statements, or the slow-query log. Add covering indexes, rewrite SELECT * patterns, and flag any queries that need application-side changes for the dev team — those go on the engineering backlog, not this runbook.
-
Adjust buffer pool and shared memory
SQL Server max server memory typically lands at 75-85% of host RAM with the OS reservation accounted for; PostgreSQL shared_buffers commonly 25% of RAM; MySQL innodb_buffer_pool_size 60-70% on dedicated hosts. Verify huge pages or large pages are enabled where the engine supports it.
-
Run vacuum, analyze, or equivalent maintenance
Postgres VACUUM ANALYZE, SQL Server UPDATE STATISTICS, MySQL ANALYZE TABLE. Confirm autovacuum or auto-stats is enabled going forward — manual maintenance during the tuning window is a one-time payoff; the recurring schedule is what holds gains.
Network Performance
-
Update NIC drivers and firmware
Pull from Intel, Mellanox, or Broadcom driver pages — the in-box driver lags by 12-24 months and frequently misses TSO / LRO offload fixes that show up as throughput ceilings.
-
Tune the TCP stack for the link profile
Raise net.core.rmem_max and wmem_max for high-bandwidth-delay paths; switch congestion control to BBR on long-haul links where supported. On 1 Gb LAN the defaults are usually fine — don't tune what isn't broken.
-
Enable interface bonding or LACP
Coordinate with the network team on the switch-side LACP configuration; mismatched bonding modes cause flapping under load. Verify with ethtool and /proc/net/bonding/bond0 that both members are active before declaring done.
-
Apply QoS marking for priority traffic
Mark database replication, backup, and VoIP signaling with appropriate DSCP values; confirm the upstream switch (Meraki, Catalyst, FortiGate) trusts the markings. Untrusted markings get rewritten at the edge and the QoS work is wasted.
Application Layer Tuning
-
Patch the application to a vendor-supported version
Check the vendor support matrix and any extended-support contracts. Out-of-support application versions are the single most common finding when a tuning engagement turns into a remediation engagement.
-
Profile the application under representative load
Use k6, JMeter, or a vendor-specific load generator to drive expected peak traffic. Compare the captured profile against the baseline; "tuned" without a profile under load is just patched.
-
Tune connection pool and thread settings
JVM heap, IIS application-pool worker count, Tomcat maxThreads, .NET ThreadPool min/max — pick one per app. Connection-pool exhaustion presents as application timeouts that look like network problems to the helpdesk, so this step prevents tickets later.
-
Configure the application caching layer
Redis or Memcached for session and object cache; IIS output caching or Varnish in front of web tiers. Set explicit TTLs and an eviction policy — caches without bounds eventually consume all available memory and trigger the swap problem you tuned away in step OS-4.
Validation and Handoff
-
Re-run the baseline benchmark suite
Run the same tools and the same business-day window as the pre-tuning capture so the comparison is apples-to-apples. Different tooling between baseline and validation is the easiest way to fool yourself into thinking tuning worked.
-
Compare metrics against the pre-tuning baseline
Quantify the delta in CPU saturation, p95 disk latency, network throughput, and application response time. Attach the post-tuning export so the QBR deck has real numbers, not adjectives.
Collects list Collects paragraph Collects file -
Restore the pre-tuning configuration
Execute the rollback plan filed with the change request: revert sysctl, restore database memory settings, roll the snapshot if needed. File a follow-up RFC with revised tuning hypothesis — do not retry inside the same change window without CAB review.
-
Update the runbook and CMDB record
Capture the final settings in IT Glue, Hudu, or Confluence so the next on-call engineer doesn't reverse-engineer your work at 2 a.m. Update the asset record in ConnectWise or ServiceNow with the post-tuning configuration baseline.
-
Obtain change-owner sign-off
Application owner or vCIO signs off that the tuning meets the agreed success criteria. Close the change in PSA referencing this checklist run and attach the post-tuning metrics file.
Collects signature
Use this template
Copy it to your account, customize the steps, and run it with your team in minutes.
Browse hundreds of free templates across every team and industry.
Back to template libraryRun Performance Tuning Checklist with your team
Customize the steps, assign roles, set a schedule, and keep a complete record for every run.