Performance Tuning Checklist

Phased tuning runbook a sysadmin or MSP engineer follows to take a server from baseline through OS, storage, database, network, and application tuning, with validation and change-management sign-off. Captures pre-tuning metrics and gates database-specific work on the host's wo...

7 sections 30 steps Collects data

Baseline and Change Scope

Capture the pre-tuning performance baseline
- Pull at least one full business day of metrics from PRTG, LogicMonitor, Datadog, or the RMM (CPU, memory, disk IOPS / latency, network throughput, app response time). Attach the export so you can compare against post-tuning numbers in validation. Tuning without a baseline is the most common reason engineers cannot prove improvement at QBR.
Collects file
Classify the host workload profile
- The workload drives every downstream tuning choice — swappiness, I/O scheduler, NUMA pinning, buffer pool sizing. Pick the dominant role; mark the host "Mixed / general purpose" only when no role exceeds ~60% of utilization.
Collects list
Schedule the maintenance window
- Confirm the window with the application owner and any dependent client (MSP context: notify the account manager so the QBR doesn't surface a surprise). Avoid month-end close, payroll runs, and patch Tuesday collisions.
Collects datetime
Open a normal change request in the PSA
- File the RFC in ConnectWise PSA, ServiceNow, or Autotask with implementation steps, rollback plan, and CAB approver. Tuning that touches kernel parameters or DB memory settings is never a standard change — keep it normal-track so the rollback plan is reviewed.

Operating System Tuning

Apply the latest vendor-supported kernel and patches
- Stay one minor release behind the bleeding edge unless a CVE forces otherwise. Confirm the kernel version is on the application vendor's support matrix (Oracle, SAP, and SQL Server all publish certified kernel ranges).
Tune kernel parameters via sysctl
- Adjust net.core.somaxconn, net.ipv4.tcp_max_syn_backlog, vm.dirty_ratio, and vm.dirty_background_ratio to match the workload profile captured earlier. Persist changes in /etc/sysctl.d so they survive reboot — runtime-only tuning that disappears at next boot is a frequent post-tuning regression.
Disable unused services and daemons
- Run systemctl list-unit-files --state=enabled and stop anything not required (cups, avahi, bluetooth on a server, sample web stacks). Document each disabled service in the change record so the next engineer knows it was deliberate.
Configure swap and vm.swappiness
- Database hosts typically run swappiness of 1-10; general-purpose hosts run 60. Verify swap is on a dedicated partition or fast local NVMe — swap on the SAN is a known latency trap during memory pressure.

Storage and File System

Update RAID controller and HBA firmware
- Pull the firmware from the vendor portal (Dell iDRAC Lifecycle, HPE SUM, Lenovo XClarity). Match the firmware against the storage vendor's HCL — a mismatched HBA firmware against a SAN array is a top cause of intermittent latency spikes that masquerade as application bugs.
Set the I/O scheduler for the storage class
- For NVMe use "none" or "mq-deadline"; for SSDs "mq-deadline"; for spinning disk "bfq" or "mq-deadline". Persist via udev rule so the scheduler sticks across reboots and disk replacements.
Tune file system mount options
- Add noatime on hot-read volumes; verify discard or run weekly fstrim on SSD-backed file systems; align XFS or ext4 stripe parameters to the underlying RAID geometry. Misalignment doubles small-write latency on RAID-5/6.
Verify RAID stripe size against workload
- Database workloads with 8K random I/O perform poorly on 256K stripes; sequential workloads (backup target, video) want the larger stripe. If you cannot rebuild the array, document the mismatch as a known ceiling so it doesn't surface later as a "tuning failed" complaint.

Database Optimization

Patch the database to the supported release
- Confirm the engine version against the vendor lifecycle page (Microsoft SQL Server, PostgreSQL, MySQL, Oracle). Apply the latest cumulative update inside the maintenance window with a rollback snapshot taken first.
Rebuild indexes on hot tables
- Identify fragmentation above 30% (sys.dm_db_index_physical_stats on SQL Server, pgstattuple on Postgres). Rebuild online where the edition supports it; reorganize where it does not. Schedule a follow-up update of statistics after the rebuild.
Review and tune slow-query log entries
- Pull the top 10 queries by total elapsed time from Query Store, pg_stat_statements, or the slow-query log. Add covering indexes, rewrite SELECT * patterns, and flag any queries that need application-side changes for the dev team — those go on the engineering backlog, not this runbook.
Adjust buffer pool and shared memory
- SQL Server max server memory typically lands at 75-85% of host RAM with the OS reservation accounted for; PostgreSQL shared_buffers commonly 25% of RAM; MySQL innodb_buffer_pool_size 60-70% on dedicated hosts. Verify huge pages or large pages are enabled where the engine supports it.
Run vacuum, analyze, or equivalent maintenance
- Postgres VACUUM ANALYZE, SQL Server UPDATE STATISTICS, MySQL ANALYZE TABLE. Confirm autovacuum or auto-stats is enabled going forward — manual maintenance during the tuning window is a one-time payoff; the recurring schedule is what holds gains.

Network Performance

Update NIC drivers and firmware
- Pull from Intel, Mellanox, or Broadcom driver pages — the in-box driver lags by 12-24 months and frequently misses TSO / LRO offload fixes that show up as throughput ceilings.
Tune the TCP stack for the link profile
- Raise net.core.rmem_max and wmem_max for high-bandwidth-delay paths; switch congestion control to BBR on long-haul links where supported. On 1 Gb LAN the defaults are usually fine — don't tune what isn't broken.
Enable interface bonding or LACP
- Coordinate with the network team on the switch-side LACP configuration; mismatched bonding modes cause flapping under load. Verify with ethtool and /proc/net/bonding/bond0 that both members are active before declaring done.
Apply QoS marking for priority traffic
- Mark database replication, backup, and VoIP signaling with appropriate DSCP values; confirm the upstream switch (Meraki, Catalyst, FortiGate) trusts the markings. Untrusted markings get rewritten at the edge and the QoS work is wasted.

Application Layer Tuning

Patch the application to a vendor-supported version
- Check the vendor support matrix and any extended-support contracts. Out-of-support application versions are the single most common finding when a tuning engagement turns into a remediation engagement.
Profile the application under representative load
- Use k6, JMeter, or a vendor-specific load generator to drive expected peak traffic. Compare the captured profile against the baseline; "tuned" without a profile under load is just patched.
Tune connection pool and thread settings
- JVM heap, IIS application-pool worker count, Tomcat maxThreads, .NET ThreadPool min/max — pick one per app. Connection-pool exhaustion presents as application timeouts that look like network problems to the helpdesk, so this step prevents tickets later.
Configure the application caching layer
- Redis or Memcached for session and object cache; IIS output caching or Varnish in front of web tiers. Set explicit TTLs and an eviction policy — caches without bounds eventually consume all available memory and trigger the swap problem you tuned away in step OS-4.

Validation and Handoff

Re-run the baseline benchmark suite
- Run the same tools and the same business-day window as the pre-tuning capture so the comparison is apples-to-apples. Different tooling between baseline and validation is the easiest way to fool yourself into thinking tuning worked.
Compare metrics against the pre-tuning baseline
- Quantify the delta in CPU saturation, p95 disk latency, network throughput, and application response time. Attach the post-tuning export so the QBR deck has real numbers, not adjectives.
Collects list Collects paragraph Collects file
Restore the pre-tuning configuration
- Execute the rollback plan filed with the change request: revert sysctl, restore database memory settings, roll the snapshot if needed. File a follow-up RFC with revised tuning hypothesis — do not retry inside the same change window without CAB review.
Update the runbook and CMDB record
- Capture the final settings in IT Glue, Hudu, or Confluence so the next on-call engineer doesn't reverse-engineer your work at 2 a.m. Update the asset record in ConnectWise or ServiceNow with the post-tuning configuration baseline.
Obtain change-owner sign-off
- Application owner or vCIO signs off that the tuning meets the agreed success criteria. Close the change in PSA referencing this checklist run and attach the post-tuning metrics file.
Collects signature

Use this template

Copy it to your account, customize the steps, and run it with your team in minutes.

Use this workflow Start free trial

Sections 7

Steps 30

Category Systems Administration

Price Free to start

Need a different process

Browse hundreds of free templates across every team and industry.

Back to template library

Related templates

More workflows your team can run.

Systems Administration

Run Performance Tuning Checklist with your team

Customize the steps, assign roles, set a schedule, and keep a complete record for every run.