Network Monitoring Checklist
Network Inventory and Configuration
Export the live device list from NinjaOne, Datto RMM, ConnectWise Automate, or Auvik — whatever is the system of record. Reconcile against IT Glue or Hudu documentation; rogue or undocumented devices are the most common audit finding at this step.
Compare running firmware on FortiGate, Meraki, Catalyst, Aruba, and SonicWall gear against the vendor's current recommended release. Note any device more than two minor releases behind — these are change-window candidates, not emergency patches unless a CVE applies.
Rotate local admin and enable secrets in CyberArk, Delinea, Passportal, or Hudu Vault. Common gotcha: a service account hardcoded in a backup script that breaks the moment the password rotates — confirm dependent services before the change.
Walk the trunk configuration on each access switch against the documented VLAN plan. PCI scope, guest WiFi isolation, and IoT/printer segmentation are the three places drift quietly accumulates.
Spot-check that 802.1x is enforcing — not in monitor mode — on production switchports. RADIUS misconfigurations frequently cause silent fallback to MAB, which defeats the control.
Monitoring and Alerting
Pull the polling status report from PRTG, Auvik, LogicMonitor, or SolarWinds Orion. Devices that are reachable but not polling are usually credential mismatches or ACLs blocking the NMS source IP — fix at the device, not the monitoring server.
Adjust CPU, memory, and interface utilization thresholds based on the trailing 30-day baseline so the noise floor reflects current load. Static thresholds copied from a vendor template are the leading cause of alert fatigue.
Review unresolved events in Sentinel, Splunk, or Elastic for the prior seven days. Focus on auth failures against management plane, unexpected config changes, and IPS signature hits — these surface incidents the threshold alerts miss.
Verify each service in PagerDuty or Opsgenie maps to the correct escalation policy and that holiday coverage is filled. Stale schedules — someone who left the team six months ago still on the Tier 2 rota — are how a P1 sits unacknowledged at 2am.
Pull the most-fired alerts from the past month and confirm each has a current runbook in IT Glue or Hudu with the actual diagnostic command, not generic advice. The Tier 1 tech at 3am should not need to invent the fix.
Security and Compliance
Use credentialed scanning in Tenable, Qualys, or Rapid7 — unauthenticated scans miss most of the meaningful findings. Confirm scan windows are coordinated so the IPS does not blackhole the scanner mid-run.
Sort findings by CVSS and exploitability (KEV catalog membership trumps raw score). Critical findings on internet-facing assets get an emergency change; internal mediums roll into the next monthly patch ring.
Open an emergency RFC in ServiceNow or ConnectWise PSA with the CVE, affected hosts, mitigation, and rollback plan. Skip CAB only with documented executive approval — emergency change is still tracked change.
Check signature subscription status on FortiGate IPS or Palo Alto Threat Prevention, and policy version in CrowdStrike, SentinelOne, or Defender for Endpoint. Lapsed subscriptions silently stop updating without breaking traffic.
Walk the test → pilot → production rings and confirm last month's KB rollouts completed without app regressions. The point of three rings is catching a bad KB at pilot — if 100% of fleet is patched on day 1, the rings are not real.
Run a 30-minute tabletop with the on-call rota against a recent realistic scenario — ransomware on a file server, phishing-driven token theft, or a public-facing service compromise. Capture gaps and feed them back into the runbook.
Performance Optimization
Pull NetFlow or sFlow from the core in Auvik, Kentik, or SolarWinds NTA. Backup jobs running during business hours and unsanctioned cloud sync clients are the usual top talkers — both have policy fixes, not bandwidth fixes.
Confirm DSCP markings (EF for voice, AF41 for video) survive end-to-end across LAN and SD-WAN. ISPs strip markings at the handoff unless the SD-WAN edge re-marks — Teams and Zoom call quality complaints often trace back to this.
Run iperf3 between sites at off-peak hours. Compare against last month's results and the contracted circuit speed; sustained drops below the floor are an ISP ticket, not a firewall tuning exercise.
Pull 95th-percentile utilization for each WAN circuit. Anything sustained above 70% is an upgrade conversation — the budget cycle is long, so flag now rather than the week the link saturates.
Backup and Recovery
Confirm Oxidized, RANCID, or the RMM-native backup successfully captured every device's running-config in the past 24 hours. Devices that authenticate with TACACS often silently stop backing up when the TACACS shared secret rotates.
Pick a switch or firewall and restore last night's backup to a lab device, not the live one. The 3-2-1 rule is meaningless if the restore path has not been exercised — the format-vs-archive mismatches always surface here, never in tabletop.
File a P2 ticket assigned to the backup engineer with the drill notes and the device on which the restore failed. Do not close this monthly checklist with a known-broken backup — the next ransomware day finds whatever was deferred.
Verify the offsite tier in Veeam, Datto, or AWS Backup uses object lock or equivalent immutability. A backup writable from production is not ransomware-resilient regardless of how many copies exist.
Reconcile the DR runbook against any new VLANs, circuits, or vendor changes from this cycle. RPO and RTO commitments only hold if the runbook reflects the current network — annual reviews catch this too late.
Use this template in Manifestly
- User Offboarding Checklist
- Application Performance Monitoring Checklist
- User Onboarding Checklist
- Employee Training Checklist
- Hardware Upgrade Checklist
- Network Troubleshooting Checklist
- IT Strategy Checklist
- Hardware Troubleshooting Checklist
- Performance Tuning Checklist
- Patch Deployment Checklist
- IT Policy Review Checklist
- Database Security Checklist
- System Monitoring Checklist
- Software Installation Checklist
- Disaster Recovery Plan Checklist
- Patch Management Checklist
- Customer Support Ticket Workflow
- User Access Review Checklist
- Software Upgrade Checklist
- Cloud Monitoring Checklist
- Containerization Rollout Checklist
- Server Maintenance Checklist
- Business Continuity Plan Checklist
- Rollback Plan Checklist
- Password Management Checklist
- Server Decommissioning Checklist
- Network Upgrade Checklist
- Backup and Restore Checklist
- Server Backup Checklist
- IT Resource Allocation Checklist
- Incident Response Checklist
- Infrastructure as Code Checklist
- Hardware Disposal Checklist
- Database Backup Checklist
- Cloud Security Checklist
- Cloud Migration Checklist
- IT Service Request Checklist
- Cloud Deployment Checklist
- IT Budgeting Checklist
- Database Installation Checklist
- Capacity Planning Checklist
- Security Audit Checklist
- Cloud Cost Management Checklist
- Database Migration Checklist
- Firewall Configuration Checklist
- Quarterly Network Security Review
- Change Management Checklist
- User Role Management Checklist
- IT Staff Performance Review
- Server Security Checklist
- Employee Onboarding Checklist
- Quarterly Compliance Reporting Checklist
- Access Control Checklist
- Incident Management Checklist
- Compliance Audit Checklist
- IT Emergency Response Checklist
- Hardware Maintenance Checklist
- Server Build and Hardening Checklist
- IT Regulatory Compliance Review
- Help Desk Ticket Handling Checklist
- Release Management Checklist
- Data Recovery Checklist
- Problem Management Checklist
- Hardware Inventory Checklist
- IT Vendor Management Checklist
Ready to take control of your recurring tasks?
Start Free 14-Day TrialUse Slack? Sign up with one click
