Network Monitoring Checklist
Network Inventory and Configuration
Export the live device list from NinjaOne, Datto RMM, ConnectWise Automate, or Auvik — whatever is the system of record. Reconcile against IT Glue or Hudu documentation; rogue or undocumented devices are the most common audit finding at this step.
Compare running firmware on FortiGate, Meraki, Catalyst, Aruba, and SonicWall gear against the vendor's current recommended release. Note any device more than two minor releases behind — these are change-window candidates, not emergency patches unless a CVE applies.
Rotate local admin and enable secrets in CyberArk, Delinea, Passportal, or Hudu Vault. Common gotcha: a service account hardcoded in a backup script that breaks the moment the password rotates — confirm dependent services before the change.
Walk the trunk configuration on each access switch against the documented VLAN plan. PCI scope, guest WiFi isolation, and IoT/printer segmentation are the three places drift quietly accumulates.
Spot-check that 802.1x is enforcing — not in monitor mode — on production switchports. RADIUS misconfigurations frequently cause silent fallback to MAB, which defeats the control.
Monitoring and Alerting
Pull the polling status report from PRTG, Auvik, LogicMonitor, or SolarWinds Orion. Devices that are reachable but not polling are usually credential mismatches or ACLs blocking the NMS source IP — fix at the device, not the monitoring server.
Adjust CPU, memory, and interface utilization thresholds based on the trailing 30-day baseline so the noise floor reflects current load. Static thresholds copied from a vendor template are the leading cause of alert fatigue.
Review unresolved events in Sentinel, Splunk, or Elastic for the prior seven days. Focus on auth failures against management plane, unexpected config changes, and IPS signature hits — these surface incidents the threshold alerts miss.
Verify each service in PagerDuty or Opsgenie maps to the correct escalation policy and that holiday coverage is filled. Stale schedules — someone who left the team six months ago still on the Tier 2 rota — are how a P1 sits unacknowledged at 2am.
Pull the most-fired alerts from the past month and confirm each has a current runbook in IT Glue or Hudu with the actual diagnostic command, not generic advice. The Tier 1 tech at 3am should not need to invent the fix.
Security and Compliance
Use credentialed scanning in Tenable, Qualys, or Rapid7 — unauthenticated scans miss most of the meaningful findings. Confirm scan windows are coordinated so the IPS does not blackhole the scanner mid-run.
Sort findings by CVSS and exploitability (KEV catalog membership trumps raw score). Critical findings on internet-facing assets get an emergency change; internal mediums roll into the next monthly patch ring.
Open an emergency RFC in ServiceNow or ConnectWise PSA with the CVE, affected hosts, mitigation, and rollback plan. Skip CAB only with documented executive approval — emergency change is still tracked change.
Check signature subscription status on FortiGate IPS or Palo Alto Threat Prevention, and policy version in CrowdStrike, SentinelOne, or Defender for Endpoint. Lapsed subscriptions silently stop updating without breaking traffic.
Walk the test → pilot → production rings and confirm last month's KB rollouts completed without app regressions. The point of three rings is catching a bad KB at pilot — if 100% of fleet is patched on day 1, the rings are not real.
Run a 30-minute tabletop with the on-call rota against a recent realistic scenario — ransomware on a file server, phishing-driven token theft, or a public-facing service compromise. Capture gaps and feed them back into the runbook.
Performance Optimization
Pull NetFlow or sFlow from the core in Auvik, Kentik, or SolarWinds NTA. Backup jobs running during business hours and unsanctioned cloud sync clients are the usual top talkers — both have policy fixes, not bandwidth fixes.
Confirm DSCP markings (EF for voice, AF41 for video) survive end-to-end across LAN and SD-WAN. ISPs strip markings at the handoff unless the SD-WAN edge re-marks — Teams and Zoom call quality complaints often trace back to this.
Run iperf3 between sites at off-peak hours. Compare against last month's results and the contracted circuit speed; sustained drops below the floor are an ISP ticket, not a firewall tuning exercise.
Pull 95th-percentile utilization for each WAN circuit. Anything sustained above 70% is an upgrade conversation — the budget cycle is long, so flag now rather than the week the link saturates.
Backup and Recovery
Confirm Oxidized, RANCID, or the RMM-native backup successfully captured every device's running-config in the past 24 hours. Devices that authenticate with TACACS often silently stop backing up when the TACACS shared secret rotates.
Pick a switch or firewall and restore last night's backup to a lab device, not the live one. The 3-2-1 rule is meaningless if the restore path has not been exercised — the format-vs-archive mismatches always surface here, never in tabletop.
File a P2 ticket assigned to the backup engineer with the drill notes and the device on which the restore failed. Do not close this monthly checklist with a known-broken backup — the next ransomware day finds whatever was deferred.
Verify the offsite tier in Veeam, Datto, or AWS Backup uses object lock or equivalent immutability. A backup writable from production is not ransomware-resilient regardless of how many copies exist.
Reconcile the DR runbook against any new VLANs, circuits, or vendor changes from this cycle. RPO and RTO commitments only hold if the runbook reflects the current network — annual reviews catch this too late.
Use this template in Manifestly
- Cloud Migration Checklist
- Cloud Security Checklist
- User Access Review Checklist
- Data Recovery Checklist
- Containerization Rollout Checklist
- Database Backup Checklist
- Password Management Checklist
- Backup and Restore Checklist
- Network Upgrade Checklist
- Server Backup Checklist
- Business Continuity Plan Checklist
- Problem Management Checklist
- Server Decommissioning Checklist
- Cloud Monitoring Checklist
- Hardware Inventory Checklist
- IT Regulatory Compliance Review
- Release Management Checklist
- Server Maintenance Checklist
- Rollback Plan Checklist
- Customer Support Ticket Workflow
- Software Upgrade Checklist
- Quarterly Compliance Reporting Checklist
- Patch Management Checklist
- Hardware Maintenance Checklist
- Server Security Checklist
- IT Emergency Response Checklist
- Incident Management Checklist
- Disaster Recovery Plan Checklist
- User Role Management Checklist
- Software Installation Checklist
- Compliance Audit Checklist
- Access Control Checklist
- Cloud Cost Management Checklist
- IT Staff Performance Review
- Change Management Checklist
- Firewall Configuration Checklist
- Security Audit Checklist
- Quarterly Network Security Review
- Database Migration Checklist
- Employee Onboarding Checklist
- Capacity Planning Checklist
- IT Budgeting Checklist
- Cloud Deployment Checklist
- Database Installation Checklist
- IT Service Request Checklist
- Database Security Checklist
- System Monitoring Checklist
- Hardware Troubleshooting Checklist
- IT Strategy Checklist
- Patch Deployment Checklist
- Hardware Upgrade Checklist
- Performance Tuning Checklist
- Application Performance Monitoring Checklist
- Employee Training Checklist
- User Onboarding Checklist
- IT Vendor Management Checklist
- Server Build and Hardening Checklist
- IT Policy Review Checklist
- Help Desk Ticket Handling Checklist
- Infrastructure as Code Checklist
- Hardware Disposal Checklist
- IT Resource Allocation Checklist
- Incident Response Checklist
- Network Troubleshooting Checklist
- User Offboarding Checklist
Ready to take control of your recurring tasks?
Start Free 14-Day TrialUse Slack? Sign up with one click
