Network Maintenance Checklist

Pre-Maintenance Preparation

    Pull the open RFCs in ConnectWise / ServiceNow / Jira Service Management and confirm which changes are bundled into this maintenance window. Blast radius matters more than ticket size — even one-line firewall ACL changes get CAB review.

    Post the window in #it-announcements and notify any client tenants on the affected SD-WAN circuits. Avoid month-end and end-of-quarter freezes.

    Export running-config from firewalls (Palo Alto, Fortinet, pfSense), core switches, and edge routers to the config archive. This is the rollback point if a patch wedges a device.

Patching and Firmware

    Pull this month's queue from Tenable / Qualys / Rapid7 and cross-reference with the CISA KEV list and EPSS scores. Don't just sort by CVSS — actively-exploited mid-severity CVEs jump the line over high-severity ones with no exploit in the wild.

    Update firewall, switch, AP, and SD-WAN firmware in the maintenance window. Apply to a non-prod stack first when available; check vendor advisories for HA-pair upgrade order (Palo Alto active/passive sequence is a common gotcha).

    Trigger the patch ring in NinjaOne / Action1 / Automox / Intune. Pilot ring first (5-10% of fleet), then broad rollout 24-48 hours later if no regressions.

    Pull compliance reports from the RMM and EDR consoles. Anything below 95% deployment after 48 hours triggers manual remediation.

    Walk the laggard list: stuck reboots, offline laptops, devices on PTO. Use Intune / Jamf remote commands or have the helpdesk reach out directly.

Security and Vulnerability Review

    Authenticated scan via Tenable / Qualys against servers, network gear, and a sample of endpoints. Unauthenticated scans miss most of what matters.

    CrowdStrike / SentinelOne / Defender — clear the open detections queue, escalate anything tagged for analyst review, confirm no machines have been in isolation longer than the SLA.

    Identify any/any rules, rules with zero hits in 90 days, and rules whose ticket reference no longer maps to an active project. Stale firewall rules accumulate fast in MSP environments.

    Cross-check IdP (Okta / Entra ID) admin role assignments against the active employee roster. Confirm break-glass accounts still have hardware keys assigned and rotated credentials in the vault.

    Aggregate findings from scan, EDR, and firewall audit. A critical finding is anything on the KEV list, a confirmed compromise indicator, or a control gap that breaks SOC 2 / PCI evidence.

    Page the on-call security lead via PagerDuty / Opsgenie. Spin up the incident channel, assign IC and scribe, follow the IR runbook.

Performance and Capacity

    Compare this month's Auvik / Datadog / Grafana baselines against the previous 90-day rolling average. Flag any circuit running above 70% sustained utilization for upgrade planning.

    Check IPsec / WireGuard tunnel uptime, SD-WAN path quality (jitter, packet loss, MTU mismatches), and any flapping interfaces. Document persistent flaps for circuit provider tickets.

    DHCP scope exhaustion is a recurring source of helpdesk tickets that look like 'wifi broken.' Confirm scopes are sized with at least 20% headroom and DNS resolvers are responding under 50ms.

    Note circuits, scopes, or appliances projected to need upgrade within the next two quarters. Feed into the budget planning cycle.

Backup and Recovery Verification

    Pull the Veeam / Datto / Rubrik / AWS Backup job report. Backups existing is not the same as backups working — confirm 3-2-1 (three copies, two media, one offsite) is intact and immutable copies haven't been tampered with.

    Pick a random server or shared-drive folder and actually restore it to a sandbox. The backup-test cadence is what separates a backup from a recovery. Document RPO and RTO actuals against targets.

    A failed restore is itself a finding. Page the backup admin, document the failure mode, and schedule a follow-up restore within 5 business days.

Documentation and Sign-Off

    Reflect any firmware versions, IP/VLAN changes, firewall rule edits, and new device additions. Out-of-date IT Glue records show up as audit findings during SOC 2 evidence collection.

    The IT lead reviews the run summary, confirms no outstanding criticals, and signs off. This is the artifact that satisfies the change-management evidence requirement for the period.