Network Troubleshooting Checklist

Ticket Intake and Scope

    Record what the user sees — "no internet," "slow Teams calls," "shared drive unreachable" — plus when it started and what changed recently (patch window, ISP work, office move). Vague tickets like "network is down" almost always narrow to one app, one subnet, or one VLAN once you ask.

    Scope drives the next move. A single user is an endpoint problem; one VLAN or floor points at a switch or AP; a site-wide outage points at the firewall, ISP, or DNS. Confirm by asking a second user on the same subnet, or check PRTG / Auvik / Meraki dashboard for affected devices.

    Site-wide or multi-site impact escalates to P1 — page the on-call engineer via PagerDuty / Opsgenie, post in the NOC channel, and start the incident timeline. Single-user issues stay at standard helpdesk priority.

Physical and Endpoint Checks

    Confirm the patch cable is seated at both ends, the switchport link LED is amber/green, and PoE devices are drawing power. A surprising fraction of "network down" tickets are a kicked cable or a tripped PoE budget on an aging switch.

    Run ipconfig /all (Windows) or ifconfig / ip addr (macOS/Linux). An APIPA address (169.254.x.x) means DHCP failed — skip ahead to the DHCP section. A correct lease with wrong DNS points at scope options or a static override on the NIC.

    Ping the gateway first to isolate LAN vs. WAN. If the gateway responds but 8.8.8.8 doesn't, the problem is upstream of the firewall (ISP, WAN circuit). If the gateway itself doesn't respond, it's a LAN-side switch, VLAN, or cable issue.

    Use tracert (Windows) or traceroute (macOS/Linux) to the destination the user can't reach. The hop where latency spikes or replies stop is your suspect. Asymmetric routing or an MPLS handoff is a common gotcha at the ISP boundary.

Network Device Diagnosis

    SSH into the upstream switch (Meraki dashboard, Cisco IOS show logging, FortiGate diagnose, Aruba show log). Look for err-disable, STP topology change, port flap, or duplex mismatch entries within the incident window.

    show interface status and show interface counters errors on Cisco; equivalent on your platform. CRC errors point at a bad cable or NIC; input drops point at a microburst or saturated uplink; err-disabled ports usually mean a port-security violation.

    Confirm the access port is in the right VLAN and the upstream trunk carries it (show interface trunk). A native-VLAN mismatch across a trunk is a classic STP and broadcast-loop trigger.

    Check show ip route for the destination prefix and confirm the active gateway peer (HSRP/VRRP) is the one you expect. A failover that didn't fail back is a common cause of intermittent connectivity after a maintenance window.

DNS and DHCP

    Run nslookup or dig against the internal DNS server (DC, Windows Server DNS, Bind) and against an external resolver (1.1.1.1, 8.8.8.8, Quad9). If internal works but external fails, check forwarders; if external works but internal fails, the DC's DNS service is the suspect.

    Open the DHCP console (Windows Server DHCP, ISC Kea, Meraki, FortiGate) and check the scope. A scope at 100% utilization gives new clients APIPA addresses and looks identical to a "network down" report. Expand the scope or shorten the lease as a temporary fix; investigate the device-count spike afterward.

    Check Event Viewer (DHCP-Server, DNS-Server channels) or the equivalent on your platform for repeated NACKs, scope-exhausted entries, or zone-transfer failures. Cross-reference timestamps with the user's report.

Wireless Diagnosis

    Have the user plug into a wired port (or test with a known-good wired endpoint nearby). If wired works and wireless doesn't, the problem is the AP, SSID, or RF environment — not the upstream network.

    In the Meraki / UniFi / Aruba / Mist dashboard, confirm the nearest AP is online, on the right firmware, and not stuck with 60+ clients on a single radio. A single overloaded AP is the most common "wifi is slow" cause in conference rooms.

    Use the controller's RF spectrum view or a tool like Ekahau / NetSpot to confirm signal strength at the affected location is above -67 dBm and the channel isn't being clobbered by a neighbor or rogue AP. 2.4 GHz channel overlap is the usual culprit in dense offices.

    Confirm the SSID is broadcast on the correct AP group, and for 802.1X SSIDs, test a RADIUS auth from the controller against NPS / ClearPass / Cisco ISE. An expired RADIUS shared secret or a cert renewal that didn't propagate is a classic post-maintenance failure.

Resolution and Documentation

    Don't close on "should be working now." Have the original reporter reproduce their workflow — the Teams call, the file open, the SaaS login — and confirm it succeeds. Restored ping is not the same as restored business function.

    Write the ticket close-out in IT Glue / Hudu / Confluence with the symptom, scope, root cause, and the exact fix command or config change. Future-you (or the next on-call) will search this in six months when it recurs.

    If the incident was site-wide or multi-site, hold a 30-minute blameless review within 48 hours. Capture preventive actions — monitoring gap, runbook gap, config drift — as tickets, not as wishes in meeting notes.

Use this template in Manifestly

Start a Free 14 Day Trial
Use Slack? Start your trial with one click

Related Systems Administration Checklists

Ready to take control of your recurring tasks?

Start Free 14-Day Trial


Use Slack? Sign up with one click

With Slack