Incident Response Checklist
Detection and Triage
Record where the signal originated — SIEM (Sentinel, Splunk, QRadar), EDR (CrowdStrike, SentinelOne, Defender for Endpoint), email security (Proofpoint, Mimecast, Defender for Office 365), end-user ticket, or third-party notification. Include the alert ID and timestamp so analysts can re-pull the raw event later.
Create the incident record in ServiceNow / Jira Service Management / Halo and spin up a dedicated Teams or Slack channel (e.g., #inc-2025-0142). Keep the channel private until classification — premature broadcast tips off the actor if they have presence in chat.
Apply the documented SEV rubric. SEV1 = active data exfiltration, ransomware encryption in progress, or executive account takeover. SEV2 = confirmed malware on a managed endpoint, unauthorized admin activity. SEV3 = single-user phishing click without credential entry or persistence. SEV4 = policy violation, no compromise.
SEV1 path: page IR lead via PagerDuty / Opsgenie, then notify the CISO, CIO, and Legal per the escalation matrix. Do not wait for business hours. For MSPs, also notify the affected client's primary contact per the MSA notification clause.
Scope and Analysis
Pivot from the initial indicator across Entra ID sign-in logs, EDR process telemetry, and firewall flow data to enumerate affected identities, endpoints, and data stores. Common pivot pattern: hash → other endpoints with same hash; sign-in IP → other accounts from same IP; mailbox rule → other mailboxes with similar rule.
Classification drives the playbook. Ransomware demands isolation + backup integrity checks before anything else. BEC / account takeover demands token revocation and mailbox rule audit. Data exfiltration demands legal hold and breach-notification clock evaluation.
Before any remediation that destroys state: capture memory image and disk image of primary affected hosts (KAPE, FTK Imager, or EDR-native acquisition), export the relevant Entra ID and M365 unified audit logs (which roll off after 90/180 days on lower tiers), and snapshot affected VMs. Maintain chain-of-custody documentation if law enforcement involvement is possible.
Maintain a running timeline with UTC timestamps: initial access, persistence, lateral movement, privilege escalation, action on objectives, and defender actions. The timeline feeds the post-incident report and any regulator submission — start it now, not later.
Containment
Use EDR network containment (CrowdStrike Network Contain, Defender Isolation, SentinelOne Disconnect) rather than physically unplugging — isolation preserves the analyst's tunnel to the host while cutting attacker access. Document each host isolated with timestamp and operator initials.
In Entra ID / Okta: disable the account, then run "Revoke sessions" — disabling alone does not invalidate existing OAuth refresh tokens. Reset the password and re-register MFA. Audit and remove any inbox forwarding rules or OAuth app grants the actor may have added for persistence.
Ransomware path only. Confirm the most recent clean restore point on the immutable / air-gapped copy (Veeam hardened repo, Datto SIRIS, S3 Object Lock). Do not touch the primary backup volume until you have validated a known-good copy exists elsewhere — actors routinely target backup infrastructure first.
Push IOC blocks to the firewall (FortiGate, Palo Alto, Meraki), DNS filter (Cisco Umbrella, DNSFilter), and EDR custom IOC list. Block by hash, domain, and IP. Note that IP blocks decay quickly — actors rotate infrastructure — so prioritize hash and domain over IP.
Required if you lack a recoverable backup, see evidence of data exfiltration, or have any SEV1 indicator. Most cyber policies require carrier notification before engaging counsel or IR firms — engaging an unapproved vendor can void coverage. Use the carrier's panel firm unless explicitly authorized otherwise.
Eradication and Recovery
Hunt for and remove scheduled tasks, services, run keys, WMI subscriptions, malicious OAuth grants, and rogue mailbox forwarding rules. Rotate any credentials cached on compromised hosts — including service accounts, krbtgt (twice, with 10+ hours between), and shared admin passwords stored in the password vault.
Wipe and re-image rather than "clean" compromised endpoints — modern malware writes to firmware, alternate data streams, and unallocated space that cleaners miss. Restore servers from validated backup taken before the earliest indicator of compromise, not from yesterday's backup which may already be poisoned.
If a known CVE was exploited, patch it across the fleet — not just the affected host. If a configuration weakness (legacy auth enabled, MFA gap, exposed RDP) enabled the incident, close it everywhere it exists. Tracking the fix back to the root cause prevents the same intrusion next month.
Run a fresh EDR scan and authenticated vulnerability scan (Nessus / Qualys / InsightVM) against rebuilt hosts. Monitor for re-infection indicators for at least 72 hours before declaring the incident closed — attackers often return through a second persistence mechanism the initial sweep missed.
Notification and Reporting
Work with Legal to evaluate notification triggers: HIPAA breach notification (60 days), GDPR Article 33 (72 hours to supervisory authority), state breach notification laws (varies — many 30 days), SEC cyber disclosure (Item 1.05, 4 business days for public companies), CMMC / DFARS (DoD within 72 hours). Document the determination even when the answer is "not required" — auditors will ask.
Send a structured update to leadership, affected business units, and (for MSPs) the client point of contact. Cover: what happened, current status, what was affected, what users need to do (password reset, retraining), and when the next update will land. Hold the technical detail for the post-incident report.
Structure: executive summary, timeline (UTC), affected scope, root cause, attacker TTPs mapped to MITRE ATT&CK, containment / eradication actions taken, dwell time, and recommendations. The report feeds SOC 2 / ISO 27001 evidence binders and any regulator submission.
Lessons Learned
Blameless review with the responders, IT leadership, and affected business owners. Focus on what the system allowed, not who made which keystroke. Drive to specific gaps: detection coverage, response time, runbook clarity, tool effectiveness.
Every lesson learned becomes a ticket with a named owner and a target completion date — not a wiki bullet that nobody reads. Common remediations: block legacy auth tenant-wide, add a SIEM detection for the missed TTP, tighten conditional access policy, schedule targeted phishing training for affected users.
Fold the new playbook steps, IOCs, and decision points back into the runbook in IT Glue / Hudu / Confluence. If responders had to figure something out from scratch during the incident, that's a runbook gap — capture it now while the memory is fresh.
Run a tabletop within 90 days using a scenario derived from this incident. The point is to verify that the runbook updates, new detections, and remediated controls would actually catch a repeat. SOC 2 and CMMC assessors look for evidence of tabletop cadence — keep the attendance list and scenario document.
