Incident Response Checklist
Steps the on-call incident commander and IR team run when a security or availability incident is declared, from identification through containment to postmortem. Aligned to NIST SP 800-61 phases.
Detection and Triage
-
Open the incident in PagerDuty
The on-call engineer creates the incident in PagerDuty (or Opsgenie / FireHydrant), assigns an incident commander, and opens a dedicated Slack channel using the #inc-YYYYMMDD-name convention. Do not investigate in the alerting channel — noise drowns the timeline.
-
Classify severity (P1-P4)
Use the severity matrix: P1 = customer-impacting outage or confirmed data exposure; P2 = major degradation or suspected breach; P3 = limited impact with workaround; P4 = no customer impact. Severity drives paging, comms cadence, and exec notification — err high; you can downgrade later.
Collects list -
Validate the alert is not a false positive
Cross-check the originating signal against a second source — EDR alert against SIEM logs, monitoring against synthetic checks, user report against access logs. Tuned-out detection rules and stale dashboards are common false-positive sources.
-
Determine if this is a security incident
Security incidents (unauthorized access, malware, data exposure) trigger additional legal, regulatory, and forensic obligations — chain of custody, breach-notification clocks, possible law-enforcement coordination. Operational incidents (outage, capacity, deploy regression) do not.
Collects list
Investigation and Scoping
-
Assign incident commander, scribe, and comms lead
The IC drives decisions and is not hands-on-keyboard. The scribe maintains the running timeline in Slack with timestamps. The comms lead handles status-page updates and stakeholder notifications. On a P1, all three roles must be filled by separate people.
-
Preserve volatile evidence before remediation
Snapshot affected EC2 / Azure VMs before terminating. Capture memory dumps from EDR (CrowdStrike RTR, SentinelOne) where supported. Export relevant SIEM queries with time ranges locked. Once you reboot or wipe, volatile evidence is gone — and so is the chain of custody for any later legal action.
Collects file -
Pull SIEM and EDR logs for the incident window
Query Splunk / Datadog / Sumo for authentication events, network flows, and process executions across the suspected window. Pull EDR detections, IdP sign-in logs (Okta / Entra ID), and CloudTrail / Azure Activity Log entries. Default cloud retention often falls below SOC 2 / PCI minimums — pull now, archive to cold storage.
-
Determine the scope of impacted systems and data
Enumerate every host, identity, SaaS tenant, and data store touched by the incident. For credential compromise, check sign-in logs across every IdP-connected app for the affected accounts. Scope drives breach-notification thresholds — undercounting here causes regulatory exposure later.
-
Build the incident timeline
From first malicious action through detection through current state. Use UTC throughout to avoid timezone confusion across the team. The scribe maintains this in real time; the IC validates it during handoff between shifts.
Containment
-
Isolate affected hosts via EDR network containment
Use CrowdStrike network containment, SentinelOne disconnect, or Defender for Endpoint isolation rather than pulling cables — the agent stays online for forensics while blocking lateral movement. For cloud workloads, modify the security group to deny all egress except to your forensic jump host.
-
Revoke compromised credentials and active sessions
In the IdP, force sign-out and reset for every implicated account. Rotate API tokens, OAuth grants, and SSH keys associated with the identity. For service accounts, rotate the secret in Vault / Secrets Manager and redeploy. SMS / email MFA is bypassable via SIM swap and phishing — escalate affected users to FIDO2 / passkey before re-enabling.
-
Block IOCs at the firewall and DNS layer
Push known-bad IPs, domains, and file hashes to the NGFW (Palo Alto, Fortinet), DNS filter (Umbrella, NextDNS), and EDR custom-IOC list. Cross-reference IOCs against the CISA KEV catalog and recent threat-intel feeds before declaring the IOC list complete.
Eradication and Recovery
-
Remove malware and persistence mechanisms
For confirmed compromise, rebuild from a known-good image rather than cleaning in place — scheduled tasks, registry run keys, cron jobs, and rogue IAM roles are easy to miss. Validate the gold image predates the initial compromise based on the timeline.
-
Patch the exploited vulnerability
Identify the CVE or misconfiguration that enabled initial access. Push the patch through your patch management tool (Action1, Automox, Intune) to all hosts running the affected version, not just the compromised one. Re-scan with Tenable / Qualys / Wiz to confirm closure.
-
Restore impacted services from clean backups
Restore from immutable backups (Veeam hardened repos, AWS Backup vault lock, Rubrik). Verify the restore point predates compromise based on your timeline — restoring from a backup taken after initial access reintroduces the foothold. Validate RPO and RTO against SLA commitments.
-
Verify containment is complete before bringing systems online
Run a fresh EDR scan, re-query SIEM for any IOC matches in the last 24 hours, and confirm no anomalous outbound traffic from the rebuilt hosts. The IC signs off before the comms lead announces resolution.
Collects list
Notification and Regulatory Reporting
-
Engage legal and privacy counsel
For confirmed security incidents, loop in legal before drafting external communications — attorney-client privilege over the investigation depends on counsel directing the work. Privacy counsel evaluates breach-notification triggers under GDPR Article 33, HIPAA, and applicable state laws.
-
File regulatory breach notifications within required windows
GDPR requires notification to the supervisory authority within 72 hours of awareness. State breach-notification statutes range from 30-90 days, with varying recipient lists (AG, credit bureaus, affected residents). HIPAA breach of >500 individuals requires HHS notification within 60 days. Track each filing separately with submission timestamps.
Collects paragraph -
Notify impacted customers and partners
Comms lead sends notifications per the templates approved by legal. Update the public status page (Statuspage, Instatus) and customer success / account managers' talking points. Avoid speculation about cause until forensics is complete; correcting public statements later is worse than initial silence.
Postmortem and Improvement
-
Schedule the blameless postmortem
Hold the postmortem within 5 business days while the timeline is fresh. Invite the IR team, the service owner, legal (for security incidents), and an exec sponsor. Blameless framing focuses on systems and processes, not individuals — the goal is durable fixes, not blame.
-
Document root cause, MTTD, and MTTR
Capture the technical root cause, the contributing factors, MTTD (time from first malicious action to detection), and MTTR (detection to full recovery). Trend these across incidents quarterly to know whether your detective controls are improving.
Collects file -
File action items with owners and due dates
Every action item lands in Jira / Linear with a named owner and a due date. Track completion in monthly security ops review — postmortem actions that don't ship are the most reliable predictor of the same incident recurring.
-
Update the IR runbook with lessons learned
Fold detection-rule changes, new IOCs, and process gaps into the runbook in Confluence / Notion / IT Glue. Surface the changes at the next IR tabletop so the team practices against the updated playbook before the next real incident.
Use this template
Copy it to your account, customize the steps, and run it with your team in minutes.
Browse hundreds of free templates across every team and industry.
Back to template libraryRun Incident Response Checklist with your team
Customize the steps, assign roles, set a schedule, and keep a complete record for every run.