Software Project Risk Management Checklist

Quarterly risk-management cycle for a software engineering team — identify, score, and mitigate technical, security, vendor, and schedule risks across the project portfolio. Run by an engineering manager or technical program manager with...

Use this workflow

Risk Identification Kickoff

Schedule the pre-mortem workshop
- Block 90 minutes with tech leads, SRE on-call, AppSec, and the product manager. Pre-mortem framing: "It's six months from now and the project failed — what went wrong?" Async brainstorming in a shared doc 24 hours ahead surfaces more than a cold-start meeting.
Pull contributing factors from past PIRs
- Review the last 4 quarters of post-incident reviews in Confluence/Notion. Extract recurring contributing factors — flaky CI, untested rollbacks, certificate expiry, unowned services. Recurring factors are the strongest signal for risks worth registering.
Collects paragraph
Inventory third-party dependencies and SaaS vendors
- Generate the SBOM (Syft, Trivy, or your registry's built-in) and list paid vendors from the procurement system. Watch for transitive critical-CVE dependencies (think Log4Shell-class), single-vendor lock-in (auth provider, payments), and packages without a maintained upstream.
Capture risks raised by on-call engineers
- On-call sees the rough edges first — alert noise, runbooks that don't match reality, services with a single SME. Ask the last two rotations directly; don't rely on tickets alone.

Risk Analysis and Scoring

Score each risk on probability and impact
- Use a 1–5 × 1–5 matrix for probability × impact. Impact dimensions: customer-facing downtime, data exposure, revenue, and engineering toil. Anything scoring 15+ goes on the top-tier list and needs a named owner this cycle.
Classify the project's regulatory scope
- Confirm what regulated data the in-scope services touch. PHI pulls in HIPAA + BAA review; cardholder data pulls in PCI scope; EU resident data pulls in GDPR sub-processor obligations. Misclassification here is the most common reason auditors find a control gap later.
Collects list
Log entries in the risk register
- Single source of truth — Jira, Linear, or a Notion table linked from the engineering wiki. Each entry gets: ID, description, category (technical / security / vendor / schedule / compliance), score, owner, mitigation, status. Avoid private spreadsheets; auditors and successors won't find them.
Collects file Collects number
Run the SOC 2 / HIPAA / PCI control mapping review
- Map each compliance-relevant risk to the affected control (CC6.x for access, CC7.x for monitoring, CC8.x for change management under SOC 2). Loop in the compliance lead or your Vanta/Drata/Secureframe owner to confirm the gap is registered and an evidence task exists.

Mitigation Planning

Assign a named owner to each top-tier risk
- One human per risk, not a team. The owner drives the mitigation plan, reports status at the monthly review, and closes the entry. Rotate ownership when people change roles — orphaned risks are how a tracked gap becomes a Sev1.
Draft mitigation plans for top-tier risks
- Each plan needs: concrete engineering work (linked tickets), a target completion date, and the residual-risk score after mitigation. Vague mitigations ("improve observability") don't ship; "add SLO burn-rate alert on checkout-service p99" does.
Define rollback triggers and kill-switch flags
- For each release-related risk, document the trigger condition (error rate > X%, p99 > Y ms, customer support tickets > Z/hr) and the operator action (flip the LaunchDarkly flag, redeploy previous container tag, run the rollback migration). The PagerDuty runbook link goes here too.
Confirm residual risk is within appetite
- After applying mitigations, re-score each top-tier risk. Anything still scoring 15+ is residual exposure leadership needs to accept explicitly — it doesn't go away because you wrote a plan.
Collects list

Monitoring and Control

Wire risk indicators into Datadog or Grafana
- If the risk has a leading indicator (Dependabot critical-CVE count, certificate days-to-expiry, p99 latency budget burn), it goes on a dashboard with an alert routing to the risk owner — not a deprecated #alerts channel. "Backup nightly green for 18 months" without a restore test is not monitoring.
Hold the monthly risk register review
- 30 minutes, calendar-recurring. Owners report status on their entries, retire mitigated risks, add new ones surfaced since last cycle. Skipping the review is how registers become museum pieces.
Re-test rollback and restore procedures
- Quarterly drill into a non-prod environment: restore the latest backup, redeploy the previous container tag, run the down migration. The first restore attempt usually fails on a rotated credential or a missing IAM permission — finding that during a drill is the point.

Stakeholder Communication

Brief the CTO on accepted residual risk
- For any risk still rated High or Critical after mitigation, schedule a 15-minute briefing with engineering leadership. Capture the explicit accept/reject decision in the register so it's defensible at the next audit walkthrough.
Post the risk summary in #engineering
- One Slack post per cycle: top three risks, owners, target dates, and a link to the register. Async visibility prevents "nobody told me" surprises during release weeks.
Hold the quarterly risk retrospective
- Look back at the cycle: which risks materialized despite mitigation, which we missed entirely, and which controls actually held. Feed the answers into next quarter's identification step — that's how risk management compounds instead of resetting.