Quarterly DevOps Security Review

Infrastructure Security

    Apply the current Patch Tuesday rollup to all DCs and ESXi/Hyper-V hosts using the three-ring deployment (test → pilot → prod over 7-14 days). Capture KB IDs and confirm reboot completed; a half-rebooted DC is the classic Monday-morning login outage.

    Pull the running config from FortiGate / Palo / Meraki and diff against the documented baseline. Flag any-any rules, expired temporary exceptions, and rules with no hit count over 90 days for cleanup.

    Confirm guest, IoT, server, and management VLANs remain isolated per the network diagram. Flat networks expand PCI scope and let a compromised printer pivot into payroll.

    Confirm conditional access policies require MFA for all users and block IMAP, POP, SMTP, and EWS basic auth org-wide. Password-spray against legacy auth is the most common MFA bypass.

    Inventory service accounts in CyberArk / Delinea / BeyondTrust. Rotate any account whose password age exceeds policy and confirm dependent services restart cleanly. The 'temporary' Domain Admin service account from 6 years ago belongs in this audit.

Application Pipeline Security

    Trigger Snyk Code, SonarQube, or Semgrep against the current main branch. Record the highest severity finding to drive the remediation gate below.

    Open a tracked issue per critical finding, assign to the owning team, and hold the next prod deploy until each is fixed or an exception is approved by the security lead with a documented mitigation.

    Use OWASP ZAP or Burp Suite Pro against staging with an authenticated session. Confirm staging mirrors prod auth flows; DAST against an unauthenticated surface misses the interesting 80%.

    Generate the SBOM via Syft or Dependency-Track and cross-reference open CVEs by CVSS score. Prioritize anything ≥ 7.0 with a known exploit in CISA KEV.

    Run synthetic attack payloads (SQLi, XSS, SSRF, path traversal) through Cloudflare / AWS WAF / F5 and confirm each is blocked and logged. False-negative on injection is the typical finding.

    Verify GitHub secret scanning + Gitleaks pre-commit hook fire on a synthetic AWS key push. If the test key reaches main, the gate is broken.

Data Protection and Backup

    Pull the BitLocker / FileVault compliance report from Intune or JAMF and confirm 100% coverage. Verify recovery keys are escrowed in Entra ID or JAMF — keys lost when the user leaves are unrecoverable.

    Run SSL Labs or testssl.sh against every public hostname. Flag any TLS 1.0/1.1, weak ciphers, or certs expiring within 60 days. ACME automation handles renewal but only for hosts you've onboarded.

    Restore a representative VM and a file-share dataset from the immutable copy (Veeam hardened repo, Datto cloud, or S3 object lock) into an isolated network. The 3-2-1 backup is only proven by an actual restore — green dashboards have lied for 18 months before.

    If the restore failed, escalate to Veeam / Datto / Rubrik support with the failed job ID and exported logs. Treat the backup chain as broken until the next successful drill.

    Pull SMB / SharePoint / Google Drive ACLs and reconcile against current role assignments. The 'Domain Users gets the project share' pattern from 5 years ago shows up here.

    Apply Microsoft Purview / Google DLP labels to PHI, PCI, and PII repositories and confirm DLP policies match. Required for HIPAA covered entities and any PCI scope.

Incident Response Readiness

    Reconcile PagerDuty / Opsgenie schedules against the IR plan's named roles (IC, comms lead, scribe, exec liaison). Departed staff in the runbook is the typical finding.

    Walk the IR team through a named scenario — file server encrypted, backups also hit, vendor under DDoS during recovery. Capture decision points where the runbook was unclear.

    Test the Signal / dedicated WhatsApp / SMS bridge for the IR team. If your primary comms is the same M365 tenant under attack, you have no comms during a tenant-wide compromise.

    Reconcile ServiceNow / Hudu / IT Glue asset records with current business criticality tiers. Drives recovery prioritization during a multi-system incident.

    For each gap surfaced in the tabletop, add a named runbook step or contact, version the document, and circulate to the IR team. Gaps not written down are gaps that recur.

Monitoring and SIEM Hygiene

    Confirm Splunk / Sentinel / Elastic is receiving events from DCs, firewalls, EDR (CrowdStrike / SentinelOne), M365 unified audit, and SaaS apps. A silent source for 30 days means the dashboard is wrong.

    Pull the top 10 noisiest rules and the alerts the SOC closed as benign. Tune thresholds, add allow-lists for known scanners, and retire rules that haven't fired a true positive in 12 months.

    Reconcile retention windows against the binding standards (PCI: 1 year hot + accessible, HIPAA: 6 years, SOC 2: per policy). Cold-storage tier is fine; deleted is not.

    Confirm S3 Object Lock or Azure Blob immutability is set on the log archive bucket and that the SIEM service account cannot delete. Ransomware that erases logs erases your investigation.

    Walk the security lead through the rolling weekly anomaly digest — impossible-travel logins, after-hours admin actions, mass-download events. Patterns at quarterly cadence catch slow-burn compromise that weekly review misses.