CI/CD Pipeline Checklist

Source Code Management

    Reconcile the GitHub / GitLab / Bitbucket org membership against the active-employee list from Entra ID or Okta. Stale collaborator access is the most common audit finding — anyone who has left the team in the last quarter should already be removed.

    Require pull requests, at least one approving review, passing status checks, and a linear history. Block force-push and direct commits to main. For SOX-relevant repos, require two reviewers and disable admin bypass.

    GPG or Sigstore-signed commits prove provenance and survive a compromised personal token. Configure the branch protection rule to reject unsigned commits on release/* and main.

    Confirm that infra/, deploy/, and any IaC directories have explicit code owners assigned. Missing CODEOWNERS lines mean any approver can rubber-stamp changes to production-shaping files.

    Check that BackHub, GitProtect, or your scripted mirror to S3 / Azure Blob has a successful run within the last 24 hours. The hosted Git provider being your only copy is a single point of failure that has bitten teams during outages.

Build Automation

    Reference the runner / agent image by SHA256 digest, not :latest. Floating tags break reproducibility and let an upstream image change silently alter your build outputs.

    Configure cache keyed on the lockfile hash (package-lock.json, yarn.lock, Pipfile.lock, go.sum). Cold builds blocking the pipeline for 8 minutes per run is a common waste; a hit/miss metric on the cache step tells you when key drift breaks it.

    SonarQube, Semgrep, CodeQL, or Snyk Code — pick one and gate the build on its result. Treat new High/Critical findings as build-breaking; suppress with documented exception comments, never silently.

    Use Cosign / Sigstore to sign the container image and generate an SLSA provenance attestation. Downstream admission controllers (Kyverno, OPA Gatekeeper) can then refuse to deploy unsigned images.

    Push to your trusted registry (Artifactory, Nexus, ECR, GHCR) tagged with the commit SHA and semantic version. Never deploy from public Docker Hub directly — your supply chain runs through the registry you control.

Test Gates

    Coverage gate at the level your team has actually agreed to (commonly 70-80% line coverage). Going from 65% to 80% overnight just adds noise — pick the floor that matches your current reality and ratchet up.

    Spin up a Testcontainers / Docker Compose / k8s namespace stack with real dependencies (Postgres, Redis, message broker). Mocks-only integration tests catch nothing the unit tests didn't already.

    Playwright or Cypress against staging, exercising the top 5 critical user paths (login, primary transaction, checkout, search, billing). Keep the suite small — flaky 200-test E2E suites are the #1 reason teams disable test gates.

    Upload screenshots, HAR files, server logs, and DB dumps from failed runs to S3 or the CI artifact store with 30-day retention. Triaging a flake without these is guesswork.

    Open a Jira / Linear ticket linked to the failing run, tag the code owner of the affected path, and set the pipeline run to blocked. Do not retry-until-green — that's how flaky tests train teams to ignore real regressions.

Deployment

    File a normal or standard change in ServiceNow / Jira Service Management with rollback plan, blast-radius assessment, and CAB approval if required. For pre-approved standard changes, link the CR template ID.

    ArgoCD / Flux / Spinnaker / Octopus — whichever the platform team standardized on. Confirm staging matches production topology (same node sizes, same DB engine, same networking) so the rehearsal is meaningful.

    Start at 1-5% traffic for 10-30 minutes, then 25%, then 100%. Blue/green is fine for stateless services; canary catches gradual degradations (memory leaks, connection pool exhaustion) that flip-cutover deploys mask.

    Watch error rate, p95/p99 latency, saturation, and any custom SLOs in Datadog / Grafana / New Relic for at least 30 minutes after full rollout. Most rollback decisions surface in the first 15 minutes; the second 15 catches slower-moving issues.

    The release manager owns this call based on the bake-period telemetry. Hold means leave canary at current percentage and re-evaluate; Rollback triggers the next step; Full Release closes the change.

    Re-pin the deployment to the prior signed image digest, page the incident commander, and update the CR with the rollback timestamp. For schema-changing releases, confirm the migration is backward-compatible before rolling forward; if not, escalate to DBA before reverting.

Security and Compliance

    Trivy, Grype, or the registry's built-in scanner (ECR, Artifactory Xray). Block on Critical CVEs with a known fix; document exceptions for unfixed vulns with a re-evaluation date.

    Confirm HashiCorp Vault, AWS Secrets Manager, or Doppler is the source — no plaintext secrets in pipeline variables, env files, or container images. Run gitleaks / trufflehog as a pre-merge guard so this stays true.

    Generate the SBOM with Syft or CycloneDX and run it against your license allowlist. AGPL or SSPL dependencies sneaking into a proprietary product is a legal problem that surfaces during M&A diligence, not at build time, unless you check.

    Write the commit SHA, image digest, deployer identity, CR number, and rollout timestamps to your immutable audit log (Splunk, Datadog Audit Trail, S3 with object lock). Auditors for SOC 2 CC8.1 and SOX ITGC will ask for this exact set of fields.

    File a recurring task to recertify pipeline service-account permissions, runner secrets, and registry push rights. Long-lived tokens that nobody recertifies are the most common finding in SOC 2 access-review evidence requests.