Containerization Rollout Checklist

Steps a sysadmin or infrastructure engineer runs to stand up a production-ready container platform — runtime install, registry setup, image hardening, RBAC, resource governance, and persistent storage with backup discipline.

4 sections 22 steps Collects data

Runtime and Registry Setup

Confirm host kernel and OS support
- Verify kernel version (4.x+ for cgroups v2), confirm overlay2 storage driver is available, and check that SELinux/AppArmor profiles are compatible. RHEL 7 and Ubuntu 18.04 hosts are common gotchas — they ship with cgroups v1 and need explicit boot-parameter changes.
Collects text Collects list
Install the container runtime
- Install via the vendor repo (Docker CE, containerd from Kubernetes apt/yum repo) — never the distro default, which lags by 1-2 minor versions. Pin the version explicitly so unattended-upgrades don't bump the runtime mid-week.
Harden the runtime daemon configuration
- Apply CIS Docker Benchmark settings: disable legacy registry v1, enable user namespace remapping, set live-restore, configure log rotation on the json-file driver (default is unbounded — fills /var fast on busy hosts).
Connect to the container registry
- Configure a registry — Harbor self-hosted, ECR, ACR, GAR, or Artifactory. Set up pull-through cache for Docker Hub to dodge rate-limit outages. Store registry credentials in the orchestrator's secret store, not in /root/.docker/config.json.
Run a smoke-test container
- Pull and run a known-good image (alpine, hello-world, or your team's canary) to confirm DNS resolution, registry auth, and outbound network egress all work end-to-end before any real workloads land.

Image Security and Compliance

Scan base images for CVEs
- Run Trivy, Grype, or Snyk against the base images you plan to allow. Block any image with unpatched critical CVEs from the registry. Distroless and Chainguard images dramatically reduce the attack surface vs. ubuntu:latest.
Collects file Collects list
Document the CVE exception
- If accepting a critical CVE, file the exception with named owner, compensating control, and review date. Exceptions without expiry become permanent — set a 90-day re-review on the calendar and tag the exception in the registry.
Enable image signing and verification
- Sign images with Cosign or Notation (Notary v2). Configure the orchestrator's admission controller (Kyverno, OPA Gatekeeper, or Connaisseur) to reject unsigned images. Don't ship the signing key on builder nodes — use a KMS-backed key.
Apply RBAC for cluster operations
- Map roles to AD/Entra ID groups via SSO — never local kubeconfig accounts for humans. Grant cluster-admin only via break-glass account stored in PAM. Developers get edit on their namespace, not view at cluster scope.
Define network policies between namespaces
- Default-deny ingress and egress per namespace, then allow only required paths. Calico, Cilium, or the built-in NetworkPolicy work; without them, a flat pod network lets a compromised container reach every other workload.
Forward container logs to the SIEM
- Ship container stdout/stderr and Kubernetes audit logs to Splunk, Sentinel, or Elastic via Fluent Bit or Vector. Audit logs are required for SOC 2 CC7.2 and the only way to forensically reconstruct a kubectl exec incident.

Resource Governance

Set CPU and memory requests and limits
- Every pod gets requests + limits. No-limit pods can OOM the node; identical-request-and-limit pods land in the Guaranteed QoS class and survive node pressure. Use VPA recommendations as a starting point, not as truth.
Configure horizontal pod autoscaling
- Set HPA on CPU or custom metrics via the metrics-server or Prometheus Adapter. Pair with cluster-autoscaler or Karpenter so pod scale-up actually gets nodes. Tune scale-down stabilization window — aggressive defaults cause flapping under bursty load.
Enforce namespace ResourceQuotas
- ResourceQuota caps total CPU/memory/storage per namespace; LimitRange sets per-pod defaults. Without these, one team's runaway CronJob can starve the cluster. Quota a dev namespace tighter than prod to encourage right-sizing.
Wire Prometheus and Grafana dashboards
- Deploy kube-prometheus-stack with the standard kubernetes-mixin dashboards. Alert on node memory pressure, persistent OOMKilled events, and throttling > 25%. Page on cluster-level signals; ticket on namespace-level signals.
Trim image size with multi-stage builds
- Multi-stage builds drop compilers and build artifacts from the runtime image — a Go service goes from 800MB to 20MB. Smaller images mean faster pulls, faster pod startup, and a smaller CVE attack surface.

Persistent Storage and Backup

Provision storage classes via the CSI driver
- Configure a CSI driver — EBS, Azure Disk, GCP PD, Longhorn, or Rook-Ceph. Define StorageClasses with reclaimPolicy=Retain for prod and Delete for dev. WaitForFirstConsumer binding mode prevents zone-mismatch errors on multi-AZ clusters.
Confirm backups follow 3-2-1 with immutability
- Velero or Kasten K10 for cluster-state + PV snapshots, replicated to an object-locked S3 bucket in a separate AWS account. Backup that's writable from the production cluster is not ransomware-resilient — that's the lesson from every 2023-2024 K8s ransomware case.
Collects list
Remediate the backup gap
- If immutable offsite copy is not in place, do not promote the cluster to production. Stand up a Velero target with S3 Object Lock in compliance mode, or contract MSP360 / Kasten / Veeam Kasten as a managed alternative. Re-run the prior verification before proceeding.
Mount secrets via a Secrets Store CSI driver
- Pull secrets from Vault, AWS Secrets Manager, or Azure Key Vault via the Secrets Store CSI driver. Plain Kubernetes Secrets are base64, not encrypted — anyone with namespace get-secret can read them. Never bake secrets into the image.
Enable encryption at rest for etcd and PVs
- Enable EncryptionConfiguration on the API server (KMS provider via Vault or cloud KMS) so etcd-stored secrets are encrypted. Enable volume-level encryption on the underlying storage. Required for HIPAA, PCI DSS, and most SOC 2 controls.
Run a quarterly restore drill
- Restore a representative PV and namespace into an isolated test cluster. Time the restore against your stated RTO. Backup that's never restored is a backup that doesn't work — and the first restore attempt is always where the credential rotation, format change, or missing key is discovered.
Collects list Collects paragraph

Use this template

Copy it to your account, customize the steps, and run it with your team in minutes.

Use this workflow Start free trial

Sections 4

Steps 22

Category Systems Administration

Price Free to start

Need a different process

Browse hundreds of free templates across every team and industry.

Back to template library

Related templates

More workflows your team can run.

Systems Administration

Run Containerization Rollout Checklist with your team

Customize the steps, assign roles, set a schedule, and keep a complete record for every run.

Use this workflow Start free trial

Containerization Rollout Checklist

Runtime and Registry Setup

Image Security and Compliance

Resource Governance

Persistent Storage and Backup

Use this template

Related templates

Cloud Outage Response

Network Troubleshooting Checklist

User Offboarding Checklist

IT Resource Allocation Checklist

Run Containerization Rollout Checklist with your team