Containerization Rollout Checklist
Runtime and Registry Setup
Verify kernel version (4.x+ for cgroups v2), confirm overlay2 storage driver is available, and check that SELinux/AppArmor profiles are compatible. RHEL 7 and Ubuntu 18.04 hosts are common gotchas — they ship with cgroups v1 and need explicit boot-parameter changes.
Install via the vendor repo (Docker CE, containerd from Kubernetes apt/yum repo) — never the distro default, which lags by 1-2 minor versions. Pin the version explicitly so unattended-upgrades don't bump the runtime mid-week.
Apply CIS Docker Benchmark settings: disable legacy registry v1, enable user namespace remapping, set live-restore, configure log rotation on the json-file driver (default is unbounded — fills /var fast on busy hosts).
Configure a registry — Harbor self-hosted, ECR, ACR, GAR, or Artifactory. Set up pull-through cache for Docker Hub to dodge rate-limit outages. Store registry credentials in the orchestrator's secret store, not in /root/.docker/config.json.
Pull and run a known-good image (alpine, hello-world, or your team's canary) to confirm DNS resolution, registry auth, and outbound network egress all work end-to-end before any real workloads land.
Image Security and Compliance
Run Trivy, Grype, or Snyk against the base images you plan to allow. Block any image with unpatched critical CVEs from the registry. Distroless and Chainguard images dramatically reduce the attack surface vs. ubuntu:latest.
If accepting a critical CVE, file the exception with named owner, compensating control, and review date. Exceptions without expiry become permanent — set a 90-day re-review on the calendar and tag the exception in the registry.
Sign images with Cosign or Notation (Notary v2). Configure the orchestrator's admission controller (Kyverno, OPA Gatekeeper, or Connaisseur) to reject unsigned images. Don't ship the signing key on builder nodes — use a KMS-backed key.
Map roles to AD/Entra ID groups via SSO — never local kubeconfig accounts for humans. Grant cluster-admin only via break-glass account stored in PAM. Developers get edit on their namespace, not view at cluster scope.
Default-deny ingress and egress per namespace, then allow only required paths. Calico, Cilium, or the built-in NetworkPolicy work; without them, a flat pod network lets a compromised container reach every other workload.
Ship container stdout/stderr and Kubernetes audit logs to Splunk, Sentinel, or Elastic via Fluent Bit or Vector. Audit logs are required for SOC 2 CC7.2 and the only way to forensically reconstruct a kubectl exec incident.
Resource Governance
Every pod gets requests + limits. No-limit pods can OOM the node; identical-request-and-limit pods land in the Guaranteed QoS class and survive node pressure. Use VPA recommendations as a starting point, not as truth.
Set HPA on CPU or custom metrics via the metrics-server or Prometheus Adapter. Pair with cluster-autoscaler or Karpenter so pod scale-up actually gets nodes. Tune scale-down stabilization window — aggressive defaults cause flapping under bursty load.
ResourceQuota caps total CPU/memory/storage per namespace; LimitRange sets per-pod defaults. Without these, one team's runaway CronJob can starve the cluster. Quota a dev namespace tighter than prod to encourage right-sizing.
Deploy kube-prometheus-stack with the standard kubernetes-mixin dashboards. Alert on node memory pressure, persistent OOMKilled events, and throttling > 25%. Page on cluster-level signals; ticket on namespace-level signals.
Multi-stage builds drop compilers and build artifacts from the runtime image — a Go service goes from 800MB to 20MB. Smaller images mean faster pulls, faster pod startup, and a smaller CVE attack surface.
Persistent Storage and Backup
Configure a CSI driver — EBS, Azure Disk, GCP PD, Longhorn, or Rook-Ceph. Define StorageClasses with reclaimPolicy=Retain for prod and Delete for dev. WaitForFirstConsumer binding mode prevents zone-mismatch errors on multi-AZ clusters.
Velero or Kasten K10 for cluster-state + PV snapshots, replicated to an object-locked S3 bucket in a separate AWS account. Backup that's writable from the production cluster is not ransomware-resilient — that's the lesson from every 2023-2024 K8s ransomware case.
If immutable offsite copy is not in place, do not promote the cluster to production. Stand up a Velero target with S3 Object Lock in compliance mode, or contract MSP360 / Kasten / Veeam Kasten as a managed alternative. Re-run the prior verification before proceeding.
Pull secrets from Vault, AWS Secrets Manager, or Azure Key Vault via the Secrets Store CSI driver. Plain Kubernetes Secrets are base64, not encrypted — anyone with namespace get-secret can read them. Never bake secrets into the image.
Enable EncryptionConfiguration on the API server (KMS provider via Vault or cloud KMS) so etcd-stored secrets are encrypted. Enable volume-level encryption on the underlying storage. Required for HIPAA, PCI DSS, and most SOC 2 controls.
Restore a representative PV and namespace into an isolated test cluster. Time the restore against your stated RTO. Backup that's never restored is a backup that doesn't work — and the first restore attempt is always where the credential rotation, format change, or missing key is discovered.
Use this template in Manifestly
- User Offboarding Checklist
- Application Performance Monitoring Checklist
- User Onboarding Checklist
- Employee Training Checklist
- Hardware Upgrade Checklist
- Network Troubleshooting Checklist
- IT Strategy Checklist
- Hardware Troubleshooting Checklist
- Performance Tuning Checklist
- Patch Deployment Checklist
- IT Policy Review Checklist
- Database Security Checklist
- System Monitoring Checklist
- Software Installation Checklist
- Disaster Recovery Plan Checklist
- Patch Management Checklist
- Customer Support Ticket Workflow
- User Access Review Checklist
- Software Upgrade Checklist
- Cloud Monitoring Checklist
- Server Maintenance Checklist
- Business Continuity Plan Checklist
- Rollback Plan Checklist
- Password Management Checklist
- Server Decommissioning Checklist
- Network Upgrade Checklist
- Backup and Restore Checklist
- Server Backup Checklist
- IT Resource Allocation Checklist
- Incident Response Checklist
- Infrastructure as Code Checklist
- Hardware Disposal Checklist
- Database Backup Checklist
- Cloud Security Checklist
- Cloud Migration Checklist
- IT Service Request Checklist
- Network Monitoring Checklist
- Cloud Deployment Checklist
- IT Budgeting Checklist
- Database Installation Checklist
- Capacity Planning Checklist
- Security Audit Checklist
- Cloud Cost Management Checklist
- Database Migration Checklist
- Firewall Configuration Checklist
- Quarterly Network Security Review
- Change Management Checklist
- User Role Management Checklist
- IT Staff Performance Review
- Server Security Checklist
- Employee Onboarding Checklist
- Quarterly Compliance Reporting Checklist
- Access Control Checklist
- Incident Management Checklist
- Compliance Audit Checklist
- IT Emergency Response Checklist
- Hardware Maintenance Checklist
- Server Build and Hardening Checklist
- IT Regulatory Compliance Review
- Help Desk Ticket Handling Checklist
- Release Management Checklist
- Data Recovery Checklist
- Problem Management Checklist
- Hardware Inventory Checklist
- IT Vendor Management Checklist
Ready to take control of your recurring tasks?
Start Free 14-Day TrialUse Slack? Sign up with one click
