ashiq-ali/k8s-security-hardening

GitHub: ashiq-ali/k8s-security-hardening

Stars: 0 | Forks: 0

# k8s-security-hardening [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) [![Ansible](https://img.shields.io/badge/Ansible-2.14%2B-EE0000?logo=ansible)](https://ansible.com) [![Kubernetes](https://img.shields.io/badge/Kubernetes-1.28%2B-326CE5?logo=kubernetes)](https://kubernetes.io) [![CIS Benchmark](https://img.shields.io/badge/CIS-Level%201%20%2B%202-orange)](https://cisecurity.org) ## Architecture ![Architecture Diagram](https://static.pigsec.cn/wp-content/uploads/repos/2026/06/4fa1d170bb181007.svg) Ansible Control Node ├── playbooks/node-hardening.yml ──SSH──► K8s Nodes ├── playbooks/audit-rbac.yml ──K8s──► ┌─ Kubernetes Cluster ──────────────────┐ └── playbooks/network-policies.yml ──K8s──► │ kube-bench (CIS L1 + L2) │ │ Falco eBPF (syscall threat detection) │ │ RBAC Audit Job (overprivileged SAs) │ │ Pod Security Admission (PSA labels) │ │ NetworkPolicies (deny-all + allowlist)│ └───────────────────────────────────────┘ │ CIS Scan Report (HTML/JSON) RBAC Audit Report Falco Runtime Alerts → Slack ## Table of Contents - [What this covers](#what-this-covers) - [Prerequisites](#prerequisites) - [Quick Start](#quick-start) - [CIS Benchmark Hardening](#cis-benchmark-hardening) - [RBAC Audit](#rbac-audit) - [Falco Runtime Security](#falco-runtime-security) - [Pod Security Standards](#pod-security-standards) - [Network Policies](#network-policies) - [Running the CIS Scan](#running-the-cis-scan) - [Inventory Setup](#inventory-setup) - [Security Posture Checklist](#security-posture-checklist) - [Troubleshooting](#troubleshooting) ## What this covers The CKS certification covers six domains. This repo implements all of them: | CKS Domain | Implementation | |-----------|----------------| | Cluster Setup | `node-hardening.yml` — API server flags, etcd encryption, audit logging | | Cluster Hardening | `audit-rbac.yml` — overprivileged SA detection, namespace isolation | | System Hardening | `node-hardening.yml` — AppArmor profiles, seccomp, host path restrictions | | Minimise Microservice Vulnerabilities | `pod-security/` — PSA Restricted profile, no root containers | | Supply Chain Security | Manifest scanning hooks, image policy webhook references | | Monitoring, Logging, Runtime Security | Falco rules — suspicious syscalls, container escape attempts | ## Prerequisites | Tool | Version | |------|---------| | Ansible | ≥ 2.14 | | Python | ≥ 3.10 (on control node) | | kubectl | ≥ 1.28 (with cluster admin access) | | SSH access | To K8s nodes (for node-level hardening) | Install Ansible collections: ansible-galaxy install -r requirements.yml ## Quick Start # 1. Clone git clone https://github.com/ashiq-ali/k8s-security-hardening cd k8s-security-hardening # 2. Configure inventory cp inventory/example-hosts.ini inventory/hosts.ini # Edit hosts.ini — add your control-plane and worker node IPs # 3. Run CIS node hardening ansible-playbook playbooks/node-hardening.yml \ -i inventory/hosts.ini \ --become \ --ask-become-pass # 4. Audit RBAC ansible-playbook playbooks/audit-rbac.yml \ -i inventory/hosts.ini # 5. Apply network policies ansible-playbook playbooks/network-policies.yml \ -i inventory/hosts.ini \ --extra-vars "target_namespace=production" # 6. Run CIS scan report ./scripts/cis-scan.sh ## CIS Benchmark Hardening `playbooks/node-hardening.yml` applies CIS Kubernetes Benchmark Level 1 and Level 2 controls. ### What it hardens **API Server (`roles/cis-benchmark/tasks/apiserver.yml`):** # Applied kube-apiserver flags (via kubeadm config or static pod manifest) --anonymous-auth=false --audit-log-path=/var/log/kubernetes/audit.log --audit-log-maxage=30 --audit-log-maxbackup=10 --audit-log-maxsize=100 --authorization-mode=Node,RBAC --encryption-provider-config=/etc/kubernetes/encryption.yaml # etcd at-rest encryption --tls-min-version=VersionTLS12 --admission-plugins=NodeRestriction,PodSecurity **Kubelet (`roles/cis-benchmark/tasks/kubelet.yml`):** --anonymous-auth=false --authorization-mode=Webhook --client-ca-file=/etc/kubernetes/pki/ca.crt --protect-kernel-defaults=true --read-only-port=0 # Disable unauthenticated read-only port --streaming-connection-idle-timeout=5m **etcd (`roles/cis-benchmark/tasks/etcd.yml`):** --auto-tls=false --peer-auto-tls=false --cert-file=/etc/kubernetes/pki/etcd/server.crt --key-file=/etc/kubernetes/pki/etcd/server.key --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt **OS-level hardening:** - Disable swap (K8s requirement, also security hygiene) - Set kernel parameters: `net.ipv4.conf.all.accept_redirects=0`, `net.ipv4.ip_forward=1` - Restrict `/proc` filesystem permissions - AppArmor enabled and set to enforce mode ### Running in check mode (no changes) ansible-playbook playbooks/node-hardening.yml \ -i inventory/hosts.ini \ --check --diff ## RBAC Audit `playbooks/audit-rbac.yml` identifies overprivileged service accounts and RBAC bindings: ansible-playbook playbooks/audit-rbac.yml -i inventory/hosts.ini The playbook deploys a Job (`manifests/rbac-audit/audit-job.yaml`) that reports: RBAC Audit Report — 2024-01-15 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ⚠️ CRITICAL: ServiceAccount default/my-app has cluster-admin Binding: ClusterRoleBinding/my-app-admin Recommendation: Scope to specific namespace + verbs ⚠️ HIGH: ServiceAccount kube-system/coredns can list secrets cluster-wide Binding: ClusterRoleBinding/coredns Recommendation: Restrict to ConfigMaps only ✅ OK: ServiceAccount default/api-service — minimal RBAC ✅ OK: ServiceAccount monitoring/prometheus — metrics-reader only Summary: 2 critical, 1 high, 0 medium, 14 OK ### Manual RBAC audit commands # Find all ClusterRoleBindings granting cluster-admin kubectl get clusterrolebindings -o json | \ jq '.items[] | select(.roleRef.name=="cluster-admin") | .metadata.name' # Find service accounts with wildcard verb permissions kubectl get clusterroles -o json | \ jq '.items[] | select(.rules[].verbs[] == "*") | .metadata.name' # Find unused service accounts (no pods using them) ./scripts/cis-scan.sh --check unused-service-accounts ## Falco Runtime Security `roles/falco-rules/` installs Falco via Helm and deploys custom rules for Kubernetes-specific threats. ### Install Falco helm repo add falcosecurity https://falcosecurity.github.io/charts helm upgrade --install falco falcosecurity/falco \ --namespace falco --create-namespace \ --set driver.kind=ebpf \ --set falcosidekick.enabled=true \ --set falcosidekick.config.slack.webhookurl="$SLACK_WEBHOOK" ### Custom Rules (`roles/falco-rules/templates/rules.yaml`) - rule: Container Escape Attempt via Proc Mount desc: Detect access to /proc/*/root which may indicate container escape condition: > spawned_process and proc.name in (sh, bash, python, python3) and fd.name startswith /proc and fd.name endswith /root and not proc.pname in (kubelet) output: "Possible container escape (user=%user.name cmd=%proc.cmdline)" priority: CRITICAL tags: [container, escape] - rule: Sensitive File Read in Container desc: Detect reads of sensitive host files from within a container condition: > open_read and container and fd.name in (/etc/shadow, /etc/passwd, /root/.ssh/authorized_keys) output: "Sensitive file read (file=%fd.name container=%container.name)" priority: WARNING - rule: Unexpected Outbound Connection desc: Container making unexpected outbound connections (possible C2) condition: > outbound and container and not fd.rip in (allowed_egress_ips) output: "Unexpected outbound (ip=%fd.rip container=%container.name)" priority: NOTICE - rule: Privileged Container Started desc: A privileged container was started condition: container.privileged=true output: "Privileged container started (container=%container.name image=%container.image)" priority: WARNING Alerts route to Slack via Falco Sidekick. Suppressed namespaces (kube-system tools) are whitelisted. ## Pod Security Standards `manifests/pod-security-admission/` enforces PSA across namespaces: # Applied to all application namespaces apiVersion: v1 kind: Namespace metadata: name: production labels: # Enforce: pods violating restricted policy are rejected pod-security.kubernetes.io/enforce: restricted pod-security.kubernetes.io/enforce-version: latest # Warn: shows warnings for baseline violations pod-security.kubernetes.io/warn: baseline pod-security.kubernetes.io/warn-version: latest **Restricted profile enforces:** - `runAsNonRoot: true` - `allowPrivilegeEscalation: false` - `seccompProfile: {type: RuntimeDefault}` - No `hostPID`, `hostNetwork`, `hostIPC` - `capabilities: drop: [ALL]` Apply PSA labels to a namespace: ansible-playbook playbooks/network-policies.yml \ -i inventory/hosts.ini \ --extra-vars "target_namespace=production apply_psa=true" ## Network Policies `manifests/network-policies/` provides a baseline deny-all + explicit allow pattern. # Apply baseline (deny all ingress + egress except DNS) kubectl apply -f manifests/network-policies/00-default-deny-all.yaml -n production # Apply service-specific allow rules kubectl apply -f manifests/network-policies/allow-api-to-database.yaml -n production Policy structure: # 00-default-deny-all.yaml — applied first spec: podSelector: {} # matches all pods in namespace policyTypes: [Ingress, Egress] egress: - ports: [{port: 53}] # Allow DNS (required for service discovery) # allow-api-to-database.yaml spec: podSelector: matchLabels: {app: database} ingress: - from: - podSelector: matchLabels: {app: api-service} ports: - port: 5432 ## Running the CIS Scan `scripts/cis-scan.sh` runs [kube-bench](https://github.com/aquasecurity/kube-bench) against your cluster: ./scripts/cis-scan.sh # Options: ./scripts/cis-scan.sh --target master # Scan control plane only ./scripts/cis-scan.sh --target node # Scan worker nodes only ./scripts/cis-scan.sh --output html # Generate HTML report ./scripts/cis-scan.sh --benchmark gke # GKE-specific checks Example output: [INFO] 1 Control Plane Components [PASS] 1.1.1 Ensure that the API server pod specification file permissions are set to 644 or more restrictive [PASS] 1.1.2 Ensure that the API server pod specification file ownership is set to root:root [FAIL] 1.2.6 Ensure that the --kubelet-certificate-authority argument is set as appropriate [WARN] 1.2.9 Ensure that the admission control plugin EventRateLimit is set == Summary master == 45 checks PASS 3 checks FAIL 5 checks WARN 0 checks INFO ## Inventory Setup # inventory/hosts.ini [control_plane] cp1 ansible_host=10.0.1.10 ansible_user=ubuntu [workers] worker1 ansible_host=10.0.1.20 ansible_user=ubuntu worker2 ansible_host=10.0.1.21 ansible_user=ubuntu [k8s:children] control_plane workers [k8s:vars] ansible_ssh_private_key_file=~/.ssh/k8s-key ansible_python_interpreter=/usr/bin/python3 For GKE/EKS where you don't have SSH to nodes, use the `--skip-tags node-ssh` flag: ansible-playbook playbooks/node-hardening.yml \ --skip-tags node-ssh # Applies only K8s API-level changes ## Security Posture Checklist Run this checklist against any cluster before going to production: - [ ] kube-bench CIS Level 1 — all PASS (or documented exceptions) - [ ] kube-bench CIS Level 2 — known failures reviewed and accepted - [ ] No service accounts with `cluster-admin` except break-glass accounts - [ ] No service accounts with wildcard verb permissions in production namespaces - [ ] All application namespaces have PSA `restricted` label - [ ] All namespaces have `default-deny-all` NetworkPolicy - [ ] Falco deployed and alerting to Slack - [ ] etcd encrypted at rest (`--encryption-provider-config` set) - [ ] API server audit logging enabled - [ ] Kubelet anonymous auth disabled (`--anonymous-auth=false`) - [ ] No `hostPID`, `hostNetwork`, `hostIPC` in production workloads - [ ] Image pull policy set to `Always` in production (no stale cached images) ## Troubleshooting **Ansible playbook fails with `Permission denied`** Ensure `--become` flag is passed and the SSH user has sudo access. Test with: ansible all -i inventory/hosts.ini -m ping ansible all -i inventory/hosts.ini -m command -a "whoami" --become **Pods rejected after PSA enforcement** # Check which PSA policy is failing kubectl describe pod -n production # Look for: "pod violates PodSecurity restrict" # Fix the pod spec or relax to baseline for that namespace **Falco eBPF driver fails to load** kubectl logs -n falco daemonset/falco | grep -i ebpf # eBPF requires kernel ≥ 5.8 and BTF (BPF Type Format) # Check: uname -r and ls /sys/kernel/btf/vmlinux # Fallback: use kernel module driver instead of eBPF helm upgrade falco falcosecurity/falco --set driver.kind=module **Network policy blocking legitimate traffic** # Debug NetworkPolicy with an ephemeral pod kubectl run debug --image=nicolaka/netshoot --rm -it -- bash # Inside: curl : # Check: kubectl describe networkpolicy -n *Built to apply CKS-level Kubernetes security hardening to production clusters, based on real-world threat modelling and compliance requirements.*