ashiq-ali/k8s-security-hardening
GitHub: ashiq-ali/k8s-security-hardening
Stars: 0 | Forks: 0
# k8s-security-hardening
[](LICENSE)
[](https://ansible.com)
[](https://kubernetes.io)
[](https://cisecurity.org)
## Architecture

Ansible Control Node
├── playbooks/node-hardening.yml ──SSH──► K8s Nodes
├── playbooks/audit-rbac.yml ──K8s──► ┌─ Kubernetes Cluster ──────────────────┐
└── playbooks/network-policies.yml ──K8s──► │ kube-bench (CIS L1 + L2) │
│ Falco eBPF (syscall threat detection) │
│ RBAC Audit Job (overprivileged SAs) │
│ Pod Security Admission (PSA labels) │
│ NetworkPolicies (deny-all + allowlist)│
└───────────────────────────────────────┘
│
CIS Scan Report (HTML/JSON)
RBAC Audit Report
Falco Runtime Alerts → Slack
## Table of Contents
- [What this covers](#what-this-covers)
- [Prerequisites](#prerequisites)
- [Quick Start](#quick-start)
- [CIS Benchmark Hardening](#cis-benchmark-hardening)
- [RBAC Audit](#rbac-audit)
- [Falco Runtime Security](#falco-runtime-security)
- [Pod Security Standards](#pod-security-standards)
- [Network Policies](#network-policies)
- [Running the CIS Scan](#running-the-cis-scan)
- [Inventory Setup](#inventory-setup)
- [Security Posture Checklist](#security-posture-checklist)
- [Troubleshooting](#troubleshooting)
## What this covers
The CKS certification covers six domains. This repo implements all of them:
| CKS Domain | Implementation |
|-----------|----------------|
| Cluster Setup | `node-hardening.yml` — API server flags, etcd encryption, audit logging |
| Cluster Hardening | `audit-rbac.yml` — overprivileged SA detection, namespace isolation |
| System Hardening | `node-hardening.yml` — AppArmor profiles, seccomp, host path restrictions |
| Minimise Microservice Vulnerabilities | `pod-security/` — PSA Restricted profile, no root containers |
| Supply Chain Security | Manifest scanning hooks, image policy webhook references |
| Monitoring, Logging, Runtime Security | Falco rules — suspicious syscalls, container escape attempts |
## Prerequisites
| Tool | Version |
|------|---------|
| Ansible | ≥ 2.14 |
| Python | ≥ 3.10 (on control node) |
| kubectl | ≥ 1.28 (with cluster admin access) |
| SSH access | To K8s nodes (for node-level hardening) |
Install Ansible collections:
ansible-galaxy install -r requirements.yml
## Quick Start
# 1. Clone
git clone https://github.com/ashiq-ali/k8s-security-hardening
cd k8s-security-hardening
# 2. Configure inventory
cp inventory/example-hosts.ini inventory/hosts.ini
# Edit hosts.ini — add your control-plane and worker node IPs
# 3. Run CIS node hardening
ansible-playbook playbooks/node-hardening.yml \
-i inventory/hosts.ini \
--become \
--ask-become-pass
# 4. Audit RBAC
ansible-playbook playbooks/audit-rbac.yml \
-i inventory/hosts.ini
# 5. Apply network policies
ansible-playbook playbooks/network-policies.yml \
-i inventory/hosts.ini \
--extra-vars "target_namespace=production"
# 6. Run CIS scan report
./scripts/cis-scan.sh
## CIS Benchmark Hardening
`playbooks/node-hardening.yml` applies CIS Kubernetes Benchmark Level 1 and Level 2 controls.
### What it hardens
**API Server (`roles/cis-benchmark/tasks/apiserver.yml`):**
# Applied kube-apiserver flags (via kubeadm config or static pod manifest)
--anonymous-auth=false
--audit-log-path=/var/log/kubernetes/audit.log
--audit-log-maxage=30
--audit-log-maxbackup=10
--audit-log-maxsize=100
--authorization-mode=Node,RBAC
--encryption-provider-config=/etc/kubernetes/encryption.yaml # etcd at-rest encryption
--tls-min-version=VersionTLS12
--admission-plugins=NodeRestriction,PodSecurity
**Kubelet (`roles/cis-benchmark/tasks/kubelet.yml`):**
--anonymous-auth=false
--authorization-mode=Webhook
--client-ca-file=/etc/kubernetes/pki/ca.crt
--protect-kernel-defaults=true
--read-only-port=0 # Disable unauthenticated read-only port
--streaming-connection-idle-timeout=5m
**etcd (`roles/cis-benchmark/tasks/etcd.yml`):**
--auto-tls=false
--peer-auto-tls=false
--cert-file=/etc/kubernetes/pki/etcd/server.crt
--key-file=/etc/kubernetes/pki/etcd/server.key
--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
**OS-level hardening:**
- Disable swap (K8s requirement, also security hygiene)
- Set kernel parameters: `net.ipv4.conf.all.accept_redirects=0`, `net.ipv4.ip_forward=1`
- Restrict `/proc` filesystem permissions
- AppArmor enabled and set to enforce mode
### Running in check mode (no changes)
ansible-playbook playbooks/node-hardening.yml \
-i inventory/hosts.ini \
--check --diff
## RBAC Audit
`playbooks/audit-rbac.yml` identifies overprivileged service accounts and RBAC bindings:
ansible-playbook playbooks/audit-rbac.yml -i inventory/hosts.ini
The playbook deploys a Job (`manifests/rbac-audit/audit-job.yaml`) that reports:
RBAC Audit Report — 2024-01-15
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ CRITICAL: ServiceAccount default/my-app has cluster-admin
Binding: ClusterRoleBinding/my-app-admin
Recommendation: Scope to specific namespace + verbs
⚠️ HIGH: ServiceAccount kube-system/coredns can list secrets cluster-wide
Binding: ClusterRoleBinding/coredns
Recommendation: Restrict to ConfigMaps only
✅ OK: ServiceAccount default/api-service — minimal RBAC
✅ OK: ServiceAccount monitoring/prometheus — metrics-reader only
Summary: 2 critical, 1 high, 0 medium, 14 OK
### Manual RBAC audit commands
# Find all ClusterRoleBindings granting cluster-admin
kubectl get clusterrolebindings -o json | \
jq '.items[] | select(.roleRef.name=="cluster-admin") | .metadata.name'
# Find service accounts with wildcard verb permissions
kubectl get clusterroles -o json | \
jq '.items[] | select(.rules[].verbs[] == "*") | .metadata.name'
# Find unused service accounts (no pods using them)
./scripts/cis-scan.sh --check unused-service-accounts
## Falco Runtime Security
`roles/falco-rules/` installs Falco via Helm and deploys custom rules for Kubernetes-specific threats.
### Install Falco
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm upgrade --install falco falcosecurity/falco \
--namespace falco --create-namespace \
--set driver.kind=ebpf \
--set falcosidekick.enabled=true \
--set falcosidekick.config.slack.webhookurl="$SLACK_WEBHOOK"
### Custom Rules (`roles/falco-rules/templates/rules.yaml`)
- rule: Container Escape Attempt via Proc Mount
desc: Detect access to /proc/*/root which may indicate container escape
condition: >
spawned_process and
proc.name in (sh, bash, python, python3) and
fd.name startswith /proc and
fd.name endswith /root and
not proc.pname in (kubelet)
output: "Possible container escape (user=%user.name cmd=%proc.cmdline)"
priority: CRITICAL
tags: [container, escape]
- rule: Sensitive File Read in Container
desc: Detect reads of sensitive host files from within a container
condition: >
open_read and container and
fd.name in (/etc/shadow, /etc/passwd, /root/.ssh/authorized_keys)
output: "Sensitive file read (file=%fd.name container=%container.name)"
priority: WARNING
- rule: Unexpected Outbound Connection
desc: Container making unexpected outbound connections (possible C2)
condition: >
outbound and container and
not fd.rip in (allowed_egress_ips)
output: "Unexpected outbound (ip=%fd.rip container=%container.name)"
priority: NOTICE
- rule: Privileged Container Started
desc: A privileged container was started
condition: container.privileged=true
output: "Privileged container started (container=%container.name image=%container.image)"
priority: WARNING
Alerts route to Slack via Falco Sidekick. Suppressed namespaces (kube-system tools) are whitelisted.
## Pod Security Standards
`manifests/pod-security-admission/` enforces PSA across namespaces:
# Applied to all application namespaces
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
# Enforce: pods violating restricted policy are rejected
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/enforce-version: latest
# Warn: shows warnings for baseline violations
pod-security.kubernetes.io/warn: baseline
pod-security.kubernetes.io/warn-version: latest
**Restricted profile enforces:**
- `runAsNonRoot: true`
- `allowPrivilegeEscalation: false`
- `seccompProfile: {type: RuntimeDefault}`
- No `hostPID`, `hostNetwork`, `hostIPC`
- `capabilities: drop: [ALL]`
Apply PSA labels to a namespace:
ansible-playbook playbooks/network-policies.yml \
-i inventory/hosts.ini \
--extra-vars "target_namespace=production apply_psa=true"
## Network Policies
`manifests/network-policies/` provides a baseline deny-all + explicit allow pattern.
# Apply baseline (deny all ingress + egress except DNS)
kubectl apply -f manifests/network-policies/00-default-deny-all.yaml -n production
# Apply service-specific allow rules
kubectl apply -f manifests/network-policies/allow-api-to-database.yaml -n production
Policy structure:
# 00-default-deny-all.yaml — applied first
spec:
podSelector: {} # matches all pods in namespace
policyTypes: [Ingress, Egress]
egress:
- ports: [{port: 53}] # Allow DNS (required for service discovery)
# allow-api-to-database.yaml
spec:
podSelector:
matchLabels: {app: database}
ingress:
- from:
- podSelector:
matchLabels: {app: api-service}
ports:
- port: 5432
## Running the CIS Scan
`scripts/cis-scan.sh` runs [kube-bench](https://github.com/aquasecurity/kube-bench) against your cluster:
./scripts/cis-scan.sh
# Options:
./scripts/cis-scan.sh --target master # Scan control plane only
./scripts/cis-scan.sh --target node # Scan worker nodes only
./scripts/cis-scan.sh --output html # Generate HTML report
./scripts/cis-scan.sh --benchmark gke # GKE-specific checks
Example output:
[INFO] 1 Control Plane Components
[PASS] 1.1.1 Ensure that the API server pod specification file permissions are set to 644 or more restrictive
[PASS] 1.1.2 Ensure that the API server pod specification file ownership is set to root:root
[FAIL] 1.2.6 Ensure that the --kubelet-certificate-authority argument is set as appropriate
[WARN] 1.2.9 Ensure that the admission control plugin EventRateLimit is set
== Summary master ==
45 checks PASS
3 checks FAIL
5 checks WARN
0 checks INFO
## Inventory Setup
# inventory/hosts.ini
[control_plane]
cp1 ansible_host=10.0.1.10 ansible_user=ubuntu
[workers]
worker1 ansible_host=10.0.1.20 ansible_user=ubuntu
worker2 ansible_host=10.0.1.21 ansible_user=ubuntu
[k8s:children]
control_plane
workers
[k8s:vars]
ansible_ssh_private_key_file=~/.ssh/k8s-key
ansible_python_interpreter=/usr/bin/python3
For GKE/EKS where you don't have SSH to nodes, use the `--skip-tags node-ssh` flag:
ansible-playbook playbooks/node-hardening.yml \
--skip-tags node-ssh # Applies only K8s API-level changes
## Security Posture Checklist
Run this checklist against any cluster before going to production:
- [ ] kube-bench CIS Level 1 — all PASS (or documented exceptions)
- [ ] kube-bench CIS Level 2 — known failures reviewed and accepted
- [ ] No service accounts with `cluster-admin` except break-glass accounts
- [ ] No service accounts with wildcard verb permissions in production namespaces
- [ ] All application namespaces have PSA `restricted` label
- [ ] All namespaces have `default-deny-all` NetworkPolicy
- [ ] Falco deployed and alerting to Slack
- [ ] etcd encrypted at rest (`--encryption-provider-config` set)
- [ ] API server audit logging enabled
- [ ] Kubelet anonymous auth disabled (`--anonymous-auth=false`)
- [ ] No `hostPID`, `hostNetwork`, `hostIPC` in production workloads
- [ ] Image pull policy set to `Always` in production (no stale cached images)
## Troubleshooting
**Ansible playbook fails with `Permission denied`**
Ensure `--become` flag is passed and the SSH user has sudo access. Test with:
ansible all -i inventory/hosts.ini -m ping
ansible all -i inventory/hosts.ini -m command -a "whoami" --become
**Pods rejected after PSA enforcement**
# Check which PSA policy is failing
kubectl describe pod -n production
# Look for: "pod violates PodSecurity restrict"
# Fix the pod spec or relax to baseline for that namespace
**Falco eBPF driver fails to load**
kubectl logs -n falco daemonset/falco | grep -i ebpf
# eBPF requires kernel ≥ 5.8 and BTF (BPF Type Format)
# Check: uname -r and ls /sys/kernel/btf/vmlinux
# Fallback: use kernel module driver instead of eBPF
helm upgrade falco falcosecurity/falco --set driver.kind=module
**Network policy blocking legitimate traffic**
# Debug NetworkPolicy with an ephemeral pod
kubectl run debug --image=nicolaka/netshoot --rm -it -- bash
# Inside: curl :
# Check: kubectl describe networkpolicy -n
*Built to apply CKS-level Kubernetes security hardening to production clusters, based on real-world threat modelling and compliance requirements.*