codewithbrandon/cloud-threat-detection
GitHub: codewithbrandon/cloud-threat-detection
一个生产级 Kubernetes 运行时安全平台,整合 Prometheus 指标告警、Loki 日志分析和 Falco eBPF 系统调用监控,提供三层纵深检测能力。
Stars: 0 | Forks: 0
[](https://github.com/codewithbrandon/cloud-threat-detection/actions/workflows/ci.yml)
[](https://python.org)
[](https://kubernetes.io)
[](https://prometheus.io)
[](https://grafana.com)
[](https://falco.org)
[](https://docker.com)
[](LICENSE)
[](docs/threat-model.md)
[](monitoring/prometheus/alert-rules.yaml)
[](monitoring/falco/falco-rules.yaml)
[](k8s/network-policy.yaml)
[**快速开始**](#-quick-start) • [**架构**](#-architecture) • [**攻击模拟**](#-simulating-attacks) • [**告警参考**](#-alert-reference) • [**面试要点**](#-interview-talking-points)
## 解决的问题
```
Most Kubernetes environments have zero runtime visibility.
A compromised container can exfiltrate data, pivot laterally,
and mine crypto for weeks before anyone notices.
```
| 之前 | 之后 |
|--------|-------|
| 容器内 Shell 生成 → **静默** | Shell 生成 → Falco 在 **< 1 秒内** 触发 |
| 500 次登录失败 → **无人知晓** | 10 次登录失败/2分钟 → Slack 告警 + 手册链接 |
| 内存耗尽 → **意外宕机** | 75% 内存阈值 → OOM kill 前预警 |
| Pod 崩溃循环 → **用户反馈** | 3 次重启/15分钟 → PagerDuty 页面告警触发 |
| "我们该怎么办?" → **临时应对** | SEC-001, SEC-002 手册 → 15 分钟遏制 SLA |
## 架构
```
┌─────────────────────────────────────────────────────────────────────────────────┐
│ CLOUD-NATIVE THREAT DETECTION PLATFORM │
│ Kubernetes Namespace: threat-detection │
└─────────────────────────────────────────────────────────────────────────────────┘
╔═══════════════════════════════════════════════════════════════════════════════╗
║ ATTACK SIMULATION LAYER ║
║ ┌──────────────────┐ ┌──────────────┐ ┌──────────────┐ ┌─────────────┐ ║
║ │ brute_force.py │ │ cpu_spike.py │ │ memory_ex.py │ │ kill_chain │ ║
║ │ T1110 BruteForce│ │ T1499 DoS │ │ T1499.004 │ │ Full APT sim│ ║
║ └────────┬─────────┘ └──────┬───────┘ └──────┬───────┘ └──────┬──────┘ ║
╚═══════════╪════════════════════╪═════════════════╪══════════════════╪════════╝
│ HTTP Requests │ │ │
╔═══════════▼════════════════════▼═════════════════▼══════════════════▼════════╗
║ APPLICATION LAYER (Python Flask + Gunicorn) ║
║ ║
║ /health /ready /metrics /login /load /memory /exec /probe ║
║ ║
║ UID 1001 │ ReadOnlyRootFS │ No SA token │ Drop ALL caps │ Seccomp ║
║ Prometheus metrics client → Counter, Gauge, Histogram ║
║ Structured JSON logs → stdout → captured by Promtail ║
╚══════════════════════════════╤═══════════════════════════════════════════════╝
│
┌──────────────────┴──────────────────┐
│ │
╔═══════════▼═══════════════╗ ╔═════════════▼═════════════╗
║ METRICS PIPELINE ║ ║ LOGGING PIPELINE ║
║ ║ ║ ║
║ ┌─────────────────────┐ ║ ║ ┌─────────────────────┐ ║
║ │ Prometheus │ ║ ║ │ Promtail │ ║
║ │ Scrapes /metrics │ ║ ║ │ DaemonSet per node │ ║
║ │ every 15 seconds │ ║ ║ │ Pipeline stages │ ║
║ │ 15-day retention │ ║ ║ │ Drop probe noise │ ║
║ └──────────┬──────────┘ ║ ║ └──────────┬──────────┘ ║
║ │ Evaluates ║ ║ │ Ships to ║
║ │ 12 rules ║ ║ ┌──────────▼──────────┐ ║
║ ┌──────────▼──────────┐ ║ ║ │ Loki │ ║
║ │ Alertmanager │ ║ ║ │ Label-indexed logs │ ║
║ │ Routing by │◄─╫───────╫──│ 4 LogQL alert rules│ ║
║ │ severity + team │ ║ ║ │ 30-day retention │ ║
║ │ Dedup + Inhibition │ ║ ║ └─────────────────────┘ ║
║ └──────────┬──────────┘ ║ ╚═══════════════════════════╝
╚═════════════╪═════════════╝
│
╔═════════════▼═══════════════════════════════════════════════════════════════╗
║ NOTIFICATION LAYER ║
║ ┌─────────────────┐ ┌──────────────────┐ ┌───────────────────────┐ ║
║ │ Slack Webhook │ │ PagerDuty │ │ Email (SMTP) │ ║
║ │ #sec-incidents │ │ Critical pages │ │ Email (configurable) │ ║
║ │ #platform-ops │ │ 30min re-alert │ │ HTML template │ ║
║ └─────────────────┘ └──────────────────┘ └───────────────────────┘ ║
╚═════════════════════════════════════════════════════════════════════════════╝
╔═════════════════════════════════════════════════════════════════════════════╗
║ RUNTIME SECURITY LAYER ── Falco eBPF Syscall Interception ║
║ ║
║ Every node │ Every container │ Every syscall ║
║ ║
║ exec() → shell_spawned_in_container (T1059.004) CRITICAL ║
║ connect() → unexpected_outbound_connection (T1071) HIGH ║
║ open(WRITE) → write_sensitive_file (T1222) CRITICAL ║
║ setuid() → privilege_escalation_attempt (T1068) CRITICAL ║
║ open(/proc) → proc_filesystem_access (T1057) CRITICAL ║
║ ║
║ Falco → Falcosidekick → Alertmanager + Loki + Slack ║
╚═════════════════════════════════════════════════════════════════════════════╝
╔═════════════════════════════════════════════════════════════════════════════╗
║ ENFORCEMENT LAYER ║
║ ┌──────────────────────┐ ┌─────────────────────┐ ┌──────────────────┐ ║
║ │ NetworkPolicy │ │ ResourceQuota │ │ Pod Security │ ║
║ │ Default deny all │ │ 4 CPU / 4Gi hard │ │ Admission │ ║
║ │ 8 allowlist rules │ │ 20 pod limit │ │ restricted │ ║
║ │ Zero-trust east- │ │ No LoadBalancer │ │ profile enforced│ ║
║ │ west traffic │ │ No NodePort │ │ on namespace │ ║
║ └──────────────────────┘ └─────────────────────┘ └──────────────────┘ ║
╚═════════════════════════════════════════════════════════════════════════════╝
```
## 检测流程
```
ATTACK OCCURS
│
▼
──────────────────────────────────────────────────────────────
LAYER 1 │ METRIC SIGNAL ~15-30s
──────────────────────────────────────────────────────────────
Flask Prometheus client increments counter
failed_logins_total{source_ip="x.x.x.x"} += 1
Prometheus scrapes /metrics every 15s
Alert rule evaluates: rate(failed_logins_total[2m]) > 10
PENDING → FIRING after `for:` duration
│
▼
──────────────────────────────────────────────────────────────
LAYER 2 │ LOG SIGNAL ~5-15s
──────────────────────────────────────────────────────────────
Structured log emitted:
AUTHENTICATION_FAILURE user=admin source_ip=x.x.x.x
Promtail pipeline extracts label: event_type=AUTHENTICATION_FAILURE
Loki ingests log stream with labels
LogQL rule fires: count_over_time > threshold
Loki ruler → Alertmanager → second correlated alert
│
▼
──────────────────────────────────────────────────────────────
LAYER 3 │ RUNTIME SIGNAL (exec/file/network) ~< 1s
──────────────────────────────────────────────────────────────
eBPF probe intercepts exec() syscall
Falco matches rule: shell_spawned_in_container
Falcosidekick fans out to Alertmanager + Loki + Slack
Correlation: same pod, overlapping time window
│
▼
──────────────────────────────────────────────────────────────
RESPONSE │ SOC ACTION
──────────────────────────────────────────────────────────────
Analyst receives Slack alert with direct playbook link
Opens Grafana: correlates metrics + logs + Falco events
Executes playbook: isolate → preserve evidence → eradicate
MTTC target: 15 minutes from first alert
```
## 仓库结构
```
cloud-threat-detection/
│
├── 📦 app/
│ ├── app.py # Flask app — 10 endpoints, Prometheus metrics, attack surfaces
│ └── requirements.txt # Pinned Python dependencies
│
├── 🐳 docker/
│ ├── Dockerfile # Multi-stage build, UID 1001, readOnlyRootFS, health checks
│ └── .dockerignore
│
├── ☸️ k8s/
│ ├── namespace.yaml # PSA restricted enforcement
│ ├── serviceaccount.yaml # No API token mounted
│ ├── configmap.yaml
│ ├── deployment.yaml # Full securityContext, probes, resource limits
│ ├── service.yaml # ClusterIP only (no external exposure)
│ ├── network-policy.yaml # Default-deny + 8 allowlist rules
│ └── resource-quota.yaml # Namespace CPU/memory/object caps
│
├── 📊 monitoring/
│ ├── prometheus/
│ │ ├── prometheus-config.yaml # Scrape configs, pod discovery, self-monitoring
│ │ ├── alert-rules.yaml # 12 production alert rules across 6 groups
│ │ └── prometheus-deployment.yaml
│ │
│ ├── alertmanager/
│ │ ├── alertmanager-config.yaml # Routing tree, receivers, inhibition rules
│ │ └── alertmanager-deployment.yaml # Webhook simulator included
│ │
│ ├── loki/
│ │ ├── loki-config.yaml # Loki + 4 LogQL alert rules
│ │ └── loki-deployment.yaml # Loki StatefulSet + Promtail DaemonSet
│ │
│ ├── falco/
│ │ ├── falco-rules.yaml # 9 custom rules with MITRE ATT&CK mapping
│ │ └── falco-deployment.yaml # DaemonSet + Falcosidekick + RBAC
│ │
│ └── grafana/
│ └── grafana-deployment.yaml # Datasource provisioning (Prometheus + Loki)
│
├── 💥 attacks/
│ ├── brute_force.py # Sequential + distributed credential stuffing
│ ├── cpu_spike.py # CPU exhaustion with alert monitoring
│ ├── memory_exhaustion.py # Escalating memory pressure + OOM simulation
│ └── suspicious_commands.py # Full kill chain: recon → persistence → C2
│
├── 📋 docs/
│ ├── incident-playbook-brute-force.md # SEC-001 with forensic queries + containment
│ ├── incident-playbook-container-compromise.md # SEC-002 with 15min MTTC target
│ └── threat-model.md # STRIDE + MITRE ATT&CK for Containers
│
├── docker-compose.yaml # Local dev stack (no K8s required)
├── Makefile # deploy / attack / port-forward / verify targets
└── README.md
```
## 快速开始
### 前置条件
| 工具 | 版本 | 用途 |
|------|---------|---------|
| `kubectl` | 1.28+ | 集群管理 |
| `helm` | 3.x | Falco 部署 |
| `python3` | 3.10+ | 攻击模拟脚本 |
| `docker` | 24+ | 镜像构建 |
| CNI | Calico / Cilium | NetworkPolicy 执行 |
### 选项 A — 完整 Kubernetes 部署
```
# Clone
git clone https://github.com/codewithbrandon/cloud-threat-detection.git
cd cloud-threat-detection
# 使用 make 部署所有内容
make deploy # namespace + monitoring + app + network policies
make deploy-falco # Falco via Helm (requires Linux node for eBPF)
# 访问仪表板
make port-forward
# 验证 Stack 健康状态
make verify
```
### 选项 B — 本地 Docker Compose(无需 K8s)
```
# 本地启动完整监控 Stack
docker-compose up -d
# 验证所有容器正在运行
docker-compose ps
# 查看应用日志
docker-compose logs -f app
```
### 访问入口
| 服务 | URL | 凭证 |
|---------|-----|-------------|
| Grafana | http://localhost:3000 | anonymous viewer |
| Prometheus | http://localhost:9090 | none |
| Alertmanager | http://localhost:9093 | none |
| 应用 | http://localhost:8080 | — |
## 模拟攻击
### 暴力破解登录 — `T1110`
```
# 顺序:单 IP 快速发射(测试每 IP Prometheus 阈值)
python3 attacks/brute_force.py \
--target http://localhost:8080 \
--mode sequential --count 25 --rate 5
# 分布式:多 IP(测试全局 Loki LogQL 阈值)
python3 attacks/brute_force.py \
--target http://localhost:8080 \
--mode distributed --count 60 --concurrency 4
```
**触发:** `ExcessiveFailedLogins` → `BruteForceAttackCritical` → `BruteForceInLogs`
**验证:**
```
# Prometheus
curl -s http://localhost:9090/api/v1/query \
--data 'query=sum(increase(failed_logins_total[2m]))by(source_ip)'
# Loki (在 Grafana Explore 中)
{app="threat-detection-app"} |= "AUTHENTICATION_FAILURE" | json
```
### CPU 飙升 — `T1499`
```
python3 attacks/cpu_spike.py \
--target http://localhost:8080 \
--intensity 0.9 --duration 120
```
**触发:** `HighCPUUsage` (>75% 持续 2 分钟) → `CriticalCPUSpike` (>95% 持续 1 分钟)
### 内存耗尽 — `T1499.004`
```
# 分 4 步逐步升级至 480MB (限制: 512MB)
python3 attacks/memory_exhaustion.py \
--target http://localhost:8080 \
--mode escalating --size 480 --steps 4 --hold 30
```
**触发:** `HighMemoryUsage` → `MemoryExhaustionCritical` → Kubernetes OOM kill → `PodCrashLoopDetected`
### 完整杀伤链 — `T1059 → T1057 → T1222 → T1071`
```
# 运行:recon → network discovery → persistence → C2 beaconing
python3 attacks/suspicious_commands.py \
--target http://localhost:8080 \
--scenario kill-chain
```
**触发:** Falco `shell_spawned_in_container` + `unexpected_outbound_connection` + `write_sensitive_file`
```
# 实时查看 Falco 告警
kubectl logs -n threat-detection -l app=falco -f | \
jq '{rule: .rule, priority: .priority, pod: .output_fields."k8s.pod.name"}'
```
### 运行所有
```
make attack-all
```
## 告警参考
| 告警 |
触发条件 |
严重性 |
通道 |
响应 SLA |
ExcessiveFailedLogins | >10 次失败/2分钟/IP | ⚠️ WARNING | #security-alerts | 5 分钟 |
BruteForceAttackCritical | >50 次失败/1分钟/IP | 🔴 CRITICAL | #security-incidents + page | 立即 |
HighCPUUsage | CPU >75% 持续 2 分钟 | ⚠️ WARNING | #platform-alerts | 15 分钟 |
CriticalCPUSpike | CPU >95% 持续 1 分钟 | 🔴 CRITICAL | #platform-oncall + page | 5 分钟 |
HighMemoryUsage | 内存 >384Mi 持续 2 分钟 | ⚠️ WARNING | #platform-alerts | 15 分钟 |
MemoryExhaustionCritical | 内存 >460Mi | 🔴 CRITICAL | #platform-oncall + page | 5 分钟 |
High5xxErrorRate | 5xx >5% 持续 2 分钟 | ⚠️ WARNING | #platform-alerts | 15 分钟 |
ServiceUnavailable | 5xx >50% 持续 1 分钟 | 🔴 CRITICAL | #platform-oncall + page | 5 分钟 |
PodCrashLoopDetected | >3 次重启 / 15 分钟 | 🔴 CRITICAL | #platform-oncall + page | 5 分钟 |
SuspiciousActivityDetected | suspicious_activity_total > 0 | ⚠️ WARNING | #security-alerts | 10 分钟 |
PrometheusTargetDown | 抓取目标宕机 2 分钟 | 🔴 CRITICAL | #platform-oncall | 5 分钟 |
WatchdogHeartbeat | 始终触发 (dead man's switch) | 🔵 NONE | 外部监控 | N/A |
## Falco 运行时规则
| 规则 |
Syscall 触发 |
MITRE 技术 |
优先级 |
shell_spawned_in_container | exec() → sh/bash/zsh | T1059.004 | 🔴 CRITICAL |
unexpected_outbound_connection | connect() 连接到非白名单 IP | T1071 | 🟠 HIGH |
write_sensitive_file | open(WRITE) 在 /etc, /bin, /usr | T1222 | 🔴 CRITICAL |
dangerous_binary_in_container | exec() → wget, curl, nc, nmap | T1105 | 🟠 HIGH |
container_running_as_root | spawned_process, UID=0 | T1078 | 🟠 HIGH |
proc_filesystem_access | open() 在 /proc/1, /proc/kcore | T1057 | 🔴 CRITICAL |
crypto_miner_detected | exec() → xmrig, stratum+tcp | T1496 | 🔴 CRITICAL |
k8s_secret_access_in_container | open() 在 /var/run/secrets | T1552 | 🔴 CRITICAL |
privilege_escalation_attempt | setuid()/setgid() 成功 | T1068 | 🔴 CRITICAL |
## 安全控制
```
Container Layer
✅ Non-root user (UID 1001) ✅ Read-only root filesystem
✅ No privilege escalation ✅ Drop ALL Linux capabilities
✅ Seccomp RuntimeDefault profile ✅ Multi-stage minimal image
Pod Layer
✅ No ServiceAccount token mounted ✅ Dedicated ServiceAccount
✅ Pod Security Admission: restricted ✅ Topology spread constraints
✅ Resource limits (CPU + memory) ✅ Liveness + readiness + startup probes
Namespace Layer
✅ Default-deny NetworkPolicy ✅ 8 explicit allowlist rules
✅ ResourceQuota (CPU/mem/objects) ✅ LimitRange (per-container defaults)
✅ No LoadBalancer services ✅ No NodePort services
Runtime Layer
✅ Falco eBPF syscall monitoring ✅ 9 custom rules (MITRE-mapped)
✅ Falcosidekick fan-out routing ✅ 12 Prometheus alert rules
✅ 4 LogQL log-based alert rules ✅ Dead man's switch heartbeat
✅ Alert deduplication + inhibition ✅ Multi-channel notification
```
## MITRE ATT&CK 覆盖范围
| 战术 | 覆盖技术 | 检测层 |
|--------|-------------------|-----------------|
| Initial Access (初始访问) | T1190 Exploit Public-Facing App | Loki + Falco |
| Execution (执行) | T1059.004 Unix Shell | Falco (exec syscall) |
| Persistence (持久化) | T1222 File Permissions Modification | Falco (open syscall) |
| Privilege Escalation (权限提升) | T1068, T1611 Container Escape | Falco (setuid syscall) |
| Defense Evasion (防御规避) | T1070 Indicator Removal | Falco (readOnlyFS blocks) |
| Credential Access (凭证访问) | T1552 Unsecured Credentials, T1110 Brute Force | Prometheus + Loki + Falco |
| Discovery (发现) | T1057 Process Discovery | Falco (exec syscall) |
| Lateral Movement (横向移动) | T1210 Exploitation of Remote Services | Falco + NetworkPolicy |
| Command & Control (命令与控制) | T1071 Application Layer Protocol | Falco + NetworkPolicy |
| Exfiltration (数据渗出) | T1041 Exfiltration Over C2 Channel | Falco + NetworkPolicy |
| Impact (影响) | T1496 Resource Hijacking, T1499 Endpoint DoS | Prometheus |
## 事件响应手册
| 手册 | 场景 | 触发条件 | MTTC 目标 |
|----------|----------|---------|-------------|
| [SEC-001](docs/incident-playbook-brute-force.md) | 暴力破解 / 凭证填充 | `BruteForceAttackCritical` | N/A (告警 + 阻断) |
| [SEC-002](docs/incident-playbook-container-compromise.md) | 容器失陷 / 运行时攻击 | `shell_spawned_in_container` | **15 分钟** |
每个手册包括:
- 检测信号清单
- 事件时间线模板
- 分诊决策树
- 分步遏制命令
- 取证证据收集(Pod 终止前)
- Loki / Prometheus 调查查询
- 根除和恢复程序
- 经验教训模板
## 面试要点
介绍一下你的检测技术栈
该平台拥有三个独立的检测层。Prometheus 每 15 秒拉取一次指标——我针对 `failed_logins_total` 超过每个 IP 10 次/2分钟的情况发出告警,这能捕获暴力破解。Loki 通过 Promtail 的 pipeline stages 聚合结构化日志,提取 `event_type` 标签——我使用 LogQL `count_over_time` 规则来检测低于单 IP 阈值的分布式攻击。Falco 拦截 eBPF syscall——`exec()`、`connect()` 和 `open()`——这能捕获指标和日志完全遗漏的后渗透行为。这三者都汇聚到 Alertmanager,由它进行分组、去重,并根据严重性路由到 Slack、PagerDuty 和邮件。
你如何处理告警疲劳?
三种机制。首先,Alertmanager 抑制规则会在同一事件的高级别告警已在触发时抑制警告级别告警——`BruteForceAttackCritical` 会抑制 `ExcessiveFailedLogins`。其次,分组将相关告警打包为一个通知,而不是 50 个。第三,Promtail 中的 `drop` pipeline stage 在摄入前过滤掉 `/health`、`/ready` 和 `/metrics` 抓取日志——Loki 告警查询不会因预期流量模式而产生噪音。
如果是生产环境,你会添加什么?
三件事。Istio 或 Cilium 用于 Pod 间的 mTLS——目前 NetworkPolicy 提供网络级隔离,但没有基于身份的加密。使用 Cosign 进行镜像签名,并通过 admission webhook 拒绝未签名镜像——这填补了我 STRIDE 威胁模型中的供应链。第三,将 Kubernetes 审计日志传输到 Loki——目前我检测容器行为,但无法检测 API Server 操作,如异常的 RBAC 更改或 Secret 访问模式。
你如何证明这确实有效?
攻击模拟脚本就是测试套件。`brute_force.py` 触发 `ExcessiveFailedLogins`,我计算从第一次请求到 Slack 通知的时间——SLA 是 2 分钟,实际通常为 35-45 秒。`suspicious_commands.py` 触发所有三个 Falco 规则,我验证每一个都在 10 秒内出现在 Falco Pod 日志和 Alertmanager 中。Dead man's switch `WatchdogHeartbeat` 告警验证整个告警管道是否正常——如果它停止触发,我们就面临比任何单个告警更大的问题。
为什么选 Loki 而不是 Elasticsearch?
Loki 是基于标签索引的,而不是全文索引的。对于安全用例,我确切知道我要搜索什么——特定事件类型、Pod 名称、源 IP。Loki 的结构化标签查询在规模上便宜几个数量级。它与 Prometheus 标签原生集成,无需上下文切换即可在 Grafana 中进行指标到日志的关联。在同等日志量下,存储成本比 Elasticsearch 低约 90%。
为什么要用三层检测而不是一层?
纵深防御。攻陷应用程序并停止写入日志的攻击者仍然会生成 Falco 捕获的 syscall。低于单 IP 指标阈值的攻击仍然会出现在全局 Loki 日志计数中。Falco 规则缺口并不意味着攻击不可见——Prometheus 会捕获指标信号。每一层都有不同的盲点;结合它们意味着攻击者必须同时逃避这三层,这难度呈指数级增加。
## 质量门禁
每次推送和 Pull Request 都会运行完整的 CI 流水线:
| 检查项 | 工具 | 捕获内容 |
|-------|------|-----------------|
| YAML lint | `yamllint` | 所有清单中的缩进、尾随空格、类型错误 |
| K8s schema 验证 | `kubeconform` | 针对 1.29 schema 的无效 Kubernetes API 字段 |
| Python lint | `ruff` | 导入顺序、未使用变量、样式、pyupgrade 建议 |
| Python 格式化 | `black` | 一致的代码格式化——漂移即失败 |
| 密钥扫描 | `gitleaks` | 意外提交的密钥、Token、密码 |
| Container CVE 扫描 | `trivy` (image) | 构建的 Docker 镜像中的 OS + 库 CVE(CRITICAL 级失败) |
| 文件系统扫描 | `trivy` (fs) | 源码中的 IaC 配置错误和密钥(信息性) |
Trivy 的结果显示在 [GitHub Security tab](https://github.com/codewithbrandon/cloud-threat-detection/security)。
### 本地运行检查
```
# 安装工具一次
pip install ruff==0.4.10 black==24.4.2 yamllint==1.35.1
# 安装 kubeconform (Linux/macOS)
curl -sSL https://github.com/yannh/kubeconform/releases/download/v0.6.7/kubeconform-linux-amd64.tar.gz \
| tar -xz -C /usr/local/bin kubeconform
# 运行所有检查
make lint # yaml + python
make validate-k8s # kubeconform schema validation
make lint-fix # auto-fix python formatting (writes files)
```
## 威胁模型
完整的 STRIDE 分析及风险登记表 → [docs/threat-model.md](docs/threat-model.md)
**最高的残余风险(由设计决策决定):**
- 供应链攻击——通过固定基础镜像摘要缓解;Cosign 签名是下一个控制措施
- 低于单 IP 阈值的分布式暴力破解——通过全局 Loki 规则缓解;WAF 是下一个控制措施
- Falco 规则缺口——通过指标 + 日志并行检测缓解;持续规则测试是流程控制措施
**旨在解决容器运行时可见性的真实缺口。**
*没有响应的检测只是昂贵的日志记录。该平台将两者连接起来。*
[](https://github.com/codewithbrandon/cloud-threat-detection)
标签:Alertmanager, AMSI绕过, BurpSuite集成, Cloudflare, DevSecOps, Docker, Docker镜像, Falco, Grafana, JSONLines, Kubernetes 安全, Loki, MITRE ATT&CK, Python, Web截图, 上游代理, 告警规则, 基础设施监控, 威胁检测, 子域名突变, 安全态势感知, 安全运维 (SecOps), 安全防御评估, 容器安全, 异常检测, 敏感词过滤, 无后门, 模型鲁棒性, 生产环境, 自定义请求头, 请求拦截, 逆向工具, 零信任