ugiordan/architecture-analyzer

GitHub: ugiordan/architecture-analyzer

一款面向 Kubernetes 生态的静态分析工具，通过提取架构元数据与构建多语言代码属性图来执行安全查询和可视化报告。

Stars: 3 | Forks: 2

# 架构分析器一个静态分析工具，可以从 Kubernetes/OpenShift 组件仓库中提取架构数据，并生成图表、安全报告和代码属性图。适用于任何基于 Go 的 K8s operator 生态系统。 **[文档](https://ugiordan.github.io/architecture-analyzer)** | **[GitHub](https://github.com/ugiordan/architecture-analyzer)** ## 演示

Architecture Analyzer Demo

## 功能 - **26 个架构提取器**，涵盖 CRD、RBAC、deployments、services、network policies、controller watches、依赖项、secrets、Helm charts、Dockerfiles、webhooks、configmaps、HTTP endpoints、ingress、外部连接（Go + Python）、feature gates、缓存架构、operator 配置常量、reconcile 序列、Prometheus metrics、status conditions、平台检测、Go CRD 提取、webhook 行为分析以及程序化资源操作 - **Go AST 提取**，通过 `go/packages` 针对将生成的 manifests 添加到 `.gitignore` 的 operator。从带有 kubebuilder 标记的 Go 类型中提取 CRD，分析 webhook 方法体以进行字段级别的变更和验证，并检测 reconcile 方法中的程序化 `client.Create/Update/Patch/Delete` 调用。针对不受信任的仓库分析进行了安全加固（CGO_ENABLED=0、模块隔离、boundedFileSystem）。 - **代码属性图**，支持多语言解析（Go、Python、TypeScript、Rust）、类型化节点模型、边置信度分类、过程内数据流、控制流图、Python 类层次结构提取（带有 BaseClasses 和 EdgeContains 的 NodeClass）以及两阶段污点传播 - **26 个安全查询**，涵盖 5 个领域（安全、测试、升级、架构、netpolicy），可检测 webhook 缺陷、RBAC bug、secret 泄露、污点路径、复杂性热点、类层次结构、工厂模式、外部 API 表面等 - **SARIF 摄取**，将外部扫描器（Semgrep、gosec 等）的发现映射到 CPG 节点以进行统一分析 - **结构化 diff 引擎**，跨版本比较代码图以检测回归 - **7 个渲染器**，可生成 Mermaid 图表、Structurizr C4 DSL、ASCII 安全视图和结构化 markdown 报告 - **CycloneDX SBOM 生成**，基于提取的数据（Go modules、Python 依赖、Dockerfile 基础镜像、deployment 容器镜像、operator 镜像常量）以及完整的操作元数据（security contexts、resource limits、health probes） - **镜像与容器分析报告**，涵盖 GPU/CUDA 依赖、基础镜像仓库、多架构支持、Dockerfile 问题、容器 security contexts、resource limits、health probes、sidecar 清单和 deployment 问题 - **CRD 契约验证**，可检测跨仓库的重大 schema 变更 - **平台聚合**，将多个组件分析合并为跨仓库视图 ## 架构 ``` graph LR subgraph Inputs REPO[Git Repository] SARIF[SARIF Files] end subgraph "Architecture Extractors (26)" E1[CRDs & RBAC] E2[Services & Deployments] E3[Network Policies & Ingress] E4[Controller Watches & Dependencies] E5[Cache Config & Operator Config] E6[Reconcile Sequences & Status Conditions] E7[Prometheus Metrics & Platform Detection] E8[Secrets, Helm, Dockerfiles, Webhooks, ConfigMaps, HTTP Endpoints, External Connections, Feature Gates] end subgraph "Code Property Graph" PARSE[Multi-Language Parsers
Go, Python, TS, Rust] CPG[Typed Node Model
Edge Confidence] DF[Data Flow Analysis] CFG[Control Flow Graphs] TAINT[Taint Propagation Engine] DOMAINS[Domain Queries
Security, Testing, Upgrade,
Architecture, NetPolicy] end subgraph Outputs JSON[component-architecture.json] GRAPH[code-graph.json] FINDINGS[security-findings.json/sarif] DIAGRAMS[Diagrams & Reports] end REPO --> E1 & E2 & E3 & E4 & E5 & E6 & E7 & E8 --> JSON --> DIAGRAMS REPO --> PARSE --> CPG --> DF --> CFG --> TAINT --> DOMAINS --> FINDINGS SARIF --> CPG CPG --> GRAPH classDef extractor fill:#3498db,stroke:#2980b9,color:#fff classDef cpg fill:#9b59b6,stroke:#8e44ad,color:#fff classDef output fill:#2ecc71,stroke:#27ae60,color:#fff class E1,E2,E3,E4,E5,E6,E7,E8 extractor class PARSE,CPG,DF,CFG,TAINT,DOMAINS cpg class JSON,GRAPH,FINDINGS,DIAGRAMS output ``` ## 安装从 [GitHub Releases](https://github.com/ugiordan/architecture-analyzer/releases) 下载最新的二进制文件： ``` # Linux curl -L https://github.com/ugiordan/architecture-analyzer/releases/latest/download/arch-analyzer-linux-amd64 -o arch-analyzer chmod +x arch-analyzer # macOS (Apple Silicon) curl -L https://github.com/ugiordan/architecture-analyzer/releases/latest/download/arch-analyzer-darwin-arm64 -o arch-analyzer chmod +x arch-analyzer ``` 或者从源码构建（需要 Go 1.25+）： ``` git clone https://github.com/ugiordan/architecture-analyzer.git cd architecture-analyzer go build -o arch-analyzer ./cmd/arch-analyzer/ ``` ## 用法 ### 分析仓库（提取 + 渲染） ``` ./arch-analyzer analyze /path/to/repo --output-dir output/ ``` 生成内容： - `output/component-architecture.json`（提取的架构数据） - `output/diagrams/rbac.mmd`（Mermaid RBAC 图） - `output/diagrams/component.mmd`（Mermaid 组件图） - `output/diagrams/dependencies.mmd`（Mermaid 依赖图） - `output/diagrams/dataflow.mmd`（Mermaid 序列图） - `output/diagrams/security-network.txt`（ASCII 安全/网络图） - `output/diagrams/c4-context.dsl`（Structurizr C4 DSL） - `output/diagrams/report.md`（结构化 markdown 报告） ### 仅提取（无图表） ``` ./arch-analyzer extract /path/to/repo --output component-architecture.json # 仅运行特定的 extractor groups（针对目标分析更快） ./arch-analyzer extract /path/to/repo --extractors dockerfiles,kustomize --output component-architecture.json ``` ### 生成 SBOM (CycloneDX 1.5) ``` # 从现有 extraction ./arch-analyzer sbom component-architecture.json --output sbom.json # 通过管道输出到 stdout ./arch-analyzer sbom component-architecture.json | jq '.components | length' ``` 包含 Go modules、Python 依赖、Dockerfile 基础镜像、deployment 容器镜像和 operator 镜像常量。每个组件都带有操作元数据：security context、resource limits、health probes、Dockerfile 问题。 ### 镜像与容器分析报告 ``` # Single component ./arch-analyzer report component-architecture.json --output report.md # Cross-component 分析（多个输入） ./arch-analyzer report results/*/component-architecture.json --output platform-report.md ``` 10 节报告：GPU/CUDA 依赖、基础镜像仓库、多架构支持、Dockerfile 问题、security contexts、resource limits、health probes、sidecars、deployment 问题、operator 镜像常量。 ### 代码图安全扫描 ``` ./arch-analyzer scan /path/to/repo --format json --output findings.json ./arch-analyzer scan /path/to/repo --format sarif --output findings.sarif # 带有特定 domains ./arch-analyzer scan /path/to/repo --domains security,testing,upgrade # 在 scan 同时从外部扫描器导入 SARIF ./arch-analyzer scan /path/to/repo --import-sarif gosec.sarif,semgrep.sarif # 结合架构上下文以进行更丰富的查询 ./arch-analyzer scan /path/to/repo --with-arch ``` ### 导出代码属性图 ``` ./arch-analyzer graph /path/to/repo --output code-graph.json ./arch-analyzer graph /path/to/repo --format dot --output code-graph.dot ``` ### 代码图之间的结构化 diff ``` ./arch-analyzer diff base.json head.json --format text ./arch-analyzer diff base.json head.json --format json --output diff.json ``` ### 摄取外部 SARIF 结果 ``` ./arch-analyzer ingest gosec.sarif --graph code-graph.json --output enriched-graph.json ``` ### 完整分析（架构 + 代码图 + schemas） ``` ./arch-analyzer full-analysis /path/to/repo --output-dir output/ ./arch-analyzer full-analysis /path/to/repo --import-sarif gosec.sarif --domains security ``` ### CRD 契约验证 ``` # 提取 schemas 作为 baseline ./arch-analyzer extract-schema /path/to/repo --output-dir contracts/schemas # 针对 baseline 验证更改 ./arch-analyzer validate /path/to/repo --contracts-dir contracts ``` ### 聚合多个组件 ``` ./arch-analyzer analyze /path/to/repo-a --output-dir results/repo-a ./arch-analyzer analyze /path/to/repo-b --output-dir results/repo-b ./arch-analyzer aggregate results/ --output-dir platform-output/ ``` ### 平台发现 ``` ./arch-analyzer discover /path/to/operator-repo --format json ./arch-analyzer build-config /path/to/operator-repo ``` ## 提取器 | 提取器 | 源模式 | 提取的数据 | |-----------|----------------|----------------| | CRDs | `config/crd/**`, `deploy/crds/`, `charts/**/crds/`, `manifests/**/crd*` | Group、version、kind、scope、字段数、CEL 规则 | | RBAC | `config/rbac/`, `deploy/rbac/`, Go kubebuilder 标记 | ClusterRoles、bindings、rules、kubebuilder RBAC 标记 | | Services | `**/service*.yaml` | 名称、类型、端口、selector | | Deployments | `**/deployment*.yaml`, `**/manager*.yaml`, `**/statefulset*.yaml` | 容器、security context、环境变量、卷、资源、probes | | Network Policies | `**/*networkpolicy*`, `**/*network-polic*`, `**/*netpol*`, `**/network-policies/**` | Pod selector、ingress/egress 规则 | | Controller Watches | `**/*_controller.go`, `**/setup.go`, `**/*reconciler*.go` | 带有 GVK 解析的 For/Owns/Watches | | 依赖项 | `go.mod` | Go 版本、工具链、模块（仅限直接依赖）、内部 ODH 依赖、replace 指令 | | Secrets | Deployments、services | Secret 名称、类型、引用（绝不是值） | | Helm | `Chart.yaml`、`values.yaml` | Chart 元数据、与安全相关的默认值 | | Dockerfiles | `Dockerfile*`, `Containerfile*`, `*.Dockerfile`, `*.Containerfile` | 基础镜像、构建阶段、USER、EXPOSE、FIPS 指标、带有阶段来源追踪的 COPY/ADD 指令、构建工具调用（go build、npm/yarn/pnpm、pip、make） | | Webhooks | `**/webhook*.yaml`, `**/mutating*`, `**/validating*` | Webhook 规则、failure policy、side effects | | ConfigMaps | `**/configmap*.yaml` | ConfigMap 名称、数据键 | | HTTP Endpoints | Go 源码 (`http.HandleFunc`, `mux.Route`, `gin.Engine`) | 方法、路径、handler、middleware | | Ingress | `**/ingress*`, `**/virtualservice*`, `**/httproute*` | Gateway API、Istio、K8s Ingress 资源 | | 外部连接 (Go) | Go 源码 (`sql.Open`, `redis.NewClient`, `grpc.Dial`, `sarama.New*`) | 数据库、对象存储、gRPC、消息传递引用（凭据已脱敏） | | 外部连接 | Python 源码 (`psycopg2`, `sqlalchemy`, `boto3`, `requests`, `httpx`, `grpc`, `openai`, `chromadb` 等) | 数据库、对象存储、gRPC、消息传递、HTTP 客户端、LLM/ML SDK 引用 | | Feature Gates | Go 源码 (`DefaultMutableFeatureGate.Add`, `featuregate.Feature` 常量) | Gate 名称、默认状态、pre-release 阶段、源码位置 | | 缓存配置 | Go 源码 (`ctrl.NewManager`, `cache.Options`) | 缓存范围、过滤的类型、禁用的类型、隐式 informers、GOMEMLIMIT | | Operator 配置 | Go 源码（controllers、pkg/config 中的 const/var 块） | 分类常量：镜像、端口、超时、环境变量、资源、名称模式 | | Reconcile 序列 | Go 源码（`Reconcile()` 方法） | 带有条件守卫的有序子资源 reconcile 步骤 | | Prometheus Metrics | Go 源码 (`prometheus.New*`, `promauto.New*`) | Metric 名称、类型 (gauge/counter/histogram/summary)、帮助文本、标签、namespace | | Status Conditions | Go 源码（controllers、API 类型中的 const 块） | Condition 类型常量、关联的 reason 常量、源码位置 | | 平台检测 | Go 源码（controllers、reconcilers、config 包） | 能力结构体 (IsOpenShift, HasRoute)、API 发现检查、条件性资源创建 | | Go CRD 提取 | 带有 `+kubebuilder:object:root=true` 标记的 Go 类型 | Group、version、kind、scope、storage version、hub/spoke 转换、字段数、CEL 规则 | | Webhook 行为分析 | Webhook `Default()` 和 `Validate*()` 方法体 | 字段级别的变更、字段级别的验证、同接收者方法调用追踪 | | 程序化资源操作 | Go reconcile 方法 (`client.Create/Update/Patch/Delete`) | 操作类型、目标 kind、API group、通过 `go/packages` 解析的类型 | ### 缓存架构分析缓存分析器将 controller-runtime 的缓存配置与 controller watches 和 deployment 内存限制进行交叉对比。它可以检测到： - 针对应为 namespace 限定或已过滤类型的**集群范围 informers** - watched 类型上**缺失的缓存过滤器**（大规模下的潜在 OOM 风险） - 由针对未 watch 类型的 `client.Get` 调用创建的**隐式 informers** - **缺失的 DefaultTransform**（managedFields 浪费内存） - deployment 中**缺失的 GOMEMLIMIT**（Go GC 无法进行压力调优） - **GOMEMLIMIT 超过**容器内存限制的 90% 这可以捕获到真实的 bug，例如 [opendatahub-io/data-science-pipelines-operator#992](https://github.com/opendatahub-io/data-science-pipelines-operator/issues/992) 和 [opendatahub-io/model-registry-operator#457](https://github.com/opendatahub-io/model-registry-operator/issues/457)。 ## 代码属性图 CPG 流水线使用 tree-sitter 从源码构建多语言代码图（无需编译），并在其上运行分层分析。 ### 多语言解析四种语言解析器提取 AST 级别的节点（函数、调用点、结构体字面量、HTTP endpoints、数据库操作）和边（调用、包含）： | 语言 | 解析器 | CFG | 数据流 | 污点 | |----------|--------|-----|-----------|-------| | Go | tree-sitter-go | 是 | 是 | 是 | | Python | tree-sitter-python | 是 | 是 | 是 | | TypeScript | tree-sitter-typescript | 是 | 是 | 是 | | Rust | tree-sitter-rust | 是 | 是 | 是 | ### 类型化节点模型节点带有类型化字段而不是字符串映射，涵盖函数签名（参数、返回类型）、调用目标、HTTP 路由、数据库操作、结构体类型、类定义（带有用于继承追踪的基类）、圈复杂度和入口点信任级别。 ### 边置信度调用边根据解析置信度进行分类： | 置信度 | 含义 | 示例 | |------------|---------|---------| | `CERTAIN` | 精确匹配，同一个包 | 直接函数调用 `doWork()` | | `INFERRED` | 跨包短名称匹配 | `utils.Validate()` 经启发式匹配 | | `UNCERTAIN` | 多个候选者，接口分发 | 带有多个实现的 `handler.Process()` | 安全查询从不过滤掉 UNCERTAIN 边；它们使用置信度来确定审查顺序的优先级。 ### 过程内数据流针对每个函数的分析会追踪变量赋值、读取、参数传递、字段访问和返回值。在函数体内生成 `assigns`、`reads`、`passes_to`、`field_access` 和 `returns` 边。 ### 控制流图在每个函数内进行基本块构建，带有分支边（`true_branch`、`false_branch`、`fallthrough`、`loop_back`、`loop_exit`、`exception`、`entry`、`exit`）。支持路径敏感分析：区分“验证守卫了危险操作”和“在独立路径上的验证”。 ### 污点传播两污点引擎： 1. **过程内**（阶段 A）：沿着数据流边进行每函数污点传播，通过 CFG 块可达性进行过滤。生成函数摘要。 2. **过程间**（阶段 B）：使用阶段 A 的摘要遍历调用图，以追踪跨函数边界和存储链接的污点。源：用户输入处理程序、反序列化调用。汇：SQL 执行、子进程调用、命令执行、模板渲染、HTML 输出、文件访问、eval 使用。受可配置的深度 (20)、路径 (100) 和访问 (10K) 限制约束，并带有截断诊断。 ### SARIF 摄取摄取来自外部静态分析器（Semgrep、gosec、Trivy 等）的 SARIF 2.1.0 输出，并将发现映射到 CPG 节点。使用架构上下文丰富外部发现：“Semgrep 在 handler.go:42 发现 SQL 注入”变为“该函数是一个带有 secrets RBAC 的不受信任的 webhook handler”。验证：schema 验证、路径规范化、注解清理、50K 结果大小限制。 ### 结构化 Diff 比较两个 code-graph.json 文件以检测回归：新增函数、移除的函数、改变的复杂性、新的调用边、信任级别变更。对 PR 审查自动化很有用。 ## 安全查询 ### 安全领域（12 条规则） | 规则 | ID | 严重性 | 描述 | |------|----|----------|-------------| | Webhook 缺失 Update | CGA-003 | 高 | Webhooks 拦截了 CREATE 但没有拦截 UPDATE | | RBAC 优先级 Bug | CGA-004 | 高 | 跨 bindings 的冲突 RBAC 规则 | | 证书用作 CA | CGA-005 | 高 | 证书在未经适当验证的情况下被用作 CA | | 跨 Namespace Secret | CGA-006 | 高 | 跨越 namespace 边界的 Secret 访问 | | 未过滤的缓存 | CGA-007 | 中 | 没有缓存过滤器的 watched 类型（OOM 风险） | | 明文 Secrets | CGA-008 | 中 | 源码中硬编码的 secrets 或凭据 | | 弱序列熵 | CGA-009 | 中 | 安全敏感环境中的弱随机性 | | 复杂性热点 | CGA-010 | 中 | 带有安全注释的高复杂性函数 | | 不受信任的 Endpoint | CGA-011 | 信息 | 没有可识别 auth middleware 的 HTTP endpoints | | 未受保护的 Ingress | CGA-012 | 高 | 没有 TLS 或 auth 的 Ingress 路由 | | 过度特权的 Secret 访问 | CGA-013 | 中 | 超出需要范围的广泛 secret 访问 | | 不受控制的 Egress | CGA-014 | 中 | 没有 network policy 的出站连接 | ### 测试领域（4 条规则） | 规则 | ID | 严重性 | 描述 | |------|----|----------|-------------| | 未经测试的安全功能 | CGA-T01 | 中 | 没有测试覆盖的安全注释函数 | | 仅 Fake 集成 | CGA-T02 | 低 | 仅使用 fakes/mocks 的集成测试 | | 缺失的错误路径 | CGA-T03 | 中 | 没有测试覆盖的错误返回路径 | | 合并机会 | CGA-T04 | 低 | 可以合并的重复测试模式 | ### 升级领域（4 条规则） | 规则 | ID | 严重性 | 描述 | |------|----|----------|-------------| | 未转换的 CRD | CGA-U01 | 中 | 仍在使用 v1beta1 的 CRDs | | Pre-Release API 使用 | CGA-U02 | 低 | 使用 alpha/beta Kubernetes API | | 未受控的 Feature | CGA-U03 | 中 | 没有 feature gate 保护的 feature | | 未经检查的版本访问 | CGA-U04 | 低 | 没有版本检查的依赖版本的代码 | ### 架构领域（4 条规则） | 规则 | ID | 严重性 | 描述 | |------|----|----------|-------------| | 抽象层 | CGA-A01 | 信息 | 展示带有抽象基类和实现的类层次结构 | | 外部 API 表面 | CGA-A02 | 信息 | 使用外部 SDK 客户端的函数（openai、boto3、chromadb 等） | | 工厂分发 | CGA-A03 | 信息 | 分发到多种实现类型的工厂函数 | | 未实现的接口 | CGA-A04 | 低 | 在分析的源码中未找到实现的抽象基类 | ### Network Policy 领域（2 条规则） | 规则 | ID | 严重性 | 描述 | |------|----|----------|-------------| | 裸 Namespace Selector | CGA-N01 | 高 | NetworkPolicy 允许通过 namespaceSelector 进行 ingress，而没有 podSelector 或端口限制 | | 租户 Namespace 触达 | CGA-N02 | 高 | 租户工作负载 namespace（notebooks、pipelines）可以触达控制平面 services | ## 渲染器 | 渲染器 | 输出 | 描述 | |----------|--------|-------------| | RBAC | `rbac.mmd` | Mermaid 图：ServiceAccounts -> Bindings -> Roles -> Resources | | Component | `component.mmd` | Mermaid 图：watched、owned 的 CRDs 以及依赖关系 | | 安全/网络 | `security-network.txt` | ASCII 分层视图：网络、RBAC、secrets、security contexts | | 依赖项 | `dependencies.mmd` | Mermaid 图：Go module 依赖（突出显示内部 ODH） | | C4 | `c4-context.dsl` | Structurizr C4 上下文图 | | 数据流 | `dataflow.mmd` | Mermaid 序列图：controller watches 和服务连接 | | 报告 | `report.md` | 带有所有提取数据和缓存问题的表格的结构化 markdown | ## 项目结构 ``` architecture-analyzer/ cmd/arch-analyzer/ main.go # CLI entry point with subcommands pkg/ extractor/ # 26 architecture extractors renderer/ # 7 diagram/report renderers aggregator/ # Platform-wide aggregation validator/ # CRD contract validation parser/ # Multi-language parsers (Go, Python, TypeScript, Rust) # with CFG construction per language builder/ # Code property graph builder (call resolution, edge confidence) graph/ # CPG data structures (typed nodes, edges, basic blocks) dataflow/ # Taint propagation engine (intraprocedural + interprocedural) diff/ # Structural diff engine for code graph comparison sarif/ # SARIF 2.1.0 ingestion and node mapping linker/ # Storage linker (DB operations to schemas) annotator/ # Security annotation engine query/ # Security query engine (base queries + taint-to-sink) domains/ # Domain framework with registered query rules security/ # 12 security queries testing/ # 4 testing queries upgrade/ # 4 upgrade queries architecture/ # 4 architecture queries netpolicy/ # 2 network policy queries arch/ # Architecture data structures config/ # Configuration types contracts/ schemas/ # CRD baseline schemas for validation scripts/ analyze-repo.sh # Clone + analyze + cleanup site/ docs/ # MkDocs Material documentation mkdocs.yml # Docs site configuration .github/workflows/ analyze-all.yml # Scheduled analysis workflow extract-schemas.yml # CRD schema extraction workflow validate-contracts.yml # CRD contract validation on PRs docs.yml # Deploy docs to GitHub Pages ``` ## 运行测试 ``` go test ./... ``` ## 文档完整文档发布在 **[ugiordan.github.io/architecture-analyzer](https://ugiordan.github.io/architecture-analyzer)**，涵盖了安装、指南、CLI 参考、架构和贡献。 ## GitHub Actions - `analyze-all.yml`：每周（UTC 时间周一 06:00）或手动调度时运行，分析所有配置的平台仓库并上传 artifacts - `extract-schemas.yml`：每周提取 CRD schemas，并为变更打开自动化 PR - `validate-contracts.yml`：在针对 `contracts/` 目录的 PR 上验证 CRD 契约变更 - `docs.yml`：在推送到 main 时将文档部署到 GitHub Pages

标签：EVTX分析, 云安全监控, 代码属性图, 可视化界面, 子域名突变, 日志审计, 架构分析, 逆向工具, 静态分析