JashwanthMU/TerraSecure

GitHub: JashwanthMU/TerraSecure

一款结合XGBoost机器学习和AI分析的Terraform安全扫描器，在部署前检测云基础设施配置错误并提供可操作的修复建议。

Stars: 10 | Forks: 3

基础设施即代码的智能安全

在安全问题演变为漏洞之前将其阻止
ML驱动检测 • AI增强分析 • 真实漏洞训练

快速开始 • 为什么选择TerraSecure？ • 功能特性 • 文档 • 与其他工具对比

## 目录：

点击展开

- [什么是TerraSecure？](#what-is-terrasecure) - [为什么选择TerraSecure？](#why-terrasecure) - [快速开始](#quick-start) - [架构设计](#architecture) - [功能特性](#features) - [安装说明](#installation) - [使用方法](#usage) - [输出示例](#output-examples) - [性能表现](#performance) - [工具对比](#comparison) - [CI/CD集成](#cicd-integration) - [文档资料](#documentation)

## 什么是TerraSecure？ TerraSecure是一款**智能安全扫描器**，专为基础设施即代码设计，结合机器学习和AI驱动分析，在配置问题进入生产环境之前检测到它们。与传统基于规则的工具不同，TerraSecure具有以下特点： - **学习模式** - 使用预训练的XGBoost模型（准确率92.45%） - **解释影响** - AI生成的业务上下文和攻击场景 - **减少噪音** - 误报率10.71%（优于Checkov的15%） - **从真实漏洞学习** - 包括Capital One、Uber和Tesla事件 ## 为什么选择TerraSecure？ ### 问题：告警疲劳传统安全扫描器产生太多误报。安全团队浪费时间调查不存在的问题，而真正的威胁却悄悄溜过。 ### 解决方案：智能 + 上下文

**传统现有工具** - 仅基于规则 - 12-15%误报率 - 缺乏上下文或解释 - 泛泛的"修复此问题"消息 - 告警疲劳

**TerraSecure** - ML + 规则（92%准确率） - 10.7%误报率 - AI解释业务影响 - 带有代码示例的具体修复方案 - 可操作的智能分析

### 真实影响 ``` BEFORE (Checkov): ! 147 issues found (22 false positives) ! Security team spends 4 hours triaging ! 3 real issues missed in the noise AFTER (TerraSecure): ✓ 125 issues found (13 false positives) ✓ Security team spends 1 hour triaging ✓ All critical issues caught with AI context ✓ Developers get actionable fixes immediately ``` ## 快速开始 ### GitHub Actions 添加到 `.github/workflows/security.yml`： ``` name: Security Scan on: [push, pull_request] permissions: security-events: write jobs: terrasecure: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: JashwanthMU/TerraSecure@v2.0.0 ``` **结果会自动显示在GitHub Security标签页中。** ### Docker ``` docker run --rm -v $(pwd):/scan \ ghcr.io/jashwanthmu/terrasecure:latest /scan ``` ### 本地运行 ``` git clone https://github.com/JashwanthMU/TerraSecure.git cd TerraSecure pip install -r requirements.txt python src/cli.py examples/vulnerable ``` ## 架构设计 ### 系统概览： TerraSecure采用**三层检测架构**： ``` flowchart TB subgraph Input["Input Layer"] TF[Terraform Files] HCL[HCL Configurations] MOD[Terraform Modules] end subgraph Detection["Detection Engine"] RULES[Rule Engine
50+ Security Patterns] ML[ML Model
XGBoost 92% Accuracy] FEAT[Feature Extractor
50 Security Features] end subgraph AI["AI Enhancement"] BEDROCK[AWS Bedrock
Claude 3 Haiku] FALLBACK[Expert Templates
Real Breach Analysis] CACHE[Response Cache
90% Cost Savings] end subgraph Output["Output Formats"] TEXT[Text Output
Human-Readable] JSON[JSON Output
Automation-Ready] SARIF[SARIF 2.1.0
GitHub Security] end TF --> Detection HCL --> Detection MOD --> Detection Detection --> |Findings|AI AI --> Output style Input fill:#e1f5ff style Detection fill:#ffebee style AI fill:#f3e5f5 style Output fill:#e8f5e9 ``` **[查看完整架构 →](docs/ARCHITECTURE.md)** ### 工作原理 ``` 1. PARSE → Extract resources and properties from Terraform files 2. DETECT → Apply 50+ security patterns + ML risk scoring 3. ANALYZE → AI generates business impact and remediation 4. OUTPUT → Format as Text/JSON/SARIF for humans or tools ``` ### ML流水线

点击查看ML训练和推理流水线

``` flowchart LR subgraph Training[" Training Pipeline"] DATA[265 SamplesReal Breaches] FEATURES[50 SecurityFeatures] MODEL[XGBoost5-Fold CV] EXPORT[Model Export177KB] DATA --> FEATURES FEATURES --> MODEL MODEL --> EXPORT end subgraph Inference[" Inference"] RESOURCE[TerraformResource] EXTRACT[FeatureExtraction] PREDICT[RiskPrediction] SCORE[Risk Score0.0 - 1.0] RESOURCE --> EXTRACT EXPORT --> PREDICT EXTRACT --> PREDICT PREDICT --> SCORE end style Training fill:#e3f2fd style Inference fill:#fff8e1 ``` **训练数据：** - Capital One S3数据泄露事件（2019） - Uber凭证泄露事件（2016） - Tesla公开存储桶事件（2018） - MongoDB勒索软件事件（2017）

## 功能特性 ### 机器学习检测

**预训练XGBoost模型** - 92.45%准确率 - 10.71%误报率 - 4.00%漏报率 - 50个安全特征 - 推理时间<100ms

**真实漏洞训练** - Capital One（S3配置错误） - Uber（硬编码凭证） - Tesla（公开S3存储桶） - MongoDB（暴露数据库）

### AI增强分析每个发现都包含： - **解释** - 问题所在及其重要性 - **业务影响** - 财务、监管和声誉风险 - **攻击场景** - 攻击者如何利用此漏洞（含真实案例） - **详细修复** - 带有代码的分步修复方案 ### 多格式输出 | 格式 | 使用场景 | 特性 | |------|----------|------| | **Text** | 人工审查 | 彩色输出、AI洞察 | | **JSON** | 自动化 | 机器可读、可脚本化 | | **SARIF 2.1.0** | GitHub Security | 代码扫描告警、PR评论 | ### 50+安全模式：

网络安全（12种模式）

- 安全组对0.0.0.0/0开放 - SSH/RDP暴露到互联网 - 出站规则无限制 - 缺少网络分段 - 使用默认安全组 - VPC未启用流日志 - ...还有6种

存储安全（15种模式）

- 公开S3存储桶 - S3/EBS/RDS未加密 - 缺少版本控制 - 无备份保留策略 - 公开快照 - 跨区域复制未启用 - ...还有9种

身份与访问（10种模式）

- IAM权限使用通配符 - 使用root账户 - 缺少MFA - 策略过于宽松 - 内联用户策略 - ...还有5种

密钥管理（8种模式）

- 硬编码凭证 - 明文环境变量 - 未加密的密钥 - 暴露的API密钥 - ...还有4种

监控与合规（5种模式）

- CloudTrail已禁用 - 无VPC流日志 - 缺少CloudWatch告警 - 访问日志已禁用 - Config规则未启用

## 安装说明： ### 前置条件 - Python 3.11+ - pip包管理器 - 最低512MB内存 ### 选项1：Docker ``` docker pull jashwanthmu/terrasecure:latest ``` ### 选项2：GitHub action ``` - uses: JashwanthMU/TerraSecure@v2.0.0 ``` ### 选项3：从源码安装 ``` git clone https://github.com/JashwanthMU/TerraSecure.git cd TerraSecure pip install -r requirements.txt python src/cli.py --help ``` ## 使用方法 ### 命令行： #### 基本扫描 ``` # 扫描当前目录 terrasecure . # 扫描特定目录 terrasecure infrastructure/ # 扫描单个文件 terrasecure main.tf ``` #### 输出格式： ``` # JSON 输出 terrasecure . --format json --output report.json # GitHub Security 的 SARIF terrasecure . --format sarif --output results.sarif # 带 AI 洞察的文本（默认） terrasecure . ``` #### 策略执行： ``` # 严重问题失败 terrasecure . --fail-on critical # 高危或严重失败 terrasecure . --fail-on high # 任何发现失败 terrasecure . --fail-on any ``` ### Docker： ``` # 基本扫描 docker run --rm -v $(pwd):/scan \ ghcr.io/jashwanthmu/terrasecure:latest /scan # 生成 SARIF 报告 docker run --rm \ -v $(pwd):/scan:ro \ -v $(pwd):/output \ ghcr.io/jashwanthmu/terrasecure:latest \ /scan --format sarif --output /output/results.sarif # 严重问题失败 docker run --rm -v $(pwd):/scan \ ghcr.io/jashwanthmu/terrasecure:latest \ /scan --fail-on critical ``` ### Github actions： #### 基本集成 ``` - name: TerraSecure Scan uses: JashwanthMU/TerraSecure@v2.0.0 ``` #### 高级配置 ``` - name: Security Scan with Policy uses: JashwanthMU/TerraSecure@v2.0.0 with: path: 'infrastructure' format: 'sarif' fail-on: 'high' upload-sarif: 'true' ``` #### 关键问题时阻止PR ``` - name: Block on Critical uses: JashwanthMU/TerraSecure@v2.0.0 with: fail-on: 'critical' # PR fails if critical issues found ``` ## 输出示例： ### 文本输出（可读格式）

点击查看示例输出

``` ╔════════════════════════════════════════════════════════════╗ ║ TerraSecure ║ ║ AI-Powered Terraform Security Scanner ║ ╚════════════════════════════════════════════════════════════╝ # 扫描摘要 Total Resources Scanned: 15 Resources Passed: 7 Issues Found: 8 Severity Breakdown: Critical: 2 High: 4 Medium: 2 # 详细发现 [CRITICAL] S3 bucket with sensitive naming is publicly accessible Resource: aws_s3_bucket.customer_data File: infrastructure/storage.tf:12 ML Risk: 95% | Confidence: 92% Triggered: s3_public_acl, s3_encryption_disabled (+13 more) ━━━ AI-Enhanced Analysis ━━━ Explanation: This S3 bucket is configured with public access (acl = "public-read"), allowing anyone on the internet to discover and potentially access its contents. The bucket name suggests it contains sensitive customer data. Business Impact: Public S3 buckets are the leading cause of cloud data breaches. Exposure could lead to: • Data theft affecting customer privacy • GDPR fines up to €20M or 4% of annual revenue • Reputational damage and loss of customer trust • Competitive intelligence leakage Attack ScenarioAttackers use automated scanners (bucket-stream, S3Scanner) that continuously probe for public S3 buckets. Once discovered, they can enumerate all objects and download sensitive files within minutes. ``` Real Example: Capital One breach (2019) exposed 100M records through misconfigured S3, resulting in $190M in settlements and fines. ``` Detailed Fix: Step 1: Change ACL to private acl = "private" ``` Step 2: Enable block public access block_public_acls = true block_public_policy = true ignore_public_acls = true restrict_public_buckets = true Step 3: Enable server-side encryption server_side_encryption_configuration { rule { apply_server_side_encryption_by_default { sse_algorithm = "AES256" } } } ``` ```

### JSON 输出（自动化）

Click to see JSON structure

```json { "total_resources": 15, "passed": 7, "stats": { "CRITICAL": 2, "HIGH": 4, "MEDIUM": 2 }, "issues": [ { "severity": "CRITICAL", "resource_type": "aws_s3_bucket", "resource_name": "customer_data", "file": "infrastructure/storage.tf", "line": 12, "message": "S3 bucket with sensitive naming is publicly accessible", "ml_risk_score": 0.95, "ml_confidence": 0.92, "triggered_features": [ "s3_public_acl", "s3_encryption_disabled", "s3_versioning_disabled" ], "llm_explanation": "This S3 bucket is configured with public access...", "llm_business_impact": "Public S3 buckets are the leading cause...", "llm_attack_scenario": "Real Example: Capital One breach...", "llm_detailed_fix": "Step 1: Change ACL to private..." } ] } ```

### SARIF输出（GitHub Security） SARIF 2.1.0格式支持： - 原生GitHub Security标签页集成 - 文件上的代码扫描告警 - 带有修复建议的PR评论 - 安全仪表板指标 ![GitHub Security Tab](https://docs.github.com/assets/cb-77251/mw-1440/images/help/security/security-tab-code-scanning-alerts.webp) ## 性能表现 ### 基准测试： | 指标 | 数值 | 目标 | 状态 | |------|------|------|------| | **准确率** | 92.45% | >85% | 超出预期 | | **精确率** | 89.29% | >80% | 超出预期 | | **召回率** | 96.00% | >90% | 超出预期 | | **F1分数** | 92.54% | >85% | 超出预期 | | **误报率** | 10.71% | <15% | 优秀 | | **漏报率** | 4.00% | <5% | 优秀 | | **扫描速度** | <100ms/资源 | <200ms | 快速 | | **模型大小** | 177 KB | <1MB | 极小 | ### 可扩展性已测试： - 10,000+ Terraform资源 - 多文件配置 - 嵌套模块 - 复杂依赖关系内存使用：**<512MB RAM** ## 工具对比 ### 与领先工具对比： | 特性 | Checkov | Trivy | **TerraSecure** | |------|---------|-------|-----------------| | **检测方法** | 规则 | 规则 | **ML + AI** | | **准确率** | ~85% | ~88% | **92.45%** | | **误报率** | ~15% | ~12% | **10.71%** | | **AI解释** | 否 | 否 | **完整上下文** | | **业务影响** | 否 | 否 | **财务+监管** | | **攻击场景** | 否 | 否 | **真实漏洞** | | **ML风险评分** | 否 | 否 | **50个特征** | | **真实漏洞训练** | 否 | 否 | **C1, Uber, Tesla** | | **修复示例** | 通用 | 通用 | **具体+代码** | | **SARIF输出** | 是 | 是 | 是 | | **GitHub Action** | 是 | 是 | 是 | | **Docker** | 是 | 是 | 是 | | **离线模式** | 是 | 是 | 是 | ### 为什么选择TerraSecure？ **选择TerraSecure如果您想要：** - 更少的误报（10.7% vs 15%） - 面向利益相关者的AI解释 - 基于ML的风险优先级排序 - 来自真实漏洞的上下文 - 安全工具的创新 **如果您需要以下条件，请继续使用Checkov/Trivy：** - 5年以上的实战检验 - 超大规模（100k+资源） - 最大的规则覆盖范围（广度>深度） **最佳方案：** 将**TerraSecure + Checkov/Trivy结合使用**，获得全面覆盖！ ## CI/CD集成 ### GitHub Actions： ``` name: Security on: [push, pull_request] permissions: security-events: write jobs: terrasecure: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: JashwanthMU/TerraSecure@v2.0.0 with: path: 'infrastructure' fail-on: 'high' ``` ### GitLab CI： ``` terrasecure: image: ghcr.io/jashwanthmu/terrasecure:latest script: - terrasecure . --format json --output report.json artifacts: reports: codequality: report.json ``` ### Jenkins： ``` pipeline { agent any stages { stage('Security Scan') { steps { script { docker.image('ghcr.io/jashwanthmu/terrasecure:latest').inside { sh 'terrasecure . --format json' } } } } } } ``` ### Azure DevOps： ``` - task: Docker@2 inputs: command: run arguments: > -v $(Build.SourcesDirectory):/scan ghcr.io/jashwanthmu/terrasecure:latest /scan --format sarif ``` ### CircleCI： ``` version: 2.1 jobs: security: docker: - image: ghcr.io/jashwanthmu/terrasecure:latest steps: - checkout - run: terrasecure . --fail-on high ``` ## 文档资料 ### 指南 - [快速开始指南](docs/QUICK_START.md) - 5分钟内开始使用 - [Docker指南](DOCKER.md) - 容器使用和部署 - [GitHub Action指南](ACTION_README.md) - CI/CD集成 - [架构设计](docs/ARCHITECTURE.md) - 系统设计和ML模型 ### 高级主题（即将更新）： - [ML模型训练](docs/ML_MODEL.md) - 模型构建方式 - [AI增强](docs/AI_ENHANCEMENT.md) - AWS Bedrock集成 - [自定义规则](docs/CUSTOM_RULES.md) - 扩展检测模式 - [SARIF格式](docs/SARIF.md) - GitHub Security集成 ## 贡献欢迎贡献！ - **问题报告** - 发现问题了？[提交问题](https://github.com/JashwanthMU/TerraSecure/issues/new) - **功能请求** - 有想法？[开始讨论](https://github.com/JashwanthMU/TerraSecure/discussions) - **文档改进** - 完善我们的文档 - **代码贡献** - 修复bug或添加功能 ## 致谢 ### 数据来源： - [CVE数据库](https://cve.mitre.org/) - 漏洞情报 - [NIST NVD](https://nvd.nist.gov/) - 安全公告 - 公开的漏洞报告和复盘报告 ### 标准规范： - [SARIF 2.1.0](

标签：AMSI绕过, Apex, CI/CD安全, DevSecOps, Docker, ECS, GitHub Actions, IaC安全, Linux安全, Llama, SMB, Terraform, XGBoost, 上游代理, 云安全监控, 威胁检测, 安全合规, 安全扫描, 安全防御评估, 时序注入, 机器学习, 构建时安全, 结构化查询, 网络代理, 自动化安全, 自动笔记, 请求拦截, 逆向工具, 错误配置检测, 静态分析