JashwanthMU/TerraSecure
GitHub: JashwanthMU/TerraSecure
一款结合XGBoost机器学习和AI分析的Terraform安全扫描器,在部署前检测云基础设施配置错误并提供可操作的修复建议。
Stars: 6 | Forks: 3
## 目录:
点击展开
- [什么是TerraSecure?](#what-is-terrasecure)
- [为什么选择TerraSecure?](#why-terrasecure)
- [快速开始](#quick-start)
- [架构设计](#architecture)
- [功能特性](#features)
- [安装说明](#installation)
- [使用方法](#usage)
- [输出示例](#output-examples)
- [性能表现](#performance)
- [工具对比](#comparison)
- [CI/CD集成](#cicd-integration)
- [文档资料](#documentation)
## 什么是TerraSecure?
TerraSecure是一款**智能安全扫描器**,专为基础设施即代码设计,结合机器学习和AI驱动分析,在配置问题进入生产环境之前检测到它们。
与传统基于规则的工具不同,TerraSecure具有以下特点:
- **学习模式** - 使用预训练的XGBoost模型(准确率92.45%)
- **解释影响** - AI生成的业务上下文和攻击场景
- **减少噪音** - 误报率10.71%(优于Checkov的15%)
- **从真实漏洞学习** - 包括Capital One、Uber和Tesla事件
## 为什么选择TerraSecure?
### 问题:告警疲劳
传统安全扫描器产生太多误报。安全团队浪费时间调查不存在的问题,而真正的威胁却悄悄溜过。
### 解决方案:智能 + 上下文
|
**传统现有工具**
- 仅基于规则
- 12-15%误报率
- 缺乏上下文或解释
- 泛泛的"修复此问题"消息
- 告警疲劳
|
**TerraSecure**
- ML + 规则(92%准确率)
- 10.7%误报率
- AI解释业务影响
- 带有代码示例的具体修复方案
- 可操作的智能分析
|
### 真实影响
```
BEFORE (Checkov):
! 147 issues found (22 false positives)
! Security team spends 4 hours triaging
! 3 real issues missed in the noise
AFTER (TerraSecure):
✓ 125 issues found (13 false positives)
✓ Security team spends 1 hour triaging
✓ All critical issues caught with AI context
✓ Developers get actionable fixes immediately
```
## 快速开始
### GitHub Actions
添加到 `.github/workflows/security.yml`:
```
name: Security Scan
on: [push, pull_request]
permissions:
security-events: write
jobs:
terrasecure:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: JashwanthMU/TerraSecure@v2.0.0
```
**结果会自动显示在GitHub Security标签页中。**
### Docker
```
docker run --rm -v $(pwd):/scan \
ghcr.io/jashwanthmu/terrasecure:latest /scan
```
### 本地运行
```
git clone https://github.com/JashwanthMU/TerraSecure.git
cd TerraSecure
pip install -r requirements.txt
python src/cli.py examples/vulnerable
```
## 架构设计
### 系统概览:
TerraSecure采用**三层检测架构**:
```
flowchart TB
subgraph Input["Input Layer"]
TF[Terraform Files]
HCL[HCL Configurations]
MOD[Terraform Modules]
end
subgraph Detection["Detection Engine"]
RULES[Rule Engine
50+ Security Patterns]
ML[ML Model
XGBoost 92% Accuracy]
FEAT[Feature Extractor
50 Security Features]
end
subgraph AI["AI Enhancement"]
BEDROCK[AWS Bedrock
Claude 3 Haiku]
FALLBACK[Expert Templates
Real Breach Analysis]
CACHE[Response Cache
90% Cost Savings]
end
subgraph Output["Output Formats"]
TEXT[Text Output
Human-Readable]
JSON[JSON Output
Automation-Ready]
SARIF[SARIF 2.1.0
GitHub Security]
end
TF --> Detection
HCL --> Detection
MOD --> Detection
Detection --> |Findings|AI
AI --> Output
style Input fill:#e1f5ff
style Detection fill:#ffebee
style AI fill:#f3e5f5
style Output fill:#e8f5e9
```
**[查看完整架构 →](docs/ARCHITECTURE.md)**
### 工作原理
```
1. PARSE → Extract resources and properties from Terraform files
2. DETECT → Apply 50+ security patterns + ML risk scoring
3. ANALYZE → AI generates business impact and remediation
4. OUTPUT → Format as Text/JSON/SARIF for humans or tools
```
### ML流水线
点击查看ML训练和推理流水线
```
flowchart LR
subgraph Training[" Training Pipeline"]
DATA[265 SamplesReal Breaches]
FEATURES[50 SecurityFeatures]
MODEL[XGBoost5-Fold CV]
EXPORT[Model Export177KB]
DATA --> FEATURES
FEATURES --> MODEL
MODEL --> EXPORT
end
subgraph Inference[" Inference"]
RESOURCE[TerraformResource]
EXTRACT[FeatureExtraction]
PREDICT[RiskPrediction]
SCORE[Risk Score0.0 - 1.0]
RESOURCE --> EXTRACT
EXPORT --> PREDICT
EXTRACT --> PREDICT
PREDICT --> SCORE
end
style Training fill:#e3f2fd
style Inference fill:#fff8e1
```
**训练数据:**
- Capital One S3数据泄露事件(2019)
- Uber凭证泄露事件(2016)
- Tesla公开存储桶事件(2018)
- MongoDB勒索软件事件(2017)
## 功能特性
### 机器学习检测
|
**预训练XGBoost模型**
- 92.45%准确率
- 10.71%误报率
- 4.00%漏报率
- 50个安全特征
- 推理时间<100ms
|
**真实漏洞训练**
- Capital One(S3配置错误)
- Uber(硬编码凭证)
- Tesla(公开S3存储桶)
- MongoDB(暴露数据库)
|
### AI增强分析
每个发现都包含:
- **解释** - 问题所在及其重要性
- **业务影响** - 财务、监管和声誉风险
- **攻击场景** - 攻击者如何利用此漏洞(含真实案例)
- **详细修复** - 带有代码的分步修复方案
### 多格式输出
| 格式 | 使用场景 | 特性 |
|------|----------|------|
| **Text** | 人工审查 | 彩色输出、AI洞察 |
| **JSON** | 自动化 | 机器可读、可脚本化 |
| **SARIF 2.1.0** | GitHub Security | 代码扫描告警、PR评论 |
### 50+安全模式:
网络安全(12种模式)
- 安全组对0.0.0.0/0开放
- SSH/RDP暴露到互联网
- 出站规则无限制
- 缺少网络分段
- 使用默认安全组
- VPC未启用流日志
- ...还有6种
存储安全(15种模式)
- 公开S3存储桶
- S3/EBS/RDS未加密
- 缺少版本控制
- 无备份保留策略
- 公开快照
- 跨区域复制未启用
- ...还有9种
身份与访问(10种模式)
- IAM权限使用通配符
- 使用root账户
- 缺少MFA
- 策略过于宽松
- 内联用户策略
- ...还有5种
密钥管理(8种模式)
- 硬编码凭证
- 明文环境变量
- 未加密的密钥
- 暴露的API密钥
- ...还有4种
监控与合规(5种模式)
- CloudTrail已禁用
- 无VPC流日志
- 缺少CloudWatch告警
- 访问日志已禁用
- Config规则未启用
## 安装说明:
### 前置条件
- Python 3.11+
- pip包管理器
- 最低512MB内存
### 选项1:Docker
```
docker pull jashwanthmu/terrasecure:latest
```
### 选项2:GitHub action
```
- uses: JashwanthMU/TerraSecure@v2.0.0
```
### 选项3:从源码安装
```
git clone https://github.com/JashwanthMU/TerraSecure.git
cd TerraSecure
pip install -r requirements.txt
python src/cli.py --help
```
## 使用方法
### 命令行:
#### 基本扫描
```
# 扫描当前目录
terrasecure .
# 扫描特定目录
terrasecure infrastructure/
# 扫描单个文件
terrasecure main.tf
```
#### 输出格式:
```
# JSON 输出
terrasecure . --format json --output report.json
# GitHub Security 的 SARIF
terrasecure . --format sarif --output results.sarif
# 带 AI 洞察的文本(默认)
terrasecure .
```
#### 策略执行:
```
# 严重问题失败
terrasecure . --fail-on critical
# 高危或严重失败
terrasecure . --fail-on high
# 任何发现失败
terrasecure . --fail-on any
```
### Docker:
```
# 基本扫描
docker run --rm -v $(pwd):/scan \
ghcr.io/jashwanthmu/terrasecure:latest /scan
# 生成 SARIF 报告
docker run --rm \
-v $(pwd):/scan:ro \
-v $(pwd):/output \
ghcr.io/jashwanthmu/terrasecure:latest \
/scan --format sarif --output /output/results.sarif
# 严重问题失败
docker run --rm -v $(pwd):/scan \
ghcr.io/jashwanthmu/terrasecure:latest \
/scan --fail-on critical
```
### Github actions:
#### 基本集成
```
- name: TerraSecure Scan
uses: JashwanthMU/TerraSecure@v2.0.0
```
#### 高级配置
```
- name: Security Scan with Policy
uses: JashwanthMU/TerraSecure@v2.0.0
with:
path: 'infrastructure'
format: 'sarif'
fail-on: 'high'
upload-sarif: 'true'
```
#### 关键问题时阻止PR
```
- name: Block on Critical
uses: JashwanthMU/TerraSecure@v2.0.0
with:
fail-on: 'critical'
# PR fails if critical issues found
```
## 输出示例:
### 文本输出(可读格式)
点击查看示例输出
```
╔════════════════════════════════════════════════════════════╗
║ TerraSecure ║
║ AI-Powered Terraform Security Scanner ║
╚════════════════════════════════════════════════════════════╝
# 扫描摘要
Total Resources Scanned: 15
Resources Passed: 7
Issues Found: 8
Severity Breakdown:
Critical: 2
High: 4
Medium: 2
# 详细发现
[CRITICAL] S3 bucket with sensitive naming is publicly accessible
Resource: aws_s3_bucket.customer_data
File: infrastructure/storage.tf:12
ML Risk: 95% | Confidence: 92%
Triggered: s3_public_acl, s3_encryption_disabled (+13 more)
━━━ AI-Enhanced Analysis ━━━
Explanation:
This S3 bucket is configured with public access (acl = "public-read"),
allowing anyone on the internet to discover and potentially access its
contents. The bucket name suggests it contains sensitive customer data.
Business Impact:
Public S3 buckets are the leading cause of cloud data breaches.
Exposure could lead to:
• Data theft affecting customer privacy
• GDPR fines up to €20M or 4% of annual revenue
• Reputational damage and loss of customer trust
• Competitive intelligence leakage
Attack ScenarioAttackers use automated scanners (bucket-stream, S3Scanner) that
continuously probe for public S3 buckets. Once discovered, they can
enumerate all objects and download sensitive files within minutes.
```
Real Example: Capital One breach (2019) exposed 100M records through
misconfigured S3, resulting in $190M in settlements and fines.
```
Detailed Fix:
Step 1: Change ACL to private
acl = "private"
```
Step 2: Enable block public access
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
Step 3: Enable server-side encryption
server_side_encryption_configuration {
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
```
```
### JSON 输出(自动化)
Click to see JSON structure
```json
{
"total_resources": 15,
"passed": 7,
"stats": {
"CRITICAL": 2,
"HIGH": 4,
"MEDIUM": 2
},
"issues": [
{
"severity": "CRITICAL",
"resource_type": "aws_s3_bucket",
"resource_name": "customer_data",
"file": "infrastructure/storage.tf",
"line": 12,
"message": "S3 bucket with sensitive naming is publicly accessible",
"ml_risk_score": 0.95,
"ml_confidence": 0.92,
"triggered_features": [
"s3_public_acl",
"s3_encryption_disabled",
"s3_versioning_disabled"
],
"llm_explanation": "This S3 bucket is configured with public access...",
"llm_business_impact": "Public S3 buckets are the leading cause...",
"llm_attack_scenario": "Real Example: Capital One breach...",
"llm_detailed_fix": "Step 1: Change ACL to private..."
}
]
}
```
### SARIF输出(GitHub Security)
SARIF 2.1.0格式支持:
- 原生GitHub Security标签页集成
- 文件上的代码扫描告警
- 带有修复建议的PR评论
- 安全仪表板指标

## 性能表现
### 基准测试:
| 指标 | 数值 | 目标 | 状态 |
|------|------|------|------|
| **准确率** | 92.45% | >85% | 超出预期 |
| **精确率** | 89.29% | >80% | 超出预期 |
| **召回率** | 96.00% | >90% | 超出预期 |
| **F1分数** | 92.54% | >85% | 超出预期 |
| **误报率** | 10.71% | <15% | 优秀 |
| **漏报率** | 4.00% | <5% | 优秀 |
| **扫描速度** | <100ms/资源 | <200ms | 快速 |
| **模型大小** | 177 KB | <1MB | 极小 |
### 可扩展性
已测试:
- 10,000+ Terraform资源
- 多文件配置
- 嵌套模块
- 复杂依赖关系
内存使用:**<512MB RAM**
## 工具对比
### 与领先工具对比:
| 特性 | Checkov | Trivy | **TerraSecure** |
|------|---------|-------|-----------------|
| **检测方法** | 规则 | 规则 | **ML + AI** |
| **准确率** | ~85% | ~88% | **92.45%** |
| **误报率** | ~15% | ~12% | **10.71%** |
| **AI解释** | 否 | 否 | **完整上下文** |
| **业务影响** | 否 | 否 | **财务+监管** |
| **攻击场景** | 否 | 否 | **真实漏洞** |
| **ML风险评分** | 否 | 否 | **50个特征** |
| **真实漏洞训练** | 否 | 否 | **C1, Uber, Tesla** |
| **修复示例** | 通用 | 通用 | **具体+代码** |
| **SARIF输出** | 是 | 是 | 是 |
| **GitHub Action** | 是 | 是 | 是 |
| **Docker** | 是 | 是 | 是 |
| **离线模式** | 是 | 是 | 是 |
### 为什么选择TerraSecure?
**选择TerraSecure如果您想要:**
- 更少的误报(10.7% vs 15%)
- 面向利益相关者的AI解释
- 基于ML的风险优先级排序
- 来自真实漏洞的上下文
- 安全工具的创新
**如果您需要以下条件,请继续使用Checkov/Trivy:**
- 5年以上的实战检验
- 超大规模(100k+资源)
- 最大的规则覆盖范围(广度>深度)
**最佳方案:**
将**TerraSecure + Checkov/Trivy结合使用**,获得全面覆盖!
## CI/CD集成
### GitHub Actions:
```
name: Security
on: [push, pull_request]
permissions:
security-events: write
jobs:
terrasecure:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: JashwanthMU/TerraSecure@v2.0.0
with:
path: 'infrastructure'
fail-on: 'high'
```
### GitLab CI:
```
terrasecure:
image: ghcr.io/jashwanthmu/terrasecure:latest
script:
- terrasecure . --format json --output report.json
artifacts:
reports:
codequality: report.json
```
### Jenkins:
```
pipeline {
agent any
stages {
stage('Security Scan') {
steps {
script {
docker.image('ghcr.io/jashwanthmu/terrasecure:latest').inside {
sh 'terrasecure . --format json'
}
}
}
}
}
}
```
### Azure DevOps:
```
- task: Docker@2
inputs:
command: run
arguments: >
-v $(Build.SourcesDirectory):/scan
ghcr.io/jashwanthmu/terrasecure:latest
/scan --format sarif
```
### CircleCI:
```
version: 2.1
jobs:
security:
docker:
- image: ghcr.io/jashwanthmu/terrasecure:latest
steps:
- checkout
- run: terrasecure . --fail-on high
```
## 文档资料
### 指南
- [快速开始指南](docs/QUICK_START.md) - 5分钟内开始使用
- [Docker指南](DOCKER.md) - 容器使用和部署
- [GitHub Action指南](ACTION_README.md) - CI/CD集成
- [架构设计](docs/ARCHITECTURE.md) - 系统设计和ML模型
### 高级主题(即将更新):
- [ML模型训练](docs/ML_MODEL.md) - 模型构建方式
- [AI增强](docs/AI_ENHANCEMENT.md) - AWS Bedrock集成
- [自定义规则](docs/CUSTOM_RULES.md) - 扩展检测模式
- [SARIF格式](docs/SARIF.md) - GitHub Security集成
## 贡献
欢迎贡献!
- **问题报告** - 发现问题了?[提交问题](https://github.com/JashwanthMU/TerraSecure/issues/new)
- **功能请求** - 有想法?[开始讨论](https://github.com/JashwanthMU/TerraSecure/discussions)
- **文档改进** - 完善我们的文档
- **代码贡献** - 修复bug或添加功能
## 致谢
### 数据来源:
- [CVE数据库](https://cve.mitre.org/) - 漏洞情报
- [NIST NVD](https://nvd.nist.gov/) - 安全公告
- 公开的漏洞报告和复盘报告
### 标准规范:
- [SARIF 2.1.0](
标签:AMSI绕过, Apex, CI/CD安全, DevSecOps, Docker, ECS, GitHub Actions, IaC安全, Linux安全, Llama, SMB, Terraform, XGBoost, 上游代理, 云安全监控, 威胁检测, 安全合规, 安全扫描, 安全防御评估, 时序注入, 机器学习, 构建时安全, 结构化查询, 网络代理, 自动化安全, 自动笔记, 请求拦截, 逆向工具, 错误配置检测, 静态分析