ajibona-ayomide/SAAD

GitHub: ajibona-ayomide/SAAD

SAAD 是一个结合规则引擎与机器学习的认证日志异常检测系统，用于在 SOC 环境中自动识别认证相关的安全威胁并输出风险评分。

Stars: 0 | Forks: 0

# SAAD — 智能认证异常检测器 **B组 | 网络安全系 | 指导教师：Olorunfemi Blessing 先生** 一个基于机器学习、面向 SOC 环境的认证威胁检测系统。 ## 快速开始 ### 1. 克隆并在 VSCode 中打开 ``` git clone cd SAAD code . ``` ### 2. 创建并激活虚拟环境 ``` python -m venv venv # Windows venv\Scripts\activate # Linux / macOS source venv/bin/activate ``` ### 3. 安装依赖 ``` pip install -r requirements.txt pip install -e . # makes the saad package importable ``` ### 4. 配置日志源 ``` # 使用你的实际日志路径编辑 saad/config/log_sources.yaml ``` ### 5. 运行 pipeline ``` python main.py # auth logs, 24h window python main.py --log-type system python main.py --log-type auth --time-range 1_week ``` ### 6. 运行测试 ``` pytest # all tests pytest tests/unit/ # unit tests only pytest tests/integration/ # integration tests only pytest --cov=saad # with coverage report ``` ### 7. 启动 API server ``` python -m saad.api.app # API 可通过 http://localhost:5000/api/v1/ 访问 ``` ## 项目结构 ``` SAAD/ ├── main.py ← Pipeline entry point ├── requirements.txt ├── setup.py ├── pytest.ini ├── .env.example ├── .gitignore │ ├── saad/ ← Main package │ ├── collectors/ ← Log file readers │ │ ├── base_collector.py │ │ ├── auth_log_collector.py │ │ ├── system_log_collector.py │ │ └── app_log_collector.py │ │ │ ├── preprocessing/ ← Clean → Normalize → Features → Validate │ │ ├── cleaning.py │ │ ├── normalization.py │ │ ├── feature_extraction.py │ │ └── schema_validation.py │ │ │ ├── analysis/ ← Rule-based detection │ │ └── rule_based_analysis.py │ │ │ ├── ml/ ← Machine learning layer │ │ ├── isolation_forest.py │ │ ├── behavioral_profiling.py │ │ └── model_utils.py │ │ │ ├── scoring/ ← Risk score computation │ │ └── risk_scorer.py │ │ │ ├── database/ ← MySQL / SQLAlchemy │ │ ├── db_connection.py │ │ ├── models.py │ │ └── db_utils.py │ │ │ ├── api/ ← Flask REST API │ │ ├── app.py │ │ └── routes.py │ │ │ ├── dashboard/ ← React SOC UI (served by Flask) │ │ ├── static/ │ │ └── templates/ │ │ │ ├── utils/ ← Shared helpers │ │ ├── logger.py │ │ ├── helpers.py │ │ └── serializers.py │ │ │ ├── config/ ← Settings and YAML loader │ │ ├── settings.py │ │ ├── loader.py │ │ └── log_sources.yaml │ │ │ └── schemas/ ← JSON Schema validation files │ ├── base_log_schema.json │ ├── auth_log_schema.json │ ├── system_log_schema.json │ └── app_log_schema.json │ ├── tests/ │ ├── unit/ ← Isolated function/class tests │ │ ├── test_cleaning.py │ │ ├── test_helpers.py │ │ └── test_risk_scorer.py │ └── integration/ ← Full pipeline tests │ └── test_pipeline.py │ ├── data/ │ └── sample_logs/ ← Put sample .log files here for testing │ ├── logs/ ← Runtime log files (auto-created) ├── outputs/ ← JSON outputs + saved ML models └── docs/ └── architecture.md ``` ## Pipeline 阶段 | # | 阶段 | 模块 | |---|-------|--------| | 1 | 收集原始日志 | `saad/collectors/` | | 2 | 移除空条目/重复条目 | `preprocessing/cleaning.py` | | 3 | 标准化字段并解析时间戳 | `preprocessing/normalization.py` | | 4 | 提取 IP、用户名、小时、关键字标志 | `preprocessing/feature_extraction.py` | | 5 | 根据 JSON Schema 进行验证 | `preprocessing/schema_validation.py` | | 6 | 基于规则的签名检测 | `analysis/rule_based_analysis.py` | | 7 | 用户行为画像 | `ml/behavioral_profiling.py` | | 8 | Isolation Forest 异常评分 | `ml/isolation_forest.py` | | 9 | 加权风险评分 (0–100) → 低/中/高 | `scoring/risk_scorer.py` | ## 环境变量 | 变量 | 用途 | 默认值 | |----------|---------|---------| | `SAAD_DB_PASSWORD` | MySQL 数据库密码 | `changeme` | 将 `.env.example` 复制为 `.env` 并填入你的值。 ## 使用你的数据集进行测试 ### 第 1 步 — 放置你的数据集将你打好标签的 Excel 文件复制到 `data/` 文件夹中： ``` data/test.xlsx ``` ### 第 2 步 — 转换为无标签日志文件 ``` python data/prepare_dataset.py # 可选：python data/prepare_dataset.py --input data/myfile.csv --format csv ``` 这将生成： - `data/sample_logs/auth/auth_test_.log` — 无标签，可直接供 pipeline 使用 - `data/ground_truth.json` — 原始标签单独保存（pipeline 绝不会读取） ### 第 3 步 — 运行 pipeline ``` python main.py --log-type auth ``` ### 第 4 步 — 评估结果 ``` python data/evaluate_results.py ``` 输出每种攻击类别的检测率和误报率。 ### 预期的数据集格式你的 Excel/CSV 必须恰好包含 11 列（无标题行）： | # | 列 | 示例 | |---|--------|---------| | 0 | timestamp | 2024-08-14T04:53:31 | | 1 | source_ip | 164.218.94.112 | | 2 | hostname | srv-ldn-02 | | 3 | username | www-data | | 4 | auth_method | sudo | | 5 | attempts | 4 | | 6 | auth_result | Failed / Success | | 7 | port | 443 | | 8 | protocol | RDP / SSH2 / TELNET | | 9 | message | User www-data failed login via sudo | | 10 | label | normal / brute_force / geo_anomaly / port_scan / privilege_escalation |

标签：Apex, Python, Web API, 安全规则引擎, 安全运营, 异常检测, 扫描框架, 无后门, 机器学习, 网络测绘, 逆向工具