dong-quan-tran/AegisLog

GitHub: dong-quan-tran/AegisLog

基于无监督学习和 LLM 的认证与 Web 日志异常检测及智能分诊工具。

Stars: 0 | Forks: 0

# AegisLog AegisLog 是一个基于 AI 的日志分析与分诊服务，专注于身份验证和 Web 访问日志。不同于 SentinelTI 等传统的监督学习分类方法，AegisLog 采用无监督异常检测、聚类和 AI 解释技术，帮助工程师快速理解并响应系统中的异常行为。它处于 AI、软件工程和网络安全的交汇点： AI：异常检测、聚类、语义日志理解、LLM 解释。软件工程：健壮的流水线、CLI 与 API、SQLite 追踪、大批量处理性能。网络安全特性：重点关注身份验证攻击、侦察/扫描以及具有安全影响的配置错误。 ## 功能特性 - **日志摄取与标准化** 从文件或 HTTP 请求中摄取原始的身份验证和 Web 访问日志，并将其标准化为统一的事件模式（时间戳、IP、用户、路径、状态、User Agent、原始文本等）。 - **会话与 IP 行为建模** 将单个日志事件按会话（随时间变化的用户/IP/User Agent）和每个 IP 的时间窗口进行分组，然后计算丰富的行为特征，例如请求计数、会话持续时间、登录失败与成功比率、状态码模式、访问的唯一端点以及非工作时间活动。 - **无监督异常检测** 使用在主要正常行为上训练的无监督模型（Isolation Forest），为每个会话/IP 分配异常分数，无需标记的攻击数据，并将分数映射为风险等级（低/中/高）。 - **事件聚类取代告警洪泛** 利用行为特征和可选的日志消息语义嵌入，将相关的异常会话聚类为更高层级的事件，这样您只需审查少量事件，而不是成千上万个孤立的异常。 - **LLM 驱动的解释与分类** 对于每个事件，AI 解释器会生成简短的人类可读摘要（例如，“可能来自单一 IP 的凭证填充攻击”），并提出分类标签，如 `auth_attack`、`scanner`、`misconfiguration` 或 `app_error`。 - **安全导向的行为检测** 专注于对安全性和可靠性至关重要的模式，包括密码喷洒、凭证填充、暴力破解登录尝试、对大量端点的侦察/扫描，以及敏感路径上的突发错误激增。 - **分诊工作流与反馈闭环** 将事件、异常分数和解释存储在 SQLite 中，允许分析师将事件标记为“真实事件”或“良性”，从而支持阈值调整和从过往分诊决策中简单学习。 - **开发者友好的 CLI 与 HTTP API** 提供 CLI 用于初始化数据库、训练模型和分析日志文件，以及 FastAPI HTTP API，该 API 包含用于每次会话异常检测和事件级分析的端点，适合集成到开发、SRE 或 SecOps 工作流中。 - **实验追踪与评估** 在 SQLite 中追踪模型版本、特征配置和评估指标，以便您可以在小型标记基准测试上以可复现的方式比较不同的异常模型和特征集。 ## 技术栈 - Python 3.10+ - FastAPI（用于 HTTP API） - scikit-learn（用于异常检测，Isolation Forest） - SQLite（用于实验追踪和分诊历史） - Pytest（用于测试） - （可选）Sentence-transformer / 小型嵌入模型（用于语义日志分析） ## 快速开始 ### 1. 克隆仓库 ``` git clone https://github.com//AegisLog.git cd AegisLog 2. Create and activate a virtual environment On Windows (PowerShell): powershell python -m venv .venv .venv\Scripts\Activate.ps1 On Windows (cmd): text python -m venv .venv .\.venv\Scripts\activate.bat On Linux/macOS: bash python -m venv .venv source .venv/bin/activate 3. Install dependencies bash pip install --upgrade pip pip install -r requirements.txt (You’ll add requirements.txt soon.) 4. Run the CLI (dev placeholder) bash python -m aegislog.cli --help 5. Run the API (dev placeholder) bash uvicorn aegislog.api:app --host 0.0.0.0 --port 8080 --reload CLI (planned) AegisLog will provide commands for training and detection: Initialize experiment DB: bash python -m aegislog.cli init Train anomaly model on logs: bash python -m aegislog.cli train --logs-path data/train_logs Analyze a log file and print top incidents (human-readable): bash python -m aegislog.cli analyze logs/access.log Output incidents as JSON for integration: bash python -m aegislog.cli analyze logs/access.log --json-pretty HTTP API (planned) GET /health – Basic health check. POST /detect-sessions – Scores sessions/IPs and returns anomaly scores. POST /detect-incidents – Runs detection, clustering, and explanation to produce incidents. Authentication: future versions will support an API key via X-API-KEY. How it works (high level) Parse logs Raw auth/access logs are parsed into a normalized event schema (timestamp, IP, user, path, status, user-agent, etc.). Build behavioral features Events are grouped into sessions and per-IP windows, and features like event count, duration, failed login ratio, status code pattern, and night-time activity are computed. Detect anomalies An Isolation Forest model trained on mostly-normal data assigns an anomaly score to each session/IP. Scores are mapped to risk levels. Group into incidents Anomalous sessions/IPs are clustered into incidents so analysts can review a handful of groups instead of thousands of individual events. Explain incidents (AI explainer) For each incident, AegisLog summarizes key patterns in natural language and suggests likely categories such as credential stuffing, vulnerability scanning, or misconfiguration. Project status Early development. CLI/API commands and models are subject to change. Author Name: Dong Quan Tran (Johnny) GitHub: https://github.com/dong-quan-tran ```

标签：AI安全, AMSI绕过, API服务, AV绕过, Chat Copilot, FastAPI, Isolation Forest, OISF, Python安全工具, SQLite, Web日志, 事件聚类, 凭证填充, 威胁检测, 安全事故响应, 安全信息与事件管理, 安全态势感知, 安全运营, 密码管理, 异常检测, 扫描框架, 插件系统, 搜索引擎爬取, 无监督学习, 日志管理, 网络侦察检测, 网络安全, 自动化分类, 行为建模, 身份认证安全, 逆向工具, 隐私保护, 风险评分