Zyrakk/noctis

GitHub: Zyrakk/noctis

一个用 Go 编写的自主暗网威胁情报平台，集成多源数据采集、LLM 驱动的分类与 IOC 提取、知识图谱关联和漏洞优先级评分于一体。

Stars: 0 | Forks: 0

# Noctis **自主暗网威胁情报平台** [![Go 1.25+](https://img.shields.io/badge/Go-1.25%2B-00ADD8?logo=go)](https://go.dev/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) [![CI](https://github.com/Zyrakk/noctis/actions/workflows/ci.yml/badge.svg)](https://github.com/Zyrakk/noctis/actions) [![Container](https://img.shields.io/badge/container-ghcr.io%2Fzyrakk%2Fnoctis-blue?logo=github)](https://ghcr.io/zyrakk/noctis) ## Noctis 是什么 Noctis 是一个使用 Go 编写的长期运行威胁情报 daemon。它自主从 Telegram 频道 (MTProto)、代码粘贴网站、暗网论坛和 RSS/Web 订阅源收集内容，然后将所有内容归档至 PostgreSQL。每一项收集的内容都会按严重程度和类别进行分类、摘要，并通过 IOC 和命名实体提取 pipeline 进行处理。提取出的实体构成了一个持久的知识图谱，其类型化关系跨越了行为者、基础设施、恶意软件家族和漏洞。关联引擎检测图谱中的跨来源模式。一个独立的 LLM 分析师 (LLM Analyst) 审查每个候选关联，并基于理由做出确认或拒绝的决定，从而生成结构化分析笔记。IOC 威胁分数通过生命周期管理器随时间呈指数衰减，该管理器会停用过期的指标。每日简报生成器将过去 24 小时的情报合成为一份结构化报告。自然语言查询引擎将人类问题转换为 SQL，并在实时存档上执行。漏洞情报来自 NVD、EPSS 和 CISA KEV 订阅源，并按优先级进行评分。收集到的 IOC 通过 AbuseIPDB、VirusTotal 和 crt.sh 进行异步富化。自主来源发现引擎从摄取的内容中提取 URL，并将新来源排入队列供操作员审查，从而允许收集网络在无需人工干预的情况下扩展。所有信息均可通过嵌入在 Go 二进制文件中的 React 仪表板查看，并由 API 密钥身份验证保护。 ## 架构 ``` +-----------------------------------------------------------------------------------+ | Layer 0: Collectors | | | | Telegram MTProto Paste Sites Dark Web Forums RSS / Web Feeds | | | | | | | | +------------------+----------------+------------------+ | | | | | CollectorManager | +------------------------------------------+-----------------------------------------+ | +-----------------------+ | +------------------v----------------+ | Layer 1: IngestPipeline | | | | Dedup -> Matcher (keyword/regex) | | -> Archive -> Alert path | +------------------+----------------+ | +------------------v--------------------------------------------------------+ | Layer 2: ProcessingEngine (background workers) | | | | Classifier Groq / llama-4-scout (category, severity) | | Summarizer Groq / llama-4-scout (abstractive summary) | | IOC Extractor Groq / llama-4-scout (IPs, domains, hashes, CVEs) | | Entity Extractor GLM-5-Turbo (actors, orgs, malware) | | Graph Bridge (entity -> graph write) | | Librarian GLM-5-Turbo (sub-classification) | | IOC Lifecycle (exponential score decay) | | Content truncation: 4K classify, 6K summarize, 8K IOC extraction | +------------------+--------------------------------------------------------+ | +------------------v--------------------------------------------------------+ | Layer 3: Brain (scheduled intelligence) | | | | Correlator rule-based cross-source pattern detection | | Analyst Gemini 3.1 Pro (LLM confirmation of correlations) | | Brief Generator Gemini 3.1 Pro (daily 24-hour intelligence summary) | | Query Engine Gemini 3.1 Pro (natural language -> SQL) | +------------------+--------------------------------------------------------+ | +------------------v--------------------------------------------------------+ | Layer 4: Infrastructure | | | | Vuln Ingestor NVD / EPSS / CISA KEV -> priority scoring | | Source Value Analyzer | | Enrichment Pipeline AbuseIPDB / VirusTotal / crt.sh | | Discovery Engine URL extraction -> pending source queue | | Triage Worker LLM-powered URL classification (investigate/trash) | +------------------+--------------------------------------------------------+ | +------------------v--------------------------------------------------------+ | Layer 5: Dashboard | | | | React SPA embedded in binary 14 pages 25+ API endpoints | | X-API-Key auth on all data routes | +--------------------------------------------------------------------------+ Shared data store: PostgreSQL (20 tables, 12 migrations) Metrics: Prometheus /metrics, /healthz, /readyz (health port, default 8080) ``` ## 功能特性 ### 收集 - **Telegram MTProto** — 通过 [gotd/td](https://github.com/gotd/td) 连接到频道和群组。自动加入公开频道。通过 `inviteHash` 频道配置字段支持私有邀请链接（`t.me/+hash` 和 `t.me/joinchat/hash` 格式）。标识符规范化将 URL 转换为纯用户名。在重新连接时重放可配置的历史积压消息。会话通过 PVC 在 Pod 重启后保留。 - **代码粘贴网站** — 通过可配置的 HTTP 爬虫抓取 Pastebin 和自定义粘贴目标。 - **暗网论坛** — 基于 CSS 选择器的抓取，支持按论坛配置的身份验证、分页和 Tor 代理。 - **RSS/Web 订阅源** — 按可配置的间隔轮询 RSS、Atom 和可抓取的网页。RSS 订阅源从配置文件和 `sources` 表（类型为 `rss`，状态为 `active`）加载，每 30 分钟从数据库刷新一次。发现功能可以在运行时批准新的订阅源，而无需重新部署。每次收集周期后，`sources.last_collected` 时间戳会更新。 - **自主来源发现** — 从摄取的内容中提取 URL，并通过三层 pipeline 进行过滤：可配置的黑名单、白名单（glob 模式 + 精确域名）和 AI 驱动的批量分诊。列入白名单的 URL 会立即排入队列；未知的 URL 进入 `pending_triage` 状态，并由快速 LLM 定期评估。积累重复垃圾决策的域名会被自动加入黑名单。 ### 处理 - **LLM 分类** — Groq (llama-4-scout) 在警报路径上将每项发现分类为类别、严重性和子类别，并针对吞吐量进行了优化。分类前内容会被截断至 4K 字节。 - **摘要** — Groq (llama-4-scout) 生成原始收集内容的抽象摘要（截断至 6K 字节）。被分类为无关的条目将跳过摘要。 - **IOC 提取** — Groq (llama-4-scout) 提取 IP、域名、URL、文件哈希 (MD5/SHA-1/SHA-256)、CVE 和电子邮件地址及来源上下文。提取前内容会被截断至 8K 字节。 - **实体提取** — GLM-5-Turbo 识别命名行为者、组织、恶意软件家族和工具，并将其写入知识图谱。 - **图谱桥接** — 使用跨发现的类型化关系连接提取的实体和 IOC。 - **信息管理器 (Librarian)** — GLM-5-Turbo 在初始分类后应用细粒度的子分类。 - **IOC 生命周期管理器** — 对 IOC 威胁分数应用可配置的指数衰减，并停用过期的指标。 ### 情报 - **关联引擎** — 使用共享的 IOC、实体共现和时间接近度检测跨来源模式。生成结构化的候选关联。 - **LLM 分析师** — Gemini 3.1 Pro 审查每个候选关联，并基于理由做出确认或拒绝的决定，存储为分析笔记。 - **每日简报生成器** — Gemini 3.1 Pro 将过去 24 小时的收集窗口合成为结构化情报简报。 - **自然语言查询引擎** — Gemini 3.1 Pro 将人类问题转换为 SQL，并在实时存档上执行。LLM 生成的 SQL 在执行前会去除尾随分号。 ### 富化与漏洞情报 - **IOC 富化** — 异步查询 AbuseIPDB (IP)、VirusTotal (哈希和域名) 和 crt.sh (域名证书历史)。 - **漏洞情报** — 摄取 NVD CVE 数据、EPSS 利用概率分数和 CISA KEV 条目。应用结合 CVSS、EPSS 和 KEV 成员资格的优先级评分。 - **来源价值分析** — 根据每个来源产生的可操作情报的质量和数量对其进行评分。 ### 基础设施 - **LLM 预算管理** — 通过 `monthlyBudgetUSD` 为每个 LLM 提供商设置月度支出限制。`SpendingTracker` 使用可配置的 `inputCostPer1M` 和 `outputCostPer1M` 费率监控累积 token 使用量和估算成本。当提供商的预算耗尽时，断路器会暂停所有使用该提供商的 worker 30 分钟。Groq 预算默认为 $10/月。 - **内容截断** — 可配置的 `maxContentLength`（默认 8192 字节）在 LLM 调用前截断内容。每个处理阶段应用其自身的限制：分类 4K，摘要 6K，IOC 提取 8K。 - **单一二进制文件** — 整个平台（包括 React 仪表板）编译为一个 Go 二进制文件。无需 sidecar 进程。 - **自动迁移** — 启动时运行 PostgreSQL 迁移。 - **Prometheus 指标** — 在专用的抓取端点上暴露收集器吞吐量、LLM 调用率和错误计数器。 - **Kubernetes 原生** — 附带 namespace、secrets、PostgreSQL StatefulSet、ConfigMap 和 Deployment 的清单。PVC 为 Telegram 身份验证提供会话持久性。 - **模块注册表** — 所有 16 个活跃模块向共享注册表报告结构化状态（运行状态、吞吐量、错误计数、最后活动、AI 提供商/模型），该注册表可在系统状态仪表板页面上查看。 ## 快速开始 ### 前置条件 - Go 1.25+ - PostgreSQL 14+ - 来自 [Z.ai](https://api.z.ai) 的 API 密钥 (GLM-5-Turbo) — 实体提取和子分类所需 - Groq API 密钥（推荐开发者级别）— 分类、摘要和 IOC 提取所需 - Gemini API 密钥 — 关联分析、简报和自然语言查询所需 ### 构建 ``` go build -o noctis ./cmd/noctis/ ``` ### Docker ``` docker pull ghcr.io/zyrakk/noctis:latest ``` 多架构 (Intel + ARM): ``` docker buildx build --platform linux/amd64,linux/arm64 \ -t ghcr.io/zyrakk/noctis:latest --push . ``` ### 最小化运行 (仅限 RSS，无需 Telegram 凭据) ``` # noctis-config.yaml noctis: logLevel: info healthPort: 8080 database: driver: postgres dsn: "${NOCTIS_DB_DSN}" llm: provider: glm baseURL: "https://api.z.ai/api/coding/paas/v4" model: "glm-5-turbo" apiKey: "${NOCTIS_LLM_API_KEY}" sources: telegram: enabled: false paste: enabled: false forums: enabled: false web: enabled: true feeds: - name: "bleeping-computer" url: "https://www.bleepingcomputer.com/feed/" type: rss interval: 900s collection: archiveAll: true classificationWorkers: 1 entityExtractionWorkers: 1 discovery: enabled: false dashboard: enabled: true port: 3000 apiKey: "${NOCTIS_DASHBOARD_API_KEY}" ``` ``` export NOCTIS_DB_DSN="postgres://user:pass@localhost:5432/noctis?sslmode=disable" export NOCTIS_LLM_API_KEY="..." export NOCTIS_DASHBOARD_API_KEY="..." ./noctis serve --config noctis-config.yaml ``` 迁移在启动时自动运行。仪表板可通过 `http://localhost:3000` 访问。 ## 配置配置文件需要一个顶级 `noctis:` 键。环境变量替换 (`${VAR_NAME}`) 在 YAML 解析之前应用 — 未设置的变量将展开为空字符串。 ### 环境变量 **必需：** | 变量 | 描述 | |---|---| | `NOCTIS_LLM_API_KEY` | Z.ai GLM-5-Turbo API 密钥 | | `NOCTIS_GROQ_API_KEY` | Groq API 密钥（推荐开发者级别）| | `NOCTIS_GEMINI_API_KEY` | Google Gemini API 密钥 | | `NOCTIS_DASHBOARD_API_KEY` | 仪表板 API 密钥（通过 `X-API-Key` 标头发送）| | `NOCTIS_DB_DSN` | PostgreSQL 连接字符串 | **Telegram（启用 Telegram 时必需）：** | 变量 | 描述 | |---|---| | `NOCTIS_TG_API_ID` | Telegram API ID | | `NOCTIS_TG_API_HASH` | Telegram API 哈希 | | `NOCTIS_TG_PHONE` | 用于 Telegram 认证的手机号码 | | `NOCTIS_TG_PASSWORD` | 2FA 密码（如果已启用）| **可选：** | 变量 | 描述 | |---|---| | `NOCTIS_NVD_API_KEY` | NVD API 密钥（更高速率限制）| | `NOCTIS_ABUSEIPDB_KEY` | AbuseIPDB 富化密钥 | | `NOCTIS_VT_KEY` | VirusTotal 富化密钥 | ### 完整配置参考 ``` noctis: logLevel: info # debug | info | warn | error metricsPort: 9090 # Prometheus scrape port healthPort: 8080 # /healthz, /readyz, /metrics, /auth/qr # --- LLM Clients --- # GLM-5-Turbo: entity extraction, sub-classification (Librarian) llm: provider: glm baseURL: "https://api.z.ai/api/coding/paas/v4" model: "glm-5-turbo" apiKey: "${NOCTIS_LLM_API_KEY}" maxTokens: 1024 temperature: 0.1 timeout: 30s retries: 3 maxConcurrent: 2 requestsPerMinute: 20 tokensPerMinute: 1500 monthlyBudgetUSD: 5.0 # monthly spending limit (0 = unlimited) inputCostPer1M: 0.50 # cost per 1M input tokens outputCostPer1M: 1.00 # cost per 1M output tokens # Groq / llama-4-scout: classification, summarization, IOC extraction llmFast: provider: groq baseURL: "https://api.groq.com/openai/v1" model: "meta-llama/llama-4-scout-17b-16e-instruct" apiKey: "${NOCTIS_GROQ_API_KEY}" maxConcurrency: 5 tokensPerMinute: 300000 tokensPerDay: 10000000 # 10M tokens/day monthlyBudgetUSD: 10.0 # monthly spending limit ($10/mo for Groq) inputCostPer1M: 0.11 # cost per 1M input tokens outputCostPer1M: 0.34 # cost per 1M output tokens # Gemini 3.1 Pro: analytical reasoning (correlations, briefs, NL queries) llmBrain: provider: gemini baseURL: "https://generativelanguage.googleapis.com/v1beta/openai" model: "gemini-3.1-pro-preview" apiKey: "${NOCTIS_GEMINI_API_KEY}" maxConcurrent: 1 monthlyBudgetUSD: 17.0 # --- Sources --- sources: telegram: enabled: true apiId: ${NOCTIS_TELEGRAM_API_ID} apiHash: "${NOCTIS_TELEGRAM_API_HASH}" phone: "${NOCTIS_TELEGRAM_PHONE}" password: "${NOCTIS_TELEGRAM_PASSWORD}" # 2FA, optional sessionFile: "/data/telegram.session" catchupMessages: 100 channels: - username: "RalfHackerChannel" - username: "zer0day1ab" - inviteHash: "abc123def" # private channel via t.me/+abc123def paste: enabled: false pastebin: enabled: false apiKey: "${NOCTIS_PASTEBIN_API_KEY}" interval: 60s scrapers: - name: "paste-custom" url: "https://example.com/pastes" interval: 300s tor: false forums: enabled: false sites: - name: "example-forum" url: "https://forum.example.onion" tor: true interval: 1800s maxPagesPerCrawl: 5 requestDelay: 5s auth: username: "${FORUM_USER}" password: "${FORUM_PASS}" loginURL: "https://forum.example.onion/login" usernameField: "username" passwordField: "password" scraper: threadListSelector: ".thread-list .thread" threadContentSelector: ".post-content" authorSelector: ".post-author" paginationSelector: "a.next-page" web: enabled: true feeds: - name: "bleeping-computer" url: "https://www.bleepingcomputer.com/feed/" type: rss # rss | scrape | search interval: 900s - name: "the-hacker-news" url: "https://feeds.feedburner.com/TheHackersNews" type: rss interval: 900s - name: "cisa-advisories" url: "https://www.cisa.gov/cybersecurity-advisories/all.xml" type: rss interval: 1800s tor: socksProxy: "127.0.0.1:9050" requestTimeout: 30s # --- Matching --- matching: rules: - name: "ransomware-keywords" type: keyword # keyword | regex patterns: ["ransomware", "lockbit", "blackcat", "alphv"] severity: high # critical | high | medium | low | info - name: "credential-patterns" type: regex patterns: - '(?i)(password|passwd|pwd)\s*[:=]\s*\S+' - '(?i)(api[_-]?key|apikey)\s*[:=]\s*\S{20,}' severity: critical - name: "cve-mentions" type: regex patterns: ['CVE-20\d{2}-\d{4,}'] severity: medium # --- Collection Workers --- collection: archiveAll: true # archive every item, not just matched ones classificationWorkers: 4 entityExtractionWorkers: 1 librarianWorkers: 1 classificationBatchSize: 10 maxContentLength: 8192 # bytes; truncated per-stage (4K/6K/8K) # --- Correlation Engine --- correlation: enabled: true intervalMinutes: 30 minEvidenceThreshold: 2 temporalWindowHours: 48 # --- LLM Analyst --- analyst: enabled: true intervalMinutes: 60 batchSize: 5 minSignalCount: 2 promoteThreshold: 0.7 # --- IOC Lifecycle --- iocLifecycle: enabled: true intervalMinutes: 360 deactivateThreshold: 0.05 # deactivate when score drops below 5% # --- Daily Brief --- briefGenerator: enabled: true scheduleHour: 6 # UTC hour to generate the daily brief # --- Vulnerability Intelligence --- vuln: enabled: true intervalHours: 6 nvdApiKey: "${NOCTIS_NVD_API_KEY}" # --- IOC Enrichment --- enrichment: enabled: true intervalMinutes: 60 batchSize: 20 abuseipdbKey: "${NOCTIS_ABUSEIPDB_KEY}" virusTotalKey: "${NOCTIS_VT_KEY}" # --- Source Discovery --- discovery: enabled: true autoApprove: false # require operator review triageEnabled: true triageBatchSize: 100 allowPatterns: - "*.onion" - "pastebin.com" - "ghostbin.*" allowDomains: - breachforums.st - exploit.in - xss.is domainBlacklist: - nvd.nist.gov - github.com - wikipedia.org # --- Database --- database: driver: postgres dsn: "${NOCTIS_DB_DSN}" # --- Dashboard --- dashboard: enabled: true port: 3000 apiKey: "${NOCTIS_DASHBOARD_API_KEY}" # --- Graph --- graph: enabled: true # --- Storage --- storage: artifactPath: "/data/artifacts" maxArtifactSizeMB: 50 # --- Dispatch --- dispatch: wazuh: enabled: false endpoint: "" webhooks: [] # [{name, url, minSeverity}] crds: enabled: false networkPolicy: enabled: false ``` ## Telegram 集成 Noctis 使用 MTProto 协议连接到 Telegram。在启动 daemon 之前进行一次身份验证： ``` # 二维码登录（使用 Telegram 移动应用扫描） ./noctis telegram-auth --config config.yaml --qr # 短信验证码登录 ./noctis telegram-auth --config config.yaml --sms ``` 会话文件将写入 `sources.telegram.sessionFile` 中配置的路径。当存储在 PVC 上时，它可以在 Pod 重启后保留。 **运行时频道管理** — 无需重启即可添加频道： ``` ./noctis source add --type telegram_channel --identifier "channelname" ``` Telegram 收集器将配置频道与类型为 `telegram_channel` 的数据库来源合并，并每 5 分钟轮询一次数据库，自动订阅新添加的频道。公开频道通过 `ChannelsJoinChannel` 加入 — 无需通过手机应用手动加入。私有频道通过 `inviteHash` 字段支持，该字段接受 `t.me/+hash` 和 `t.me/joinchat/hash` 格式。Telegram 标识符 URL 在存储前会规范化为纯用户名。 QR 身份验证也可通过健康端口上的 `/auth/qr` 端点使用。 ## 仪表板仪表板是一个编译并嵌入到 Go 二进制文件中的 React SPA。无需单独的前端部署。 **访问：** ``` # Kubernetes kubectl port-forward deployment/noctis -n noctis 3000:3000 # 然后打开 http://localhost:3000 # 独立运行 — 可通过 http://localhost:3000 访问 ``` 身份验证使用 `X-API-Key` 标头而不是 Bearer token。登录页面接受在 `dashboard.apiKey` 中设置的密钥。密钥比较使用恒定时间评估，以防止时序攻击。两个公共端点无需身份验证即可暴露安全的汇总数据。 ### 页面 | 页面 | 描述 | |---|---| | 首页 | 包含汇总统计信息和最新发现的公共主页（无需身份验证）| | 登录 | API 密钥输入表单 | | 概览 | 图表：随时间推移的发现、类别明细、来源活动、IOC 类型分布 | | 情报概览 | 跨来源情报图：活跃 IOC、关联、实体数量、简报状态 | | 发现 | 所有归档发现的可过滤表格，包含类别、严重性、来源和完整详情面板 | | IOC 浏览器 | IOC 浏览器，具有类型/活跃/富化过滤器、威胁分数、富化数据、CSV 导出功能 | | 来源 | 来源注册表：批准、拒绝和添加新来源；类型过滤标签（全部、RSS、Telegram、Web、其他）；显示来源价值分数 | | 实体图 | 知识图谱的力导向可视化，具有实体类型和关系过滤器 | | 分析笔记 | LLM 分析师对候选关联的决定及理由 | | 关联 | 候选关联和已确认的关联及证据摘要 | | 简报 | 每日情报简报的分页列表，包含全文 | | 漏洞 | NVD/EPSS/KEV 漏洞表，具有优先级评分和 CVE 详细视图 | | 查询 | 自然语言查询界面 — 输入问题，获取 SQL 和表格结果 | | 系统状态 | 所有 16 个模块的实时状态：运行状态、吞吐量、错误、AI 提供商/模型 | ### API 端点除公共端点外，所有端点都需要 `X-API-Key` 标头。 **GET（需身份验证）：** `/api/stats`, `/api/findings`, `/api/findings/{id}`, `/api/iocs`, `/api/sources`, `/api/categories`, `/api/subcategories`, `/api/timeline`, `/api/entities`, `/api/graph`, `/api/correlations`, `/api/correlation-decisions`, `/apinotes`, `/api/actors/{id}/profile`, `/api/sources/value`, `/api/system/status`, `/api/intelligence/overview`, `/api/briefs`, `/api/briefs/latest`, `/api/vulnerabilities`, `/api/vulnerabilities/{cve}` **POST（需身份验证）：** `/api/sources`, `/api/sources/{id}/approve`, `/api/sources/{id}/reject`, `/api/query`, `/api/auth/check` **GET（公共）：** `/api/public-stats`, `/api/public-recent` ## 模块注册表所有 16 个活跃模块都向 `modules.Registry` 注册，并暴露结构化的 `ModuleStatus`，包含 ID、运行状态、吞吐量计数器、错误计数、最后活动时间戳和 AI 提供商/模型信息。系统状态仪表板页面轮询 `/api/system/status` 以实时显示此信息。 ### 模块 ID | 类别 | 模块 ID | |---|---| | 收集器 | `collector.telegram`, `collector.rss` | | 处理 | `processor.classifier`, `processor.summarizer`, `processor.ioc_extractor`, `processor.entity_extractor`, `processor.graph_bridge`, `processor.librarian`, `processor.ioc_lifecycle` | | 大脑 | `brain.correlator`, `brain.analyst`, `brain.brief_generator`, `brain.query_engine` | | 基础设施 | `infra.ioc_enrichment`, `infra.vuln_ingestor`, `infra.source_analyzer` | ## 数据库结构 PostgreSQL 是唯一的共享数据存储。迁移在启动时从 `migrations/` 目录自动运行。 | 迁移 | 创建的表 | |---|---| | 001\_init | `findings`, `canary_tokens`, `actor_profiles` | | 002\_graph | `entities`, `edges` | | 003\_pivot | `raw_content`, `iocs`, `artifacts`, `sources` | | 004\_cleanup\_discovered | 来源生命周期清理 | | 005\_provenance | `raw_content` 上的 `provenance` 列 | | 006\_correlations | `ioc_sightings`, `correlations`, `correlation_candidates` | | 007\_phase2 | 子分类列, `analytical_notes`, `correlation_decisions`, 来源价值列 | | 008\_phase3 | IOC 生命周期列, `intelligence_briefs`, `vulnerabilities` | | 009\_enrichment | `iocs` 上的 IOC 富化列 | | 010\_triage | `source_triage_log`, `discovered_blacklist` | | 011\_normalize\_telegram\_identifiers | Telegram 标识符 URL 到用户名的规范化 | | 012\_purge\_legacy\_embedly\_urls | 从发现的来源中清除旧版 Embedly URL | ## 项目结构 ``` cmd/noctis/ CLI entry point (serve, telegram-auth, source, search, stats) internal/ analyzer/ LLM wrapper: Classify, Summarize, ExtractIOCs, ExtractEntities, SubClassify, EvaluateCorrelation, GenerateBrief, RawCompletion archive/ PostgreSQL persistence: 20 tables, IOC lifecycle, brief metrics, vulnerability methods brain/ Intelligence layer: Correlator, Analyst, BriefGenerator, QueryEngine collector/ Telegram, Paste, Forum, Web collectors + CollectorManager + SourceValueAnalyzer config/ Full config struct with 20+ sections; ${ENV_VAR} substitution dashboard/ 25+ API handlers, embedded React SPA, X-API-Key auth middleware database/ pgxpool connection + migration runner discovery/ Source discovery engine (URL extraction, blacklist, queue, AI triage) dispatcher/ Prometheus metrics enrichment/ AbuseIPDB, VirusTotal, crt.sh providers + Enricher health/ /healthz, /readyz, QR auth state ingest/ IngestPipeline: dedup, keyword/regex matching, archive, alert path llm/ OpenAI-compatible HTTP client, SpendingTracker, budget circuit breaker matcher/ Keyword and regex rule engine models/ Finding, IOC, Severity, Category types modules/ ModuleStatus, StatusTracker, Registry processor/ ProcessingEngine: Classifier, Summarizer, IOCExtractor, EntityExtractor, GraphBridge, Librarian, IOCLifecycleManager vuln/ NVD, EPSS, CISA KEV ingestion + priority scoring migrations/ 001 through 012 (SQL files, applied in order) prompts/ 10 LLM prompt templates (classify, classify_detail, extract_iocs, extract_entities, severity, summarize, evaluate_correlation, daily_brief, stylometry, triage) deploy/ Kubernetes manifests: namespace, secrets, postgres, configmap, noctis deployment, ingress web/ React frontend source (14 pages, built output embedded in binary) ``` ## 部署 ### Kubernetes 在 k3s 上测试。需要一个名为 `nfs-shared` 的 StorageClass 用于 PostgreSQL PVC。 **步骤 1 — Namespace** ``` kubectl apply -f deploy/namespace.yaml ``` **步骤 2 — Secrets** ``` cp deploy/secrets.yaml.example deploy/secrets.yaml # 填写真实值 — 切勿提交此文件 kubectl apply -f deploy/secrets.yaml ``` 必需的 secret 键：`NOCTIS_LLM_API_KEY`、`NOCTIS_GROQ_API_KEY`、`NOCTIS_GEMINI_API_KEY`、`NOCTIS_DASHBOARD_API_KEY`、`NOCTIS_DB_PASSWORD`、`NOCTIS_DB_DSN`。如果启用 Telegram，则还需要 Telegram 键（`NOCTIS_TELEGRAM_API_ID`、`NOCTIS_TELEGRAM_API_HASH`、`NOCTIS_TELEGRAM_PHONE`、`NOCTIS_TELEGRAM_PASSWORD`）。 **步骤 3 — PostgreSQL** ``` kubectl apply -f deploy/postgres.yaml kubectl -n noctis get pods -w # wait for noctis-postgres-0 1/1 Running ``` **步骤 4 — ConfigMap** ``` kubectl apply -f deploy/configmap.yaml ``` ConfigMap 使用从环境变量在运行时解析的 `${NOCTIS_DB_DSN}` 和 `${NOCTIS_LLM_API_KEY}` token。ConfigMap 本身不包含凭据。 **步骤 5 — Noctis** ``` kubectl apply -f deploy/noctis.yaml kubectl -n noctis get pods -w # wait for noctis 1/1 Running ``` **步骤 6 — 验证** ``` kubectl -n noctis logs -f deploy/noctis kubectl -n noctis port-forward svc/noctis-metrics 8080:8080 curl http://localhost:8080/healthz # ok curl http://localhost:8080/readyz # ready ``` **仪表板访问 (Kubernetes)** ``` kubectl -n noctis port-forward deployment/noctis 3000:3000 # 打开 http://localhost:3000 ``` 对于公共访问，应用包含的 Traefik IngressRoute 清单： ``` kubectl apply -f deploy/ingress.yaml ``` 这通过 Let's Encrypt TLS 终止路由 HTTPS 流量。需要带有 `letsencrypt` 证书解析器的 Traefik 以及指向集群的 DNS A 记录。 **启用其他来源** 编辑 `deploy/configmap.yaml`，设置 `telegram.enabled: true`（或 `paste`/`forums`），将相应的 API 密钥添加到 secret 中，然后： ``` kubectl apply -f deploy/configmap.yaml kubectl -n noctis rollout restart deploy/noctis ``` ### 独立部署 ``` ./noctis serve --config config.yaml ``` 迁移自动运行。不需要 init container 或 sidecar。 ### 构建并推送镜像 ``` make build docker build -t ghcr.io/zyrakk/noctis:latest . docker push ghcr.io/zyrakk/noctis:latest ``` ## CLI 参考所有子命令都接受 `--config`/`-c`（默认：`noctis-config.yaml`）。 ### `noctis serve` 启动 daemon。加载配置，运行迁移，启动所有收集器和后台模块，并阻塞直到收到 SIGINT/SIGTERM。 ``` noctis serve --config config.yaml ``` ### `noctis telegram-auth` 一次性交互式 Telegram 身份验证。写入 `serve` 使用的会话文件。 ``` noctis telegram-auth --config config.yaml --qr # QR code noctis telegram-auth --config config.yaml --sms # SMS code ``` ### `noctis source` 管理来源注册表。 ``` noctis source list [--status discovered|approved|active|paused|dead|banned|pending_triage] [--type telegram_channel|telegram_group|forum|paste_site|web|rss] noctis source add --type --identifier noctis source approve noctis source pause noctis source remove ``` `source add` 以 `active` 状态插入。Telegram 收集器会在 5 分钟内接收新频道，而无需重启。 ### `noctis search` 查询全文存档。 ``` noctis search [text] [--category credential_leak|malware_sample|vulnerability|...] [--tag ] [--since 7d|24h] [--author ] [--limit ] ``` ### `noctis stats` 按来源和类别打印收集统计信息。 ### `noctis config validate` 验证配置文件并报告错误。 ``` noctis config validate --config config.yaml ``` ## 许可证 MIT。详见 [LICENSE](LICENSE)。

标签：AbuseIPDB, AI威胁分类, Ask搜索, CISA KEV, CISA项目, EPSS, ESC4, EVTX分析, Go语言, GPT, IOC提取, NL2SQL, NVD, OSINT, PostgreSQL, React, Syscalls, Telegram监控, VirusTotal, 入侵指标, 单文件部署, 命令控制, 大模型分析, 威胁建模, 威胁情报, 威胁评分, 子域名突变, 实体图谱, 实时处理, 密码管理, 嵌入式前端, 开发者工具, 情报富化, 情报生命周期, 数据采集, 日志审计, 暗网监控, 每日安全简报, 测试用例, 漏洞管理, 程序破解, 网络安全, 自动化发现, 自动化平台, 自定义请求头, 自然语言查询, 请求拦截, 跨源关联, 隐私保护