S3nna13/Aurelius

GitHub: S3nna13/Aurelius

一个纯 PyTorch 实现的代码智能体模型家族，提供从训练、对齐到服务与安全评估的一站式解决方案。

Stars: 1 | Forks: 0

# Aurelius — 前沿 AI 研究平台 [![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/) [![PyTorch 2.11+](https://img.shields.io/badge/PyTorch-2.11+-ee4c2c.svg)](https://pytorch.org/) [![React 19](https://img.shields.io/badge/React-19-61dafb.svg)](https://react.dev/) [![TypeScript](https://img.shields.io/badge/TypeScript-5.7+-3178c6.svg)](https://www.typescriptlang.org/) [![Rust](https://img.shields.io/badge/Rust-2024+-dea584.svg)](https://www.rust-lang.org/) [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE) 每一个组件 —— transformer 核心、训练 pipeline、对齐系统、推理引擎、API gateway 和前端 —— 均为纯手工编写。不依赖 HuggingFace Transformers，推理时不使用 flash-attn runtime 和 bitsandbytes。 ## 技术栈 | 层级 | 位置 | 语言 | 职责 | |-------|----------|----------|------| | Rust Engine | `crates/` | Rust 2024 | Tokenization、搜索、向量相似度、会话管理、数据引擎 | | Python Backend | `src/`, `agent/`, `gateway/` | Python 3.12 | 模型、训练、推理、对齐、API、CLI | | Node.js BFF | `middle/` | TypeScript | 认证、速率限制、WebSocket、SSE、cron、文件服务 | | Frontend | `frontend/` | React 19 + TypeScript | 任务控制台：仪表盘、聊天、分析、后台管理 | **数据流：** 浏览器 → Node.js BFF (端口 3001) → Python API (端口 8080) → AureliusTransformer + Rust NAPI 前端绝不直接与 Python 通信。所有 API 调用均通过 BFF 路由。 ## 模型架构 | 参数 | 值 | |-----------|-------| | Type | Decoder-only causal LM | | Parameters | 1.395B (目标) | | Layers | 24 transformer blocks | | Hidden dim | 2,048 | | Attention | Grouped-Query Attention (16 Q heads, 8 KV heads) | | Head dim | 128 | | FFN | SwiGLU, d_ff = 5,632 | | Normalization | Pre-norm RMSNorm | | Positional encoding | RoPE (θ = 500,000) + YaRN 上下文扩展 | | Vocabulary | 8,192 tokens (BPE) | | Embeddings | Tied input/output | | KV cache | GQA 压缩；8 种热插拔策略 (KIVI, DuoAttention, EVICT, QUEST, Rocket KV, SAGE, TEAL, INT8) | | MoE | SparseMoELayer — top-2 routing，8 个 experts，shared expert，EP 负载均衡 | | MTP | Multi-Token Prediction (n=2，共享参数，分阶段训练) | | Optimizer | Muon (Newton-Schulz 8+2 steps + Nesterov + RMS rescaling) | | Checkpoint | safetensors (带有弃用警告的传统 .pt 回退方案) | ## 训练 - **全栈：** pretrain → SFT → DPO → GRPO → RLHF —— 全部从零开始 - **Muon optimizer** —— 混合 Newton-Schulz 正交化 (8+2 步)、Nesterov momentum、RMS rescaling；Polar Express (T=6) 开发中 - **Liger kernel** —— 融合 RMSNorm、SwiGLU、cross-entropy (~30% 吞吐量提升) - **ZClip** —— z-score 梯度裁剪；**BAdam** —— block-coordinate 微调 - **Forward replay** —— 带有选择性层重放的 activation checkpointing - **内存映射分片** —— `.npy` uint16 token 数组，通过 `searchsorted` 实现 O(log n) 分片查找 - **`torch.compile`** —— `AureliusTransformer.from_config(config, compile=True)` (CUDA) **训练配置：** `train_1b.yaml`, `train_2.7b.yaml`, `train_3b.yaml`, `train_moe_5b.yaml`, `yarn_finetune.yaml` ## 对齐 **PRAXIS / MOSAIC v2** — 6 信号架构感知对齐： - SteeringRewardCorrespondence (SRC)、ExpertSafetyAffinity (ESA)、MultiTokenAlignmentHorizon (MTAH) - PrecisionFusion (贝叶斯逆方差加权) - PRAXISLoss = DAPO + KL penalty + constitutional gate **全套算法：** REINFORCE++, SAPO, TUR-DPO, AEM, DPO, GRPO, CPO, ORPO, PPO, SimPO, SPIN, KTO, constitutional AI **MIS-PO** (开发中) — 在 token 级别 (KL 阈值门控) 和轨迹级别 (reward floor) 进行离散分布式过滤，并带有 KL penalty 项以使策略保持在参考附近。 ## 推理与 KV Cache 8 种热插拔 KV cache 策略： | 策略 | 描述 | |----------|-------------| | DuoAttention | 逐 Head 检索/流式分类；自动导出 JSON 配置 | | EVICT (H2O) | 基于注意力分数的驱逐 | | KIVI | 带有可配置残差长度的 INT4/INT8 量化缓存 | | QUEST | Query 感知的稀疏 KV 访问 | | Rocket KV | 重要性加权的预算分配 | | SAGE Attention | SageAttention 内核集成 | | TEAL | 基于稀疏性的 token 驱逐 | | INT8 Sim | 微调期间的量化噪声模拟 | **MTP 投机解码** (开发中) —— 通过 MTP heads 进行起草，单次验证 → 目标是在长上下文序列上实现 2-3 倍的吞吐量。 ## Agent 系统 - **ReAct 循环** —— tool-call 解析、参数验证、有限预算终止；AST-walker 算术运算 (无动态代码执行) - **AbsoluteZero** —— 自我博弈课程：在闭环反馈系统中的任务提出者 + 求解器 - **规划引擎** —— workstream DAG，`TaskStatus` / `PlanStatus` StrEnum，`get_workstream(missing_ok)` 守卫 - **任务调度器** —— cron / interval / delayed 任务；持久化到 `~/.cache/aurelius/jobs.json` - **神经符号技能** —— 在符号规则引擎上进行 LLM 推理 - **信誉系统** —— 贝叶斯多智能体信任评分，抗 Sybil 攻击 - **跨越 5 个领域的 13 种人格** (GENERAL, CODING, SECURITY, THREAT_INTEL, AGENT)；7 个可组合的 facet ## 可观测性完整的生产级可观测性技术栈 (`src/observability/`)： | 模块 | 用途 | |--------|---------| | `AgentTelemetry` | 高级外观模式：在一次调用中完成审计 + 指标 + 追踪 | | `AuditLogger` | 带有保留策略的结构化审计追踪 | | `EventBus` | 进程内异步事件路由 | | `MetricsCollector` | 带有标签的计数器、直方图、仪表 | | `TraceContext` | 兼容 W3C 的分布式追踪传播 | **SRE 指标** (`src/monitoring/`) —— 黄金信号：延迟 (p50/p99)、错误率、流量、饱和度。 **Prometheus** —— `/metrics` 端点：请求计数、延迟百分位数、活动连接数 ### Prometheus 指标 `/metrics` 端点暴露以下计数器和仪表： | 指标 | 类型 | 描述 | |--------|------|-------------| | `aurelius_requests_total` | counter | 接收到的 HTTP 请求总数 | | `aurelius_requests_per_second` | gauge | 当前请求速率 | | `aurelius_active_connections` | gauge | 并发活动连接数 | | `aurelius_request_duration_ms` | gauge | 请求延迟 p50/p95/p99 | | `aurelius_uptime_seconds` | gauge | 服务器运行时间 | | `aurelius_http_status_total` | counter | 按 HTTP 状态代码分类的请求 (标签 `code`) | | `aurelius_rate_limit_rejected_total` | counter | 被速率限制器拒绝的请求 | | `aurelius_validation_failures_total` | counter | 参数验证失败 (超出范围) | | `aurelius_rate_limiter_backend` | gauge | 速率限制器后端 (`0`=memory, `1`=redis) | 。 ## 弹性容错生产级容错原语 (`src/resilience/`)： | 模式 | 描述 | |---------|-------------| | `CircuitBreaker` | CLOSED / OPEN / HALF_OPEN FSM；可配置的失败阈值 + 恢复超时 | | `Bulkhead` | 基于信号量的并发上限；隔离子系统故障 | | `RetryPolicy` | 带有抖动的指数退避，可配置最大重试次数 | | `RateLimiter` | Token bucket；内存 (单节点) 或 Redis (分布式) | | `Pipeline` | 可组合的链式结构：熔断器 → 隔板 → 重试 | ## 安全与防护 **2026 年 5 月审计 —— 11 项严重、8 项高危修复：** - 所有进程内执行模块中通过 `object.__subclasses__()` 的沙箱逃逸已被阻止 - SSRF：私有/保留 IP 黑名单；URL 验证已移至 `Request()` 构建之前 - 认证中间件默认 → `require_auth=True` (fail-closed) - Shell 工具：`shell=True` + 黑名单 → `shell=False` + `shlex.split()` + 显式白名单 - PPO 训练器：修复了 `prompt_ids` NameError；纠正了 logit gather 中的 off-by-one 错误 - Constitutional AI：纠正了 KL divergence 参数顺序 (对齐信号之前被静默屏蔽) - 插件沙箱：异常 → fail-closed `SandboxResult(success=False)` - CI 门控：所有安全扫描步骤已移除 `continue-on-error: true` **持续安全：** 拓扑安全 (persistent-homology 不变量)、叠加几何 (polysemanticity 检测)、24 个对抗性防御模块、越狱检测器、PII 扫描器、危害分类法 (9 个类别)。所有 `torch.load` 调用均使用 `weights_only=True`。容器镜像使用非 root 用户 + 锁定的 base-image digests。 ## API 与服务 **端点：** - `POST /v1/chat/completions` —— 流式 + 非流式，兼容 OpenAI - `GET /v1/models` —— 模型列表 - `GET /health` —— 存活探针 (返回 `engine_loaded` 标志；仅在服务器启动时返回 200) - `GET /health/ready` —— 就绪探针 (仅在模型引擎完全初始化时返回 200；否则返回 503) - `GET /healthz` —— 镜像 `/health` 的旧版别名 (用于向后兼容的健康检查) - `GET /metrics` —— Prometheus 指标抓取端点 - `WebSocket /ws` —— 实时流式聊天 **Gateway 特性：** CSP / HSTS (仅限生产环境) / X-Frame-Options: DENY / X-Content-Type-Options / Referrer-Policy 头；基于 IP 的速率限制 (memory 或 Redis 后端)；请求大小限制 (1 MiB JSON body，10 MiB 流式传输)； `X-Request-ID` 追踪；主机白名单强制执行；输入参数验证；响应清理。 **健康状态与就绪：** - `GET /health` —— 存活状态。返回 `{"ok": true, "engine_loaded": , "version": "...", "uptime_seconds": ...}`。服务器运行时返回 HTTP 200。 - `GET /health/ready` —— 就绪状态。相同的 JSON schema；在模型引擎完成加载之前返回 503。Kubernetes `readinessProbe` 应指向此端点。 - `GET /healthz` —— 用于现有 HEALTHCHECK 集成的旧版别名 (行为与 `/health` 相同)。 **速率限制后端：** - **Memory** (默认) —— 进程内 token bucket；适用于单实例部署 - **Redis** (设置 `AURELIUS_RATE_LIMIT_REDIS_URL`) —— 分布式 Lua 脚本 token bucket；在多个 API 副本之间保持限制一致 **可观测性：** `/metrics` 上的 Prometheus 指标包括请求速率、延迟百分位数 (p50/p95/p99)、活动连接数、运行时间、HTTP 状态总数，以及 gateway 专属计数器：速率限制拒绝 (`aurelius_rate_limit_rejected_total`)、验证失败 (`urelius_validation_failures_total`) 和后端指示器 (`aurelius_rate_limiter_backend` — 0=memory, 1=redis)。 **部署目标：** Docker Compose (`deployment/compose.yaml`)、Kubernetes (`k8s/aurelius-deployment.yaml`, `k8s/aurelius-service.yaml`)、Helm charts (`deployment/helm/aurelius/`)。 ## 服务配置 Aurelius 支持针对不同部署规模优化的预设： ### `production` (默认) 具有研究级优化的全功能集 —— 投机解码 (如果 checkpoint 包含 draft model)、自动选择的 KV cache 策略、更大的 batch size (32)，以及在有益处时使用 CUDA graphs。适用于多 GPU 或高内存单 GPU 服务器。 ### 单 GPU 针对单 GPU 的内存保守配置： - 投机解码 **已禁用** (无 draft model) —— 节省 15–25% VRAM - KV cache 策略：`standard` —— 普通的 paged attention，无特殊技巧 - 默认 batch size：16 (可通过 `AURELIUS_BATCH_SIZE_MAX` 配置) - GPU 显存利用率：85% (更安全的余量) - Torch 优化：启用 TF32 + cuDNN benchmark 启用： ``` export AURELIUS_SERVING_PROFILE=single-gpu aurelius serve ``` 手动覆盖 (优先级高于 profile)： ``` export AURELIUS_SPECULATIVE_DECODING=false export AURELIUS_KV_CACHE_STRATEGY=standard export AURELIUS_BATCH_SIZE_MAX=16 aurelius serve ``` ## 批量推理用于离线/同步工作负载的高吞吐量静态批处理端点： **POST** `/v1/batch/completions` 请求体： ``` { "prompts": ["Prompt A", "Prompt B"], "temperature": 0.7, "max_tokens": 256 } ``` 响应： ``` { "completions": ["Output A", "Output B"], "count": 2 } ``` 批处理端点会对所有 prompt 进行 tokenization，运行单次前向传递，并返回所有结果。需要 vLLM 后端；遵循配置的 `AURELIUS_BATCH_SIZE_MAX`。不支持流式传输 —— 交互式聊天请使用 `/v1/chat/completions`。 ## 快速开始 ``` git clone https://github.com/S3nna13/Aurelius.git cd Aurelius bash scripts/bootstrap.sh # full setup (Rust + Python + Node) bash scripts/bootstrap.sh --fast # skip Rust builds ``` **前置条件：** Python 3.12+、Node 22+、Rust 1.81+、npm 10+ ### CLI ``` aurelius # interactive chat aurelius chat --persona aurelius-coding # coding persona aurelius chat --react --model-path # ReAct tool-use loop aurelius serve --port 8080 # API server ``` ### OpenAI 客户端 ``` import openai client = openai.OpenAI(base_url="http://localhost:8080/v1", api_key="none") resp = client.chat.completions.create( model="aurelius", messages=[{"role": "user", "content": "Hello"}], ) print(resp.choices[0].message.content) ``` ### Docker ``` docker compose up # full stack docker compose up --profile cache # with Redis ``` ## 环境变量 | 变量 | 默认值 | 描述 | |----------|---------|-------------| | `AURELIUS_API_KEY` | — | 单个共享 API key | | `AURELIUS_API_KEYS` | — | 多 key：`id:key:scope1,scope2;...` | | `AURELIUS_AUTH_ENABLED` | `true` | 在非回环接口上要求认证 | | `AURELIUS_ALLOWED_HOSTS` | `*` | 逗号分隔的主机白名单 | | `AURELIUS_RATE_LIMIT` | `120` | 每个 IP 每个时间窗口的最大请求数 | | `AURELIUS_RATE_WINDOW` | `60` | 速率限制窗口 (秒) | | `AURELIUS_RATE_LIMIT_REDIS_URL` | — | 用于分布式速率限制的 Redis URL | | `AURELIUS_RATE_LIMIT_PREFIX` | `rl:` | 用于速率限制 token 的 Redis 键前缀 | | `AURELIUS_SERVING_PROFILE` | `production` | 服务预设：`production` (全功能) 或 `single-gpu` (吞吐量优化，无投机解码，batch 较小) | | `AURELIUS_USE_CUDA_GRAPHS` | `auto` | 启用 CUDA graphs 进行内核融合：always/never/auto | | `AURELIUS_GPU_MEM_UTIL` | `0.90 (production), 0.85 (single-gpu)` | 分配给 KV cache 的 GPU 显存比例 | | `AURELIUS_BATCH_SIZE_MAX` | `32` | 静态批处理端点的最大 batch size；同时限制 vLLM max_num_seqs | | `AURELIUS_MODEL_PATH` | — | checkpoint 目录的路径 | | `AURELIUS_VERSION` | `0.1.0` | 版本字符串 (在 `/health` 可见) | ## DAIES 扩展计划 | 阶段 | 参数量 | 激活参数 | 策略 | 状态 | |-------|--------|--------|----------|--------| | v1 | 1.395B | 1.395B | Muon + grad_ckpt, bs=4 | 训练中 | | v2 | 2.7B | 2.7B | Muon + grad_ckpt, bs=1 | 已规划 | | v3 | 3.0B | 3.0B | 8-bit optim + MLX | 已规划 | | v4 | ~5B MoE | ~2B | Sparse MoE + expert offload | 已规划 | | v5 | 7-14B | 7-14B | bf16 / 4-bit quant | 未来 | | v6 | 32B | ~8B MoE | Expert parallelism, 分布式 | 未来 | Dense checkpoint 通过 `src/model/moe_upcycle.py` 为 MoE experts 提供种子。GGUF Q4_K_M 导出目标是在 Apple Silicon 上实现 25-35 tok/s。 ## 目录结构 ``` Aurelius/ ├── src/ │ ├── model/ # Transformer, GQA, RoPE, SwiGLU, MoE, MTP — 200+ modules │ ├── training/ # Muon, ZClip, BAdam, curriculum, RLHF trainers │ ├── alignment/ # PRAXIS/MOSAIC v2, DPO, GRPO, PPO, MIS-PO │ ├── inference/ # 8 KV cache strategies, speculative decoding, sampling │ ├── agent/ # ReAct, AbsoluteZero, tool parser, planner │ ├── persona/ # 13 personas, 7 facets, routing │ ├── memory/ # MemCoE, semantic + episodic + unified orchestrator │ ├── retrieval/ # BM25 + dense hybrid + re-ranking │ ├── safety/ # Jailbreak, topology safety, superposition geometry, PII │ ├── security/ # GCG adversarial, backdoor scan, MITRE ATT&CK │ ├── observability/ # Telemetry, audit, event bus, metrics, trace │ ├── resilience/ # Circuit breaker, bulkhead, retry, rate limiter, pipeline │ ├── monitoring/ # SRE golden signals │ ├── interpretability/# SAEs, activation patching, probing │ ├── quantization/ # AWQ, GPTQ, SmoothQuant, NF4, FP8 │ └── reasoning/ # MCTS, chain-of-thought, structured reasoning ├── agent/ # Planning engine, task scheduler (canonical namespace) ├── gateway/ # FastAPI server + rate limiting + metrics middleware ├── aurelius_cli/ # CLI entry points, pipeline, scheduler commands ├── middle/ # Node.js BFF (TypeScript) ├── frontend/ # React 19 + Vite + TypeScript ├── crates/ # Rust NAPI-rs (11 crates: data-engine, token-counter, search, etc.) ├── k8s/ # Kubernetes manifests (Deployment, Service) ├── deployment/ # Docker Compose, Helm charts ├── configs/ # Training YAML configs ├── examples/ # Runnable scripts (scheduler, pipeline, SRE metrics) ├── scripts/ # Bootstrap, benchmark, GGUF export, data prep ├── tests/ # 33,000+ tests across all surfaces ├── data/ # Training shards (.npy uint16), tokenizer, corpus └── checkpoints/ # Saved checkpoints (safetensors) ``` ## 测试 ``` make test # Python backend make test-cov # With coverage make frontend-test # Vitest make middle-test # Node.js BFF make rust-test # Rust crates make test-all # All surfaces make ci # lint + typecheck + security + all tests ``` ## 入口点 | 命令 | 描述 | |---------|-------------| | `aurelius` | 交互式聊天 CLI | | `aurelius-cli` | 带会话管理的终端聊天 | | `aurelius-shell` | 带斜杠命令的 REPL | | `aurelius-api` | Python API 服务器 | | `aurelius-server` | 生产级服务技术栈 | ## 文档 | 文档 | 描述 | |----------|-------------| | [SECURITY.md](SECURITY.md) | 安全策略与漏洞报告 | | [CONTRIBUTING.md](CONTRIBUTING.md) | 代码风格、测试、分支策略 | | [CHANGELOG.md](CHANGELOG.md) | 发布历史 | | [docs/MODEL_CARD.md](docs/MODEL_CARD.md) | 架构说明卡 | | [docs/threat_model.md](docs/threat_model.md) | 安全威胁模型 | | [examples/](examples/) | 可运行的示例脚本 | [MIT 许可证](LICENSE) — 版权所有 © 2025 Aurelius Systems, Inc. **GitHub:** [https://github.com/S3nna13/Aurelius](https://github.com/S3nna13/Aurelius)

标签：AI智能体, DLL 劫持, MCTS推理, Rust底层, 全栈架构, 大语言模型, 模型训练, 混合检索