Harshil26202/nexus

GitHub: Harshil26202/nexus

NEXUS 是一个AI原生平台，利用GPT-4o agents自动化CI/CD、优化测试选择和事件响应，以解决工程团队在部署效率和故障恢复上的痛点。

Stars: 0 | Forks: 0

# ⚡ NEXUS ### AI 原生工程智能平台 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) [![Python 3.11](https://img.shields.io/badge/Python-3.11-blue.svg)](https://python.org) [![Next.js 14](https://img.shields.io/badge/Next.js-14-black.svg)](https://nextjs.org) [![FastAPI](https://img.shields.io/badge/FastAPI-0.111-green.svg)](https://fastapi.tiangolo.com) [![Docker](https://img.shields.io/badge/Docker-Compose-blue.svg)](docker-compose.yml) [![Hackathon](https://img.shields.io/badge/Microsoft%20Build%20AI%20Hackathon-2026-purple.svg)](https://devpost.com) **为 Microsoft Build AI Hackathon 2026 提交 · 主题 6: AI 驱动的生产功能** [在线演示](#quickstart) · [架构](#architecture) · [功能](#features) · [API 文档](http://localhost:8000/docs) *你的 CI/CD 流水线在每次提交时运行 500 个测试。80% 是无关紧要的。昨晚生产环境崩溃，花了 3 个小时才找到根本原因。NEXUS 能自动修复这两个问题——通过 AI。*

## 问题所在每个工程团队都通过一个为没有 AI 的世界设计的流水线来交付代码： | 痛点 | 现实 | NEXUS 解决方案 | |---|---|---| | 盲目执行测试 | 所有 500 个测试在每次提交时运行 | AI 只选择与你变更相关的约 40 个测试 | | 静态质量门禁 | "覆盖率 ≥ 80%"，不分上下文 | 自适应 AI 门禁，在周五部署时收紧，热修复时放宽 | | 手动事故响应 | 工程师花费 3+ 小时追踪根本原因 | AI 在几秒内识别出有问题的提交，并起草事后复盘报告 | | DevOps 部落知识 | 需要知道 CLI、仪表板、运行手册 | 用普通英语提问：“昨晚认证服务为什么失败了？” | ## 功能特性 ### 当前可用功能 - **语义差异分析器** — GPT-4o 阅读你实际的代码差异，识别影响范围、风险因素和受影响的服务。给出 0-100 的风险评分及普通英语的解释。 - **测试智能引擎** — ML（XGBoost）+ LLM 推理选择相关的测试子集。跳过可证明未受影响的测试。在我们的演示数据中，**CI 时间平均减少 67%**。 - **自适应质量门禁** — AI 根据上下文评估门禁，而不仅仅是静态阈值。示例门禁提示：“如果修改了 auth 中间件但没有安全审查 PR 标签，则阻止。” - **事件智能** — 当警报触发时，智能体会关联指标和 git 历史，精确定位根本原因提交，并自动生成完整的 Google SRE 风格的事后复盘草案。 - **自然语言 DevOps** — 通过聊天界面使用普通英语查询流水线数据。无需挖掘仪表板。 - **实时仪表板** — 通过 WebSocket 实时馈送流水线事件、风险评分、智能体活动和事故状态。 - **部署后监控智能体** — 监控部署后的指标，检测回归，自动触发事故创建。 ### 平台功能 - REST API，可在 `/docs` 访问交互式 Swagger 文档 - WebSocket 事件流，用于实时 UI 更新 - Prometheus 指标在 `/metrics`，Grafana 仪表板在 `:3001` - OpenTelemetry 分布式追踪 - 种子演示数据 — 开箱即用的完整仪表板 ## 架构 ``` ┌─────────────────────────────────────────────────────────────────────┐ │ DEVELOPER WORKFLOW │ │ │ │ git push ──▶ GitHub PR ──▶ CI trigger │ └──────────────────────────────┬──────────────────────────────────────┘ │ POST /api/v1/webhooks/github ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ NEXUS BACKEND (FastAPI · Python 3.11) │ │ │ │ /pipelines /incidents /quality-gates /analytics /chat │ │ /webhooks /ws │ │ │ │ │ ┌──────────▼──────────┐ │ │ │ Agent Dispatcher │ │ │ └──────────┬──────────┘ │ └───────────────────────────────┼─────────────────────────────────────┘ │ spawns AI agents ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ MULTI-AGENT SYSTEM (GPT-4o) │ │ │ │ ┌──────────────────┐ ┌──────────────────┐ │ │ │ Semantic Analyzer │ │ Test Intelligence │ │ │ │ • Code diff NLU │ │ • XGBoost + LLM │ │ │ │ • Blast radius │ │ • 67% CI savings │ │ │ │ • Risk score 0–100│ │ • Smart skip │ │ │ └──────────────────┘ └──────────────────┘ │ │ │ │ ┌──────────────────┐ ┌──────────────────┐ │ │ │ Quality Gate │ │ Incident Response │ │ │ │ • Adaptive AI │ │ • Root cause AI │ │ │ │ • Context-aware │ │ • Postmortem gen │ │ │ │ • Deploy block │ │ • MTTR: hrs→mins │ │ │ └──────────────────┘ └──────────────────┘ │ │ │ │ ┌──────────────────┐ ┌──────────────────┐ │ │ │ NL DevOps Chat │ │ Monitoring Agent │ │ │ │ • Plain English │ │ • Post-deploy │ │ │ │ • Pipeline query │ │ • Auto incidents │ │ │ │ • No CLI needed │ │ • Anomaly detect │ │ │ └──────────────────┘ └──────────────────┘ │ └───────────────────────┬─────────────────────────────────────────────┘ │ ┌─────────────┴───────────────┐ ▼ ▼ ┌──────────────────┐ ┌────────────────────┐ │ PostgreSQL 16 │ │ Redis 7 │ │ │ │ │ │ • Pipelines │ │ • Response cache │ │ • Incidents │ │ • Pub/Sub events │ │ • Quality gates │ │ • Rate limiting │ │ • Agent tasks │ │ • WebSocket fanout │ └──────────────────┘ └────────────────────┘ ┌─────────────────────────────────────────────────────────────────────┐ │ FRONTEND (Next.js 14 · TypeScript) │ │ │ │ Dashboard · Pipelines · Incidents · Quality Gates · Agents · Chat │ │ │ │ Live WebSocket stream · Recharts visualizations │ └─────────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────────┐ │ OBSERVABILITY │ │ Prometheus → Grafana · OpenTelemetry Collector · Structlog │ └─────────────────────────────────────────────────────────────────────┘ ┌──────────────────────┐ │ AI PROVIDERS │ │ │ │ OpenAI (default) │ │ ─── or ─── │ │ Azure OpenAI │ │ (swap via .env) │ └──────────────────────┘ ``` ## 快速开始 **前提条件：** Docker Desktop · 一个 OpenAI API 密钥（在 [platform.openai.com](https://platform.openai.com) 获取） ``` # 1. Clone git clone https://github.com/Harshil26202/nexus.git cd nexus # 2. 配置 — 只需一个密钥即可开始 cp .env.example .env # 打开 .env 并设置：OPENAI_API_KEY=sk-... # 其余所有项都有适用于本地开发的默认值。 # 3. 启动所有组件 make dev # 或不使用 Make： docker compose up --build -d ``` | 服务 | URL | |---|---| | 仪表板 | http://localhost:3000 | | API + Swagger 文档 | http://localhost:8000/docs | | Grafana | http://localhost:3001 (admin / admin) | | Prometheus | http://localhost:9090 | ### 填充演示数据（可选——让仪表板看起来更棒） ``` make seed # 或： docker compose exec backend python -m app.scripts.seed_demo_data ``` 这将填充 60 次流水线运行、5 个质量门禁、3 个事故以及智能体任务历史记录，使仪表板在首次启动时内容完整。 ### 常用命令 ``` make logs # tail backend + agent worker + frontend logs make migrate # run database migrations make seed # populate demo data make test # run backend + frontend test suites make lint # ruff + mypy + eslint make down # stop everything make clean # stop + remove volumes ``` ## 使用 Azure OpenAI NEXUS 会自动检测使用哪个提供程序。如果你有 Azure 凭据： ``` # 在 .env 中 — 设置以下两项，Azure 将自动接管： AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/ AZURE_OPENAI_API_KEY=your-azure-key ``` 将它们留空（默认值）则使用标准的 OpenAI。 ## 技术栈 | 层级 | 技术 | |---|---| | 前端 | Next.js 14, TypeScript, Tailwind CSS, Recharts, TanStack Query | | 后端 API | FastAPI (Python 3.11), AsyncPG, SQLAlchemy 2.0 (异步) | | AI / LLM | OpenAI GPT-4o · Azure OpenAI (直接替换) | | 机器学习模型 | XGBoost, scikit-learn, pandas (风险评分 + 测试预测) | | 嵌入 | text-embedding-3-large | | 缓存 / 发布订阅 | Redis 7 (异步, 连接池) | | 数据库 | PostgreSQL 16 | | 消息队列 | Azure Service Bus (可选, 异步智能体任务) | | 向量搜索 | Azure AI Search (可选, 语义事故搜索) | | 可观测性 | Prometheus + Grafana + OpenTelemetry + Structlog | | 认证 | JWT + Azure Entra ID (可选 SSO) | | 容器化 | Docker Compose (开发环境) | | 基础设施即代码 | Terraform (infrastructure/) | | CI/CD | GitHub Actions (.github/workflows/) | ## 项目结构 ``` nexus/ ├── frontend/ # Next.js 14 dashboard │ ├── app/(dashboard)/ # Dashboard, Pipelines, Incidents, Agents, Chat │ ├── components/ # Reusable UI + WebSocket hooks │ └── lib/ # Axios API client, socket utilities │ ├── backend/ # FastAPI + multi-agent system │ ├── app/ │ │ ├── agents/ # 7 specialized AI agents │ │ │ ├── orchestrator.py # Master coordinator │ │ │ ├── semantic_analyzer.py # Code diff intelligence │ │ │ ├── test_intelligence.py # Smart test selection │ │ │ ├── quality_gate_agent.py # Adaptive quality gates │ │ │ ├── incident_response.py # Root cause + postmortem │ │ │ ├── monitoring_agent.py # Post-deploy health │ │ │ └── nl_devops.py # Natural language interface │ │ ├── routers/ # REST + WebSocket endpoints │ │ ├── models/ # SQLAlchemy ORM models │ │ ├── services/ # GitHub, notifications, ML │ │ ├── scripts/ # seed_demo_data.py │ │ └── core/ # Config, DB, Redis, telemetry, auth │ ├── main.py # FastAPI app entrypoint │ └── requirements.txt │ ├── ml/ # ML model code (risk scoring, test prediction) │ ├── risk_scoring/ # Gradient boosting risk model │ └── test_prediction/ # XGBoost test failure prediction │ ├── infrastructure/ │ ├── terraform/ # Azure infrastructure as code │ ├── kubernetes/ # AKS manifests │ └── monitoring/ # Prometheus, Grafana, OTEL config │ ├── .github/workflows/ # CI (lint, test, build) + CD pipelines ├── docker-compose.yml # Full local stack ├── Makefile # Developer convenience commands └── .env.example # Configuration template ``` ## API 参考运行后，完整的交互式 API 位于 **http://localhost:8000/docs**。关键端点： | 方法 | 端点 | 描述 | |---|---|---| | `GET` | `/api/v1/pipelines/` | 列出带有风险评分的流水线运行 | | `GET` | `/api/v1/pipelines/{id}` | 流水线详情 + AI 分析 | | `POST` | `/api/v1/webhooks/github` | GitHub webhook 接收器 | | `GET` | `/api/v1/incidents/` | 活跃 + 已解决的事故 | | `GET` | `/api/v1/quality-gates/` | 门禁定义 + 结果 | | `POST` | `/api/v1/chat/` | 自然语言 DevOps 查询 | | `GET` | `/api/v1/analytics/overview` | 平台范围统计 | | `WS` | `/ws/pipelines` | 实时流水线事件流 | | `GET` | `/health` | 健康检查 | | `GET` | `/metrics` | Prometheus 指标 | ## AI 如何工作 — 端到端流程 ``` 1. Developer opens a PR on GitHub │ ▼ 2. GitHub sends a webhook → POST /api/v1/webhooks/github │ ▼ 3. Semantic Analyzer (GPT-4o) reads the raw diff → Identifies: blast radius, risk factors, affected services → Assigns: risk score 0–100, risk level (low/medium/high/critical) → Writes: plain-English summary for the team │ ▼ 4. Test Intelligence (XGBoost + GPT-4o) selects relevant tests → Skips ~67% of the test suite that can't be affected → Prioritizes highest-risk paths │ ▼ 5. Quality Gate Agent evaluates gates with AI reasoning → Applies adaptive thresholds based on context → Returns: pass/fail + deploy recommendation │ ▼ 6. Results appear live on the dashboard via WebSocket → Risk score, blast radius, gate results, AI recommendation │ ▼ 7. If production breaks post-deploy: Incident Response Agent traces root commit → postmortem → fix suggestion ``` ## 演示导览 1. 打开 **http://localhost:3000** — 仪表板显示实时流水线统计、风险分布、智能体活动 2. 转到 **流水线** — 查看 60 次运行及其 AI 风险评分、语义摘要、影响范围 3. 点击任意流水线 — 深入了解 AI 分析、测试选择理由、门禁结果 4. 转到 **事故** — 查看 SEV1 (已解决) 事故，包括完整事后复盘、根本原因提交及建议修复 5. 转到 **质量门禁** — 查看带有自然语言 AI 提示的自适应门禁 6. 打开 **聊天** — 询问：“本周哪些流水线有高风险？” 7. 打开 **Grafana** (http://localhost:3001) — 查看基础设施指标 ## 使用的 AI 工具 - **Claude Code** — 架构设计、全栈实现 - **GitHub Copilot** — 开发过程中的内联建议 ## 团队由 Harshil Kaneriya 在 **Microsoft Build AI Hackathon 2026** 构建 ## 许可证 MIT — 参见 [LICENSE](LICENSE)

标签：AI 代理, AI 原生, Apex, AV绕过, Docker, FastAPI, GPT-4o, NIDS, NL DevOps, Python, 安全防御评估, 容器化, 工程智能平台, 搜索引擎查询, 数据管道, 无后门, 智能运维, 机器学习, 根因分析, 测试用例, 生产优化, 用户代理, 自定义请求头, 自适应质量门禁, 语义差异分析, 请求拦截, 质量管理, 软件工程, 软件开发, 风险评分