sereno4/llm-mesh-gateway

GitHub: sereno4/llm-mesh-gateway

LLM Mesh Gateway:企业级生成式AI网关,提供弹性、安全、可观测性。

Stars: 0 | Forks: 0

LLM Mesh Gateway

企业级生成式AI系统的企业AI网关

架构功能安全可观测性部署路线图

## 🎯 执行摘要 **LLM Mesh Gateway** 是一个云原生AI网关平台,为企业在规模上的生成式AI应用提供**弹性**、**可观测性**、**治理**和**安全性**。基于Envoy Proxy构建,专为Kubernetes设计,它使组织能够以生产级可靠性部署多LLM架构。 ## 🏗️ 架构 ``` ┌─────────────────────────────────────────────────────────────────────────┐ │ CLIENT LAYER │ │ Web Apps • Mobile • Agents • Copilots • Internal Tools │ └─────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────┐ │ LLM MESH GATEWAY │ ├─────────────────────────────────────────────────────────────────────────┤ │ Authentication & Security │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ JWT Auth │ │ OPA / Rego │ │ Rate Limit │ │ Prompt │ │ │ │ Layer │ │ Policies │ │ Protection │ │ Guardrails │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ │ Traffic Management │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ Canary │ │ Circuit │ │ Fallback │ │ Cost │ │ │ │ Routing │ │ Breaker │ │ Engine │ │ Tracking │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ └─────────────────────────────────────────────────────────────────────────┘ │ │ │ ▼ ▼ ▼ ┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐ │ OpenAI GPT-4 │ │ Anthropic Claude │ │ Local LLaMA │ │ Primary Provider │ │ Automatic Fallback │ │ On-Prem Deployment │ └─────────────────────┘ └─────────────────────┘ └─────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────┐ │ OBSERVABILITY PLANE │ ├─────────────────────────────────────────────────────────────────────────┤ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ Prometheus │ │ Grafana │ │ OpenTelemetry│ │ Arize │ │ │ │ Metrics │ │ Dashboards │ │ Distributed │ │ Phoenix │ │ │ │ │ │ & Alerts │ │ Tracing │ │ LLM Traces │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ └─────────────────────────────────────────────────────────────────────────┘ --- ## ✨ 核心功能 ### 🔄 弹性 & 可靠性 | Feature | Description | Technology | |---------|-------------|------------| | **Intelligent Routing** | Route by model capability, cost, or latency | Envoy xDS | | **Automatic Fallback** | Seamless failover between LLM providers | Custom Lua/WASM | | **Canary Releases** | Gradual rollout of new models | Envoy weighted clusters | | **Circuit Breaker** | Prevent cascade failures | Envoy outlier detection | | **Adaptive Routing** | Latency-based provider selection | Custom metrics | ### 🔐 安全 & 治理 | Feature | Description | Technology | |---------|-------------|------------| | **JWT Authentication** | Token-based access control | Envoy JWT filter | | **OPA Policy Engine** | Fine-grained authorization | Open Policy Agent | | **Prompt Injection Detection** | Real-time prompt sanitization | Custom Python middleware | | **Tool Abuse Detection** | Monitor excessive function calls | Risk scoring engine | | **Data Exfiltration Prevention** | Block PII leakage | Regex + ML classifiers | | **Rate Limiting** | Token-bucket per tenant | Envoy local rate limit | ### 📊 可观测性与分析 | Feature | Description | Technology | |---------|-------------|------------| | **Distributed Tracing** | End-to-end request flow | OpenTelemetry | | **Real-time Metrics** | Latency, cost, token usage | Prometheus + Grafana | | **LLM Interaction Logging** | Prompt/response audit trail | Phoenix (Arize) | | **Risk Scoring** | Anomaly detection per request | Custom scoring engine | | **Cost Attribution** | Per-tenant, per-model billing | Custom metrics | ## ✨ 功能 - JWT Authentication - OPA / Rego Policy Enforcement - Prompt Injection Detection - Data Exfiltration Detection - Tool Abuse Detection - Circuit Breaker - Automatic Provider Fallback - Canary Releases - Distributed Tracing - Arize Phoenix Integration - Grafana Dashboards - Prometheus Metrics - OpenTofu Infrastructure - Kubernetes Ready 🔐 Camada de Segurança Autenticação JWT Validation Claims Validation Role Based Access Control Políticas OPA (Open Policy Agent) Rego Policies Runtime Enforcement Detecção de Ameaças Prompt Injection Jailbreak Attempts Tool Abuse Data Exfiltration PII Detection AI Security Gateway Score de risco Correlação de eventos Decisão automática Exportação para SIEM 🔄 Resiliência Circuit Breaker Proteção contra provedores indisponíveis. Fallback Automático Grok → Lhama Modelo Local Canary Releases Distribuição progressiva de tráfego entre versões de modelos. Rate Limiting Proteção contra abuso e picos de utilização. 📊 Observabilidade Prometheus Coleta de métricas operacionais: Requests por segundo Latência Erros Utilização de provedores Score de risco Grafana Dashboards para: Tráfego Segurança Resiliência Custos Performance OpenTelemetry Tracing distribuído ponta a ponta. Arize Phoenix Monitoramento de: Traces Latência Fluxo de requisições Qualidade operacional ☁️ Infraestrutura como Código Toda a infraestrutura é provisionada utilizando OpenTofu. Recursos provisionados Namespace Kubernetes Gateway Prometheus Grafana ConfigMaps Services Deployments --- ## 🚀 快速开始 ### 先决条件 - Kubernetes 1.28+ - Helm 3.12+ - OpenTofu 1.6+ - Python 3.11+ ### 1. 基础设施配置 ```bash # 克隆仓库 git clone https://github.com/sereno4/llm-mesh-gateway.git cd llm-mesh-gateway # 配置基础设施 cd infrastructure/environments/dev tofu init tofu plan tofu apply 2. Deploy Gateway bash # 通过 Helm 安装 helm repo add llm-mesh https://sereno4.github.io/llm-mesh-gateway helm install gateway llm-mesh/llm-mesh-gateway \ --namespace llm-mesh \ --create-namespace \ --values values.yaml 3. Configure Providers yaml # config/providers.yaml providers: openai: api_key: ${OPENAI_API_KEY} models: ["gpt-4", "gpt-3.5-turbo"] priority: 1 timeout_ms: 30000 anthropic: api_key: ${ANTHROPIC_API_KEY} models: ["claude-3-opus", "claude-3-sonnet"] priority: 2 fallback: true local: endpoint: "http://llama-3-70b.local:8000" models: ["llama-3-70b"] priority: 3 on_prem: true 4. Send First Request bash curl -X POST https://gateway.your-domain.com/v1/chat/completions \ -H "Authorization: Bearer ${JWT_TOKEN}" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "Hello, world!"}], "routing_hint": "low_latency" }' 🔒 Security Architecture plain ┌─────────────────────────────────────────────────────────┐ │ ZERO-TRUST PERIMETER │ ├─────────────────────────────────────────────────────────┤ │ Layer 1: Transport Security (TLS 1.3, mTLS) │ │ Layer 2: Authentication (JWT, OIDC, API Keys) │ │ Layer 3: Authorization (OPA/Rego policies) │ │ Layer 4: Input Validation (Prompt Guard, Schema) │ │ Layer 5: Output Filtering (PII Redaction, DLP) │ │ Layer 6: Audit & Compliance (Logging, Retention) │ └─────────────────────────────────────────────────────────┘ Risk Scoring Engine Python # 示例风险评分 def calculate_risk_score(request: LLMRequest) -> RiskAssessment: score = 0.0 # Prompt injection patterns if contains_jailbreak_patterns(request.prompt): score += 0.4 # Data exfiltration attempt if requests_sensitive_data(request.prompt): score += 0.3 # Tool abuse if tool_call_frequency > threshold: score += 0.2 # Token exhaustion attack if request.max_tokens > 10_000: score += 0.1 return RiskAssessment( score=score, threshold=0.5, action=Action.BLOCK if score > 0.5 else Action.LOG ) 📈 Observability Stack Metrics (Prometheus) promql # 按提供商的 LLM 请求延迟 histogram_quantile(0.95, rate(llm_request_duration_seconds_bucket[5m]) ) # 每个租户的令牌消耗 sum by (tenant) (llm_tokens_total) # 每小时每模型的成本 sum by (model) (llm_cost_usd) Dashboards (Grafana) Executive Overview: Cost, usage, SLA compliance SRE Operations: Latency, errors, circuit breaker status Security Operations: Blocked requests, risk scores, anomalies FinOps: Cost per tenant, model efficiency, budget alerts Tracing (OpenTelemetry) JSON { "trace_id": "4f9e2e...", "spans": [ {"name": "jwt_validation", "duration_ms": 2}, {"name": "opa_policy_check", "duration_ms": 5}, {"name": "prompt_guard", "duration_ms": 15}, {"name": "openai_request", "duration_ms": 1200}, {"name": "response_filter", "duration_ms": 3} ] } 🏢 Enterprise Use Cases Table Industry Use Case Gateway Features Finance Trading copilots with PII protection Data exfiltration detection, OPA policies Healthcare Clinical decision support HIPAA compliance, audit trails, on-prem routing Legal Contract analysis across jurisdictions Geo-routing, data residency, retention policies SaaS Multi-tenant AI platform Per-tenant rate limits, cost attribution, RBAC Government Secure classified analysis Air-gapped deployment, custom models only 🗺️ Roadmap Q3 2026 [ ] Multi-provider failover with health checks [ ] Circuit breaker with adaptive thresholds [ ] Cost tracking and AI FinOps dashboard [ ] Runtime security analytics Q4 2026 [ ] Governance dashboard with policy editor [ ] Model performance benchmarking suite [ ] Fine-tuning pipeline integration [ ] SOC 2 compliance documentation 2027 [ ] Federated learning gateway [ ] Edge deployment (K3s, IoT) [ ] Auto-scaling based on queue depth [ ] Multi-region active-active 🤝 Contributing We welcome contributions from the AI infrastructure community. Please see CONTRIBUTING.md for guidelines. 📄 License MIT License - see LICENSE for details. 🙏 Acknowledgments Envoy Proxy community for the extensible data plane OpenTofu team for infrastructure-as-code tooling Arize AI for LLM observability foundations

Built with ❤️ by @sereno4

``` 项目亮点不仅在于“一个LLM网关”。其独特之处在于以下组合: AI安全 弹性(回退+断路器) 可观测性(Grafana + Phoenix) 治理(OPA/Rego) 基础设施即代码(OpenTofu) 下一步计划 与多个商业提供商的集成 LLM消费的FinOps 自动评估响应 漂移监控 异常行为高级检测 生成式AI治理层

标签:API网关, Envoy Proxy, GET参数, MIT许可, OpenTelemetry, OpenTofu, Python, 人工智能网关, 企业级AI, 可观察性, 可靠性, 子域名突变, 安全性, 技术教程, 无后门, 服务网格, 治理, 生成式AI, 用户代理, 索引, 自定义请求头, 逆向工具