sattyamjjain/ferrumdeck

GitHub: sattyamjjain/ferrumdeck

FerrumDeck是一款生产级AI代理执行控制平面,提供确定性治理、全面可观测性和可度量可靠性。

Stars: 5 | Forks: 0

# FerrumDeck **AgentOps Control Plane** — A production-grade platform for running agentic AI workflows with deterministic governance, comprehensive observability, and measurable reliability. [![CI](https://static.pigsec.cn/wp-content/uploads/repos/2026/06/0daa93a007222428.svg)](https://github.com/sattyamjjain/ferrumdeck/actions/workflows/ci.yml) [![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](LICENSE) [![Rust](https://img.shields.io/badge/rust-1.80+-orange.svg)](https://www.rust-lang.org/) [![Python](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/) [![Next.js](https://img.shields.io/badge/next.js-16.1-black.svg)](https://nextjs.org/) ## Table of Contents - [Overview](#overview) - [Key Features](#key-features) - [Architecture](#architecture) - [Quick Start](#quick-start) - [Project Structure](#project-structure) - [Components](#components) - [Control Plane (Rust)](#control-plane-rust) - [Data Plane (Python)](#data-plane-python) - [Dashboard (Next.js)](#dashboard-nextjs) - [API Reference](#api-reference) - [Configuration](#configuration) - [Security Model](#security-model) - [Observability](#observability) - [Evaluation Framework](#evaluation-framework) - [Development](#development) - [Deployment](#deployment) - [License](#license) ## Overview FerrumDeck solves the critical challenge of running AI agents safely in production. While LLMs are probabilistic and unpredictable, production systems require deterministic governance, audit trails, and budget controls. ### The Problem - AI agents can make costly mistakes (token spend, wrong tool calls) - Prompt injection attacks can bypass safety measures - No visibility into what agents are doing in production - Difficult to reproduce and debug agent failures - Compliance requirements demand audit trails ### The Solution FerrumDeck provides a **dual-plane architecture**: | Control Plane (Rust) | Data Plane (Python) | |---------------------|---------------------| | Deterministic state | Probabilistic execution | | Policy enforcement | LLM interactions | | Budget tracking | Tool calls via MCP | | Audit logging | Step execution | | Approval gates | Artifact storage | ## Key Features ### Governance - **Deny-by-Default Tools**: Only explicitly allowed tools can be called - **Approval Gates**: High-risk actions require human approval before execution - **Budget Enforcement**: Automatic run termination when limits exceeded (tokens, cost, time) - **Predictive Budget Forecast**: Deterministic linear + EWMA projection of end-of-run cost after every step, surfacing a `budget_breach_projected` flag on the run API + SSE event (`run.forecast.updated`) before the auto-kill fires. See [`docs/runbooks/budget-forecast.md`](docs/runbooks/budget-forecast.md). - **Policy Engine**: Configurable rules for tool access and risk management - **Explicit Conflict Resolution + Decision Traces**: When multiple policies match a tool call, a named precedence function (`Deny > RequiresApproval > BudgetCap > Allow`) picks the winner deterministically, and every decision carries an audit-grade trace of matched verdicts and overrides surfaced on the run API + `policy.decision.explained` SSE event. See [`docs/runbooks/policy-conflict-resolution.md`](docs/runbooks/policy-conflict-resolution.md). - **Routing-Decision Audit (multi-agent coordination)**: Every time the orchestrator binds a subtask to a concrete agent / role / model, a `RoutingDecision` record (candidates considered, chosen binding, reason code, SHA-256 content hash) is written through the existing immutable audit trail and surfaced on `GET /v1/runs/{id}/routing` plus the `routing.decision.recorded` SSE event. fd-evals replays compare the content hash to detect coordination drift. Anchor: AgensFlow ([arXiv:2605.27466](https://arxiv.org/abs/2605.27466)). See [`docs/runbooks/routing-decision-audit.md`](docs/runbooks/routing-decision-audit.md). - **Champion-Challenger Promotion Gate**: A registered challenger version cannot replace the live champion until it clears a deterministic gate — configurable metric thresholds (inclusive floors) **plus** a required human approval. Deny-by-default: the challenger stays in shadow until explicitly promoted. The decision + metric evidence (SHA-256 content hash for tamper-evidence) flow through the **same** `PolicyDecision` channel every gate uses and are written to the immutable audit trail. Exposed on `POST /v1/promotions/evaluate` (write scope) + `GET /v1/promotions/{agent_id}`, surfaced on the agent dashboard (champion vs challenger + gate status). See [`docs/runbooks/champion-challenger-promotion.md`](docs/runbooks/champion-challenger-promotion.md). ### Observability - **OpenTelemetry Integration**: Full distributed tracing with GenAI semantic conventions - **Cost Tracking**: Real-time token counting and cost calculation per run - **Jaeger UI**: Visual trace exploration and debugging - **Audit Trail**: Immutable logging of every action for compliance - **Tool-call firing rate**: Derived OTel signal (`ferrumdeck.metrics.tool_call_firing_rate`) tracking the share of reasoning steps that invoked at least one tool, per run + per agent over a sliding window. Surfaced on the agent overview tab with a configurable low-firing-rate threshold (default 40%) that flags model regressions or broken tool registries before they propagate. See [`docs/runbooks/tool-call-firing-rate.md`](docs/runbooks/tool-call-firing-rate.md). - **Debt-vs-tax cost decomposition (§2605.27320)**: Per-call `span_role ∈ {primary, retry, judge, guardrail, escalation, revalidation, monitor}` classification on every LLM/tool call, with two derived rollups per task/run — `agent.cost.token` (primary calls = debt) and `agent.cost.tax` (everything else). Dashboard panel ranks tasks by `tax / (token + tax)` descending so retry / escalation storms are visible at a glance. See [`docs/runbooks/cost-decomposition.md`](docs/runbooks/cost-decomposition.md). ### Reproducibility - **Versioned Registry**: Agents, tools, and prompts are version-controlled - **Step-Level Replay**: Debug specific steps with exact inputs - **Deterministic IDs**: ULID-based identifiers for time-ordered, collision-resistant tracking ### Quality - **Evaluation Framework**: Deterministic test suites for agent workflows - **Regression Gating**: CI blocks merges if agent quality degrades - **Baseline Comparisons**: Track performance across versions - **Per-harness eval dimension (Harness-Bench)**: fd-evals reports at the `(model × harness_config)` level — same model under different harness configs can produce different scores. Each run records its `tools_available`, `permission_tier`, `state_recovery`, and `tracing` config alongside the existing baseline, the dashboard groups results by `(model × harness)` with a side-by-side Recharts bar chart, and `DeltaReport` exposes a per-dimension diff (added/removed tools, tier change, recovery change). See [`docs/runbooks/harness-config.md`](docs/runbooks/harness-config.md). ## Architecture ┌─────────────────────────────────────────────────────────────────────────┐ │ Clients │ │ (Dashboard / CLI / SDK / CI Pipelines) │ └─────────────────────────────────────────────────────────────────────────┘ │ │ │ ▼ ▼ ▼ ┌─────────────────┐ ┌──────────────────────────────────────────────────┐ │ DASHBOARD │ │ CONTROL PLANE (Rust) │ │ (Next.js) │ │ │ │ │ │ ┌───────────┐ ┌──────────┐ ┌──────────────┐ │ │ • Runs Monitor │◀──▶│ │ Gateway │ │ Policy │ │ Registry │ │ │ • Approvals │ │ │ (Axum) │ │ Engine │ │ (Versioned) │ │ │ • Analytics │ │ │ │ │ │ │ │ │ │ • Audit Trail │ │ │ • REST │ │ • Budget │ │ • Agents │ │ │ • Evals UI │ │ │ • SSE │ │ • Rules │ │ • Tools │ │ │ │ │ │ • Auth │ │ • Gates │ │ • Versions │ │ └─────────────────┘ │ └───────────┘ └──────────┘ └──────────────┘ │ :3001/:8000 │ │ │ ┌───────────┐ ┌──────────┐ ┌──────────────┐ │ │ │ Audit │ │ DAG │ │ OTEL │ │ │ │ Log │ │Scheduler │ │ Setup │ │ │ └───────────┘ └──────────┘ └──────────────┘ │ └──────────────────────────────────────────────────┘ │ ┌───────────────────┼───────────────────┐ ▼ ▼ ▼ ┌───────────────┐ ┌───────────────┐ ┌───────────┐ │ PostgreSQL │ │ Redis │ │ Jaeger │ │ (pgvector) │ │ Streams │ │ UI │ │ │ │ │ │ │ │ • runs/steps │ │ • Job Queue │ │ • Traces │ │ • agents/tools│ │ • Pub/Sub │ │ • GenAI │ │ • audit_events│ │ │ │ Spans │ └───────────────┘ └───────┬───────┘ └───────────┘ :5433 │ :16686 ▼ ┌───────────────────────────────────────────────────────────┐ │ DATA PLANE (Python) │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │ │ │ Worker │ │ LLM │ │ MCP Router │ │ │ │ │ │ Executor │ │ │ │ │ │ • Poll Queue │ │ │ │ • GitHub MCP │ │ │ │ • Execute │ │ • Claude │ │ • Filesystem MCP │ │ │ │ • Report │ │ • GPT-4 │ │ • Custom Tools │ │ │ │ • Retry │ │ • litellm │ │ • Policy Checks │ │ │ └──────────────┘ └──────────────┘ └──────────────────┘ │ └───────────────────────────────────────────────────────────┘ ### Data Flow 1. **Client** creates a run via `POST /v1/runs` 2. **Gateway** authenticates, validates, creates run in PostgreSQL 3. **Gateway** enqueues first step to Redis Stream 4. **Worker** polls Redis, fetches step details from Gateway 5. **Worker** executes step (LLM call, tool call, etc.) with tracing 6. **Worker** reports result back to Gateway 7. **Gateway** updates state, checks budget, enqueues next step 8. **Repeat** until run completes or fails ### Service Ports | Service | Port | Description | |---------|------|-------------| | Gateway | `8080` | REST API (Rust control plane) | | Dashboard | `3001` / `8000` | Next.js UI (dev) / Static server | | PostgreSQL | `5433` | Database (pgvector enabled) | | Redis | `6379` | Queue and cache | | Jaeger UI | `16686` | Distributed tracing | | OTel Collector | `4317` / `4318` | gRPC / HTTP endpoints | ### Receipts schema The control plane's append-only audit log is documented as a stable receipts substrate compatible with [Foundation Protocol](https://arxiv.org/abs/2605.23218) (Mila + MetaGPT). See [`docs/receipts-schema.md`](docs/receipts-schema.md) for the canonical `AuditEvent` shape, the FP event-substrate mapping (metering / receipt / settlement / policy / provenance / audit), the wrap-don't-replace stance on downstream consumers, and the per-call p95 budget. Drift is gated by the `audit_record_schema_drift` integration test in `rust/crates/fd-audit/tests/`. ## Quick Start ### Prerequisites - **Rust** 1.80+ ([rustup.rs](https://rustup.rs)) - **Python** 3.12+ - **Docker** & Docker Compose - **uv** ([docs.astral.sh/uv](https://docs.astral.sh/uv)) - Fast Python package manager ### 1. Clone and Setup git clone https://github.com/sattyamjjain/ferrumdeck.git cd ferrumdeck # Copy environment file cp .env.example .env # Start infrastructure (PostgreSQL, Redis, Jaeger) make dev-up # Install all dependencies make install # Run database migrations make db-migrate # Build everything make build ### 2. Start Services # Terminal 1: Start the Gateway (Rust) make run-gateway # Gateway running at http://localhost:8080 # Terminal 2: Start a Worker (Python) make run-worker ### 3. Create Your First Run # Create an API key (dev mode) export API_KEY="fd_dev_key_abc123" # Create a run curl -X POST http://localhost:8080/v1/runs \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ -d '{ "agent_id": "agt_safe_pr_agent", "input": { "task": "Review the latest changes in the repository" } }' # Check run status curl http://localhost:8080/v1/runs/{run_id} \ -H "Authorization: Bearer $API_KEY" ### 4. Open the Dashboard # Start the dashboard (static server) make run-dashboard # Open http://localhost:8000 # Or run the Next.js development server cd nextjs && npm run dev # Open http://localhost:3001 The dashboard provides a complete UI for: - Monitoring runs in real-time - Approving/rejecting tool calls - Managing agents and tools - Viewing analytics and audit trails ### 5. View Traces Open Jaeger UI at [http://localhost:16686](http://localhost:16686) to see distributed traces. ## Project Structure ferrumdeck/ ├── .github/ │ └── workflows/ # CI/CD pipelines │ └── ci.yml # Main CI (lint, test, build, eval gate) │ ├── contracts/ # API Contracts │ ├── openapi/ # OpenAPI 3.1 specifications │ │ └── control-plane.openapi.yaml │ └── jsonschema/ # JSON Schema definitions │ ├── run.schema.json │ ├── policy.schema.json │ ├── tool.schema.json │ └── workflow.schema.json │ ├── rust/ # Control Plane (Rust) │ ├── crates/ # Shared libraries │ │ ├── fd-core/ # IDs, errors, config, time utilities │ │ ├── fd-policy/ # Policy engine, budgets, rules │ │ ├── fd-registry/ # Agent/tool versioning │ │ ├── fd-audit/ # Audit logging, redaction │ │ ├── fd-storage/ # PostgreSQL repos + Redis queue │ │ ├── fd-dag/ # DAG scheduler │ │ └── fd-otel/ # OpenTelemetry setup │ └── services/ │ └── gateway/ # Axum HTTP API service │ ├── python/ # Data Plane (Python) │ └── packages/ │ ├── fd-runtime/ # Workflow execution, tracing, client │ ├── fd-worker/ # Queue consumer, step execution │ ├── fd-mcp-router/ # MCP tool routing with policy checks │ ├── fd-mcp-tools/ # MCP server implementations (git, test runner) │ ├── fd-cli/ # Command-line interface │ └── fd-evals/ # Evaluation framework with scorers │ ├── nextjs/ # Dashboard (Next.js 16.1) │ ├── src/ │ │ ├── app/ # App Router pages │ │ │ └── (dashboard)/ # Dashboard route group │ │ │ ├── runs/ # Run monitoring & detail │ │ │ ├── approvals/ # Approval queue │ │ │ ├── agents/ # Agent registry │ │ │ ├── tools/ # Tool registry │ │ │ ├── workflows/ # Workflow management │ │ │ ├── analytics/ # Usage charts │ │ │ ├── audit/ # Audit trail viewer │ │ │ ├── evals/ # Evaluation results │ │ │ ├── policies/ # Policy management │ │ │ ├── logs/ # Container logs │ │ │ └── settings/ # API keys & config │ │ ├── components/ # React components (shadcn/ui) │ │ ├── hooks/ # Custom React hooks │ │ ├── lib/ # API client, utilities │ │ └── types/ # TypeScript interfaces │ └── Dockerfile # Multi-stage production build │ ├── evals/ # Evaluation Suite │ ├── suites/ # Test suite definitions (YAML) │ │ ├── smoke.yaml # Quick smoke tests │ │ └── regression.yaml # Full regression suite │ ├── datasets/ # Test datasets │ ├── agents/ # Agent configs for testing │ ├── scorers/ # Scorer configurations │ └── reports/ # Generated reports (gitignored) │ ├── examples/ # Example Agents │ └── safe-pr-agent/ # PR review agent example │ ├── agent.yaml # Agent configuration │ └── workflow.yaml # Multi-step workflow │ ├── deploy/ │ └── docker/ │ ├── compose.dev.yaml # Local development stack │ ├── Dockerfile.gateway # Gateway Docker build │ └── Dockerfile.worker # Worker Docker build │ ├── config/ │ └── mcp-config.json # MCP server configuration │ ├── observability/ │ └── otel/ │ └── collector.yaml # OTel Collector configuration │ ├── docs/ # Documentation │ ├── architecture/ # System design docs │ ├── adr/ # Architecture decisions │ ├── security/ # Security documentation │ └── runbooks/ # Operational guides │ ├── Cargo.toml # Rust workspace manifest ├── pyproject.toml # Python workspace manifest (uv) ├── Makefile # Development commands └── .env.example # Environment template ## Components ### Control Plane (Rust) #### fd-core — Foundation Primitives Type-safe IDs, error handling, and configuration. **ID System** (ULID-based with prefixes): TenantId // ten_01ARZ3NDEKTSV4RRFFQ69G5FAV AgentId // agt_01ARZ3NDEKTSV4RRFFQ69G5FAV RunId // run_01ARZ3NDEKTSV4RRFFQ69G5FAV StepId // stp_01ARZ3NDEKTSV4RRFFQ69G5FAV PolicyRuleId // pol_01ARZ3NDEKTSV4RRFFQ69G5FAV **Error Types**: - `NotFound`, `Validation`, `Unauthorized`, `Forbidden` - `PolicyDenied`, `BudgetExceeded`, `ApprovalRequired` - `Database`, `Queue`, `ExternalService`, `Internal` #### fd-policy — Policy Engine Governance rules enforcement with deny-by-default security. **Tool Allowlist**: pub struct ToolAllowlist { allowed_tools: Vec, // Explicitly allowed approval_required: Vec, // Require human approval denied_tools: Vec, // Explicitly denied } // Priority: Denied > Approval Required > Allowed > Default Deny **Budget System**: pub struct Budget { max_input_tokens: Option, // Default: 100,000 max_output_tokens: Option, // Default: 50,000 max_total_tokens: Option, // Default: 150,000 max_tool_calls: Option, // Default: 50 max_wall_time_ms: Option, // Default: 5 minutes max_cost_cents: Option, // Default: $5.00 } **Tool Risk Levels**: | Level | Description | Examples | |-------|-------------|----------| | Low | Read-only operations | read_file, list_directory | | Medium | Limited mutations | write_file (with approval) | | High | External communications | send_email, create_pr | | Critical | Security-sensitive | deploy, payment, delete | #### fd-registry — Versioned Registry Immutable, version-controlled storage for agents and tools. // Agent versions are immutable - changes require new versions pub struct AgentVersion { id: AgentVersionId, agent_id: AgentId, version: String, // Semantic version: "1.2.3" system_prompt: String, model: String, // "claude-sonnet-4-20250514" allowed_tools: Vec, model_params: Value, // temperature, max_tokens, etc. changelog: String, } #### fd-storage — Database & Queue PostgreSQL repositories with SQLx compile-time checked queries: - `RunsRepo`, `StepsRepo`, `AgentsRepo`, `ToolsRepo` - `PoliciesRepo`, `ApiKeysRepo`, `AuditRepo`, `WorkflowsRepo` Redis Streams for reliable job queuing: - Consumer groups for horizontal scaling - Automatic acknowledgment and retry - Message format: `StepJob` with context #### fd-audit — Audit Trail Append-only, immutable event logging: - Run creation/completion - Tool calls (allowed/denied) - Policy decisions - Approval resolutions - API key usage #### Gateway Service Axum-based HTTP API with middleware: - **Authentication**: API keys (SHA256 hashed) or OAuth2 JWT - **Rate Limiting**: Per-tenant request limiting - **Request ID**: X-Request-ID for distributed tracing ### Data Plane (Python) #### fd-runtime — Runtime Primitives **Models**: class RunStatus(Enum): CREATED, QUEUED, RUNNING, WAITING_APPROVAL, COMPLETED, FAILED, BUDGET_KILLED, POLICY_BLOCKED class StepType(Enum): LLM, TOOL, RETRIEVAL, SANDBOX, APPROVAL class Budget(BaseModel): max_input_tokens: int = 100_000 max_output_tokens: int = 50_000 max_total_tokens: int = 150_000 max_tool_calls: int = 50 max_wall_time_ms: int = 300_000 # 5 minutes max_cost_cents: int = 500 # $5.00 **Control Plane Client**: client = ControlPlaneClient(base_url, api_key) run = await client.create_run(agent_id, input_data) await client.submit_step_result(run_id, step_id, output, status) **Tracing** (GenAI Semantic Conventions): with trace_llm_call(model="claude-sonnet-4", run_id=run.id) as span: response = await llm.complete(messages) set_llm_response_attributes(span, response) # Automatically tracks: tokens, cost, latency #### fd-worker — Step Executor Queue consumer that executes individual steps: async def run_worker(): consumer = RedisQueueConsumer(redis_url) executor = StepExecutor( control_plane_url, api_key, mcp_servers=load_mcp_config(), tool_allowlist=allowlist, ) while running: job = await consumer.poll() if job: await executor.execute(job) **Retry Strategy** (exponential backoff): @retry( retry=retry_if_exception_type(RETRYABLE_EXCEPTIONS), stop=stop_after_attempt(3), wait=wait_exponential(min=1000, max=30000) ) async def execute_with_retry(step): ... #### fd-mcp-router — Tool Router Deny-by-default MCP tool routing: class MCPRouter: async def call_tool(self, tool_name: str, args: dict) -> ToolResult: # 1. Check allowlist (deny-by-default) status = self.allowlist.check(tool_name) if status == "denied": return ToolResult(success=False, error="Tool not allowed") if status == "requires_approval": # Pause and wait for human approval ... # 2. Find server and execute server = self.find_server(tool_name) return await server.call(tool_name, args) **Supported MCP Servers**: - GitHub (`@modelcontextprotocol/server-github`) - Filesystem (`@modelcontextprotocol/server-filesystem`) - Custom servers (stdio or HTTP-based) #### fd-cli — Command Line Interface # Runs fd run create --agent agt_xxx --input '{"task": "..."}' fd run status fd run logs --follow # Registry fd agent list fd agent get fd tool list # Approvals fd approval list fd approval approve fd approval reject --reason "..." # Evaluations fd eval run --dataset evals/datasets/safe-pr-agent.jsonl fd eval report --output reports/latest.html #### fd-evals — Evaluation Framework Deterministic testing for agent workflows: runner = EvalRunner( scorers=[ FilesChangedScorer(), PRCreatedScorer(), TestPassScorer(), LintScorer(), ], control_plane_url=url, ) summary = runner.run_eval( dataset_path="evals/datasets/safe-pr-agent.jsonl", agent_id="agt_safe_pr_agent", max_tasks=20, ) # Returns: pass_rate, avg_score, cost_per_task, regressions #### fd-mcp-tools — MCP Server Implementations Built-in MCP tool servers for common operations: # Git operations server from fd_mcp_tools import GitMCPServer # Test runner server from fd_mcp_tools import TestRunnerMCPServer ### Dashboard (Next.js) A professional admin UI built with Next.js 16.1.1, React 19.2, and Tailwind CSS 4. #### Key Pages | Page | Description | |------|-------------| | `/overview` | Dashboard home with key metrics and recent activity | | `/runs` | Real-time run monitoring with step timeline visualization | | `/runs/{runId}` | Detailed run view with step-by-step execution trace | | `/approvals` | Approval queue with approve/reject actions | | `/agents` | Agent registry with version management | | `/tools` | Tool registry and MCP server status | | `/workflows` | Multi-step workflow definitions and runs | | `/analytics` | Usage charts, cost tracking, performance metrics | | `/audit` | Immutable audit trail viewer with filtering | | `/evals` | Evaluation suite results and comparisons | | `/policies` | Policy configuration and management | | `/threats` | Security threat detection and monitoring | | `/logs` | Container and service logs viewer | | `/settings` | API key management and configuration | #### Technology Stack Next.js 16.1.1 # App Router with standalone output React 19.2.3 # Concurrent features, Server Components Tailwind CSS 4 # Utility-first styling with dark theme TanStack Query 5 # Server state with polling (2-3s intervals) TanStack Table 8 # Data tables with sorting/filtering Radix UI # Accessible component primitives shadcn/ui # Pre-built component library Recharts 3 # Analytics visualizations nuqs 2 # URL state management sonner 2 # Toast notifications #### Running the Dashboard # Development (hot reload) cd nextjs && npm install && npm run dev # Open http://localhost:3001 # Production build npm run build npm start # Runs on port 3001 # Static dashboard (simple HTTP server) make run-dashboard # Open http://localhost:8000 # Docker docker build -t ferrumdeck-dashboard nextjs/ docker run -p 3001:3001 \ -e GATEWAY_URL=http://gateway:8080 \ -e FD_API_KEY=fd_dev_key_abc123 \ ferrumdeck-dashboard #### Environment Variables GATEWAY_URL=http://localhost:8080 # Control plane URL FD_API_KEY=fd_dev_key_abc123 # API key for authentication NEXT_PUBLIC_POLL_INTERVAL=2000 # Polling interval (ms) #### API Proxy (BFF Pattern) The dashboard proxies all API calls through `/api/v1/*` routes: // src/app/api/v1/[...path]/route.ts // Forwards requests to GATEWAY_URL with authentication ## API Reference ### Authentication All API requests require authentication via `Authorization` header: # API Key Authorization: Bearer fd_tenant_abc123xyz # Or OAuth2 JWT Authorization: Bearer eyJhbGciOiJSUzI1NiIs... ### Endpoints #### Runs | Method | Endpoint | Description | |--------|----------|-------------| | POST | `/v1/runs` | Create a new run | | GET | `/v1/runs` | List runs with filtering | | GET | `/v1/runs/{runId}` | Get run details | | POST | `/v1/runs/{runId}/cancel` | Cancel a running run | | GET | `/v1/runs/{runId}/steps` | List steps in a run | | POST | `/v1/runs/{runId}/steps/{stepId}` | Submit step result (worker) | | POST | `/v1/runs/{runId}/check-tool` | Check tool policy before execution | #### Registry | Method | Endpoint | Description | |--------|----------|-------------| | GET | `/v1/registry/agents` | List agents | | POST | `/v1/registry/agents` | Create agent | | GET | `/v1/registry/agents/{agentId}` | Get agent details | | GET | `/v1/registry/agents/{agentId}/versions` | List agent versions | | POST | `/v1/registry/agents/{agentId}/versions` | Create agent version | | GET | `/v1/registry/agents/{agentId}/stats` | Get agent statistics | | GET | `/v1/registry/tools` | List tools | | POST | `/v1/registry/tools` | Create tool | | GET | `/v1/registry/tools/{toolId}` | Get tool details | | GET | `/v1/registry/mcp-servers` | List MCP servers | #### Approvals | Method | Endpoint | Description | |--------|----------|-------------| | GET | `/v1/approvals` | List pending approvals | | PUT | `/v1/approvals/{approvalId}` | Approve or reject | #### Policies | Method | Endpoint | Description | |--------|----------|-------------| | GET | `/v1/policies` | List policies | | POST | `/v1/policies` | Create policy | | GET | `/v1/policies/{policyId}` | Get policy details | | PATCH | `/v1/policies/{policyId}` | Update policy | | DELETE | `/v1/policies/{policyId}` | Delete policy | #### API Keys | Method | Endpoint | Description | |--------|----------|-------------| | GET | `/v1/api-keys` | List API keys | | GET | `/v1/api-keys/{keyId}` | Get API key details | | POST | `/v1/api-keys/{keyId}/revoke` | Revoke an API key | #### Workflows | Method | Endpoint | Description | |--------|----------|-------------| | POST | `/v1/workflows` | Create workflow | | GET | `/v1/workflows` | List workflows | | GET | `/v1/workflows/{workflowId}` | Get workflow | | GET | `/v1/workflows/{workflowId}/runs` | List workflow runs | | POST | `/v1/workflow-runs` | Execute workflow | | GET | `/v1/workflow-runs/{runId}` | Get execution status | | POST | `/v1/workflow-runs/{runId}/cancel` | Cancel workflow run | | GET | `/v1/workflow-runs/{runId}/executions` | List step executions | | POST | `/v1/workflow-runs/{runId}/executions` | Create step execution | | POST | `/v1/workflow-runs/{runId}/executions/{executionId}` | Submit step result | #### Health & Documentation | Method | Endpoint | Description | |--------|----------|-------------| | GET | `/health` | Liveness probe | | GET | `/ready` | Readiness probe | | GET | `/docs` | Swagger UI documentation | | GET | `/api-docs/openapi.json` | OpenAPI specification | ### Example: Create a Run curl -X POST http://localhost:8080/v1/runs \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ -d '{ "agent_id": "agt_safe_pr_agent", "input": { "task": "Review PR #123 in repo owner/repo", "repository": "owner/repo", "pr_number": 123 }, "config": { "budget": { "max_total_tokens": 50000, "max_cost_cents": 100 } } }' Response: { "id": "run_01ARZ3NDEKTSV4RRFFQ69G5FAV", "agent_id": "agt_safe_pr_agent", "status": "queued", "created_at": "2024-12-24T10:00:00Z" } ## Configuration ### Environment Variables Create a `.env` file from `.env.example`: # ============================================ # Application # ============================================ FERRUMDECK_ENV=development FERRUMDECK_LOG_LEVEL=debug FERRUMDECK_LOG_FORMAT=pretty # or "json" for production # ============================================ # Gateway # ============================================ GATEWAY_HOST=0.0.0.0 GATEWAY_PORT=8080 GATEWAY_WORKERS=4 # ============================================ # Database (PostgreSQL) # ============================================ DATABASE_URL=postgres://ferrumdeck:ferrumdeck@localhost:5433/ferrumdeck DATABASE_MAX_CONNECTIONS=20 DATABASE_MIN_CONNECTIONS=5 # ============================================ # Queue (Redis) # ============================================ REDIS_URL=redis://localhost:6379 REDIS_QUEUE_PREFIX=fd:queue: # ============================================ # LLM Providers # ============================================ ANTHROPIC_API_KEY=sk-ant-api03-xxx OPENAI_API_KEY=sk-xxx DEFAULT_MODEL=claude-sonnet-4-20250514 # ============================================ # OpenTelemetry # ============================================ OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 OTEL_SERVICE_NAME=ferrumdeck OTEL_TRACES_SAMPLER=parentbased_traceidratio OTEL_TRACES_SAMPLER_ARG=1.0 # ============================================ # Worker # ============================================ FD_API_KEY=fd_dev_key_abc123 CONTROL_PLANE_URL=http://localhost:8080 WORKER_CONCURRENCY=4 WORKER_MAX_RETRIES=3 # ============================================ # OAuth2 (Optional) # ============================================ OAUTH2_ENABLED=false OAUTH2_JWKS_URI=https://your-provider/.well-known/jwks.json OAUTH2_ISSUER=https://your-provider/ OAUTH2_AUDIENCE=api://ferrumdeck OAUTH2_TENANT_CLAIM=tenant_id ### MCP Server Configuration Configure MCP servers in `config/mcp-servers.json`: { "servers": [ { "name": "github", "command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"], "env": { "GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}" } }, { "name": "filesystem", "command": "npx", "args": ["-y", "@modelcontextprotocol/server-filesystem", "/workspace"] } ], "allowlist": { "allowed": [ "read_file", "list_directory", "search_files", "get_file_contents", "list_commits", "get_pull_request" ], "approval_required": [ "write_file", "create_file", "create_pull_request", "create_issue", "push_files" ], "denied": [ "delete_file", "delete_branch", "merge_pull_request" ] } } ## Security Model ### Defense in Depth FerrumDeck implements multiple security layers: ┌─────────────────────────────────────────────────────────┐ │ Layer 1: Authentication │ │ • API Keys (SHA256 hashed, scoped) │ │ • OAuth2/JWT with tenant claims │ ├─────────────────────────────────────────────────────────┤ │ Layer 2: Deny-by-Default Tools │ │ • Explicit allowlist required │ │ • Risk level classification │ │ • Per-agent tool restrictions │ ├─────────────────────────────────────────────────────────┤ │ Layer 3: Budget Enforcement │ │ • Token limits (input, output, total) │ │ • Cost limits (in cents) │ │ • Time limits (wall clock) │ │ • Automatic run termination │ ├─────────────────────────────────────────────────────────┤ │ Layer 4: Approval Gates │ │ • Human-in-the-loop for sensitive actions │ │ • Configurable per tool │ │ • Timeout with auto-rejection │ ├─────────────────────────────────────────────────────────┤ │ Layer 5: Audit Trail │ │ • Immutable event logging │ │ • Every action recorded │ │ • Compliance-ready │ └─────────────────────────────────────────────────────────┘ ### Airlock RASP — Egress DLP The data-exfiltration shield in `rust/crates/fd-policy/src/airlock/exfiltration.rs` runs in-process on every network-tool dispatch and layers three checks against the outbound payload: 1. **Credential DLP** (`credential_dlp.rs`) — scans for cloud keys (AWS access key id, GCP service-account JSON), PATs (GitHub, Slack bot tokens, Stripe live keys, Anthropic and OpenAI keys), and financial account numbers. False positives on PAN and IBAN are suppressed with **Luhn** (mod-10) and **mod-97** checksum gates respectively, so a random 16-digit correlation id is not flagged as a credit card. Matches are recorded with a redacted form (first-4 + last-4 only) — the raw secret never reaches audit storage. 2. **Domain allowlist + raw-IP block** — deny-by-default, with subdomain matching and IP-literal rejection to prevent C2-style direct dialing. 3. **Per-domain data budget** — configurable `data_budget_per_domain_bytes` caps cumulative outbound bytes per `(run, domain)` tuple. Further dispatches that would exceed the budget are denied; the violation reuses the existing audit and shadow/enforce-mode plumbing, so an exceedance kills the run the same way a budget-exceeded policy decision does. Configure via `ExfiltrationConfig` — see `rust/crates/fd-policy/src/airlock/config.rs`. The Anti-RCE matcher, Velocity / Circuit Breaker, and Schema-Drift guard sit on the same `AirlockInspector` as sibling layers. ### Threat Model **Assumption**: Prompt injection cannot be fully prevented. **Strategy**: Containment, not prevention. | Threat | Mitigation | |--------|-----------| | Malicious tool calls | Deny-by-default allowlist | | Token exhaustion | Budget limits with auto-kill | | Data exfiltration (destination) | Domain allowlist + raw-IP block (Airlock RASP) | | Credential exfiltration (payload) | Airlock credential DLP — cloud keys, PATs, Luhn-valid PANs, mod-97 IBANs (redacted in audit) | | Slow-leak exfil to allowed host | Airlock per-domain data budget per run | | Tool-call payload drift | Airlock schema-drift guard against the registered `ToolVersion` JSON Schema | | Privilege escalation | Scoped API keys, tenant isolation | | Audit tampering | Append-only, immutable logging | ## Observability ### OpenTelemetry Integration FerrumDeck uses OpenTelemetry with GenAI semantic conventions: **Tracked Attributes**: gen_ai.system = "anthropic" | "openai" gen_ai.request.model = "claude-sonnet-4-20250514" gen_ai.usage.input_tokens = 1234 gen_ai.usage.output_tokens = 5678 gen_ai.usage.cost_usd = 0.0234 ferrumdeck.run.id = "run_xxx" ferrumdeck.step.id = "stp_xxx" ferrumdeck.agent.id = "agt_xxx" ferrumdeck.tenant.id = "ten_xxx" ### Receiver Attestation (optional, off by default) FerrumDeck spans are **agent-self-reported**: the agent (or the worker on its behalf) describes what it did. That is useful, but a self-reported span is an *assertion*, not a *proof* — nothing independently confirms the call happened as described. Receiver attestation is an **optional** cross-check. When enabled, a tool/ service call may carry a minimal, Sello-style **receiver-signed receipt** (`receiver_id`, `tool_name`, a per-call `call_token` binding, an owner-encrypted `payload_ref`, and a signature). The trace plane (`fd_runtime.attestation`) verifies that the receipt (a) has a valid receiver signature and (b) binds to the *same call* the span claims (same tool name + same `call_token`), then annotates the span: ferrumdeck.attestation.attested = true | false ferrumdeck.attestation.status = "attested" | "unverified_no_receipt" | "unverified_signature_invalid" | "unverified_mismatch" | "unverified_unknown_receiver" ferrumdeck.attestation.self_reported_unverified = true | false ferrumdeck.attestation.receiver_id = "github-mcp" ferrumdeck.attestation.call_token = "call_tok_xxx" **Enable it** with the environment switch (off unless explicitly set): export FD_ATTESTATION_ENABLED=true # default: false (existing pipelines unaffected) and supply a `ReceiptVerifier` (keyed per receiver) + the per-call receipt to `trace_tool_call(...)`. When disabled, the verification path is skipped entirely and spans are byte-for-byte identical to before. **Trust model — what attestation DOES and does NOT prove.** Be honest about this; it is deliberately narrow: - ✅ **Does** prove that *a party holding the receiver's key* issued a receipt that binds to this specific call (same tool + same `call_token`), and that the receipt was not altered after signing. - ✅ **Does** give you an honest, additive signal: a span without a verified receipt is flagged `self_reported_unverified = true` instead of being silently trusted. - ❌ Does **not** prove the call's *contents* or *results* are correct — the `payload_ref` is owner-encrypted and the trace plane never decrypts it. Attestation proves *binding*, not *semantics*. - ❌ Does **not** provide third-party non-repudiation with the default scheme. The default is **HMAC-SHA256** (a symmetric, shared-secret signature): a valid signature proves the holder of the receiver key produced it, not that *only* the receiver could have. The `ReceiptVerifier` interface is scheme-agnostic so an asymmetric scheme (e.g. Ed25519) can replace HMAC later without changing callers. - ❌ Does **not** enforce anything. Unattested spans are **never dropped** — most spans are unattested today. This is signal for the trace view, not a gate. There is no "attestation required" mode. ### Jaeger UI Access traces at [http://localhost:16686](http://localhost:16686): - Search by run ID, agent ID, or error status - View step execution timeline - Analyze token usage and costs - Debug failures with full context ### Cost Tracking Automatic cost calculation based on model pricing: | Model | Input ($/1M) | Output ($/1M) | |-------|-------------|---------------| | claude-opus-4 | $15.00 | $75.00 | | claude-sonnet-4 | $3.00 | $15.00 | | gpt-4o | $2.50 | $10.00 | | gpt-4o-mini | $0.15 | $0.60 | ## Example Agents ### Safe PR Agent A flagship example demonstrating FerrumDeck's governance features. Located in `examples/safe-pr-agent/`. **Agent Configuration** (`agent.yaml`): name: safe-pr-agent description: | Reads a repository, analyzes code, proposes changes, runs tests in sandbox, and creates a pull request. Every action is permissioned, traced, and cost-accounted. default_model: claude-sonnet-4-20250514 # Read-only tools allowed by default allowed_tools: - read_file - list_files - search_code # These require human approval approval_required_tools: - write_file - create_pr # Governance limits budget: max_input_tokens: 50000 max_output_tokens: 20000 max_tool_calls: 30 max_wall_time_ms: 180000 # 3 minutes max_cost_cents: 100 # $1 **Create Your Own Agent**: # Copy the example cp -r examples/safe-pr-agent examples/my-agent # Edit the configuration vim examples/my-agent/agent.yaml # Register with the control plane curl -X POST http://localhost:8080/v1/registry/agents \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ -d @examples/my-agent/agent.yaml ## Evaluation Framework ### Running Evaluations # Run full evaluation suite ./scripts/run-evals.sh # Run specific dataset fd eval run \ --dataset evals/datasets/safe-pr-agent.jsonl \ --agent agt_safe_pr_agent \ --output evals/reports/latest.json # Compare against baseline fd eval compare \ --baseline evals/reports/baseline.json \ --current evals/reports/latest.json ### Evaluation Dataset Format {"task_id": "pr-review-001", "input": {"task": "Review PR #1"}, "expected": {"files_changed": true}} {"task_id": "pr-review-002", "input": {"task": "Review PR #2"}, "expected": {"files_changed": true}} ### CI Integration Evaluations run automatically on PRs to `main`: # .github/workflows/evals.yml - name: Run evaluations run: fd eval run --suite smoke --parallel 4 - name: Check for regressions run: | if [ $(jq '.pass_rate' report.json) -lt 80 ]; then echo "Eval gate FAILED: Pass rate below 80%" exit 1 fi ## Development ### Prerequisites # Install Rust curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh # Install uv (Python package manager) curl -LsSf https://astral.sh/uv/install.sh | sh # Install Docker # See: https://docs.docker.com/get-docker/ ### Common Commands # Start development infrastructure make dev-up # Stop infrastructure make dev-down # Install all dependencies make install # Build everything make build # Run all tests make test # Format code make fmt # Lint code make lint # Run full CI checks locally make check # Run database migrations make db-migrate # Start gateway make run-gateway # Start worker make run-worker ### Running Tests # All tests make test # Rust tests cargo test --workspace # Python tests uv run pytest python/packages/fd-evals/tests/ -v uv run pytest python/packages/fd-worker/tests/ -v # Specific package cargo test -p fd-policy uv run pytest python/packages/fd-runtime # With coverage cargo tarpaulin --out Html uv run pytest --cov=fd_runtime --cov-report=html # Next.js type checking cd nextjs && npx tsc --noEmit ### Code Quality # All checks make check # Rust cargo fmt --all -- --check cargo clippy --workspace --all-targets -- -D warnings # Python uv run ruff check python/ uv run ruff format --check python/ uv run pyright python/ # Next.js cd nextjs && npm run lint ## Deployment ### Production Checklist - [ ] **Database**: Use managed PostgreSQL with pgvector (RDS, Cloud SQL, etc.) - [ ] **Redis**: Use managed Redis (ElastiCache, Redis Cloud, etc.) - [ ] **TLS**: Enable HTTPS for all API endpoints - [ ] **Secrets**: Use secrets manager for API keys and LLM tokens - [ ] **Monitoring**: Set up CloudWatch/Datadog metrics - [ ] **Logging**: Centralized logging (ELK, CloudWatch Logs) - [ ] **Backups**: Daily PostgreSQL snapshots - [ ] **Rate Limiting**: Configure per-tenant limits - [ ] **OAuth2**: Enable for production authentication - [ ] **Dashboard**: Deploy behind CDN with proper CORS settings - [ ] **Workers**: Scale horizontally with multiple instances ### Docker Deployment # Build all images docker build -t ferrumdeck-gateway -f deploy/docker/Dockerfile.gateway . docker build -t ferrumdeck-worker -f deploy/docker/Dockerfile.worker . docker build -t ferrumdeck-dashboard nextjs/ # Run with Docker Compose (development) docker compose --env-file .env -f deploy/docker/compose.dev.yaml up -d # Services will be available at: # Gateway: http://localhost:8080 # Dashboard: http://localhost:3001 # Jaeger: http://localhost:16686 ### Kubernetes Helm charts coming soon. For now, use the Docker images with your preferred orchestration. **Minimum resources per service:** - Gateway: 512MB RAM, 0.5 CPU - Worker: 1GB RAM, 1 CPU (scales horizontally) - Dashboard: 256MB RAM, 0.25 CPU ## Contributing 1. Fork the repository 2. Create a feature branch (`git checkout -b feature/amazing-feature`) 3. Make your changes 4. Run tests (`make check`) 5. Commit (`git commit -m 'Add amazing feature'`) 6. Push (`git push origin feature/amazing-feature`) 7. Open a Pull Request ### Code Style - **Rust**: Follow `rustfmt` defaults, clippy warnings as errors - **Python**: Follow `ruff` rules (see `pyproject.toml`), pyright type checking - **TypeScript**: ESLint with Next.js config - **Commits**: Use conventional commits (`feat:`, `fix:`, `docs:`, etc.) See [AGENTS.md](AGENTS.md) for detailed coding guidelines and single-test commands. ## License Apache-2.0 — see [LICENSE](LICENSE) for details. ## Acknowledgments **Rust Control Plane:** - [Axum](https://github.com/tokio-rs/axum) — Web framework - [SQLx](https://github.com/launchbadge/sqlx) — Async SQL with compile-time checks - [Tower](https://github.com/tower-rs/tower) — Middleware framework - [Tokio](https://github.com/tokio-rs/tokio) — Async runtime **Python Data Plane:** - [litellm](https://github.com/BerriAI/litellm) — Unified LLM interface - [MCP](https://modelcontextprotocol.io/) — Model Context Protocol - [Pydantic](https://github.com/pydantic/pydantic) — Data validation - [Tenacity](https://github.com/jd/tenacity) — Retry with backoff **Dashboard:** - [Next.js](https://nextjs.org/) — React framework - [Tailwind CSS](https://tailwindcss.com/) — Utility-first CSS - [shadcn/ui](https://ui.shadcn.com/) — Component library - [TanStack Query](https://tanstack.com/query) — Server state management - [Radix UI](https://www.radix-ui.com/) — Accessible primitives - [Recharts](https://recharts.org/) — Chart library **Observability:** - [OpenTelemetry](https://opentelemetry.io/) — Tracing framework - [Jaeger](https://www.jaegertracing.io/) — Distributed tracing UI
标签:AI代理, AI安全, AI工作流, AI治理, API集成, Chat Copilot, Python, Rust, 人工智能, 人工智能安全, 双平面架构, 可观测性, 可视化界面, 可靠性, 合规性, 安全策略, 审计日志, 开源许可, 控制平面, 提示词设计, 数据平面, 无后门, 生产级平台, 生产部署, 用户模式Hook绕过, 网络流量审计, 自动化攻击, 调试, 逆向工具, 错误处理, 预算控制