BadBoy0170/p2p-sec-mesh
GitHub: BadBoy0170/p2p-sec-mesh
Stars: 1 | Forks: 0
# P2P Zero-Trust Security Mesh
[](https://golang.org)
[](https://python.org)
[](https://grpc.io)
[](https://docker.com)
## The Problem It Solves
In a traditional enterprise network, one firewall breach gives an attacker **free lateral movement**. Every other machine trusts its neighbours by default.
This project eliminates that assumption. Every node **constantly watches itself and its neighbours**. When a node detects a compromise — via CPU spikes, honeypot triggers, or suspicious traffic — it uses a local LLM to score the threat, then **cryptographically broadcasts a quarantine vote** to the mesh. Peers verify the signature and sever connections within seconds, **without any central authority**.
## Architecture
┌───────────────┐
│ Coordinator │ ← HTTP REST + Web Dashboard (port 8090)
│ (Phonebook) │ Introduces nodes, serves topology UI.
└──────┬────────┘ Steps back from security decisions.
│ register
┌────────────┼────────────┐
│ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│ Node A │ │ Node B │ │ Node C │
│ Go+AI │◄─┤ Go+AI ├─►│ Go+AI │
└─────────┘ └─────────┘ └─────────┘
│ │ │
└────────────┴────────────┘
Direct P2P gRPC (mTLS)
Each node runs two processes (Sidecar Architecture):
| Process | Language | Responsibility |
|---------|----------|----------------|
| **Peer Node** | Go | Networking, honeypot, telemetry, gossip, cryptography |
| **AI Sidecar** | Python | LLM queries via Ollama, threat scoring, Pydantic response validation |
## The Threat Lifecycle (Step-by-Step)
┌─────────────────────┐
Attacker ──► port 2222 ──► Honeypot ──► │ TelemetryEvent │
│ (core.EventCh) │
└────────┬────────────┘
│ gRPC
┌────────▼────────────┐
│ Python AI Sidecar │
│ (Ollama LLM) │
│ threat_score: 8 │ ← fallback: rule-based
└────────┬────────────┘ if Ollama unavailable
│ > 7 threshold
┌────────▼────────────┐
│ Sign QuarantineVote│
│ (Ed25519) │
└────────┬────────────┘
│ broadcast
┌──────────────┴──────────────┐
┌────▼────┐ ┌────▼────┐
│ Node A │ verify signature │ Node C │
│ sever ✓ │ │ sever ✓ │
└─────────┘ └─────────┘
│
┌────────▼────────────┐
│ /api/report │ ← visible in dashboard
│ AI Events feed │
└─────────────────────┘
1. **Infiltration** — Attacker connects to the honeypot on port 2222 (mock SSH).
2. **Telemetry** — Go layer detects the unauthorized attempt and queues a `TelemetryEvent`.
3. **Inference** — The Go sidecar bridge forwards it to the Python AI via gRPC.
4. **Analysis** — Python queries Ollama (`dolphin-llama3`) with a strict JSON-only system prompt. If Ollama is unavailable, a deterministic rule-based fallback scores `HONEYPOT_TRIPPED` at 9/10 automatically.
5. **Gossip** — If `threat_score > 7`, the compromised node signs a `QuarantineVote` with its Ed25519 private key and broadcasts it to all peers.
6. **Consensus** — Peers verify the cryptographic signature before tallying the vote. Self-votes trigger immediate quarantine; external votes require a strict majority (> 50%).
7. **Self-Healing** — On consensus, the mesh severs all TCP connections to the quarantined node. Healthy nodes remain connected to each other.
8. **Reporting** — The decision (score, method `ai`/`rule-based`, decision `quarantine`/`monitor`) is pushed to the coordinator's `/api/report` endpoint and shown live in the web dashboard.
## Technology Stack
| Component | Technology | Why |
|-----------|-----------|-----|
| **P2P Networking** | Go (Goroutines) | Thousands of concurrent connections, <1ms goroutine spawn |
| **AI Inference** | Python + Ollama | Native LLM ecosystem, easy prompt engineering |
| **Bridge Protocol** | gRPC (protobuf) | Type-safe, millisecond IPC between Go and Python |
| **Transport Security** | mTLS (mutual TLS) | Both client and server verify identity — rogue nodes rejected at handshake |
| **Identity / Signing** | Ed25519 | 64-byte signatures, fast verification, immune to timing attacks |
| **Rate Limiting** | Token-Bucket (`x/time/rate`) | Prevents P2P packet-flood memory exhaustion |
| **Anti-Injection** | Regex sanitization + truncation | Strips injection keywords, control chars before LLM input |
| **Containerization** | Docker Compose | One-command reproducible 5-node cluster |
| **Web Dashboard** | vis.js + Vanilla JS | Live topology graph with animated glow on anomaly/quarantined nodes |
## Security Hardening
### mTLS (Mutual TLS)
All P2P gRPC connections require a certificate signed by the project's local CA. A rogue container that doesn't present a valid cert is **immediately rejected** at the handshake layer — no application code ever sees its packets.
### Token-Bucket Rate Limiting
Each peer's incoming gRPC port runs a `golang.org/x/time/rate` limiter: 10 requests/second per source IP. Packet floods are dropped before deserialization, preventing memory exhaustion.
### Ed25519 Signature Verification
Every `QuarantineVote` is signed with the originating node's Ed25519 private key. Receiving nodes verify the signature before tallying the vote. A forged quarantine vote — even one that looks legitimate — will be **rejected**.
### Prompt Injection Sanitization
The Python sidecar strips injection keywords (`ignore`, `disregard`, `system prompt`, etc.), control characters, and backticks/braces before embedding telemetry into the LLM prompt. Attacker-controlled input in a packet's source IP cannot manipulate the LLM's output.
### Honeypot Architecture
Fake SSH (2222) and HTTP (8888) listeners accept connections but **serve no data** and immediately close. Any connection is treated as a critical intrusion — the ports have no legitimate use.
### Rule-Based Fallback
If the AI sidecar or Ollama is unreachable, `HONEYPOT_TRIPPED` events still trigger quarantine via deterministic rules (score = 9/10). The system never silently ignores an intrusion.
## Project Structure
p2p-sec-mesh/
├── coordinator/ # HTTP REST peer registry + web dashboard (Go)
│ ├── main.go # Registry, /api/topology, /api/report, HTTP handlers
│ ├── index.html # Web dashboard (topology graph + AI events feed)
│ ├── styles.css # Minimal system-font dark UI
│ ├── js/
│ │ ├── app.js # Dashboard controller (polls topology + report API)
│ │ ├── NetworkGraph.js # vis.js graph with animated pulse for anomaly nodes
│ │ └── ApiService.js # Fetch wrapper
│ └── Dockerfile
│
├── peer-node/ # P2P node (Go)
│ ├── main.go # Entry point — wires all subsystems
│ ├── node_keys.go # Ed25519 key generation
│ ├── core/
│ │ ├── network.go # Coordinator registration, mTLS gRPC, heartbeat
│ │ ├── gossip.go # Vote signing, signature verification, quarantine
│ │ └── sidecar.go # Go→Python gRPC bridge, AI + rule-based fallback, report push
│ ├── telemetry/
│ │ ├── monitor.go # CPU/RAM monitoring (gopsutil)
│ │ └── honeypot.go # Fake SSH/HTTP listeners
│ └── Dockerfile
│
├── analyzer/ # AI Sidecar (Python)
│ ├── app.py # gRPC server (port 50052)
│ ├── llm_client.py # Ollama integration + sanitization + Pydantic validation
│ ├── stubs/ # Auto-generated gRPC Python stubs
│ ├── requirements.txt
│ └── Dockerfile
│
├── protos/
│ └── sidecar.proto # Shared message definitions
│
├── gen/go/ # Auto-generated Go gRPC stubs
├── generate_certs.sh # mTLS CA + certificate generation script
├── docker-compose.yml # 5-node cluster orchestration
├── run_self_healing_cluster.sh # 30-node local bash demo (no Docker)
├── go.mod # Go module
└── go.work # Go workspace
## Quick Start (Docker — Recommended)
### Prerequisites
- Docker Desktop with Compose v2
- `make` (pre-installed on macOS/Linux)
- ~6 GB RAM free (Ollama needs ~4 GB for `dolphin-llama3`)
### One command — that's it
git clone https://github.com/BadBoy0170/p2p-sec-mesh
cd p2p-sec-mesh
make up
This starts:
| Container | Role |
|-----------|------|
| `coordinator` | Peer phonebook + web dashboard |
| `ollama` | Local LLM engine |
| `ollama-init` | One-shot model pull (`dolphin-llama3`), then exits |
| `node-{a..e}-go` | 5 Go peer nodes |
| `node-{a..e}-ai` | 5 Python AI sidecars |
### Open the dashboard
http://localhost:8090
The topology graph auto-updates every 5 seconds. The **AI Events** panel in the sidebar shows every threat detection in real time.
### Trigger an attack
# Attack a single node
make attack NODE=a
# Attack all 5 nodes simultaneously (stress-tests consensus voting)
make attack-all
### Watch the logs
make logs # all containers
make logs-node NODE=a # just node-a
Expected sequence:
node-a-go | [HONEYPOT] INTRUSION DETECTED on mock-SSH port 2222 from 172.20.0.1
node-a-ai | [LLM] Model returned threat_score=9
node-a-go | [SIDECAR] CRITICAL THREAT (9.0) — initiating self-quarantine broadcast
node-a-go | [GOSSIP] Broadcasting quarantine vote for to 4 peer(s)
node-b-go | [GOSSIP] Vote 1 for quarantine of (from )
node-b-go | [GOSSIP] QUARANTINING NODE — severing all connections
Node-a is isolated. Nodes B–E remain connected. Dashboard shows node-a glowing red.
### Check mesh status
make topology # current graph JSON
make report # AI threat event log
make status # container health
### Tear down
make down # stop containers
make clean # also remove images + volumes
### All Makefile targets
| Target | Description |
|--------|-------------|
| `make up` | Build + start everything |
| `make down` | Stop all containers |
| `make restart` | Rebuild and restart |
| `make logs` | Stream all logs |
| `make logs-node NODE=b` | Logs for a specific node |
| `make attack NODE=a` | Trigger honeypot on node-a |
| `make attack-all` | Attack all 5 nodes simultaneously |
| `make topology` | Print current mesh topology JSON |
| `make report` | Print AI threat event log |
| `make status` | Show container health |
| `make clean` | Remove containers, images, volumes |
## Quick Start (Local — No Docker)
### Prerequisites
- Go 1.22+
- Python 3.11+
- [Ollama](https://ollama.com) running locally with `dolphin-llama3` pulled
ollama pull dolphin-llama3
### Run the 30-node simulation
chmod +x run_self_healing_cluster.sh
./run_self_healing_cluster.sh
Open `http://localhost:8090`. The script boots 30 nodes, attacks node-14, waits for quarantine, then spins up a replacement replica automatically.
### Manual setup
# Terminal 1 — Coordinator
go run ./coordinator
# Terminal 2 — AI Sidecar
cd analyzer && python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
SIDECAR_PORT=50051 OLLAMA_URL=http://localhost:11434 python app.py
# Terminal 3+ — Peer nodes (repeat with different ports)
NODE_ID=node-a P2P_PORT=9001 LOCAL_MODE=true \
HONEYPOT_SSH_PORT=2222 SIDECAR_ADDR=localhost:50051 \
go run ./peer-node
## Configuration Reference
| Environment Variable | Default | Description |
|---------------------|---------|-------------|
| `COORDINATOR_URL` | `http://localhost:8090` | Coordinator REST endpoint |
| `COORDINATOR_PORT` | `8090` | Coordinator listen port |
| `P2P_PORT` | `9000` | Node's P2P gRPC listen port |
| `NODE_HOSTNAME` | `localhost` | Hostname advertised to peers |
| `NODE_ID` | — | Friendly name (for logging + dashboard) |
| `SIDECAR_ADDR` | `localhost:50052` | Python AI sidecar gRPC address |
| `SIDECAR_PORT` | `50052` | AI sidecar listen port |
| `LOCAL_MODE` | `false` | Skip mTLS (for local dev without certs) |
| `CERT_FILE` | `certs/peer.pem` | mTLS certificate |
| `KEY_FILE` | `certs/peer.key` | mTLS private key |
| `CA_FILE` | `certs/ca.pem` | Certificate Authority cert |
| `HONEYPOT_SSH_PORT` | `2222` | Mock SSH honeypot port |
| `HONEYPOT_HTTP_PORT` | `8888` | Mock HTTP honeypot port |
| `OLLAMA_URL` | `http://localhost:11434` | Ollama API endpoint |
| `OLLAMA_MODEL` | `dolphin-llama3` | LLM model for threat analysis |
## API Reference
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/` | GET | Web dashboard |
| `/healthz` | GET | Health check (`ok`) |
| `/register` | POST | Node self-registration |
| `/peers` | GET | All active peers |
| `/api/telemetry` | POST | Node status push (status, CPU, RAM, peer list) |
| `/api/topology` | GET | Current graph state for the dashboard |
| `/api/report` | GET | AI/rule-based event log (JSON) |
| `/api/report/submit` | POST | Nodes push threat decisions here |
## Design Decisions & Trade-offs
### PKI Architecture
This project uses **per-node PKI (F-01)**. Each node receives a unique certificate with its own SAN (Subject Alternative Name). This prevents a compromised node from spoofing another node at the TLS layer.
### Why Ed25519 over RSA/ECDSA?
Keys are smaller (32 bytes vs 256 bytes RSA), signing is ~3× faster than ECDSA-256, and it's immune to timing attacks due to constant-time design.
### Why majority consensus instead of unanimous vote?
A quarantined node that has already lost connectivity can't vote. Requiring unanimity would deadlock. Strict majority (> 50%) ensures liveness even when some peers are unreachable.
### Why a rule-based fallback?
LLMs can be unavailable (cold start, network issue). `HONEYPOT_TRIPPED` has zero false positive rate — there is no legitimate reason to connect to a honeypot port. The fallback ensures the mesh always responds, even without AI.
## Security Enhancements Implemented
- ~~Replace the shared `peer.pem` with per-node certificates (unique SANs per container).~~ **(F-01: Per-Node PKI implemented)**
- ~~Add certificate rotation via a proper PKI (e.g., Vault, cert-manager).~~
- ~~Use signed/authenticated coordinator responses to prevent MITM peer injection.~~ **(F-03: Signed Coordinator Responses + TOFU pinning implemented)**
- ~~Add reputation scoring to weight votes from long-established peers higher.~~ **(GAP-08 / F-05: Reputation-weighted consensus implemented)**
- ~~Implement vote replay prevention (nonce or monotonic timestamp counter).~~ **(GAP-01 / GAP-12: Timestamp expiry and rotating hash bucket implemented)**
- ~~Persist Ed25519 node identity across reboots.~~ **(GAP-06 / F-04: Persistent Node Identity implemented)**
- **(F-10: Distributed Audit Log)**: Every gossip event is now signed and logged to an append-only `audit.log` on every node for forensics.
标签:自定义脚本