gammahazard/vision-labs-v2

GitHub: gammahazard/vision-labs-v2

Stars: 1 | Forks: 0

# Vision Labs — AI-Powered Security Camera System [![Tests](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/0f632f1549162830.svg)](https://github.com/gammahazard/vision-labs-v2/actions/workflows/tests.yml) [![Publish images](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/55e657d6a2162832.svg)](https://github.com/gammahazard/vision-labs-v2/actions/workflows/publish-images.yml) ![License](https://img.shields.io/badge/license-MIT-blue) ![Python](https://img.shields.io/badge/python-3.11-3776AB?logo=python&logoColor=white) ![FastAPI](https://img.shields.io/badge/FastAPI-0.136-009688?logo=fastapi&logoColor=white) ![Redis](https://img.shields.io/badge/Redis-7-DC382D?logo=redis&logoColor=white) ![Docker](https://img.shields.io/badge/Docker-Compose-2496ED?logo=docker&logoColor=white) ![CUDA](https://img.shields.io/badge/CUDA-12.8-76B900?logo=nvidia&logoColor=white) ![Ollama](https://img.shields.io/badge/Ollama-Qwen%203%2014B-000?logo=ollama&logoColor=white) ![YOLO](https://img.shields.io/badge/YOLO-v8-00FFFF) ![InsightFace](https://img.shields.io/badge/InsightFace-buffalo__l-FF6F00) A self-hosted, multi-camera AI security platform that processes live RTSP feeds through person detection, face recognition, vehicle tracking, and an LLM-powered chat assistant — all running locally via Docker Compose with zero cloud dependencies. Built and tested on a dual-GPU workstation (RTX 5070 Ti + RTX 3090) running Ubuntu 24.04 inside WSL2 on Windows. Single-GPU works fine too — defaults are tuned for an 8–12 GB card; tiers are available for smaller and larger rigs. ## Screenshots **Multi-camera home grid** — live feeds, recent activity across every camera, enrolled faces, conditions panel. ![Dashboard home](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/f464ef1fd6162850.png) **Per-camera detail view** — full controls, drawable zones, action labels, sticky face identities. ![Per-camera detail](https://raw.githubusercontent.com/gammahazard/vision-labs-v2/main/docs/images/detailed-view-cam1.png) | Cameras admin | Face enrollment | DVR playback | |:-:|:-:|:-:| | ![Cameras](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/365bc0498b162858.png) | ![Enrol](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/fae0fae794162902.png) | ![DVR](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/4e67bce3ce162908.png) | | Add/edit/pause cameras, ONVIF scan, per-detector toggles | 5-angle multi-pose enrollment wizard | Date + camera filter, click any segment to play | | AI assistant | Telegram alerts | Bot commands | |:-:|:-:|:-:| | ![AI chat](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/905c52023d162911.png) | ![Telegram alerts](https://raw.githubusercontent.com/gammahazard/vision-labs-v2/main/docs/images/telegram-1.png) | ![Telegram bot UI](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/be5fa6c791163000.png) | | Qwen 3 14B + 19 tools, with DVR deep-links | Real-time push with snapshot from each camera | `/snapshot`, `/clip`, `/ask`, `/who`, `/events`, ... | | Grafana + Prometheus | Container monitoring | |:-:|:-:| | ![Grafana](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/a103c3c7b4163007.png) | ![Containers](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/31147d2a2e163011.png) | | Live GPU, Redis, inference-time metrics, opened from the System Monitor tab | Every service's state + uptime in one view, deep-link to Portainer | ### Setup walkthrough

Setup wizard walkthrough

The first-run wizard handles hardware detection, GPU tier recommendation, ONVIF camera discovery, location + retention, and Telegram pairing — about 90 seconds end-to-end. ### Live metrics

Grafana System Monitor with live inference, VRAM, and stream metrics

Grafana — bound to loopback and opened from the dashboard's System Monitor tab — pulls from a Prometheus scrape of every service in the stack: YOLO inference latency per camera, Redis stream lengths, VRAM usage, GPU utilization + temperature + power, and Redis op rate. Captured here on real cam1/cam2 traffic. ### Optional: Locate tool Open-vocabulary visual grounding — drop any image into the AI tab, describe what to find in plain English, and get it back with boxes drawn (downloadable). Powered by `nvidia/LocateAnything-3B`. | Drop an image + a prompt | Get the boxes back | |:-:|:-:| | ![Locate prompt](https://static.pigsec.cn/wp-content/uploads/repos/2026/05/1474bcbdd1163025.png) | ![Locate result](https://raw.githubusercontent.com/gammahazard/vision-labs-v2/main/docs/images/locate-result.png) | **Opt-in + non-commercial.** This is *not* part of the default stack — the model is under NVIDIA's research/non-commercial license. The published `locate-anything` image contains **only our code + public deps** (the model weights are **not** bundled — they download from NVIDIA's HF Hub at runtime, where you accept NVIDIA's license), so anyone can enable it. Pull it (or build from source) and bring it up with the overlay: docker compose -f docker-compose.yml -f docker-compose.locate.yml pull locate-anything docker compose -f docker-compose.yml -f docker-compose.locate.yml up -d locate-anything dashboard ## What you get - **Person + face + vehicle detection** on every camera in real time (YOLOv8s-pose, InsightFace, YOLOv8s) - **Vehicle attribute crops + classification** — per-track HD reservoir of crops, browseable per-drive-by. Optional ConvNeXt-Tiny classifier (color + body + make + model) fills the attributes block at flush time; gated by `ENABLE_CLASSIFIER`, auto-enabled when any cam toggles `detect_vehicle_attributes` on. Weights lazy-download from HF Hub on first vehicle event. - **Per-operator label + retrain workflow** — label tracks via the Browse crops modal (color/body dropdowns + make/model autocomplete), prune bad crops or delete unsalvageable tracks, then fine-tune the color + body heads on your own labels via `scripts/vehicle_attributes/retrain_attributes.py`. The script holds out 20% val and refuses to deploy on regression. - **AI scene descriptions** on every Telegram alert (MiniCPM-V vision LLM) - **19-tool AI assistant** (Qwen 3 14B) — query events, capture live snapshots (with auto vision-model description), set reminders, find DVR segments — all multi-camera aware - **DVR recording** — 1-hour MPEG-TS segments, browseable through the dashboard with date + camera filters - **Drawable zones** with per-time-of-day alert rules (always / night-only / log / ignore / dead zone) - **Up to 20 cameras** out of the box (symmetric `cam1`–`cam20` slots, orchestrator-managed). The real cap is GPU VRAM, not the slot count — a 16 GB card with AI chat off comfortably handles 7+ cameras at 'n' detector models; a 24 GB 3090 with chat off pushes 12+. The wizard estimates a number for your hardware. - **Prometheus + Grafana monitoring** — linked from the dashboard's System Monitor tab (Grafana is bound to `127.0.0.1` for security; reach it on the host or via an SSH tunnel) - **Optional open-vocabulary "Locate" tool** — drop an image in the AI tab, type what to find ("vehicle", "license plate"), get it back with boxes drawn. Powered by `nvidia/LocateAnything-3B`. **Opt-in** via `docker-compose.locate.yml`; the model is **non-commercial** (NVIDIA License) and downloads at runtime — see [below](#optional-locate-tool) ## Architecture at a glance flowchart LR Cam["📷 RTSP cameras
cam1 – cam20"] --> Ing["camera-ingester
(one per slot)"] Ing -->|frames stream| R[(Redis Streams
bus)] R --> Pose["pose-detector
YOLOv8s-pose"] R --> Veh["vehicle-detector
YOLOv8s"] R --> Face["face-recognizer
InsightFace buffalo_l"] R --> Rec["recorder
1h MPEG-TS"] Pose & Veh & Face -->|detections| R R --> Track[tracker] Track -->|events| R Track -.vehicle_sample.-> VA["vehicle-attributes
per-track HD crops + classifier"] VA --> R R --> Dash["dashboard
FastAPI + WebSocket"] Dash <-->|chat + 19 tools| Oll["Ollama
Qwen 3 14B + MiniCPM-V"] Dash --> Web["🌐 Browser"] Dash --> Bot["📲 Telegram bot"] Dash -.cameras:events pub/sub.-> Orch["orchestrator
(holds Docker socket)"] Orch -.docker compose --profile camN up/down.-> Ing The dashboard never touches the Docker socket — only the orchestrator does. Adding a camera in the UI is just an upsert into `cameras:registry` + a pub/sub trigger; the orchestrator reconciles and brings the slot's services up via `docker compose --profile camN up -d`. Bounding the Docker permission to a single service is the most important security choice in this design. For the full data flow and per-service responsibilities, see **[DETAILED_README.md §3](DETAILED_README.md#3-service-map)** and **[CONTEXT.md](CONTEXT.md)**. ## Quick install ### Linux (Ubuntu 22.04 / 24.04, Debian 12) git clone https://github.com/gammahazard/vision-labs-v2 vision-labs && cd vision-labs bash scripts/install-linux.sh # pulls pre-built images from GHCR (~3-5 min) # OR, if you've forked + modified the code: bash scripts/install-linux.sh --build # builds locally (~10-15 min) The script installs Docker + nvidia-container-toolkit, pulls (or builds) all images, starts the stack, and tells you when the dashboard is up. Idempotent — safe to re-run. Pin a specific release with `IMAGE_TAG=v0.1.1 bash scripts/install-linux.sh`. ### Windows 11 (with NVIDIA GPU) # In an ELEVATED PowerShell (right-click -> Run as administrator): powershell.exe -ExecutionPolicy Bypass -File .\scripts\install-windows.ps1 The script installs WSL2, writes a `.wslconfig` with mirrored networking, then prompts a reboot. After reboot, open the new Ubuntu terminal and run `bash scripts/install-linux.sh` from your cloned repo to finish. ### macOS Not supported — the inference pipeline is CUDA-bound. Run Vision Labs on a Linux box or Windows + NVIDIA, then access the dashboard from your Mac via LAN. ### After install Open `http://localhost:8080`, log in with `admin/admin` (forced password rotation on first login), then walk through the setup wizard: GPU probe → recommended hardware tier → add your first camera (ONVIF scan or manual RTSP URL) → done. ## Requirements - **NVIDIA GPU** with driver supporting CUDA 12.8 (R555+). 6 GB VRAM minimum (small tier); 12+ GB recommended for AI chat. Apple Silicon / Intel iGPU / AMD GPU are not supported. - **Docker Engine** (not Docker Desktop). On Windows, runs inside WSL2 Ubuntu — the installer sets this up for you. - **RTSP-capable IP camera** (tested with Reolink RLC-1240A). ONVIF auto-discovery works with Reolink, Hikvision, Dahua, Amcrest, Axis, Unifi G-series. DIY RTSP setups (Pi + mediamtx, go2rtc, OBS) work via manual URL entry. - **QNAP NAS** is optional — only needed if you want to offload DVR recordings to network storage. ## URLs after install | Service | URL | Notes | |---|---|---| | Dashboard | http://localhost:8080 | Main UI, LAN-reachable. `admin/admin` on first run. | | Portainer | https://localhost:9443 | Docker management UI — **host-only** (bound to `127.0.0.1`). From another machine: `ssh -L 9443:localhost:9443 `. | | Grafana | http://localhost:3000 | System metrics — **host-only** (bound to `127.0.0.1`); linked from the dashboard's System Monitor tab. Tunnel: `ssh -L 3000:localhost:3000 `. | | Prometheus | http://localhost:9090 | Raw metrics — host-only (bound to `127.0.0.1`). | ## Learn more - **[DETAILED_README.md](DETAILED_README.md)** — feature deep-dive, manual setup, env vars, dashboard pages, Telegram commands, backup/restore, hardware tiers - **[CONTEXT.md](CONTEXT.md)** — full project context for developers: service-by-service responsibilities, Redis schema, orchestrator behavior, gotchas - **[ARCHITECTURE.md](ARCHITECTURE.md)** — architectural reasoning: why services are split this way, Redis bus design, GPU placement - **[docs/history/](docs/history/)** — historical planning docs (MANUAL_SETUP, PHASES, REFACTOR_PLAN, PACKAGING_PLAN) ## Scope + status This is a **personal / portfolio project**, not a productized product. It runs in the author's home on a dual-GPU workstation and is documented to the level of "another developer could stand it up." A few things to set expectations: - **LAN-only by design.** The dashboard is meant to be reached at `http://:8080` from devices on the same network. There is no built-in TLS terminator; if you want to expose it to the internet, put it behind a reverse proxy you trust (Caddy, nginx-proxy-manager, Cloudflare Tunnel) and set `DASHBOARD_BEHIND_TLS=true` so session cookies flip to `Secure`. - **Single-user authentication.** One admin account, bcrypt-hashed password, HMAC-signed session cookies, brute-force-rate-limited login. No team/role model. Good enough for self-hosted home use; not designed for multi-tenant. - **No support, no warranty.** MIT licensed (see [LICENSE](LICENSE)) — feel free to fork, learn from, or repurpose. Issues filed will be read but not necessarily fixed. ### What's optional The system is overbuilt on purpose so the dual-GPU host has work to do. Several pieces can be removed cleanly: - **AI chat (Qwen 3 14B + 19 tools)** — set `CHAT_MODEL=` (empty) in `.env`. Frees ~10 GB VRAM. The dashboard shows "AI chat disabled on this tier" and the rest of the system works unchanged. - **MiniCPM-V vision scene descriptions** — set `VISION_MODEL=`. Saves ~5 GB VRAM. Telegram alerts still fire; they just don't include the auto-generated "person in a black hoodie walking left" sentence. - **Telegram alerts** — leave `TELEGRAM_BOT_TOKEN` blank. The notification path gates on `is_configured()` and silently no-ops. - **QNAP NAS** — opt-in via `QNAP_ENABLED=true` + the overlay compose file. Default install keeps recordings local. ### Known limitations - macOS is not supported (CUDA-only inference). - The orchestrator service has access to the Docker socket — it's the only one. The dashboard does not. - The setup wizard's GPU probe is best-effort; on very new cards (Blackwell) verify CUDA 12.8+ before relying on auto-tier selection. ## License [MIT](LICENSE) — see the LICENSE file for full text.