neuregex/nodesafe
GitHub: neuregex/nodesafe
Stars: 3 | Forks: 2
# nodesafe
[](https://pypi.org/project/nodesafe/)
[](https://pypi.org/project/nodesafe/)
[](https://pypi.org/project/nodesafe/)
[](https://opensource.org/licenses/Apache-2.0)
[](https://github.com/neuregex/nodesafe/actions/workflows/ci.yml)
`nodesafe` scans third-party plugins/nodes before you install them in node-based workflow tools, detecting malicious code with a cascading pipeline that combines static analysis, signature matching, machine learning, and optional semantic analysis with an LLM. Starting point: the ComfyUI ecosystem.
## Why this exists
In **June 2024**, ComfyUI_LLMVISION stole browser credentials and crypto wallets from hundreds of users. In **April 2026**, a botnet compromised 1,000+ ComfyUI instances by auto-installing malicious nodes via the Manager. The custom_nodes ecosystem is large, fast-moving, and largely unverified.
`nodesafe` scans before you install.
## Quick start
pip install nodesafe
nodesafe scan /path/to/custom_node
Or directly without installing:
uvx nodesafe scan /path/to/custom_node
## How it works
A 9-layer cascading pipeline. Each layer more expensive than the previous. Most clean nodes pass in <100ms; only ambiguous cases escalate.
| Layer | Technique | Cost |
|-------|-----------|------|
| 0 | Hash matching against malware database | μs |
| 1 | Bloom filter of malicious URLs | μs |
| 2 | Aho-Corasick over dangerous patterns | ms |
| 3 | AST analysis + obfuscation detectors (chr-chain, split-concat, Shannon entropy, suspicious identifiers, Unicode homoglyph, nested decoder chains, file-level minification) | ms |
| 4 | Typosquatting + OSV vulnerability check | ms |
| 5 | Aggregate heuristic risk score (hand-calibrated; ML model pending dataset) | tens of ms |
| 6 | Anomaly detection (Isolation Forest + Autoencoder) | tens of ms |
| 7 | Semantic similarity (CodeBERT embeddings + FAISS) | hundreds of ms |
| 8 | LLM review (optional, local-first via Ollama) | seconds |
**Current state (v0.5.1):** Layers 0-5 functional and shipping on PyPI. 87 tests passing across Python 3.10–3.12 × Linux/macOS/Windows. Layer 3 includes 7 obfuscation detectors that catch char-code keyword construction, split-concat, high-entropy literals, suspicious identifier shapes, Unicode homoglyph attacks, nested decoder chains, and minified files. CLI supports `--batch` for scanning a parent directory with many nodes at once. Layers 6-8 in the M3-M4 roadmap.
## Features
- ✓ **Pure static analysis** — never executes scanned code
- ✓ **Zero telemetry by default** — this policy is immutable
- ✓ **Works offline** (after the first signature update)
- ✓ **Multiple output formats**: JSON, Markdown (SARIF coming in v0.6 for GitHub Code Scanning integration)
- ✓ **GitHub Action ready** — see the example workflow
- ✓ **Pre-commit hook ready** — for CI/CD of custom_nodes repositories
- ✓ **Local-first LLM analysis** — Ollama by default, cloud opt-in with BYO key
- ✓ **OSS Apache 2.0** — no freemium, no hidden SaaS, no paid whitelisting
## Usage
### Scan a directory
nodesafe scan /path/to/custom_node
### JSON output
nodesafe scan /path/to/custom_node --format json
### Only cheap layers (fast, no aggregate score)
nodesafe scan /path/to/custom_node --layers 0,1,2,3
### Batch mode (scan a whole `custom_nodes/` folder at once)
nodesafe scan ComfyUI/custom_nodes --batch
Emits a per-node verdict plus an aggregate "worst verdict" line. Use
`--format json` to get an array of per-node summaries for tooling.
### Update signatures
nodesafe update
### Verify installation
nodesafe doctor
## Retrospective analysis
Would nodesafe have detected the historical incidents? We apply the pipeline mentally to each case:
| Incident | Detection layer | Time | Verdict |
|----------|-----------------|------|---------|
| LLMVISION (Jun 2024) | Layer 2-3 | ~30-50ms | malicious 0.98 |
| Pickai (Mar-Jun 2025) | Layer 2-3 + 5-7 | ~100ms | malicious 0.92 |
| Mining botnet (Apr 2026) | Layer 2-3 + Manager gate | <50ms | malicious 0.95 |
Full analysis in [`docs/retrospective-analysis.md`](docs/retrospective-analysis.md).
## Honest limitations
`nodesafe` is **static analysis**, not a sandbox. Its limits:
- **It does not prevent upstream supply chain attacks** (a legitimate provider being compromised). It detects the malware when it is distributed in nodes, not the original compromise.
- **It is not a replacement for the Manager** — it is complementary; ideally integrated.
- **It does not monitor runtime behavior** — that is the job of an IDS/EDR.
- **False positives happen** — the policy is conservative, but every flag shows exactly what triggered the alert so you can decide.
## Configuration
`~/.config/nodesafe/config.toml` (optional — sane defaults):
[scanner]
default_layers = "0,1,2,3,4,5,6" # Layer 8 NOT included by default
fail_on = "suspicious"
[llm]
enabled = false # OFF by default. Conscious opt-in.
provider = "local" # local-first if enabled
[llm.local]
endpoint = "http://localhost:11434" # Ollama
model = "qwen2.5-coder:7b-instruct"
[telemetry]
enabled = false # ALWAYS false. Immutable policy.
## Roadmap
- **v0.5.x (shipped):** Layers 0-5 with obfuscation detectors + batch mode. Available now via `pip install nodesafe`.
- **v0.6 (next):** runtime-installation detector (catches nodes that pip-install or git-clone code at runtime, the April 2026 botnet vector) + SARIF output for GitHub Code Scanning.
- **v0.7 (M3):** Layer 6 anomaly detection (Isolation Forest + autoencoder over the feature extractor) once enough labeled samples have been collected to seed a baseline.
- **v0.9 (M3):** Layer 7 semantic similarity (CodeBERT embeddings + FAISS) for polymorphic variant matching.
- **v1.0:** Layer 8 LLM contextual review (local-first via Ollama, cloud opt-in) + first PR to ComfyUI-Manager so scans run before any install by default.
- **v1.5:** public threat report + consolidated community signature contributions.
- **v2+ (Year 2):** `.nodesafe` standard portable to other node-based ecosystems (LangFlow, n8n, Flowise).
Full plan in [`ARCHITECTURE.md`](ARCHITECTURE.md).
## Acknowledgments
Inspired by HuggingFace's `safetensors` push, [Snyk Labs' research](https://labs.snyk.io/resources/hacking-comfyui-through-custom-nodes/) on ComfyUI attack vectors, and the unfortunate work of [u/_roblaughter_](https://www.reddit.com/r/StableDiffusion/) who discovered LLMVISION at his own cost.
## License
Apache 2.0. See [LICENSE](LICENSE).
## Long-term vision
ComfyUI is the most urgent case, not the only one. The full category of node-based tools with executable plugins (LangFlow, Flowise, Node-RED, n8n, etc.) shares the same structural problem. In the long term, `.nodesafe` aspires to become a **portable manifest artifact** that any ecosystem can adopt — analogous to how `.safetensors` became the standard for ML model weights.
V2-V3 of the project formalizes the standard and works with maintainers of other ecosystems. Today, brutal focus on ComfyUI.