whitespots/Whitespots-AI-SAST
GitHub: whitespots/Whitespots-AI-SAST
Stars: 1 | Forks: 0
# Whitespots AI SAST Docker Scanner
Lightweight CLI-first AI SAST scanner that runs from one Docker container and emits structured JSON.
This is intentionally not a platform: no database, queue, dashboard, workers, or cloud services.
For these purposes we have another self-hosted platform https://docs.whitespots.io
## Build
Build the CI image from the public GitHub repository:
docker build -t whitespots-ai-sast .
By default, the CI Dockerfile clones `https://github.com/whitespots/Whitespots-AI-SAST.git` at `main`. Build another branch or tag with:
docker build \
--build-arg SAST_REF=main \
-t whitespots-ai-sast .
Build from the current local checkout with the development Dockerfile:
docker build -f Dockerfile.dev -t whitespots-ai-sast:dev .
Use `Dockerfile.dev` while testing unpushed local changes. The default `Dockerfile` always clones GitHub, so it only includes changes already available in the selected remote branch or tag.
On first run, the container pulls the requested Ollama model if it is not already present. For fully offline local-model scanning, preload an Ollama model into the image at build time:
docker build \
--build-arg PRELOAD_MODEL=true \
--build-arg OLLAMA_MODEL=qwen2.5-coder:0.5b \
-t whitespots-ai-sast .
Avoid `PRELOAD_MODEL=true` on small CI runners; model layers are large. Prefer mounting `/models` at runtime or letting the first scan pull the model.
Pin the Ollama runtime image if CI needs reproducible runtime builds:
docker build \
--build-arg OLLAMA_IMAGE=ollama/ollama:0.11.4 \
-t whitespots-ai-sast .
You can also mount an existing Ollama model store at `/models` instead of baking the model into the image.
Large first-run pulls can take several minutes. Override the pull timeout if needed:
docker run --rm \
-e AI_SAST_PULL_TIMEOUT_MS=3600000 \
-v "$(pwd):/scan" \
whitespots-ai-sast scan /scan --local --local-model qwen2.5-coder:0.5b
## Run Local Offline Scan
The Docker image exposes `/usr/bin/whitespots-ai-sast`, so the tool runs as:
whitespots-ai-sast
docker run --rm \
-v "$(pwd):/scan" \
whitespots-ai-sast scan /scan --local --local-model qwen2.5-coder:0.5b
Scan a subdirectory:
docker run --rm \
-v "$(pwd):/scan" \
whitespots-ai-sast scan /scan/project --local
Write JSON to a file:
docker run --rm \
-v "$(pwd):/scan" \
whitespots-ai-sast scan /scan --local -o /scan/sast-results.json
Use scan depth to control how much of large repositories is analyzed:
whitespots-ai-sast scan ./repo --scan-mode fast
whitespots-ai-sast scan ./repo --scan-mode balanced
whitespots-ai-sast scan ./repo --scan-mode deep
`deep` is the default and scans every supported source file. `balanced` scans the 50 most interesting files. `fast` scans only the 10 most interesting files.
Large files are skipped by default using `--max-file-bytes`:
whitespots-ai-sast scan ./repo --skip-large-files --max-file-bytes 750000
The scanner ignores common generated and dependency paths by default, including `dist/`, `build/`, `vendor/`, `*.min.js`, and `*.lock`.
The scanner always returns:
{
"results": []
}
Each finding includes `title`, `severity`, `description`, `remediation`, `details`, `file_path`, `line_number`, and `code_snippet`.
## External Providers
External providers are optional and run only when selected:
whitespots-ai-sast scan ./repo \
--provider openai \
--model gpt-4.1 \
--api-key "$OPENAI_API_KEY"
Or like:
docker run --rm \
-v ai-sast-ollama:/models \
-v "$(pwd):/scan" \
whitespots-ai-sast scan /scan \
--provider grok \
--model grok-4.20-0309-non-reasoning \
--api-key "xai-....." \
-o /scan/sast-results.json
Supported provider names:
- `local`
- `openai`
- `anthropic`
- `llama-remote`
- `grok`
API keys are never logged. Prefer environment variables for keys:
- `OPENAI_API_KEY`
- `ANTHROPIC_API_KEY`
- `LLAMA_REMOTE_API_KEY`
- `GROK_API_KEY`
## Scan Behavior
The scanner:
- Recursively scans files and directories
- Reads `.gitignore`
- Skips binaries and large files
- Detects supported languages by extension
- Uses Tree-sitter where available to identify code symbols
- Ranks security-sensitive regions before AI analysis
- Chunks code by nearby risky sinks and sources
- Deduplicates and validates model findings
Supported languages include JavaScript, TypeScript, Python, Go, Java, PHP, Ruby, Rust, C#, C/C++, and Kotlin.
## Modes
whitespots-ai-sast scan ./repo --mode conservative
whitespots-ai-sast scan ./repo --mode balanced
whitespots-ai-sast scan ./repo --mode aggressive
`conservative` filters harder for concrete evidence. `aggressive` sends more code regions to the model.
For Docker Desktop CPU-only runs, start with `qwen2.5-coder:0.5b`. If you have enough memory and want better findings, try `qwen2.5-coder:1.5b`. Larger tags such as `qwen2.5-coder` may require increasing Docker memory.
Useful CPU tuning options:
whitespots-ai-sast scan ./repo --local \
--local-model qwen2.5-coder:0.5b \
--ctx-size 2048 \
--timeout-ms 600000
Optional second-pass validation:
whitespots-ai-sast scan ./repo --second-pass
标签:自动化攻击