aharwelik/credsweep
GitHub: aharwelik/credsweep
Stars: 0 | Forks: 0

# credsweep
**Zero-dependency secret & cloud-key scanner — one tool, two native runtimes.**
[](https://github.com/aharwelik/credsweep/actions/workflows/ci.yml)
[](LICENSE)
[]()
[]()
[]()
`credsweep` finds leaked credentials, cloud keys, and private keys before they reach a
commit, a CI log, or a public repo. It ships as **two byte-for-byte equivalent
implementations** — a POSIX-friendly `bash` script and a cross-platform `PowerShell`
script — so the *same* scan runs on a developer's mac, a Linux CI runner, and a Windows
build agent with **nothing to install**. No Python, no Go binary, no `npm install`, no
network calls.
Most secret scanners are heavy (Docker images, language runtimes, cloud accounts).
`credsweep` is two files you can drop into any repo and read top to bottom in five minutes.
## Why this exists
- **Two runtimes, one ruleset.** Security teams live in PowerShell; app teams live in bash.
`credsweep` gives both the identical 15-rule detection engine and identical JSON/SARIF
output, so findings are comparable no matter who ran the scan.
- **CI-native.** Emits **SARIF 2.1.0**, so findings show up inline in the GitHub
*Security → Code scanning* tab with zero extra glue.
- **Pre-commit by design.** Non-zero exit on any finding; a ready-to-symlink git hook is
included.
- **Offline & auditable.** No telemetry, no API keys required. The optional AI triage step
is strictly opt-in.
## Detections (15 rules)
| Provider / type | Severity | Provider / type | Severity |
|---|---|---|---|
| AWS Access Key ID (`AKIA…`) | HIGH | AWS Secret Access Key | CRITICAL |
| GCP API key (`AIza…`) | HIGH | Google OAuth secret (`GOCSPX-`) | HIGH |
| GitHub token (`ghp_/gho_/…`) | HIGH | GitHub fine-grained PAT | HIGH |
| Slack token (`xox…`) | HIGH | Slack webhook URL | MEDIUM |
| Stripe secret key (`sk_live_`) | CRITICAL | OpenAI key (`sk-…`) | HIGH |
| Anthropic key (`sk-ant-…`) | HIGH | npm token (`npm_…`) | HIGH |
| Azure Storage connection key | CRITICAL | Azure client secret | HIGH |
| PEM private key block | CRITICAL | JWT | MEDIUM |
Plus an **opt-in entropy rule** (`--entropy` / `-Entropy`) that flags generic
`password=`/`token=`/`secret=` assignments only when the value's Shannon entropy clears a
threshold — cutting the false positives that make most generic scanners unusable.
## Install
git clone https://github.com/aharwelik/credsweep.git
cd credsweep
chmod +x credsweep.sh
That's it. Optionally symlink onto your `PATH`:
ln -s "$PWD/credsweep.sh" /usr/local/bin/credsweep
## Usage
**Bash**
./credsweep.sh . # scan current tree (human table)
./credsweep.sh src --format json # machine-readable JSON
./credsweep.sh . --format sarif > r.sarif
./credsweep.sh . --entropy # add high-entropy generic detection
./credsweep.sh . --no-fail # report but never break the build
./credsweep.sh . --exclude-dir fixtures --exclude-dir testdata
**PowerShell** (macOS / Linux / Windows)
./credsweep.ps1 . # scan current tree
./credsweep.ps1 src -Format json
./credsweep.ps1 . -Format sarif > r.sarif
./credsweep.ps1 . -Entropy
./credsweep.ps1 . -NoFail
### Try it on the demo fixtures
The repo ships a generator that writes a throwaway project full of (fake) secrets — no
secret-shaped literal is ever committed to git:
bash examples/generate-fixtures.sh # creates examples/leaky-project/ (gitignored)
./credsweep.sh examples # → 10 findings
### Example output
credsweep 1.0.0 — 10 finding(s) in examples
CRITICAL examples/leaky-project/config.example.env:3 aws-secret-access-key
aws_…****…EY
HIGH examples/leaky-project/config.example.env:2 aws-access-key-id
AKIA…****…LE
MEDIUM examples/leaky-project/config.example.env:9 slack-webhook
http…****…XX
Matched secrets are **redacted by default** (`first4…****…last2`). Pass `--show-secrets`
only when you truly need the raw value.
## CI integration (GitHub Actions)
`credsweep` ships a workflow (`.github/workflows/ci.yml`) that lints both scripts and runs
the scanner against generated fixtures on every push. To gate *your* repo on secret
leaks and surface them in the Security tab:
- name: Scan for secrets
run: ./credsweep.sh . --format sarif > credsweep.sarif
- uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: credsweep.sarif
## Pre-commit hook
ln -s ../../hooks/pre-commit .git/hooks/pre-commit
Now any `git commit` that would introduce a secret is blocked locally.
## Optional AI triage
For teams that want a written risk summary, `credsweep` can pipe its JSON to an LLM for
plain-English triage (which keys are highest risk, suggested rotation order). This is
**opt-in and offline-by-default** — see [`docs/ai-triage.md`](docs/ai-triage.md). The core
scanner never needs an API key.
## How it works
- A single rule table (`name | severity | case-flag | regex`) drives detection in both
implementations, so adding a provider means editing one row in each file.
- Binary files and noisy directories (`.git`, `node_modules`, `vendor`, `dist`, `target`,
`.terraform`, …) are skipped automatically.
- The entropy gate uses a from-scratch Shannon-entropy calculation (no libraries) so the
generic rule fires on real secrets, not on long-but-low-entropy strings like URLs.
## Limitations (honest notes)
- Regex + entropy detection is high-signal but not exhaustive — treat a clean scan as
"no *known patterns* found," not a proof of safety.
- Provider prefixes overlap (e.g. `sk-ant-…` also matches the broad `sk-…` OpenAI rule),
so a single Anthropic key may be reported by two rules. Both say the same thing: rotate it.
- The entropy threshold (3.5 bits/char) is tuned for typical secrets; lower it for stricter
scanning.
## Author
**Anthony Harwelik** — founder, **Sole Priority LLC** / **BlueTech Green**.
Security & AI tooling, automation, and cloud engineering.
- Email: **aharwelik@gmail.com**
- Web: **https://bluetechgreen.com**
- GitHub: **[@aharwelik](https://github.com/aharwelik)**
Open to consulting and collaboration on security automation and AI-assisted DevSecOps.
## License
[MIT](LICENSE) © Anthony Harwelik