BrandonRoos/hashdetect

GitHub: BrandonRoos/hashdetect

Stars: 0 | Forks: 0

# hashdetect A command-line tool that identifies common hash types by their length and structure, ranks the candidates by confidence, and can export results as JSON. Given an unknown hash, `hashdetect` tells you what it's most likely to be — and, because many hash types share the same shape, it shows *all* plausible matches ranked by how common each type is in the wild. It also prints the [hashcat](https://hashcat.net/hashcat/) mode and [John the Ripper](https://www.openwall.com/john/) format for each match, so you can move straight from identification to cracking workflows. ## Features - Detects 9 common hash types: MD5, SHA-1, SHA-224, SHA-256, SHA-384, SHA-512, NTLM, MD4, and bcrypt. - Ranks ambiguous matches by confidence (e.g. a 32-character hex string could be MD5, NTLM, or MD4 — all three are shown, highest-likelihood first). - Human-readable output by default; machine-readable JSON with `--json`. - Accepts a single hash, a file of hashes, or piped input from stdin. - Prints hashcat mode and John the Ripper format for each match. - Proper exit codes for use in shell scripts. ## Installation Requires Python 3.10 or newer. # Clone the repository git clone https://github.com/BrandonRoos/hashdetect.git cd hashdetect # Create and activate a virtual environment python -m venv .venv # Windows (PowerShell): .\.venv\Scripts\Activate.ps1 # macOS / Linux: source .venv/bin/activate # Install dependencies (only needed to run the tests) pip install -r requirements.txt The tool itself uses only the Python standard library, so no dependencies are required just to run it. ## Usage ### Identify a single hash python -m hashdetect 5f4dcc3b5aa765d61d8327deb882cf99 Possible matches for 5f4dcc3b5aa765d61d8327deb882cf99: - MD5 (confidence 60%, length 32, hashcat 0, john raw-md5) - NTLM (confidence 30%, length 32, hashcat 1000, john nt) - MD4 (confidence 10%, length 32, hashcat 900, john raw-md4) ### Read hashes from a file One hash per line: python -m hashdetect -f hashes.txt ### Read from stdin (pipe) # macOS / Linux cat hashes.txt | python -m hashdetect # Windows PowerShell type hashes.txt | python -m hashdetect ### JSON output Add `--json` to any of the above for structured output: python -m hashdetect 5f4dcc3b5aa765d61d8327deb882cf99 --json [ { "input": "5f4dcc3b5aa765d61d8327deb882cf99", "matches": [ { "name": "MD5", "confidence": 0.6, "length": 32, "hashcat_mode": 0, "john_format": "raw-md5" }, { "name": "NTLM", "confidence": 0.3, "length": 32, "hashcat_mode": 1000, "john_format": "nt" }, { "name": "MD4", "confidence": 0.1, "length": 32, "hashcat_mode": 900, "john_format": "raw-md4" } ] } ] ### Help python -m hashdetect --help ## How it works Detection happens in two ways: 1. **Structural matching.** Hashes with a distinctive shape — like bcrypt's `$2b$12$...` format — are matched by a regular expression that captures their exact structure. These matches are unambiguous. 2. **Length and character-set matching.** Most raw hashes are just hex strings of a fixed length. A 64-character hex string, for example, could be SHA-256, but also SHA3-256, BLAKE2s, and others. `hashdetect` returns every signature that fits and ranks them. Confidence is computed from a *prevalence* score assigned to each hash type (how often it appears in practice). Each match's confidence is its prevalence divided by the total prevalence of all matching types, so the scores for a given input always sum to 100%. ## Exit codes | Code | Meaning | |------|---------| | 0 | At least one match found (or JSON mode, which always exits 0) | | 1 | No known hash type matched the input (text mode) | | 2 | No input provided (no hash, no `-f`, no piped stdin) | ## Limitations `hashdetect` identifies hashes by **shape, not content**. It cannot verify that a string is genuinely a hash of a given type — only that it *could* be, based on length and pattern. A 32-character hex string is reported as a possible MD5 because it has the right form, not because the tool has confirmed it was produced by MD5. Treat the output as a ranked set of hypotheses, not a definitive answer. ## Running the tests pytest ## Disclaimer This tool is intended for legitimate security work — penetration testing, forensics, CTFs, and education — on systems and data you own or are authorized to test. You are responsible for complying with all applicable laws. ## License This project is licensed under the MIT License — see the [LICENSE](LICENSE) file for details.