k3rt4s/email-evidence-tools

GitHub: k3rt4s/email-evidence-tools

Stars: 0 | Forks: 0

# email-evidence-tools Python utilities for processing, reducing, scanning, and labeling email archives in mbox or IMAP form. Suitable for security-operations triage of an exported mailbox (phishing, exfiltration, policy violations), internal investigations, incident response, or legal evidence review. Designed for streaming and resume-on-failure so they handle multi-gigabyte archives without blowing memory or losing progress on a network disconnect. **Author:** Jon Bowker **Requires:** Python 3.10+. `pip install -r requirements.txt`. ## Scripts | Script | Purpose | | ----------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `extract_messages_by_address.py` | Stream-scans one or more mbox files and extracts every message where a given address (or domain substring) appears in From/To/Cc/Bcc/Reply-To/Sender/Delivered-To. Outputs a filtered mbox + index CSV. Deduped by Message-ID. Byte-offset checkpoint for resume on large archives. | | `render_mbox_to_markdown.py` | Renders an mbox as a chronological Markdown evidence document with full forensic headers, plain-text body, attachment manifest (each file extracted to disk and hashed). | | `scan_mbox_for_evidence.py` | Scans an mbox file and extracts messages matching configurable evidence keyword categories. | | `label_matching_emails_via_imap.py` | Connects to an IMAP mailbox and applies a label/folder to messages whose address domains match configured domains. | | `strip_attachments_from_mbox.py` | Creates an attachment-free mbox copy and writes an attachment inventory CSV. | | `clean_evidence_csv.py` | Cleans text fields in evidence CSV output by removing HTML tags and normalizing whitespace. | ## Usage python extract_messages_by_address.py --mbox-file "" --address "someone@example.com" python render_mbox_to_markdown.py --mbox-file "" --output-dir "" python scan_mbox_for_evidence.py --mbox-file "" --output-file "evidence_hits.csv" python strip_attachments_from_mbox.py --input-mbox "" python clean_evidence_csv.py --input-file "evidence_hits.csv" --output-file "evidence_hits_clean.csv" python label_matching_emails_via_imap.py --domains "example.com,example.org" --target-label "Labels/Evidence" `extract_messages_by_address.py` accepts multiple `--mbox-file` arguments and treats `--address` as a case-insensitive substring, so passing `@example.com` matches every address at that domain. All scripts also accept their inputs via environment variables for automation; see each script's docstring for the supported variables. ## Data hygiene These tools operate on user-provided email archives that may contain PII, credentials, or sensitive correspondence. Treat the repository as code-only: - Do not commit mbox files, generated CSVs, attachment inventories, checkpoints, or `.env` files. The included `.gitignore` excludes these. - Pass inputs and outputs through command-line arguments or environment variables; never hard-code addresses, domains, or labels into the scripts. - For long-running jobs against large archives, output to a directory outside the repository so accidental commits cannot leak data. ## Structure email-evidence-tools/ ├── clean_evidence_csv.py ├── extract_messages_by_address.py ├── label_matching_emails_via_imap.py ├── render_mbox_to_markdown.py ├── scan_mbox_for_evidence.py ├── strip_attachments_from_mbox.py ├── requirements.txt └── README.md