abcreativ/questio

GitHub: abcreativ/questio

一款离线的法医级 PDF 审计 CLI 工具,通过多维度结构特征为文档生成 0-100 信任评分以识别篡改。

Stars: 0 | Forks: 0

# 用于检测篡改文档的法医 PDF 审计工具 Edit a PDF in Word, LibreOffice, Acrobat, Preview, Sejda, Foxit, or anything else, and you leave traces behind. Producer strings that shouldn't be there. Font tags reset mid-document. End-of-file markers stacked on top of each other. questio reads all of it and scores the document 0 to 100. It runs on your laptop. Nothing uploads, ever. Generic mode works on any PDF with zero setup. Train it once on a genuine bank statement, invoice, or contract you already hold, and questio gets surgical: it flags the exact object in the next forgery that does not match. No cloud. No shared registry. No vendor asking you to trust them. That includes us. ## 工作原理 ### 1. 编辑者和制作方检测 Every PDF carries metadata saying what tool made it. Microsoft Word, LibreOffice, Canva, Apple Pages, and Google Docs are uncommon for official documents from banks, utilities, HMRC, or enterprise software. questio flags them on sight. Adobe Acrobat, Foxit, PDF-XChange, Sejda, and Nitro are editors, not generators. If one shows up in the metadata of a bank statement, somebody opened the file and saved it again. **Translation:** if your landlord sends a "bank statement" whose producer field says LibreOffice, that is the tool noticing. ### 2. 字体会话追踪 Every time a PDF is saved, embedded fonts get a random 6-character tag (for example `ABCDEF+Helvetica`). A clean single-session document has one set of tags. A document edited and re-saved has two. Three save passes leave three sets. This trace survives flattening and is the most reliable signal questio has. **Translation:** if someone added a fake line to an invoice, the fake line often carries a different font tag than the rest. questio catches this even when the eye cannot. ### 3. 版本历史映射 PDFs can be saved incrementally, leaving a literal history of every change stacked inside the file. questio counts the end-of-file markers and, when it finds more than one, tells you which revision modified which page and which objects. **Translation:** three end-of-file markers means three save passes. A clean original has exactly one. ### 4. Acrobat 编辑签名 Adobe Acrobat's Edit Text tool leaves a particular kind of font object behind (CIDFont). Microsoft Word, Adobe InDesign, Google Chrome, LibreOffice, and most other generators never write these. If one appears in a document that claims to come from a generator that does not use them, Acrobat touched the file. **Translation:** this is the surgical catch. When it fires, the forger almost certainly used Acrobat's Edit Text tool. ### 5. 内容流指纹识别 Every generator has a drawing handwriting: coordinate precision, operator style, font reference patterns. questio learns these dimensions as a fingerprint and compares them against a stored issuer profile. **Translation:** if a document claims to come from InDesign but draws text the way Acrobat does, it was edited. ## questio 拒绝做什么 - **No cloud.** There is no hosted version, no "questio pro", no API key. Your documents never leave your machine. - **No shared fingerprint registry.** Fingerprints stay on your laptop. A public registry would be a honeypot for forgers: upload a bad fingerprint, make real documents look fake, or vice versa. - **No court admissibility claims.** questio produces evidence and a score. It does not produce verdicts. Admissibility is your lawyer's job. - **No GUI.** A GUI invites verdict-seeking users. questio's whole point is that you do the thinking. - **No telemetry.** Nothing is logged, uploaded, or phoned home. Verify with the source if you do not trust the binary. ## 本地优先:为什么 questio 从不上传您的 PDF Most PDF-checking tools ask you to upload the file. That creates two problems questio refuses to inherit: 1. **Your sensitive documents reach a third party.** Bank statements, legal contracts, payslips, medical records. Even if the service promises to delete them, you have no way to verify. 2. **A central fingerprint database is a liability.** Anyone who contributes a fingerprint can poison the registry. A malicious actor could upload a fingerprint that makes forged documents score as genuine, or the reverse. questio sidesteps both by running entirely on your laptop. Your fingerprints live in `~/.questio/fingerprints/` and never leave. You decide which documents are genuine by training the tool on files you verified yourself. The trust chain starts and ends with you. ## 安装 ``` pip install questio ``` Or, if you want the latest main branch: ``` pip install git+https://github.com/abcreativ/questio.git ``` **Requires**: Python 3.10+, [exiftool](https://exiftool.org/) ``` brew install exiftool # macOS apt install libimage-exiftool-perl # Debian/Ubuntu ``` ## 用法 ### 审计 PDF(默认) ``` questio invoice.pdf ``` **Exit codes** | Code | Meaning | |------|---------| | `0` | Authentic: trust score >= 80 | | `1` | Suspicious: trust score 50 to 79 | | `2` | Likely tampered: trust score < 50 | | `3` | Error (file not found, exiftool missing, etc.) | **Options** ``` --json Machine-readable JSON output --verbose, -v Full structural dump (font details, content stream fingerprint) --issuer NAME Verify against a stored issuer fingerprint ``` ### 学习发行方指纹 ``` questio learn --issuer three-uk genuine-bill.pdf ``` Run on several known-good samples from the same issuer to build a reliable fingerprint. Each `learn` call adds to the existing profile rather than replacing it. ### 验证已知发行方 ``` questio verify --issuer three-uk suspect-bill.pdf ``` Compares the suspect document's structural dimensions against the stored fingerprint and reports per-dimension confidence scores and any deviations. ### 直接比较两个 PDF ``` questio compare reference.pdf suspect.pdf ``` Side-by-side structural diff: font prefixes, content stream fingerprint, object counts, metadata fields. Useful when you have a confirmed genuine copy to compare against. ## 输出示例 ``` $ questio invoice.pdf ================================================================ PDF AUDIT -- invoice.pdf ================================================================ Trust Score: 83/100 ████████████████████████████████░░░░░░░░ Verdict: LIKELY AUTHENTIC Creator: Adobe InDesign 19.0 Producer: Adobe PDF Library 15.0 Created: 2024-03-15T14:22:31+00:00 Modified: 2024-03-15T14:22:35+00:00 Pages: 1 Revisions: 1 (%%EOF markers) Objects: 47 Size: 84 KB Revision Map ---------------------------------------------------------------- Rev 1: 84.2 KB, 47 objects [original] Findings ---------------------------------------------------------------- v Producer Adobe PDF Library 15.0 (trusted) v Font subsets 3 subsets, consistent prefix ABCDEF v Revisions Single revision -- no incremental updates ! Timestamp gap 4s create->modify (low confidence -- sub-minute) v Content stream Matches Adobe InDesign profile v Font types TrueType only -- consistent with declared tool v Object count 47 objects -- within expected range v File structure Cross-reference table valid v Version PDF 1.6 -- consistent with producer v Coord precision 3 decimal places -- matches Adobe profile ``` ## 指纹格式 Fingerprints are stored as JSON at `~/.questio/fingerprints/.json`. Each file records the issuer name, number of training samples, and arrays of observed values plus min/max ranges for structural dimensions: font subset prefixes, content stream operator distributions, coordinate precision, font reference style, object counts, and producer strings. Override the storage directory with the `QUESTIO_HOME` environment variable: ``` QUESTIO_HOME=/team/shared/fingerprints questio verify --issuer acme suspect.pdf ``` Fingerprints are automatically migrated from the legacy `~/.pdf-audit/fingerprints/` path on first run. Do not import fingerprints from untrusted sources. A fingerprint tells questio what to treat as normal. A poisoned fingerprint from a bad actor could make tampered documents score as genuine. See [docs/FINGERPRINT_FORMAT.md](docs/FINGERPRINT_FORMAT.md) for the complete field reference and hand-authoring guide. ## 如何检查 PDF 是否被编辑 1. Install questio: `pip install questio` 2. Install exiftool (the only external dependency): `brew install exiftool` on macOS, `apt install libimage-exiftool-perl` on Debian/Ubuntu. 3. Run: `questio suspect.pdf` 4. Read the trust score and findings list. Scores above 80 are typically authentic, 50 to 79 are suspicious, below 50 are likely tampered. 5. For surgical verification of a specific issuer, train questio on a genuine document from that issuer first: `questio learn --issuer three-uk real_bill.pdf`. Then verify future documents against the trained fingerprint: `questio verify --issuer three-uk suspect_bill.pdf`. ## 谁在使用 questio - **Journalists** checking the authenticity of leaked documents from sources - **Fraud investigators** verifying invoices, expense receipts, and supplier documents - **Paralegals** running initial integrity reviews on client-submitted exhibits - **Accountants** flagging suspect bank statements, payslips, or tax forms - **Landlords and property managers** checking proof-of-income submissions - **Small-business owners** verifying supplier invoices and contractor credentials - **KYC and compliance teams** adding a pre-manual-review filter to document workflows ## 局限性 questio is a heuristic tool. It makes educated guesses based on structural evidence. It cannot: - Read the visual content of a PDF (it does not OCR text or compare images) - Detect forgery that was recreated from scratch in the same tool the original used - Catch a tampering that happened before the PDF was generated (e.g. a doctored spreadsheet exported to PDF) - Prove a document is fake. It can only prove that structural evidence is inconsistent with the document's claim. - Replace human judgment. Always verify findings manually before acting. ## 故障排除 **exiftool not found** ``` brew install exiftool # macOS apt install libimage-exiftool-perl # Debian/Ubuntu ``` **False positive on a legitimate PDF** Run `questio learn` on several known-good samples from the same issuer before using `questio verify`. A fingerprint built from a single sample is too narrow and will flag minor generator variation as suspicious. Three or more samples is the practical minimum. **Fingerprint not loading** Check that `~/.questio/fingerprints/.json` exists. If you were using an older version, fingerprints at `~/.pdf-audit/fingerprints/` are migrated automatically on first run. You can also set `QUESTIO_HOME` to point at an existing directory. **`questio` command not found after install** Ensure your Python `bin` directory is on `PATH`. With pipx: `pipx install git+https://github.com/abcreativ/questio.git`. ## 常见问题解答 **Q: How do I check if a PDF has been edited after it was created?** A: Run `questio document.pdf`. It scores the PDF 0 to 100 and flags ten kinds of tampering traces: producer metadata that should not be there, font tags stacked from multiple editing sessions, Acrobat-specific font signatures, extra end-of-file markers, timestamp gaps, and content stream patterns inconsistent with the declared generator. **Q: Can questio detect a forged bank statement?** A: Yes. questio catches edits from any common tool: Microsoft Word, LibreOffice, Apple Pages, Canva, Google Docs, Adobe Acrobat, Foxit, PDF-XChange, Sejda. Typically multiple traces fire at once: producer metadata, font session tags, and timestamp drift. Acrobat's edit tool fires an extra signature on top of the rest. **Q: Does questio send my PDFs to a cloud service?** A: No. questio runs entirely on your machine. It never uploads, phones home, or requires an API key. **Q: What PDF tampering does questio catch that I cannot see by eye?** A: Font tag drift across editing sessions, Acrobat-specific font object signatures, incremental save history, mismatched producer metadata, permission-lock abuse, and byte-level drawing instructions inconsistent with the declared PDF generator. **Q: Is questio free and open source?** A: Yes. MIT-licensed Python CLI. Install with pip, run against any PDF, no account required. ## 许可 MIT (c) 2026 ABCreativ ## 免责声明 questio is a heuristic forensic tool. Its output is not legal evidence and is not court-admissible. It produces a score and a list of structural anomalies. It does not produce verdicts. Always verify findings manually before making any accusations or decisions. The authors accept no liability for incorrect results.
标签:Adobe Acrobat, EOF标记, Foxit, JSONLines, Nitro, PDF-XChange, PDF取证, PDF编辑器检测, Producer字符串, Sejda, 一次性训练, 元数据分析, 反伪造, 发票验证, 合同验真, 命令行审计, 字体追踪, 对象级异常, 数字取证, 文件完整性, 文档审计, 文档溯源, 文档结构分析, 本地优先, 自动化脚本, 跨编辑器兼容, 逆向工具, 通用模式, 银行对账单审计, 零上传, 零信任