systemslibrarian/cipher-detective-ai

GitHub: systemslibrarian/cipher-detective-ai

一个面向经典密码学教育的交互式分析平台，通过启发式引擎和Transformer分类器检测并解码81种历史密码类型。

Stars: 0 | Forks: 0

--- title: Cipher Detective AI emoji: 🕵️‍♂️ colorFrom: indigo colorTo: yellow sdk: gradio sdk_version: 4.44.0 app_file: app.py pinned: false license: mit tags: - cryptography - cryptanalysis - classical-ciphers - cipher - transformers - text-classification - cybersecurity-education - gradio - nlp - machine-learning - python - substitution-cipher - vigenere - caesar-cipher - educational --- # 🕵️‍♂️ Cipher Detective AI Cipher Detective AI 是一个**教育性质**的经典密码分析展示项目。它使用透明的启发式引擎和可选的微调 Transformer 分类器，检测并解码 **81 种历史密码类型**——从 Caesar 和 Vigenère 到 Rail-Fence、Columnar、Playfair 等等。它教授了脆弱的历史密码是如何泄露模式的，密码分析实际上是如何工作的，以及**为什么现代密码学有着根本的不同**。它被构建为一个 Hugging Face 原生三位一体组合： | 产物 | 仓库 | 角色 | |----------|-------------------------------------------------------------|----------------------------------------------| | Space | [systemslibrarian/cipher-detective-ai](https://huggingface.co/spaces/systemslibrarian/cipher-detective-ai) | 实时交互展示（本应用） | | Dataset | [systemslibrarian/classical-cipher-corpus](https://huggingface.co/datasets/systemslibrarian/classical-cipher-corpus) | 带标签的经典密码样本 | | Model | [systemslibrarian/cipher-detective-classifier](https://huggingface.co/systemslibrarian/cipher-detective-classifier) | 小型 Transformer 分类器 | **这不是一个攻击工具。** 它不能破解现代加密、恢复密码或绕过访问控制。请参见 [`docs/educational-boundary.md`](docs/educational-boundary.md)。 ## ✨ 演示 ## 🧭 模式该 Gradio Space 包含**七个**选项卡： 1. **Detect（检测）** —— 粘贴密文，获取分类、置信度和完整的证据报告（频率、IoC、熵、Caesar/Affine 候选、Kasiski/Friedman 指标、换位信号）。一键随机示例。 2. **Evidence Notebook（证据笔记）** —— 查看没有结论的原始证据 —— 适用于逐步教授密码分析。 3. **Challenge（挑战）** —— 生成选定难度的练习密文（Caesar、Atbash、Vigenère、Rail-Fence、Columnar、Affine、Substitution）。 4. **Try Decode（尝试解码）** —— 十种带有自动英语质量评分的解密方法： - *自动模式（无需密钥）：* auto-best-Caesar（自动最佳 Caesar）、auto-best-Affine（自动最佳 Affine）、auto-Vigenère（Kasiski + Friedman）、auto-rail-fence（暴力破解轨道数 2–15） - *带密钥模式：* Caesar/ROT、Atbash、Vigenère、Beaufort、Affine、Columnar transposition（列换位） 5. **Compare Mode（对比模式）** —— 并排运行**透明启发式基线**和 **Transformer 分类器**，并提供分歧分析。 6. **Solve Substitution（破解替换密码）** —— 使用混合 bigram + trigram 对数概率评分的爬山算法求解器，用于破解单表替换密码。仅限教育用途 —— 在 120+ 个英文字母的文本上收敛。 7. **About（关于）** —— 项目背景、教育界限和链接。 ## 🚀 本地运行 ``` git clone https://github.com/systemslibrarian/cipher-detective-ai.git cd cipher-detective-ai python -m venv .venv && source .venv/bin/activate pip install -r requirements.txt python app.py ``` Space 将在上运行。要附加一个已训练的 Transformer 分类器： ``` export CIPHER_MODEL_ID=systemslibrarian/cipher-detective-classifier # or a local folder path python app.py ``` 如果无法加载模型，应用将**始终**回退到透明的启发式基线。 ## 🧪 测试 ``` pip install -r requirements-dev.txt pytest ``` 测试覆盖率包括密码往返测试、边缘情况（空/非字母输入）、特征信号（IoC、熵、Kasiski、Friedman、换位）、启发式分类器以及数据集生成器 schema。 ## 🗂️ 生成数据集快速演示（5,000 行）： ``` python scripts/generate_dataset.py --out data/cipher_examples.jsonl --n 5000 --seed 42 ``` 公开发布规模（50,000 行）： ``` python scripts/generate_dataset.py --out data/cipher_examples.jsonl --n 50000 --seed 42 ``` 每行包括 `id`、`text`、`ciphertext`、`plaintext`、`label`、`cipher`、`key`、`difficulty`、`language`、`text_length`、`attack_methods` 和 `educational_note` —— 参见 [`hf_cards/dataset_README.md`](hf_cards/dataset_README.md)。 ## 🧠 训练模型 ``` python scripts/train_transformer.py \ --data data/cipher_examples.jsonl \ --model distilbert-base-uncased \ --out cipher_model \ --epochs 3 ``` 输出将保存在 `cipher_model/` 中： - model + tokenizer（模型和分词器） - `training_metrics.json`（准确率、宏观精确率/召回率/F1） - `label_mapping.json`（`label2id` / `id2label`）随后要上传到 Hub： ``` huggingface-cli login huggingface-cli upload systemslibrarian/cipher-detective-classifier ./cipher_model ``` ## 📊 评估仅启发式基线： ``` python scripts/evaluate_baseline.py --data data/cipher_examples.jsonl --out reports/baseline_metrics.json ``` 对比启发式与 Transformer： ``` python scripts/evaluate_baseline.py \ --data data/cipher_examples.jsonl \ --model cipher_model \ --out reports/baseline_metrics.json ``` 报告包括准确率、宏观 F1、每类精确率/召回率/F1、混淆矩阵以及数据集的标签分布。 ## 🏷️ 标签分类器涵盖 **81 种密码类别**，包括： `plaintext`、`caesar_rot`、`atbash`、`vigenere`、`beaufort`、`rail_fence`、`columnar`、`affine`、`substitution`、`playfair`、`four_square`、`two_square`、`hill_cipher`、`bifid`、`trifid`、`adfgx`、`adfgvx`、`enigma`、`lorenz`、`morse_code`、`tap_code`、`navajo_code`、`pigpen`、`baconian`、`polybius`、`straddling_checkerboard`、`chaocipher`、`nihilist`、`porta`、`rot13`、`rot47`、`gronsfeld`、`running_key`、`autokey`、`one_time_pad`、`venona_pad_reuse`、`voynich`、`babington` 等等。完整的标签分布请参见 [`data/cipher_examples.jsonl`](data/cipher_examples.jsonl)。 ## 🛣️ 路线图 - [ ] 发布 `classical-cipher-corpus` 数据集（5 万行）。 - [ ] 训练并发布 `cipher-detective-classifier`。 - [ ] 添加 `screenshots/` 图片。 - [x] 针对单表替换的爬山算法求解器演示（仅限教育用途）。 - [x] 按长度和难度划分的评估区间。 - [x] Vigenère 自动求解器（Kasiski + Friedman 密钥长度估计）。 - [x] Rail-fence 和 columnar transposition 解码器。 - [x] Beaufort 密码支持（加密 + 解密）。 - [x] 针对爬山算法的 Bigram + trigram 混合评分。 - [x] GitHub Actions → 在每次推送时自动同步到 Hugging Face Space。 - [ ] 校准图（启发式置信度与准确率）。 - [ ] 多语言明文来源（明确标注）。 - [ ] 链接来自 Cipher Museum / Crypto Lab 的展示页面。已发布的内容请参见 [`CHANGELOG.md`](CHANGELOG.md)。 ## 🌐 生态系统 Cipher Detective AI 是更广泛的密码教育路径的一部分： - **Cipher Museum** —— 精选的密码史 _(链接占位符)_ - **Crypto Compare** —— 算法比较 _(链接占位符)_ - **Crypto Lab** —— 动手实验 _(链接占位符)_ - **Meow Decoder** —— 友好的入门点 _(链接占位符)_ 请参见 [`docs/ecosystem.md`](docs/ecosystem.md)。 ## 🛡️ 教育界限本项目教授经典密码分析。它**并非**旨在： - 破坏现代加密（AES、ChaCha20、RSA、ECC、TLS、age、PGP）， - 恢复密码或密码哈希， - 绕过访问控制或 DRM， - 支持监视或未经授权的访问， - 对现实世界的密码学安全性做出任何声明。现代密码学依赖于经过审查的原语、协议、密钥管理、正确的实现、元数据处理以及诚实的威胁模型。这里展示的所有技术均不适用于此。请参见 [`docs/educational-boundary.md`](docs/educational-boundary.md)。 ## 🔐 安全有关如何报告漏洞，请参见 [`SECURITY.md`](SECURITY.md)。 ## 📜 许可证 MIT —— 参见 [`LICENSE`](LICENSE)。 ## 📚 引用如果您在教学中使用了本项目，请参见 [`CITATION.cff`](CITATION.cff)。

标签：AI教育, Apex, Gradio, Hugging Face, IT教育, NLP, Python, Transformer, 人工智能, 侦探, 凯撒密码, 古典密码, 启发式引擎, 安全规则引擎, 密码分析, 密码学, 开源, 手动系统调用, 教育工具, 文本分类, 无后门, 替换密码, 机器学习, 用户模式Hook绕过, 维吉尼亚密码, 网络安全, 网络安全教育, 解密, 逆向工具, 隐私保护