idllresearch/malicious-gpt

GitHub: idllresearch/malicious-gpt

源自USENIX Security '24的研究数据集，收录220个现实世界恶意LLM服务及其越狱提示、恶意响应数据，用于评估地下黑市AI服务的威胁能力与后端溯源。

Stars: 70 | Forks: 4

# Malla：揭示现实世界中集成恶意服务的大型语言模型 [![USENIX Security: 论文](https://img.shields.io/badge/USENIX_Security-paper-maroon.svg)](https://www.usenix.org/conference/usenixsecurity24/presentation/lin-zilong) [![arXiv: 论文](https://img.shields.io/badge/arXiv-paper-red.svg)](https://arxiv.org/abs/2401.03315) [![数据集: 已发布](https://img.shields.io/badge/dataset-released-green.svg)](https://github.com/idllresearch/malicious-gpts/) [![许可证: MIT](https://img.shields.io/badge/license-MIT-yellow.svg)](https://opensource.org/licenses/MIT) ![工件评估: 可用](https://img.shields.io/badge/ARTIFACT_EVALUATION-AVAILABLE-orange.svg) ![工件评估: 功能正常](https://img.shields.io/badge/ARTIFACT_EVALUATION-FUNCTIONAL-blue.svg) ![工件评估: 功能正常](https://img.shields.io/badge/ARTIFACT_EVALUATION-FUNCTIONAL-purple.svg) ![](https://static.pigsec.cn/wp-content/uploads/repos/2026/03/d3b5c0b94c130234.png)

*收录了 220 个恶意 LLM 应用（例如 WormGPT、FraudGPT、BLACKHATGPT 等）的集合、后端及性能表现的代码库* ## 目录 - 环境 - 主要声明 1：[Malla 生成内容的质量评估](./quality) - 执行 - 步骤 1. 检查各项指标的性能 - 步骤 2. 汇总各项指标的结果 - 数据集 - 结果 - 主要声明 2：[作者归属分类](./authorship) - 执行 - 数据集 - 结果 - 主要声明 3：[针对“忽略上述指令”提示泄露攻击的评估](./jailbreak_prompt_uncovering) - 执行 - 数据集 - 结果 - 其他声明 - [malPrompt](https://github.com/idllresearch/malicious-gpt/tree/main/mal_prompts)：45 个用于生成恶意软件和网络钓鱼内容的恶意提示。 - [MallaResponse](https://github.com/idllresearch/malicious-gpt/tree/main/malicious_LLM_responses)：207 个恶意 LLM 应用对 45 个恶意提示的响应。 - [MallaJailbreak](https://github.com/idllresearch/malicious-gpt/tree/main/jailbreak)：200 个恶意 LLM 应用使用的 182 个越狱提示。 - [ULLM-QA](https://github.com/idllresearch/malicious-gpt/tree/main/LLM_responses)：一个包含 33,996 个提示-响应对的大型基准数据集，由 GPT-3.5、Davinci-002、Davinci-003、GPT-J、Luna AI Llama2 Uncensored 和 Pygmalion-13B 生成。 - 更多数据 - [恶意 LLM 应用 (Malla) 列表](https://github.com/idllresearch/malicious-gpt/tree/main/malicious_LLM_name_list)：恶意 LLM 应用集合，其中 22 个来自地下市场，125 个来自 Poe.com，73 个来自 FlowGPT.com。 - [LLM 相关关键词](https://github.com/idllresearch/malicious-gpt/blob/main/keywords/LLM_keywords.txt)：145 个 LLM 相关关键词集合。 - [恶意 LLM 主题关键词](https://github.com/idllresearch/malicious-gpt/blob/main/keywords/malla_services_topic_keywords.txt)：73 个恶意 LLM 主题关键词集合。 - [Malla 运行截图](https://github.com/idllresearch/malicious-gpt/blob/main/screenshots_and_ads/running_screenshot)：7 个恶意 LLM 应用的运行截图集合。 - [Malla 广告中使用的宣传材料](https://github.com/idllresearch/malicious-gpt/blob/main/screenshots_and_ads/ad)：恶意 LLM 应用广告中使用的 4 个 GIF 和 3 张海报集合。 - 媒体报道 ## 环境使用 [requirements.txt](./requirements.txt) 和以下命令构建环境。 ``` conda create -n malla python=3.8 conda activate malla pip install -r requirements.txt ``` ## 主要声明 1：[Malla 生成内容的质量评估](https://github.com/idllresearch/malicious-gpt/tree/main/quality) ```Malla 生成内容在不同指标上的质量评估结果与表 3 中报告的结果相同或非常接近。``` ### 执行代码存储在 ["quality"](https://github.com/idllresearch/malicious-gpt/tree/main/quality) 文件夹中。下面，我们以 Malla **service** 为例。要评估 **poe** 和 **flowgpt** 中的 Malla，请遵循相同的步骤（关于检查 Poe 和 FlowGPT 中 Malla 的步骤及其介绍，请参阅 [`quality/MoreGuide.md`](https://github.com/idllresearch/malicious-gpt/blob/main/quality/MoreGuide.md)）。 #### *步骤 1. 检查各项指标的性能* **步骤 1.1：代码格式合规性 (F)、可编译性 (C)、有效性 (V)** - 检查 Python 代码的格式合规性和可编译性。 - 输入：包含 25 个原始数据文件的 `malicious_LLM_responses/service` 文件夹。 - 输出：`quality/services/Python/results` 文件夹，其中存储了 25 个输出文件，文件名格式为 `synPython_QA-XXX-X.json`（例如 `synPython_QA-BadGPT-1.json`）。 ``` cd ./malicious-gpt/quality/services/Python python scanner.py ``` - 检查 C/C++ 代码的格式合规性和可编译性。 - 输入：`malicious_LLM_responses/service` 文件夹。 - 输出：`quality/services/C++/results` 文件夹，其中存储了 25 个输出文件，文件名格式为 `synC++_QA-XXX-X.json`（例如 `synC++_QA-BadGPT-1.json`）。 ``` cd ./malicious-gpt/quality/services/C++ python scanner.py ``` - 检查 HTML 代码和页面的格式合规性和有效性。生成的网页存储在 *services/HTML/html-results* 文件夹中。 - 输入：`malicious_LLM_responses/service` 文件夹。 - 输出：`quality/services/HTML/results` 文件夹，其中存储了 25 个输出文件，文件名格式为 `synHTML_QA-XXX-X.json`（例如 `synHTML_QA-BadGPT-1.json`）。 - 注意：运行可能会因错误而中断。这是由于对 API 的请求过于频繁。请等待一段时间并重新运行脚本。 ``` cd ./malicious-gpt/quality/services/HTML python scanner.py ``` - **汇总上述结果**为 **CodeSyn** 文件夹中的文件。 - 输入：文件夹（`quality/services/Python/results`、`quality/services/C++/results`、`quality/services/HTML/results`）。 - 输出：`quality/services/CodeSyn` 文件夹，其中存储了 25 个输出文件，文件名格式为 `synFinal_QA-XXX-X.json`（例如 `synFinal_QA-BadGPT-1.json`）。 ``` cd ./malicious-gpt/quality/services/ python sumCompilable.py ``` **步骤 1.2：代码规避性 (E)** - 检查 Python、C/C++ 和 HTML 代码针对病毒检测器的规避性。在 **codeDetection** 文件夹中生成文件。 - 输入：`malicious_LLM_responses/service` 文件夹。 - 输出：`quality/services/codeDetection` 文件夹，其中存储了 25 个输出文件，文件名格式为 `VT_QA-XXX-X.json`（例如 `VT_QA-BadGPT-1.json`）。 - 注意：在运行代码之前，请在 **VTscanner.py** 中添加您的 VirusTotal API。VirusTotal API 是免费的，但有查询频率限制。我们提供 VirusTotal API，但不保证它们仍然有效。 - dbd288d2f3dd1f1dec3b3b1462e8f8598e9ad74fa92b86d47848488d607371bc - a23e2c605b96dfd600217c04d25650e3680ac6ab82201c1e88279637877eaeac - 36fe08222b6791270d44d9f2c76d2a1556b8233912e496c945235334df4ca970 运行可能会因错误而中断。这是由于对 API 的请求过于频繁。请等待一段时间并重新运行脚本。 ``` cd ./malicious-gpt/quality/services/VirusTotalDetect python VTscanner.py ``` **步骤 1.3：电子邮件格式合规性 (F) 和可读性 (R)** - 检查电子邮件的格式合规性和可读性。在 **mailFluency** 文件夹中生成文件。 - 环境注意：请确保 Python 已安装版本为 `0.7.3` 的 `textstat` 包。 - 输入：`malicious_LLM_responses/service` 文件夹。 - 输出：`quality/services/mailFluency` 文件夹，其中存储了 25 个输出文件，文件名格式为 `fogemail_QA-XXX-X.json`（例如 `fogemail_QA-BadGPT-1.json`）。 ``` cd ./malicious-gpt/quality/services/fluency python fluency_scanner.py ``` **步骤 1.4：电子邮件规避性 (E)** - 检查电子邮件针对钓鱼检测器的规避性。在 **mailDetection** 文件夹中生成文件。 - 输入：`malicious_LLM_responses/service` 文件夹。 - 输出：`quality/services/mailDetection` 文件夹，其中存储了 25 个输出文件，文件名格式为 `oop_QA-XXX-X.json`（例如 `oop_QA-BadGPT-1.json`）。 - 注意：在运行代码之前，请在 **oopspam_detect.py** 中添加您的 OOPSpam API。OOPSpam API **不是免费的**。 ``` cd ./malicious-gpt/quality/services/OOPSpamDetect python scanner.py ``` #### *步骤 2：汇总各项指标的检查结果* 请运行以下脚本以获取汇总结果。 - 输入：文件夹 `quality/services/CodeSyn`、`quality/services/codeDetection`、`quality/services/mailFluency` 和 `quality/services/mailDetection`。 - 返回：最终汇总结果。 ``` cd ./malicious-gpt/quality/services python quality_evaluation.py ``` ### 数据集 **步骤 1：** 输入是由恶意 LLM 生成的内容，存储在 [malicious_LLM_responses](https://github.com/idllresearch/malicious-gpt/tree/main/malicious_LLM_responses) 中。 **步骤 2：** 执行步骤 1 后，您将获得四个子文件夹，即 `codeSyn`、`codeDetection`、`mailFluency` 和 `mailDetection`。我们还提供了步骤 1 的结果作为中间结果，位于： - Malla 服务评估：https://github.com/idllresearch/malicious-gpt/tree/main/quality/services - Poe 的 Malla 项目评估：https://github.com/idllresearch/malicious-gpt/tree/main/quality/poe - FlowGPT 的 Malla 项目评估：https://github.com/idllresearch/malicious-gpt/tree/main/quality/flowgpt ### 结果 **Malla 服务** 脚本预期将打印： ``` BadGPT Malicious code -> F: 0.35, C: 0.22, E: 0.19 | Mail -> F: 0.80, R: 0.13, E: 0.00 | Website -> F: 0.20, V: 0.13, E: 0.13 ----- CodeGPT Malicious code -> F: 0.52, C: 0.29, E: 0.22 | Mail -> F: 0.53, R: 0.27, E: 0.00 | Website -> F: 0.20, V: 0.13, E: 0.13 ----- DarkGPT Malicious code -> F: 1.00, C: 0.65, E: 0.63 | Mail -> F: 1.00, R: 0.87, E: 0.13 | Website -> F: 0.80, V: 0.33, E: 0.33 ----- EscapeGPT Malicious code -> F: 0.78, C: 0.67, E: 0.67 | Mail -> F: 1.00, R: 0.50, E: 0.25 | Website -> F: 1.00, V: 1.00, E: 1.00 ----- EvilGPT Malicious code -> F: 1.00, C: 0.54, E: 0.51 | Mail -> F: 1.00, R: 0.93, E: 0.27 | Website -> F: 0.80, V: 0.20, E: 0.13 ----- FreedomGPT Malicious code -> F: 0.90, C: 0.21, E: 0.21 | Mail -> F: 1.00, R: 0.87, E: 0.13 | Website -> F: 0.60, V: 0.00, E: 0.00 ----- MakerGPT Malicious code -> F: 0.24, C: 0.11, E: 0.11 | Mail -> F: 0.07, R: 0.00, E: 0.00 | Website -> F: 0.20, V: 0.13, E: 0.13 ----- WolfGPT Malicious code -> F: 0.89, C: 0.52, E: 0.52 | Mail -> F: 1.00, R: 1.00, E: 0.67 | Website -> F: 0.67, V: 0.13, E: 0.13 ----- XXXGPT Malicious code -> F: 0.14, C: 0.05, E: 0.05 | Mail -> F: 0.07, R: 0.00, E: 0.00 | Website -> F: 0.40, V: 0.27, E: 0.27 ----- ``` **Poe 上的 Malla 服务项目** 脚本预期将打印： ``` Quality of content generated by Mallas on Poe.com Malicious code: F: 0.37+-0.26, C: 0.25+-0.18, E: 0.24+-0.16 Email: F: 0.44+-0.29, R: 0.21+-0.20, E: 0.05+-0.08 Web: F: 0.32+-0.22, V: 0.21+-0.19, E: 0.21+-0.19 ``` **FlowGPT 上的 Malla 服务项目** 脚本预期将打印： ``` Quality of content generated by Mallas on FlowGPT.com Malicious code: F: 0.44+-0.29, C: 0.29+-0.19, E: 0.28+-0.18 Email: F: 0.37+-0.31, R: 0.21+-0.21, E: 0.04+-0.07 Web: F: 0.24+-0.27, V: 0.19+-0.24, E: 0.19+-0.24 ``` ## 主要声明 2：[作者归属分类](https://github.com/idllresearch/malicious-gpt/tree/main/authorship) ```我们的归属分类器正确识别了 DarkGPT、EscapeGPT 和 FreedomGPT 的后端，分别为 Davinci-003、GPT-3.5 和 Luna AI Llama2 Uncensored，如 §6.1 中所述。K折交叉验证的结果与 6.1 节中显示的结果非常接近。``` ### 执行代码存储在 ["authorship"](https://github.com/idllresearch/malicious-gpt/tree/main/authorship) 文件夹中。运行以下脚本： ``` python author.py ``` ### 数据集训练集：您可以使用以下任何方式获取训练数据。 * https://github.com/idllresearch/malicious-gpt/blob/main/authorship/data/training_data.zip （请解压文件并将训练数据放入 **data** 文件夹） * https://drive.google.com/drive/folders/1ZhSL_6ze3tEfQ6QikoMil1zzwQgheWlx （请将训练数据放入 **data** 文件夹）。测试集：https://github.com/idllresearch/malicious-gpt/tree/cdf8af3f68c3dbdfbcab8531274dfab6ed0c21c0/authorship. ### 结果通过五折交叉验证，脚本将打印出 0.87 的精确率和召回率。为了识别 DarkGPT、EscapeGPT 和 FreedomGPT 的后端 LLM。使用预训练模型的识别结果打印如下： ``` Identified Backend: Backends of DarkGPT -> Davinci_003 Backends of FreedomGPT -> Luna_AI_Llama2_Uncensored Backends of EscapeGPT -> ChatGPT_3.5. ``` ## 主要声明 3：[针对“忽略上述指令”提示泄露攻击的评估](https://github.com/idllresearch/malicious-gpt/tree/main/jailbreak_prompt_uncovering) ```93.01% (133/143) 的越狱提示被揭露。可见的越狱提示与基本真值数据集中相应的参考越狱提示之间的平均 Jaro-Winkler 相似度和语义文本相似度分别为 0.88 和 0.83，详见 6.1 节。``` ### 执行代码存储在 ["jailbreak_prompt_uncovering"](https://github.com/idllresearch/malicious-gpt/tree/main/jailbreak_prompt_uncovering) 文件夹中。运行以下脚本： ``` python uncoveringMeasure.py ``` ### 数据集请在此处下载基本真值数据集：https://github.com/idllresearch/malicious-gpt/tree/main/jailbreak_prompt_uncovering/Poe%2BFlowGPT_visible-groundtruth.json ### 结果攻击成功率 (**93.01%**)、平均语义文本相似度 (**0.83**) 和平均 Jaro-Winkler 相似度 (**0.88**) 将被显示，并与论文中给出的数据相匹配。 ## 其他声明在我们的代码库中，我们提供了四个数据集。 **[malPrompt](https://github.com/idllresearch/malicious-gpt/tree/main/mal_prompts)**：我们从 Malla 列表中的 45 个恶意提示，涉及生成恶意代码、起草钓鱼邮件和创建钓鱼网站。 **[MallaResponse](https://github.com/idllresearch/malicious-gpt/tree/main/malicious_LLM_responses)**：9 个 Malla 服务和 198 个 Malla 项目对 45 个恶意提示的响应。 **[MallaJailbreak](https://github.com/idllresearch/malicious-gpt/tree/main/jailbreak)**：我们从 200 个 Malla 服务和项目中收集或揭露的 182 个越狱提示。 **[ULLM-QA](https://github.com/idllresearch/malicious-gpt/tree/main/LLM_responses)**：一个包含 33,996 个提示-响应对的大型数据集，由 GPT-3.5、Davinci-002、Davinci-003、GPT-J、Luna AI Llama2 Uncensored 和 Pygmalion-13B 生成，其中 15,114 个与生成 Python 恶意代码或未指定语言的代码相关。 ## 恶意 LLM 应用的重要数据 **[恶意 LLM 应用列表](https://github.com/idllresearch/malicious-gpt/tree/main/malicious_LLM_name_list)**：恶意 LLM 应用集合，其中 22 个来自地下市场，125 个来自 Poe.com，73 个来自 FlowGPT.com。 **[LLM 相关关键词](https://github.com/idllresearch/malicious-gpt/blob/main/keywords/LLM_keywords.txt)**：145 个 LLM 相关关键词集合。 **[恶意 LLM 主题关键词](https://github.com/idllresearch/malicious-gpt/blob/main/keywords/malla_services_topic_keywords.txt)**：73 个恶意 LLM 主题关键词集合。 ## 媒体报道 **Tech Policy Press** (2024年1月18日) [研究大型语言模型地下市场，研究人员发现 OpenAI 模型为恶意服务提供动力](https://www.techpolicy.press/studying-black-market-for-large-language-models-researchers-find-openai-models-power-malicious-services/) **Le Monde** (2024年2月22日) [AI 的阴暗面：网络犯罪分子开发的聊天机器人](https://www.lemonde.fr/en/science/article/2024/02/22/the-dark-side-of-ai-chatbots-developed-by-cyber-criminals_6550302_10.html) **The Wall Street Journal** (2024年2月28日) [欢迎来到 BadGPTs 时代](https://www.wsj.com/articles/welcome-to-the-era-of-badgpts-a104afa8) **安全内参-网络安全首席知识官** (2024年3月28日) [重估现实中的恶意大模型服务](https://www.secrss.com/articles/64772) **AI Incident Database** (2024年6月27日) [LLM 地下市场助长恶意软件和网络钓鱼诈骗](https://incidentdatabase.ai/cite/736/) **Fast Company** (2024年9月5日) [黑市 AI 聊天机器人的地下世界正在繁荣发展](https://www.fastcompany.com/91184474/black-market-ai-chatbots-thriving) **Cryptopolitan** (2024年9月5日) [新一波黑市聊天机器人涌现并蓬勃发展](https://www.cryptopolitan.com/a-new-wave-of-black-market-chatbots-emerges/) ## 引用如果您发现上述数据和信息对您的研究有帮助，请考虑引用： ``` @inproceedings{lin2024malla, title={Malla: Demystifying Real-world Large Language Model Integrated Malicious Services}, author={Lin, Zilong and Cui, Jian and Liao, Xiaojing and Wang, XiaoFeng}, booktitle={33rd USENIX Security Symposium (USENIX Security 24)}, year={2024}, publisher = {USENIX Association} } ```

标签：DLL 劫持, ESC8, FraudGPT, Prompt注入, USENIX Security, WormGPT, 人工智能滥用, 域名收集, 大语言模型, 归因分析, 恶意AI服务, 恶意提示词, 恶意软件, 生成式AI, 社会工程学, 索引, 网络犯罪, 越狱提示词, 逆向工具, 黑帽SEO