Hunyuan-PromptEnhancer/PromptEnhancer

GitHub: Hunyuan-PromptEnhancer/PromptEnhancer

PromptEnhancer是一个提示重写工具，用于优化文生图和图生图任务的输入提示，以提升图像生成质量。

Stars: 3729 | Forks: 325

# PromptEnhancer：一种通过思维链提示重写增强文生图模型的简单方法 [**王林清**](https://scholar.google.com/citations?hl=en&view_op=list_works&gmla=AH8HC4z9rmDHYjp5o28xKk8U4ddD_n7BuMnk8UZFP-jygFBtHUSz6pf-5FP32B_yKMpRU9VpDY3iT8eM0zORHA&user=Hy12lcEAAAAJ) ^†· [**邢曦明**](https://ximinng.github.io/) · 徐志勇 · [**程一佶**](https://scholar.google.com/citations?user=Plo8ZSYAAAAJ&hl=en) · 赵志远 · 李东浩 · 杭天凯 · 李振熙 · [**陶佳乐**](https://scholar.google.com/citations?user=WF5DPWkAAAAJ&hl=en) · [**王其勋**](https://github.com/wangqixun) · [**李瑞煌**](https://scholar.google.com/citations?user=8CfyOtQAAAAJ&hl=en) · 陈科米 · 李欣 · [**吴明锐**](https://scholar.google.com/citations?user=sbCKwnYAAAAJ&hl=en) · 邓昕驰 · 顾书扬 · [**王春宇**](https://scholar.google.com/citations?user=VXQV5xwAAAAJ&hl=en)^* · [**陆清林**](https://luqinglin.weebly.com/) 腾讯混元 ^†项目负责人 · ^*通讯作者

PromptEnhancer Teaser

## 概述 Hunyuan-PromptEnhancer 是一个提示重写工具，**同时支持文生图生成和图生图编辑**。它能在保留原始意图的前提下重构输入提示，为下游图像生成任务产出更清晰、结构化的提示。 **主要功能：** ## 🔥🔥🔥 更新日志 - [2025-10-11] ✨ 发布 [PromptEnhancer-32B gradio](https://huggingface.co/spaces/PromptEnhancer/PromptEnhancer_32B)。 - [2025-09-30] ✨ 发布 [PromptEnhancer-Img2Img 编辑模型](https://huggingface.co/PromptEnhancer/PromptEnhancer-Img2img-Edit)。 - [2025-09-22] 🚀 感谢 @mradermacher 添加 **GGUF 模型支持**，实现量化模型的高效推理！ - [2025-09-18] ✨ 试用 [PromptEnhancer-32B](https://huggingface.co/PromptEnhancer/PromptEnhancer-32B) 以获得更高质量的提示增强效果！ - [2025-09-16] ✨ 发布 [T2I-Keypoints-Eval 数据集](https://huggingface.co/datasets/PromptEnhancer/T2I-Keypoints-Eval)。 - [2025-09-07] ✨ 发布 [PromptEnhancer-7B 模型](https://huggingface.co/tencent/HunyuanImage-2.1/tree/main/reprompt)。 - [2025-09-07] ✨ 发布 [技术报告](https://arxiv.org/abs/2509.04545)。 ## 安装说明 ### 选项 1：标准安装（推荐） ``` pip install -r requirements.txt ``` ## 模型下载 ### 🎯 快速开始对于大多数用户，我们建议从 **PromptEnhancer-7B** 模型开始： ``` # 下载 PromptEnhancer-7B（13GB）- 质量与效率的最佳平衡 huggingface-cli download tencent/HunyuanImage-2.1/reprompt --local-dir ./models/promptenhancer-7b ``` ### 📊 模型对比与选择指南 | 模型 | 大小 | 质量 | 显存需求 | 最佳用途 | |-------|------|---------|--------|----------| | **PromptEnhancer-7B** | 13GB | 高 | 8GB+ | 大多数用户，性能均衡 | | **PromptEnhancer-32B** | 64GB | 最高 | 32GB+ | 研究用途，追求最高质量 | | **32B-Q8_0 (GGUF)** | 35GB | 最高 | 35GB+ | 高端 GPU (H100, A100) | | **32B-Q6_K (GGUF)** | 27GB | 优秀 | 27GB+ | RTX 4090, RTX 5090 | | **32B-Q4_K_M (GGUF)** | 20GB | 良好 | 20GB+ | RTX 3090, RTX 4080 | ### 标准模型（全精度） ``` # PromptEnhancer-7B（推荐给大多数用户） huggingface-cli download tencent/HunyuanImage-2.1/reprompt --local-dir ./models/promptenhancer-7b # PromptEnhancer-32B（追求最高质量的用户） huggingface-cli download PromptEnhancer/PromptEnhancer-32B --local-dir ./models/promptenhancer-32b # PromptEnhancer-Img2Img-Edit（用于图像编辑任务） huggingface-cli download PromptEnhancer/PromptEnhancer-Img2img-Edit --local-dir ./models/promptenhancer-img2img-edit ``` ### GGUF 模型（量化版 - 内存高效）根据您的 GPU 显存选择其中之一： ``` # Q8_0：最高质量（35GB） huggingface-cli download mradermacher/PromptEnhancer-32B-GGUF PromptEnhancer-32B.Q8_0.gguf --local-dir ./models # Q6_K：卓越质量（27GB）- 推荐用于 RTX 4090 huggingface-cli download mradermacher/PromptEnhancer-32B-GGUF PromptEnhancer-32B.Q6_K.gguf --local-dir ./models # Q4_K_M：良好质量（20GB）- 推荐用于 RTX 3090/4080 huggingface-cli download mradermacher/PromptEnhancer-32B-GGUF PromptEnhancer-32B.Q4_K_M.gguf --local-dir ./models ``` ## 快速开始 ### 使用 HunyuanPromptEnhancer（文生图） ``` from inference.prompt_enhancer import HunyuanPromptEnhancer models_root_path = "./models/promptenhancer-7b" enhancer = HunyuanPromptEnhancer(models_root_path=models_root_path, device_map="auto") # 增强一个提示词（中文或英文） user_prompt = "Third-person view, a race car speeding on a city track..." new_prompt = enhancer.predict( prompt_cot=user_prompt, # Default system prompt is tailored for image prompt rewriting; override if needed temperature=0.7, # >0 enables sampling; 0 uses deterministic generation top_p=0.9, max_new_tokens=256, ) print("Enhanced:", new_prompt) ``` ### 使用 PromptEnhancerImg2Img（图像编辑）对于需要根据输入图像增强编辑指令的图生图编辑任务： ``` from inference.prompt_enhancer_img2img import PromptEnhancerImg2Img # 初始化图像到图像提示词增强器 enhancer = PromptEnhancerImg2Img( model_path="./models/your-model", device_map="auto" ) # 利用图像上下文增强编辑指令 edit_instruction = "Remove the watermark from the bottom" image_path = "./examples/sample_image.png" enhanced_prompt = enhancer.predict( edit_instruction=edit_instruction, image_path=image_path, temperature=0.1, top_p=0.9, max_new_tokens=2048 ) print("Enhanced editing prompt:", enhanced_prompt) ``` ### 使用 GGUF 模型（量化版，更快） ``` from inference.prompt_enhancer_gguf import PromptEnhancerGGUF # 自动检测 models/ 文件夹中的 Q8_0 模型 enhancer = PromptEnhancerGGUF( model_path="./models/PromptEnhancer-32B.Q8_0.gguf", # Optional: auto-detected n_ctx=1024, # Context window size n_gpu_layers=-1, # Use all GPU layers ) # 增强一个提示词 user_prompt = "woman in jungle" enhanced_prompt = enhancer.predict( user_prompt, temperature=0.3, top_p=0.9, max_new_tokens=512, ) print("Enhanced:", enhanced_prompt) ``` ### 命令行使用 (GGUF) ``` # 简单用法 - 自动检测 models/ 文件夹中的模型 python inference/prompt_enhancer_gguf.py # 或指定模型路径 GGUF_MODEL_PATH="./models/PromptEnhancer-32B.Q8_0.gguf" python inference/prompt_enhancer_gguf.py ``` ## GGUF 模型优势 🚀 **为什么使用 GGUF 模型？** - **内存高效**：与全精度模型相比，显存占用减少 50-75% - **推理更快**：针对 CPU 和 GPU 加速优化，使用 llama.cpp - **质量保持**：Q8_0 和 Q6_K 保持优秀的输出质量 - **部署简便**：单一文件格式，无复杂依赖 - **GPU 加速**：完全支持 CUDA，实现高性能推理 | 模型 | 大小 | 质量 | 显存占用 | 最佳用途 | |-------|------|---------|------------|----------| | Q8_0 | 35GB | 最高 | ~35GB | 高端 GPU (H100, A100) | | Q6_K | 27GB | 优秀 | ~27GB | RTX 4090, RTX 5090 | | Q4_K_M| 20GB | 良好 | ~20GB | RTX 3090, RTX 4080 | ## 使用方式对比 | 模型 | 输入类型 | 使用场景 | 模型后端 | |-------|------------|----------|---------------| | **HunyuanPromptEnhancer** | 仅文本 | 文生图生成 | Transformers (7B/32B) | | **PromptEnhancerImg2Img** | 文本 + 图像 | 图像编辑任务 | Transformers (32B) | | **PromptEnhancerGGUF** | 仅文本 | 内存高效的文生图 | llama.cpp (量化版) | ## 参数说明 ### 标准模型 (Transformers) - `models_root_path`：本地路径或仓库 ID；支持 `trust_remote_code` 模型。 - `device_map`：设备映射（默认 `auto`）。 - `predict(...)`: - `prompt_cot` (str)：需要重写的输入提示。 - `sys_prompt` (str)：可选系统提示；已为图像提示重写提供默认提示。 - `temperature` (float)：`>0` 启用采样；`0` 为确定性生成。 - `top_p` (float)：核采样阈值（采样时生效）。 - `max_new_tokens` (int)：最大生成的新 token 数量。 ### GGUF 模型 - `model_path` (str)：GGUF 模型文件路径（若位于 models/ 文件夹下则自动检测）。 - `n_ctx` (int)：上下文窗口大小（默认：8192，对于短提示推荐 1024）。 - `n_gpu_layers` (int)：卸载到 GPU 的层数（-1 表示所有层）。 - `verbose` (bool)：启用 llama.cpp 的详细日志。 ### 图生图模型 (PromptEnhancerImg2Img) - `model_path` (str)：预训练 Qwen2.5-VL 模型的路径。 - `device_map` (str)：模型加载的设备映射（默认：`auto`）。 - `predict(...)`: - `edit_instruction` (str)：原始编辑指令。 - `image_path` (str)：输入图像文件的路径。 - `sys_prompt` (str)：可选系统提示（若为 None 则使用默认）。 - `temperature` (float)：采样温度（默认：0.1）。 - `top_p` (float)：核采样阈值（默认：0.9）。 - `max_new_tokens` (int)：最大生成 token 数（默认：2048）。 ## 引用如果您觉得本项目有用，请考虑引用： ``` @article{promptenhancer, title={PromptEnhancer: A Simple Approach to Enhance Text-to-Image Models via Chain-of-Thought Prompt Rewriting}, author={Wang, Linqing and Xing, Ximing and Cheng, Yiji and Zhao, Zhiyuan and Donghao, Li and Tiankai, Hang and Zhenxi, Li and Tao, Jiale and Wang, QiXun and Li, Ruihuang and Chen, Comi and Li, Xin and Wu, Mingrui and Deng, Xinchi and Gu, Shuyang and Wang, Chunyu and Lu, Qinglin}, journal={arXiv preprint arXiv:2509.04545}, year={2025} } ``` ## 致谢我们感谢以下开源项目和社区对开放研究与探索的贡献：[Transformers](https://huggingface.co/transformers) 和 [HuggingFace](https://huggingface.co)。 ## 联系我们如果您希望向我们的研发和产品团队留言，欢迎联系我们的开源团队。您也可以通过邮件联系我们 (hunyuan_opensource@tencent.com)。

标签：AIGC, AI艺术, CVPR, 人工智能, 凭据扫描, 图像合成, 图像生成, 思维链, 提示重写, 文本到图像模型, 模型优化, 深度学习, 用户模式Hook绕过, 系统调用监控, 计算机视觉, 逆向工具