Security-FIT/Latium

GitHub: Security-FIT/Latium

围绕 ROME 方法的大语言模型知识编辑与编辑检测研究框架，提供从权重干预、因果追踪到结构化检测的完整实验 pipeline。

Stars: 2 | Forks: 1

# Latium 框架 ## 快速开始使用 `pipeline.sh` 执行标准的端到端工作流： ``` # 在 GPU 主机上：对一个模型运行 ROME-only smoke benchmark bash pipeline.sh --models gpt2-medium --n 1 --compute-cov # 在 GPU 主机上：structural benchmark + detector/new graph 后处理 bash pipeline.sh --structural --models gpt2-medium --n 1 --compute-cov # 可选：通过 SSH 从另一台机器编排相同的运行 bash pipeline.sh --remote user@gpu-host --structural --models gpt2-medium --n 1 # 显示所有可用的 pipeline 选项 bash pipeline.sh --help ``` `pipeline.sh` 默认在本地运行。默认模式通过 `rome_benchmark.py` 运行仅限 ROME 的基准测试；`--structural` 切换为结构基准测试，并包含检测器和新的图后处理。当你想从另一台机器通过 SSH/tmux 启动相同的工作流时，请添加 `--remote `。结构 pipeline 生成的图会保存在 `pipeline_out//graphs/` 下。 ## 运行 ROME ROME（及相关命令）由 `src/cli.py` 中基于 Hydra 的 CLI 驱动。 **单次干预：** ``` python -m src.cli command=rome model=gpt2-medium ``` **批量评估：** ``` python -m src.cli command=batch-rome model=gpt2-medium ``` **计算二阶矩统计**（在为新模型运行 ROME 前必须执行）： ``` python -m src.cli command=second-moment model=gpt2-medium ``` **在 GPU 主机上进行集群本地冒烟测试：** ``` # 首次冷运行：在本地下载 model/datasets 并构建 second moments python -m src.cli command=second-moment model=gpt2-medium # 单一本地编辑 smoke test python -m src.cli command=rome model=gpt2-medium # 本地 ROME-only benchmark（与 pipeline.sh 默认模式使用的 benchmark 系列相同） python rome_benchmark.py --models gpt2-medium --n-tests 1 --start-idx 0 --output-dir ./analysis_out_local_rome ``` 注意： - 你可以使用其他模型，例如 gpt2-xl、qwen3-4b 等（向下滚动查看完整模型列表） - 除非设置了 `ROME_ALLOW_SECOND_MOMENT_AUTOCOMPUTE=1`，否则 `python -m src.cli command=rome ...` **不会**自动计算缺失的二阶矩。 - 默认情况下，模型下载会缓存在 `../models` 下，数据集下载在 `../datasets` 下，计算出的协方差文件保存在 `./data/second_moment_stats` 下。 - 在 GPU 主机上真正的首次冷启动运行可能会静默几分钟，期间 `command=second-moment` 会下载资源并构建协方差文件。默认配置位于 `src/config/config.yaml`。在命令行中使用 Hydra 语法覆盖任何值（例如 `model=gpt2-large`）。或者，使用控制台回退（无 Hydra 开销）： ``` python -m src.cli --console rome --config src/config/config.yaml ``` ## 运行因果追踪 ``` python -m src.cli command=causal-trace model=gpt2-medium ``` 若要在不运行完整追踪的情况下检查计算出的噪声乘数： ``` python -m src.cli command=compute-multiplier model=gpt2-medium ``` ## 远程协方差 Pipeline `covariance_a100_remote.sh` 用于在远程 GPU 节点（例如 A100）上计算二阶矩统计，并将生成的产物拉取回本地。 ``` # 使用默认 models 运行 (deepseek-7b-base, granite4-micro, llama2-7b, mistral-7b-v0.1, mistral-7b-v0.3)： ./covariance_a100_remote.sh user@gpu-host # 覆盖 models： MODEL_KEYS="gpt2-xl gpt-j-6b" ./covariance_a100_remote.sh user@gpu-host /path/to/Latium optim latium # 参数: [remote_repo_path] [remote_branch] [conda_env] ``` 该脚本会将模型配置和 `src/rome/common.py` 同步到远程，按模型运行协方差计算，并将 `.pt` 产物下载到 `data/second_moment_stats/`。 ## 层选择启发式算法 `src/causal_trace/layer_heuristic.py` 使用多种信号（因果追踪、权重范数、谱隙、架构先验）为 ROME 编辑推荐最佳的 MLP 层。 ``` # 仅 CSV（无需 GPU）： python -m src.causal_trace.layer_heuristic \ --csvs analysis_out/causal_trace_deepseek*.csv \ --num-layers 30 # 完整分析（GPU + model）： python -m src.causal_trace.layer_heuristic \ --model deepseek-ai/deepseek-llm-7b-base \ --layer-template 'model.layers.{}.mlp.down_proj' \ --num-layers 30 \ --csvs analysis_out/causal_trace_deepseek*.csv ``` ## 运行结构基准测试 `structural_benchmark.py` 在整个数据集上应用 ROME 编辑，并在修改后的权重上评估所有结构检测器（MSD、盲测 MSD、频谱、IPR）。结果以 JSON 格式写入 `analysis_out/`。对于事后检测器和 `paper_graphs.ipynb` 使用的轻量级 payload，请运行 `structural_benchmark.py --posthoc-only ...` 或 `structural_benchmark.py --paper ...` （`--analysis-profile paper` 仍是底层的 profile 名称）。 ``` python structural_benchmark.py \ --model gpt2-large \ --n-tests 30 \ --start-idx 0 \ --output-dir ./analysis_out \ --spectral-top-k 50 \ --trim-first-layers 2 \ --trim-last-layers 2 \ --spectral-neighbor-layers 1 ``` 核心参数： | 参数 | 默认值 | 描述 | |---|---|---| | `--model` | `gpt2-large` | 模型名称（必须与 `src/config/model/` 中的配置匹配） | | `--n-tests` | `30` | 需要进行基准测试的 ROME 编辑次数 | | `--start-idx` | `0` | 事实数据集中的起始索引 | | `--output-dir` | `./analysis_out` | JSON 结果文件的输出目录 | | `--spectral-top-k` | `50` | 频谱检测器使用的 Top-K 奇异值 | | `--trim-first-layers` | `2` | 从模型头部排除的层数 | | `--trim-last-layers` | `2` | 从模型尾部排除的层数 | | `--n-prompts` | 自动 | ROME 前缀 prompt 的数量（如省略，则根据模型大小进行缩放） | ## 检测文档检测方法的详细文档位于 `docs/` 目录中： - `docs/structural-docs.md` - 结构检测器指标（L2 差异、相对差异、方向一致性、MSD、IPR 等） - `docs/spectral-docs.md` - 频谱检测器信号以及奇异值 z 分数和比率分数背后的数学原理 ## 重建最终论文图表如果在仓库根目录下存在 `final_n500_bundle/` 产物： ``` bash scripts/bundle_graphs/run_all_graphs.sh --bundle-root final_n500_bundle ``` 在包含已下载 bundle 的目录中执行： ``` bash final_n500_bundle/scripts_for_graphs/run_all_graphs.sh ``` 运行器会重建各模型的论文图表、bundle 摘要图表、窗口化检测器报告、队列图形、产物网格，并刷新 bundle 索引。 ## 模型路线图 | 支持的模型 | 因果追踪 | 权重干预 | 平均 ES (n=500) | 备注 | |-------------------|--------------------|---------------------|---------------------|-------| | gpt2-medium | :heavy_check_mark: | :heavy_check_mark: | 0.988 | 正常 | | gpt2-large | :heavy_check_mark: | :heavy_check_mark: | 0.986 | 正常 | | gpt2-xl | :heavy_check_mark: | :heavy_check_mark: | 0.986 | 正常 | | gpt-j-6b | :heavy_check_mark: | :heavy_check_mark: | 0.996 | 正常 | | qwen3-0.6b | :heavy_check_mark: | :heavy_check_mark: | | | | qwen3-1.7b | :heavy_check_mark: | :heavy_check_mark: | | | | qwen3-4b | :heavy_check_mark: | :heavy_check_mark: | 0.992 | | | qwen3-8b | :heavy_check_mark: | :heavy_check_mark: | 1.000 | | | granite4-micro | :heavy_check_mark: | :heavy_check_mark: | 0.978 | 架构特殊 | | mistral-7b-v0.1 | :heavy_check_mark: | :heavy_check_mark: | 0.948 | | | mistral-7b-v0.3 | :heavy_check_mark: | :heavy_check_mark: | 0.934 | | | llama2-7b | :heavy_check_mark: | :heavy_check_mark: | 0.614 |架构特殊| | falcon-7b | :heavy_check_mark: | :heavy_check_mark: | 0.976 | | | opt-6.7b | :heavy_check_mark: | :heavy_check_mark: | 0.978 | | | deepseek-7b-base | :heavy_check_mark: | :heavy_check_mark: | 0.976 | | | llama3 | | | | 计划中 | | gpt-neo | | | | 计划中 | | qwen2.5 | | | | 计划中 | | baichuan | | | | 计划中 | | chatglm | | | | 计划中 | | t5 | | | | 计划中 | ## Pipeline 脚本 `pipeline.sh` 可运行仅限 ROME 的基准测试或结构基准测试。克隆仓库后，直接在 GPU 主机上运行它；或者传递 `--remote ` 来同步仓库并通过 SSH/tmux 启动所选模式。 ``` # 在一个模型上运行本地 ROME-only smoke benchmark bash pipeline.sh --models gpt2-medium --n 1 --compute-cov # 使用 detector/new graph 处理的本地 structural 运行 bash pipeline.sh --structural --models gpt2-medium --n 1 --compute-cov # 本地 structural 运行，如果存在 bundle，则重建 final-bundle 论文图 bash pipeline.sh --structural --bundle-graphs --bundle-root final_n500_bundle # 带 env 设置的远程运行 bash pipeline.sh --remote ubuntu@132.145.129.234 --setup-env # 远程 structural benchmark，N=1 smoke test bash pipeline.sh --remote user@gpu-host --models gpt2-medium --n 1 --structural # 先计算 covariance，再进行 benchmark bash pipeline.sh --compute-cov --n 10 # 仅特定 models bash pipeline.sh --models gpt2-xl mistral-7b-v0.1 --n 5 ``` 对于结构化运行，当前渲染器的输出位于 `pipeline_out//graphs/`： - `rome_success_metrics/` - 保存的 ROME 指标表、热力图和柱状图 - `detector_stacked_variants/` - 堆叠的 SG/TE 检测器信号面板 - `detector_layer_window/` - 严格和 +/- 窗口检测器层评分 | 标志 | 默认值 | 描述 | |---|---|---| | `--compute-cov` | 关 | 计算协方差矩阵（否则使用现有的） | | `--n ` | 50 | 每个模型的测试编辑次数 | | `--structural` | 关 | 运行结构基准测试，并在 `pipeline_out//graphs/` 下渲染新的单次运行图表集 | | `--bundle-graphs` | 关 | 在结构运行之后，从 `--bundle-root` 重建图表 | | `--bundle-root ` | `final_n500_bundle` | `--bundle-graphs` 使用的最终 bundle 根目录 | | `--setup-env` | 关 | 在远程主机上设置 conda 环境和依赖项 | | `--remote ` | 本地当前主机 | 用于远程执行的 SSH 目标 | | `--models ` | 最终论文模型集 | 覆盖模型列表 | | `--output-dir ` | `./pipeline_out` | 输出目录 | ## 前缀/模板频谱变异性测试 `prefixtest/experiment.py` 用于测量频谱检测 pipeline 对 ROME 编辑期间使用的前缀/模板的敏感程度。系统会在保持所有其他参数不变的情况下，在不同的前缀策略（自生成、基于模板、外部）下对单个事实进行 20 次编辑。频谱检测器会对每个结果进行运算，生成各层的信号曲线，从而揭示哪些前缀会放大或抑制编辑的频谱足迹。额外的一项 **baseline_unedited** 运行会捕获原始（未修改）模型权重上的频谱检测器输出，以便可以将编辑后的曲线与干净的噪声基准进行比较。 ### 运行实验 ``` # 默认: Qwen/Qwen3-8B，case 0 python prefixtest/experiment.py # 自定义 model / case python prefixtest/experiment.py --model gpt2-large --case-idx 3 ``` ### 通过 `prefixtest/run_remote.sh` 在远程 GPU 上运行 `prefixtest/run_remote.sh` 可在远程机器上自动完成上传、环境设置以及基于 tmux 的执行： ``` # 启动（上传代码 + second-moment 统计信息，安装 deps，在 tmux 中启动） ./prefixtest/run_remote.sh # default: Qwen/Qwen3-8B, case 0 ./prefixtest/run_remote.sh gpt2-large 3 # custom model & case # 监控进度 ./prefixtest/run_remote.sh --status # 完成后下载结果 ./prefixtest/run_remote.sh --fetch ``` ### 可视化 notebook `prefixtest/prefixtest.ipynb` 是对 `prefixtest/prefixtest_support.py` 的轻量级封装。它会自动发现 `prefixtest/artifacts/` 或 `analysis_out/` 中的最新产物，将输出写入 `prefixtest/output/`，绘制带有未编辑基准的分组层级频谱曲线，添加复合检测器图表，并显示摘要表。 ``` prefixtest/prefixtest.ipynb # open in Jupyter prefixtest/prefixtest_support.py # all data-loading and plotting logic prefixtest/output/ # saved graphs and summary tables prefixtest/artifacts/ # selected local experiment artifacts ``` ## 错误代码： | 错误代码 | 错误名称 | 描述 | |---------------|-------------------|-------------------------------------------------------------------------------------| | `1` | 帮助 | 调用了帮助。通常是由于脚本使用不当引起的。 | | `2` | 资源已存在 | 尝试创建已存在的资源。 | | `-1` | 未知 | 未知错误。请创建 GitHub Issue 并附上复现步骤。 |

标签：DLL 劫持, Python, ROME, 人工智能, 凭据扫描, 大语言模型, 无后门, 模型编辑, 用户模式Hook绕过, 自动化流水线, 逆向工具