ajsb85/humanify-skill
GitHub: ajsb85/humanify-skill
一个基于 Babel AST 作用域安全重命名与 AI 模型语义命名的 JavaScript 去混淆/反压缩工具,作为 Claude Code 技能运行。
Stars: 1 | Forks: 0
# humanify-deobfuscate
[](https://github.com/ajsb85/humanify-deobfuscate/actions/workflows/ci.yml)
[](https://github.com/ajsb85/humanify-deobfuscate/releases)
[](LICENSE)
一个 [Claude Code](https://claude.com/claude-code) **agent skill**,它能让被压缩、打包或混淆的 JavaScript 恢复可读性——通过 AST 实现**作用域安全**的标识符重命名,同时由模型提供名称。它能触及普通重命名无法触及的部分:webpack 的导出 ID(`.a`/`.b`)以及 TypeScript 的类 IIFE 构造函数,并通过确定性的主干和用于大型 bundle 的收敛循环来完成。
它是 [humanify](https://github.com/jehna/humanify) 技术的从零开始的重新实现:脚本负责*正确性*(Babel 的 `scope.rename`),模型负责*命名*。
## 为什么会有这个项目
手动或通过查找替换来重命名被压缩的 JS 会悄无声息地破坏代码——被遮蔽的变量会合并,名称会发生冲突。这个 skill 将每次重命名都通过 AST 进行路由,因此它始终是安全的,并且它已经在一个真实的 **1 MB / 124 个模块 / 约 10k 个标识符的 webpack bundle** 上进行了端到端的运行,将单字母标识符 token 减少了约 63%,同时保持输出的逐字节行为等价。
## 安装
需要 [Node.js](https://nodejs.org/)。使用 [skills CLI](https://skills.sh) 进行安装:
```
#### npx skills add ajsb85/humanify-deobfuscate -g
Or clone manually into your agent's skills directory (`~/.claude/skills/` for Claude Code),
then install the Babel dependencies:
```bash
#### cd humanify-deobfuscate/scripts && npm install
## 使用它
Just ask your agent in natural language — the skill triggers on minified `.js`,
"deobfuscate", "un-minify", "make this readable", "what does this minified code do", etc.
For example:
> "Deobfuscate `vendor.min.js` and tell me what the `parseConfig` module does."
### 单个文件
```bash
node scripts/extract_identifiers.mjs input.js > identifiers.json # AST → bindings + context
# (模型将其命名为 renames.json,以字节偏移量为键)
node scripts/apply_renames.mjs input.js renames.json readable.js # scope-safe rename
#### node --check readable.js
### Webpack / browserify bundles
```bash
node scripts/split_bundle.mjs bundle.js list # survey modules
node scripts/split_bundle.mjs bundle.js dump work 73 102 # dump per module
# (每个模块一个 agent 命名其绑定 -> work/mod_.renames.json)
node scripts/split_bundle.mjs bundle.js merge work renames_all.json
#### node scripts/finish_bundle.mjs bundle.js renames_all.json readable.js # deterministic backbone
See [`SKILL.md`](SKILL.md) for the full workflow, including the parallel-agent scaling
pattern and the convergence loop.
## 内容
| Script | Role |
|---|---|
| `extract_identifiers.mjs` | AST → renameable bindings + context, keyed by byte offset |
| `apply_renames.mjs` | scope-safe rename (`scope.rename`) + codegen |
| `split_bundle.mjs` | bundle mode: `list` / `dump` / `chunk` / `merge` |
| `rename_exports.mjs` | post-pass: webpack export-ids (`.a`/`.b` → real names) |
| `detect_ts_classes.mjs` | post-pass: TS class-IIFE inner ctors + super params |
| `extract_remaining.mjs` | convergence loop: surface remaining nameable bindings |
| `finish_bundle.mjs` / `.sh` | one-command deterministic backbone |
## 工作原理
The model never edits code — it only suggests one name at a time, and an AST applies it
across the binding's whole scope. The rest of this document is the full theory,
architecture, and runbooks. See also [`references/algorithm.md`](references/algorithm.md)
## 以及 [`references/agent-prompt.md`](references/agent-prompt.md)。
## 架构
": "", ... }
node scripts/apply_renames.mjs input.js renames.json readable.js # apply_ρ (α-safe)
#### node --check readable.js # 验证
**Bundle, end to end**
```bash
node scripts/split_bundle.mjs bundle.js list # survey modules (size, bindings, domain hits)
node scripts/split_bundle.mjs bundle.js dump work 73 102 … # per-module {src,bindings}
node scripts/split_bundle.mjs bundle.js chunk work 400 73 # split giant modules
# 每个模块/chunk 一个 agent → work/mod_.renames.json (ρ_k)
node scripts/split_bundle.mjs bundle.js merge work renames_all.json # ρ = ⨆ ρ_k
#### node scripts/finish_bundle.mjs bundle.js renames_all.json readable.js # apply_ρ ▸ exp ▸ ts ▸ verify
**Convergence loop** (repeat until a round names $<\epsilon$):
```bash
node scripts/rename_exports.mjs readable.js readable.js
node scripts/detect_ts_classes.mjs readable.js > ts.json
node scripts/apply_renames.mjs readable.js ts.json readable.js
#### node scripts/extract_remaining.mjs readable.js > remaining.json # S_j (命名真实的,跳过 scaffolding)
## skill 如何触发
`SKILL.md` carries YAML frontmatter — a kebab-case `name` and a trigger-rich
`description` — which Claude indexes. Progressive disclosure keeps the body lean and
defers detail to `references/` and the scripts. The skill fires on requests like
"deobfuscate this bundle," "make this minified file readable," or "what does
`function a(e,t){…}` do," then drives the procedures above. Install:
```bash
#### npx skills add ajsb85/humanify-deobfuscate -g
## License
[MIT](LICENSE) © Alexander Salas Bastidas
```
####
The design separates **correctness** (owned by deterministic AST scripts) from **naming
judgment** (owned by the model). Everything below makes that split precise.
## 理论:作为 α-conversion 的去混淆
Let a source program be a string $P$. Parsing yields an abstract syntax tree $T$ and,
via scope analysis, a finite set of **bindings** $B=\lbrace b_1,\dots,b_n\rbrace $. Each binding $b$
carries a declaration offset $\text{pos}(b)\in\mathbb{N}$ (unique), a scope $\sigma(b)$, a
set of reference sites $R(b)$, and a current name $\text{name}(b)\in\Sigma^*$.
A **rename map** is a partial function keyed by offset,
$$ \rho:\ \mathbb{N}\ \rightharpoonup\ \Sigma^*, $$
where $\rho(\text{pos}(b))$ is the new name of $b$ (undefined ⇒ unchanged). Applying $\rho$
produces $P'$. The soundness guarantee is **α-equivalence**:
$$ P'\ \equiv_\alpha\ P \quad\Longrightarrow\quad \mathrm{eval}(P')=\mathrm{eval}(P), $$
i.e. the program is identical up to consistent renaming of bound identifiers, so its
observable behavior is unchanged. This holds because each binding is renamed *together
with its whole reference set* $R(b)$ inside $\sigma(b)$, and capture is avoided (below).
**Why text replacement is unsound.** A naive substitution $\text{sub}(P,a,a')$ replaces
every textual occurrence of the token $a$. Under shadowing — two bindings $b_1,b_2$ both
named $a$ with $\sigma(b_2)\subsetneq\sigma(b_1)$ — substitution cannot separate them and
conflates $R(b_1)\cup R(b_2)$, violating α-equivalence. We therefore key on **binding
identity** $b$ (its offset), never on the token.
**Capture avoidance.** When applying $\rho(\text{pos}(b))=t$, if $t$ is already visible in
$\sigma(b)$ the rename would capture. Writing $p$ for the underscore prefix (`_`), define
the freshening operator over the names visible in the scope, $\mathrm{vis}(\sigma)$:
$$ \nu_\sigma(t)=t\ \text{ if }\ t\notin\mathrm{vis}(\sigma),\qquad \nu_\sigma(p\ t)\ \text{ otherwise.} $$
— i.e. prepend `_` until free. This makes the applied name unique in scope, preserving
α-equivalence. (`apply_renames.mjs` realizes $\nu$ via Babel's `scope.generateUid`.)
**Normalization.** Model output is arbitrary text; a normalizer maps it to a legal
identifier $\text{norm}:\Sigma^*\to\mathrm{Id}$, where $\mathrm{Id}$ is the set of strings
matching `[A-Za-z_$][\w$]*` that are not reserved words — collapsing separators to
camelCase, dropping illegal characters, and prefixing `_` when the result starts with a
digit or is a reserved word.
## 核心算法(单个文件)
**1 — Order largest-scope-first.** With scope span $|\sigma(b)|=e_\sigma-s_\sigma$ (bytes),
bindings are processed under
$$ b_i \prec b_j \iff |\sigma(b_i)|>|\sigma(b_j)|\ \lor\ \big(|\sigma(b_i)|=|\sigma(b_j)|\ \wedge\ \text{pos}(b_i)<\text{pos}(b_j)\big). $$
Outer, longer-lived names are decided first and inform the inner ones.
**2 — Context window.** Each binding is presented with a windowed slice of its scope,
budgeted to $c$ characters (default $c=1500$) and centered on the identifier:
$$ w(b)=P[\ell:r],\quad [\ell,r]=[s_\sigma,e_\sigma]\ \text{ if }\ e_\sigma-s_\sigma\le c,\quad \text{else}\ \big[\text{pos}(b)-\tfrac{c}{2},\ \text{pos}(b)+\tfrac{c}{2}\big]. $$
clamped to the scope and snapped to UTF‑8 boundaries.
**3 — Name, 4 — normalize, 5 — freshen, 6 — apply.** The model returns $\rho$; each entry
is normalized by $\text{norm}$, freshened by $\nu_\sigma$, and applied to the binding and
all of $R(b)$ via the scope model, then code is regenerated. Parsing and codegen are
linear, $O(|P|)$; the full pass is $O(|P|+\sum_b |R(b)|)$.
## Bundle 模式
A webpack/browserify bundle is one file whose module array exposes functions
$m_0,\dots,m_{K-1}$ over byte ranges $[s_k,e_k)$ that **partition** the binding offsets:
$$ \forall\ k\ne k':\ [s_k,e_k)\cap[s_{k'},e_{k'})=\varnothing . $$
Each module is named independently into a map $\rho_k$ with
$\mathrm{dom}(\rho_k)\subseteq[s_k,e_k)$. Disjoint ranges + injective $\text{pos}$ make the
merge a **disjoint union**, conflict-free in any order:
$$ \rho=\bigsqcup_{k=0}^{K-1}\rho_k,\qquad \mathrm{dom}(\rho_k)\cap\mathrm{dom}(\rho_{k'})=\varnothing\ (k\ne k'). $$
This is exactly what makes the *one-agent-per-module* fan-out safe to run fully in
parallel — see [`references/agent-prompt.md`](references/agent-prompt.md).
**Webpack export-ids (`rename_exports.mjs`).** Webpack mangles each module's exports to
single letters via `__webpack_require__.d(exports, "a", () => X)`. Define the per-module
export map
$$ E_k(\ell)=\text{name}(X)\quad\text{from } \texttt{r.d(exports, }\ell\texttt{, () => X)} . $$
For any import binding $v=r(k)$, a member access $v.\ell$ provably denotes export $\ell$ of
module $k$, so it is rewritten $v.\ell\mapsto v.E_k(\ell)$. These are *property* names, not
bindings, so this is a distinct deterministic pass (only member accesses on confirmed
import bindings are touched, so real `point.x` is never altered).
**TypeScript class IIFEs (`detect_ts_classes.mjs`).** `tsc` lowers `class X extends B {}`
to `var X = (function (_super) { __extends(e, _super); function e(){} return e })(B)`. The
holder `X` gets named but the inner constructor `e` and `_super` stay cryptic. The pass
recognizes this shape and emits $\rho$-entries (inner ctor ↦ holder name, super param ↦
`_super`) applied through the same α-safe machinery.
## The convergence loop
After the deterministic passes, let $S_j\subseteq B$ be the **nameable short** bindings
remaining after round $j$ (short ∧ meaningful, excluding the scaffolding set $F$ of
compiler temporaries). Each round names a nonempty subset, giving a strictly decreasing
chain bounded below by $F$:
$$ S_0\ \supsetneq\ S_1\ \supsetneq\ \cdots\ \supseteq\ F . $$
Because $B$ is finite the chain stabilizes — $\exists J,\ \forall j\ge J:\ S_j=F$. In
practice we stop at the first round whose gain falls below a threshold:
$$ |S_j|-|S_{j+1}|<\epsilon\qquad(\epsilon\approx 5). $$
**Coverage.** With $\rho(\text{pos}(b))=\bot$ meaning "unnamed,"
$$ \text{coverage}=\frac{\big|\lbrace \ b\in B:\rho(\text{pos}(b))\ne\bot\ \rbrace \big|}{|B|}. $$
On a real TypeScript-compiled bundle this saturates near $0.6\text{–}0.7$; the residual
$\approx 1-\text{coverage}$ is $F$ (iterator-protocol scratch, `__extends`/`__values`
temporaries) and is intentionally left unnamed — naming it reduces readability.
**The whole pipeline** is the composition
$$ \text{readable}=\big(\ \text{ts}\circ\text{exp}\circ\text{apply}_\rho\ \big)(\text{bundle}), $$
with $\text{apply}_\rho$ the α-safe binding rename, $\text{exp}$ the export-id pass, and
$\text{ts}$ the class-IIFE pass — exactly what `finish_bundle.mjs` runs in order.
## Scripts 参考
| Script | Input → Output | Role |
| --- | --- | --- |
| `extract_identifiers.mjs` | `file.js` → `{identifiers:[{pos,name,kind,line,context}]}` | build $B$ with $w(b)$, ordered by $\prec$ |
| `apply_renames.mjs` | `file.js`, $\rho$ → `out.js` | α-safe apply: `scope.rename` + freshening $\nu_\sigma$ + codegen |
| `split_bundle.mjs` | `bundle.js` `list\|dump\|chunk\|merge` | module survey / per-module dump / chunking / disjoint merge $\bigsqcup\rho_k$ |
| `rename_exports.mjs` | `file.js` → `file.js` | rewrite $v.\ell\mapsto v.E_k(\ell)$ (webpack export-ids) |
| `detect_ts_classes.mjs` | `file.js` → $\rho$ (stdout) | emit class-IIFE rename map |
| `extract_remaining.mjs` | `file.js` → `{remaining:[…]}` | surface $S_j$ (nameable short bindings, scaffolding filtered) |
| `finish_bundle.mjs` / `.sh` | `bundle.js`, $\rho$ → `readable.js` | run $\text{ts}\circ\text{exp}\circ\text{apply}_\rho$ + verify + report |
All scripts are pure Node + Babel (`@babel/parser`, `@babel/traverse`, `@babel/generator`)
with zero native binaries, so they run identically on Linux/macOS/Windows.
## 过程
**Single file**
```bash
node scripts/extract_identifiers.mjs input.js > identifiers.json # build B + context
# model 写入 renames.json : { "pipeline 的文本版本
bundle.js ──▶ parse (Babel) ──▶ AST T + scope tree ──▶ bindings B (keyed by byte offset) │ ┌─────────────────────────────────────────────────────┘ ▼ split_bundle.mjs list · dump · chunk ──▶ per-module { src, bindings } │ ▼ naming (you / parallel agents) ──▶ mod_k.renames.json (a map ρ_k) │ ▼ split_bundle.mjs merge ρ = ⨆ₖ ρ_k ──▶ renames_all.json │ ▼ finish_bundle.mjs apply_ρ ▸ rename_exports ▸ detect_ts_classes ▸ verify │ (α-safe binding) (export ids) (class IIFEs) ▼ readable.js + readable.js.remaining.json │ ▼ #### convergence loop extract_remaining ▸ name ▸ apply (重复直到 gain < ε)标签:Babel, Claude Code, CMS安全, JavaScript, MITM代理, SOC Prime, 代码混淆还原, 响应拦截, 开发工具, 数据可视化, 自动化payload嵌入, 自定义脚本