cawalch/node-yara-x
GitHub: cawalch/node-yara-x
该项目是 VirusTotal YARA-X 模式匹配引擎的 Node.js 绑定,让 JavaScript 应用能高效执行基于规则的文件和内存扫描。
Stars: 3 | Forks: 2
# @litko/yara-x
## 功能
- 高性能:基于 [napi-rs](https://napi-rs.com) 和 [VirusTotal/yara-x](https://github.com/VirusTotal/yara-x) 构建
- 异步支持:一等公民式的异步扫描支持
- WASM 编译:将规则编译为 WebAssembly 以实现可移植执行
- 零依赖:无外部运行时依赖
## 用法
### 安装
```
npm install @litko/yara-x
```
### 发布完整性
发布版本构建在 GitHub 托管的 runner 上,并通过可信发布和来源说明发布到 npm。原生的 `.node` 制品也包含在 GitHub 制品证明中。
安装后,验证 npm 注册表签名和来源证明:
```
npm audit signatures
```
根据 GitHub 证明验证下载的原生制品:
```
gh attestation verify path/to/yara-x.*.node -R cawalch/node-yara-x
```
### 基础示例
```
import { compile } from "@litko/yara-x";
// Compile yara rules
const rules = compile(`
rule test_rule {
strings:
$a = "hello world"
condition:
$a
}
`);
// Scan a buffer
const buffer = Buffer.from("This is a test with hello world in it");
const matches = rules.scan(buffer);
// Process matches
if (matches.length > 0) {
console.log(`Found ${matches.length} matching rules:`);
matches.forEach((match) => {
console.log(`- Rule: ${match.ruleIdentifier}`);
match.matches.forEach((stringMatch) => {
console.log(
` * Match at offset ${stringMatch.offset}: ${stringMatch.data}`,
);
});
});
} else {
console.log("No matches found");
}
```
## 扫描文件
```
import { fromFile, compile } from "@litko/yara-x";
import { readFileSync } from "fs";
// Load rules from a file
const rules = fromFile("./rules/malware_rules.yar");
try {
// Scan a file directly
const matches = rules.scanFile("./samples/suspicious_file.exe");
console.log(`Found ${matches.length} matching rules`);
} catch (error) {
console.error(`Scanning error: ${error.message}`);
}
```
## 异步扫描
```
import { compile } from "@litko/yara-x";
async function scanLargeFile() {
const rules = compile(`rule large_file_rule {
strings:
$a = "sensitive data"
condition:
$a
}
`);
try {
// Scan a file asynchronously
const matches = await rules.scanFileAsync("./samples/large_file.bin");
console.log(`Found ${matches.length} matching rules`);
} catch (error) {
console.error(`Async scanning error: ${error.message}`);
}
}
scanLargeFile();
```
## 变量
```
import { compile } from "@litko/yara-x";
// Create a scanner with variables
const rules = compile(
`
rule variable_rule {
condition:
string_var contains "secret" and int_var > 10 and bool_var
}
`,
{
defineVariables: {
string_var: "this is a secret message",
int_var: 20,
bool_var: true,
},
},
);
// Scan with default variables
let matches = rules.scan(Buffer.from("test data"));
console.log(`Matches with default variables: ${matches.length}`);
// Override variables at scan time
matches = rules.scan(Buffer.from("test data"), {
string_var: "no secrets here",
int_var: 5,
bool_var: false,
});
console.log(`Matches with overridden variables: ${matches.length}`);
```
## 命名空间
```
import { compile, create } from "@litko/yara-x";
// Compile a source into a YARA namespace
const rules = compile(
`
rule namespaced_rule {
strings:
$a = "namespace test"
condition:
$a
}
`,
{ namespace: "alpha" },
);
const [match] = rules.scan(Buffer.from("namespace test"));
console.log(match.namespace); // "alpha"
// Add sources incrementally into separate namespaces
const scanner = create();
scanner.addRuleSource('rule shared { strings: $a = "one" condition: $a }', "one");
scanner.addRuleSource('rule shared { strings: $a = "two" condition: $a }', "two");
```
## WASM 编译
```
import { compile, compileToWasm } from "@litko/yara-x";
// Compile rules to WASM
const rule = `
rule wasm_test {
strings:
$a = "compile to wasm"
condition:
$a
}
`;
// Static compilation
compileToWasm(rule, "./output/rules.wasm");
// Or from a compiled rules instance
const compiledRules = compile(rule);
compiledRules.emitWasmFile("./output/instance_rules.wasm");
// Async compilation
await compiledRules.emitWasmFileAsync("./output/async_rules.wasm");
```
## 增量规则构建
```
import { create } from "@litko/yara-x";
// Create an empty scanner
const scanner = create();
// Add rules incrementally
scanner.addRuleSource(`
wrule first_rule {
strings:
$a = "first pattern"
condition:
$a
}
`);
// Add rules from a file
scanner.addRuleFile("./rules/more_rules.yar");
// Add another rule
scanner.addRuleSource(`
rule another_rule {
strings:
$a = "another pattern"
condition:
$a
}
`);
// Now scan with all the rules
const matches = scanner.scan(Buffer.from("test data with first pattern"));
```
## 规则验证
```
import { validate } from "@litko/yara-x";
// Validate rules without executing them
const result = validate(`
rule valid_rule {
strings:
$a = "valid"
condition:
$a
}
`);
if (result.errors.length === 0) {
console.log("Rules are valid!");
} else {
console.error("Rule validation failed:");
result.errors.forEach((error) => {
console.error(`- ${error.code}: ${error.message}`);
});
}
```
## 高级选项
```
import { compile } from "@litko/yara-x";
// Create a scanner with advanced options
const rules = compile(
`
rule advanced_rule {
strings:
$a = /hello[[:space:]]world/ // Using POSIX character class
condition:
$a and test_var > 10
}
`,
{
// Define variables
defineVariables: {
test_var: "20",
},
// Compile these rules into a YARA namespace
namespace: "research",
// Enable relaxed regular expression syntax
relaxedReSyntax: true,
// Enable condition optimization
conditionOptimization: true,
// Ignore specific modules
ignoreModules: ["pe"],
// Error on potentially slow patterns
errorOnSlowPattern: true,
// Error on potentially slow loops
errorOnSlowLoop: true,
// Specify directories for include statements (v1.5.0+)
includeDirectories: ["./rules/includes", "./rules/common"],
// Enable or disable include statements (v1.5.0+)
enableIncludes: true,
},
);
```
## 错误处理
### 编译错误
```
import { compile } from "@litko/yara-x";
try {
// This will throw an error due to invalid syntax
const rules = compile(`
rule invalid_rule {
strings:
$a = "unclosed string
condition:
$a
}
`);
} catch (error) {
console.error(`Compilation error: ${error.message}`);
// Output: Compilation error: error[E001]: syntax error
// --> line:3:28
// |
// 3 | $a = "unclosed string
// | ^ expecting `"`, found end of file
// 278: }
}
```
### 扫描错误
```
import { compile } from "@litko/yara-x";
const rules = compile(`
rule test_rule {
condition:
true
}
`);
try {
// This will throw if the file doesn't exist
rules.scanFile("/path/to/nonexistent/file.bin");
} catch (error) {
console.error(`Scanning error: ${error.message}`);
// Output: Scanning error: Error reading file: No such file or directory (os error 2)
}
```
### 异步错误
```
import { compile, compileToWasm } from "@litko/yara-x";
async function handleAsyncErrors() {
const rules = compile(`
rule test_rule {
condition:
true
}
`);
try {
await rules.scanFileAsync("/path/to/nonexistent/file.bin");
} catch (error) {
console.error(`Async scanning error: ${error.message}`);
}
try {
await compileToWasm(
"rule test { condition: true }",
"/invalid/path/rules.wasm",
);
} catch (error) {
console.error(`WASM compilation error: ${error.message}`);
}
}
handleAsyncErrors();
```
## 编译器警告
```
import { compile } from "@litko/yara-x";
// Create a scanner with a rule that generates warnings
const rules = compile(`
rule warning_rule {
strings:
$a = "unused string"
condition:
true // Warning: invariant expression
}
`);
// Get and display warnings
const warnings = rules.getWarnings();
if (warnings.length > 0) {
console.log("Compiler warnings:");
warnings.forEach((warning) => {
console.log(`- ${warning.code}: ${warning.message}`);
});
}
```
## 包含目录
```
import { compile } from "@litko/yara-x";
// Create a main rule that includes other rules
const mainRule = `
include "common/strings.yar"
include "malware/pe_patterns.yar"
rule main_detection {
condition:
common_string_rule or pe_malware_rule
}
`;
// Compile with include directories
const rules = compile(mainRule, {
includeDirectories: [
"./rules", // Base directory
"./rules/common", // Additional include path
"./rules/malware", // Another include path
],
});
// Scan as usual
const matches = rules.scan(Buffer.from("test data"));
```
## 扫描性能选项
控制扫描行为以获得更好的性能或安全性。
### 限制每个模式的匹配数
通过限制每个模式的匹配数来防止过多的内存消耗:
```
import { compile } from "@litko/yara-x";
const rules = compile(`
rule find_pattern {
strings:
$a = "pattern"
condition:
$a
}
`);
// Limit to 1000 matches per pattern
rules.setMaxMatchesPerPattern(1000);
// Scan data with many occurrences
const data = Buffer.from("pattern ".repeat(10000));
const matches = rules.scan(data);
// Will only return up to 1000 matches per pattern
console.log(`Found ${matches[0].matches.length} matches (limited to 1000)`);
```
### 内存映射文件控制
控制扫描时是否使用内存映射文件:
```
import { compile } from "@litko/yara-x";
const rules = compile(`
rule test {
strings:
$a = "test"
condition:
$a
}
`);
// Disable memory-mapped files for safer scanning
// (slower but safer for untrusted files)
rules.setUseMmap(false);
// Scan file without memory mapping
const matches = rules.scanFile("./sample.bin");
```
### 扫描超时
为扫描操作设置超时时间以防止扫描失控:
```
import { compile } from "@litko/yara-x";
const rules = compile(`
rule slow_rule {
strings:
$a = /(.+)*\1/
condition:
$a
}
`);
// Set a 5-second timeout
rules.setTimeout(5000);
const matches = rules.scan(Buffer.from("test data"));
```
### 匹配上下文
你可以通过设置匹配上下文大小来获取匹配周围的字节。这对于在更广泛的上下文中分析匹配非常有用。
```
import { compile } from "@litko/yara-x";
const rules = compile(`
rule context_rule {
strings:
$a = "secret"
condition:
$a
}
`);
// Request 10 bytes of context before and after the match
rules.setMatchContextSize(10);
const data = Buffer.from("this is a top secret document containing sensitive info");
const matches = rules.scan(data);
if (matches.length > 0) {
matches[0].matches.forEach(match => {
console.log(`Match: ${match.data}`);
// "secret"
console.log(`Context: ${match.contextData}`);
// " a top secret document"
console.log(`Match offset within context: ${match.contextMatchOffset}`);
});
}
```
## 性能基准测试
`node-yara-x` 通过智能的 scanner 缓存和优化的 Rust 实现提供了卓越的性能。
### 基准测试结果
测试环境:MacBook Pro M3 Max,36GB RAM,启用 LTO 的 Release 构建
方法论:跨多次迭代的统计分析及百分位报告
#### Scanner 创建性能
| 规则类型 | 平均值 | p50 | p95 | p99 |
| -------------- | ------ | ------ | ------ | ------ |
| 简单规则 (Simple Rule) | 2.43ms | 2.41ms | 2.87ms | 3.11ms |
| 复杂规则 (Complex Rule) | 2.57ms | 2.52ms | 2.96ms | 3.06ms |
| 正则规则 (Regex Rule) | 7.57ms | 7.47ms | 8.29ms | 8.70ms |
| 多重规则 (Multiple Rules)| 2.05ms | 2.03ms | 2.24ms | 2.42ms |
#### 按数据大小划分的扫描性能
| 数据大小 | 规则类型 | 平均值 | 吞吐量 |
| --------- | -------------- | ----- | ---------- |
| 64 bytes | 简单 (Simple) | 3μs | ~21 MB/s |
| 100KB | 简单 (Simple) | 6μs | ~16.7 GB/s |
| 100KB | 复杂 (Complex) | 73μs | ~1.4 GB/s |
| 100KB | 正则 (Regex) | 7μs | ~14.3 GB/s |
| 100KB | 多重 (Multiple Rules) | 73μs | ~1.4 GB/s |
| 10MB | 简单 (Simple) | 204μs | ~49 GB/s |
#### 高级功能性能
| 功能 | 平均值 | 备注 |
| ---------------- | ---- | -------------------------- |
| 变量扫描 (Variable Scanning) | 1μs | 预编译的变量 |
| 运行时变量 (Runtime Variables)| 2μs | 在扫描时设置的变量 |
| 异步扫描 (Async Scanning) | 11μs | 非阻塞操作 |
## API 参考
### 函数
- `compile(ruleSource: string, options?: CompilerOptions)` - 从字符串编译 yara 规则。
- `compileToWasm(ruleSource: string, outputPath: string, options?: CompilerOptions)` - 将字符串形式的 yara 规则编译为 WASM 文件。
- `compileFileToWasm(rulesPath: string, outputPath: string, options?: CompilerOptions)` - 将文件形式的 yara 规则编译为 WASM 文件。
- `validate(ruleSource: string, options?: CompilerOptions)` - 验证但不执行 yara 规则。
- `create()` - 创建一个空的规则 scanner 以增量添加规则。
- `fromFile(rulePath: string, options?: CompilerOptions)` - 从文件编译 yara 规则。
### YaraX 方法
- `getWarnings()` - 获取编译器警告。
- `scan(data: Buffer, variables?: Record)` - 扫描 buffer。
- `scanFile(filePath: string, variables?: Record)` - 扫描文件。
- `scanAsync(data: Buffer, variables?: Record)` - 异步扫描 buffer。
- `scanFileAsync(filePath: string, variables?: Record)` - 异步扫描文件。
- `emitWasmFile(filePath: string)` - 同步将编译后的规则输出为 WASM 文件。
- `emitWasmFileAsync(filePath: string)` - 异步将编译后的规则输出为 WASM 文件。
- `addRuleSource(rules: string)` - 从字符串向现有 scanner 添加规则。
- `addRuleFile(filePath: string)` - 从文件向现有 scanner 添加规则。
- `defineVariable(name: string, value: string)` - 为 YARA 编译器定义变量。
- `setMaxMatchesPerPattern(maxMatches: number)` - 设置每个模式的最大匹配数。
- `setUseMmap(useMmap: boolean)` - 启用或禁用用于扫描的内存映射文件。
- `setTimeout(timeoutMs: number)` - 以毫秒为单位设置扫描超时。
- `setMatchContextSize(size: number)` - 设置在匹配字符串周围获取的上下文字节数。
### CompilerOptions
- `defineVariables?: object` - 为 YARA 规则定义全局变量。
- `ignoreModules?: string[]` - 编译期间要忽略的模块名列表。
- `bannedModules?: BannedModule[]` - 禁止使用的模块列表。
- `features?: string[]` - 为 YARA 规则启用的功能列表。
- `relaxedReSyntax?: boolean` - 使用宽松的正则表达式语法。
- `conditionOptimization?: boolean` - 优化 YARA 规则中的条件。
- `errorOnSlowPattern?: boolean` - 遇到慢速模式时报错。
- `errorOnSlowLoop?: boolean` - 遇到慢速循环时报错。
- `includeDirectories?: string[]` - **(v1.5.0+)** 编译器查找包含文件的目录。
- `enableIncludes?: boolean` - **(v1.5.0+)** 在 YARA 规则中启用或禁用 include 语句。
## 许可证
本项目包含受两个不同许可证约束的代码:
- **MIT 许可证:**
- Node.js 绑定以及特定于本模块的其他代码均在 MIT 许可证下授权。
- 完整文本请参见 `LICENSE-MIT`。
- **BSD-3-Clause 许可证:**
- 包含的 yara-x 库在 BSD-3-Clause 许可证下授权。
- 完整文本请参见 `LICENSE-BSD-3-Clause`。
标签:AI工具, GNU通用公共许可证, MITM代理, Node.js, Rust, YARA, 云资产可视化, 可视化界面, 网络流量审计, 自定义脚本, 规则匹配引擎