cawalch/node-yara-x

GitHub: cawalch/node-yara-x

该项目是 VirusTotal YARA-X 模式匹配引擎的 Node.js 绑定,让 JavaScript 应用能高效执行基于规则的文件和内存扫描。

Stars: 3 | Forks: 2

# @litko/yara-x ## 功能 - 高性能:基于 [napi-rs](https://napi-rs.com) 和 [VirusTotal/yara-x](https://github.com/VirusTotal/yara-x) 构建 - 异步支持:一等公民式的异步扫描支持 - WASM 编译:将规则编译为 WebAssembly 以实现可移植执行 - 零依赖:无外部运行时依赖 ## 用法 ### 安装 ``` npm install @litko/yara-x ``` ### 发布完整性 发布版本构建在 GitHub 托管的 runner 上,并通过可信发布和来源说明发布到 npm。原生的 `.node` 制品也包含在 GitHub 制品证明中。 安装后,验证 npm 注册表签名和来源证明: ``` npm audit signatures ``` 根据 GitHub 证明验证下载的原生制品: ``` gh attestation verify path/to/yara-x.*.node -R cawalch/node-yara-x ``` ### 基础示例 ``` import { compile } from "@litko/yara-x"; // Compile yara rules const rules = compile(` rule test_rule { strings: $a = "hello world" condition: $a } `); // Scan a buffer const buffer = Buffer.from("This is a test with hello world in it"); const matches = rules.scan(buffer); // Process matches if (matches.length > 0) { console.log(`Found ${matches.length} matching rules:`); matches.forEach((match) => { console.log(`- Rule: ${match.ruleIdentifier}`); match.matches.forEach((stringMatch) => { console.log( ` * Match at offset ${stringMatch.offset}: ${stringMatch.data}`, ); }); }); } else { console.log("No matches found"); } ``` ## 扫描文件 ``` import { fromFile, compile } from "@litko/yara-x"; import { readFileSync } from "fs"; // Load rules from a file const rules = fromFile("./rules/malware_rules.yar"); try { // Scan a file directly const matches = rules.scanFile("./samples/suspicious_file.exe"); console.log(`Found ${matches.length} matching rules`); } catch (error) { console.error(`Scanning error: ${error.message}`); } ``` ## 异步扫描 ``` import { compile } from "@litko/yara-x"; async function scanLargeFile() { const rules = compile(`rule large_file_rule { strings: $a = "sensitive data" condition: $a } `); try { // Scan a file asynchronously const matches = await rules.scanFileAsync("./samples/large_file.bin"); console.log(`Found ${matches.length} matching rules`); } catch (error) { console.error(`Async scanning error: ${error.message}`); } } scanLargeFile(); ``` ## 变量 ``` import { compile } from "@litko/yara-x"; // Create a scanner with variables const rules = compile( ` rule variable_rule { condition: string_var contains "secret" and int_var > 10 and bool_var } `, { defineVariables: { string_var: "this is a secret message", int_var: 20, bool_var: true, }, }, ); // Scan with default variables let matches = rules.scan(Buffer.from("test data")); console.log(`Matches with default variables: ${matches.length}`); // Override variables at scan time matches = rules.scan(Buffer.from("test data"), { string_var: "no secrets here", int_var: 5, bool_var: false, }); console.log(`Matches with overridden variables: ${matches.length}`); ``` ## 命名空间 ``` import { compile, create } from "@litko/yara-x"; // Compile a source into a YARA namespace const rules = compile( ` rule namespaced_rule { strings: $a = "namespace test" condition: $a } `, { namespace: "alpha" }, ); const [match] = rules.scan(Buffer.from("namespace test")); console.log(match.namespace); // "alpha" // Add sources incrementally into separate namespaces const scanner = create(); scanner.addRuleSource('rule shared { strings: $a = "one" condition: $a }', "one"); scanner.addRuleSource('rule shared { strings: $a = "two" condition: $a }', "two"); ``` ## WASM 编译 ``` import { compile, compileToWasm } from "@litko/yara-x"; // Compile rules to WASM const rule = ` rule wasm_test { strings: $a = "compile to wasm" condition: $a } `; // Static compilation compileToWasm(rule, "./output/rules.wasm"); // Or from a compiled rules instance const compiledRules = compile(rule); compiledRules.emitWasmFile("./output/instance_rules.wasm"); // Async compilation await compiledRules.emitWasmFileAsync("./output/async_rules.wasm"); ``` ## 增量规则构建 ``` import { create } from "@litko/yara-x"; // Create an empty scanner const scanner = create(); // Add rules incrementally scanner.addRuleSource(` wrule first_rule { strings: $a = "first pattern" condition: $a } `); // Add rules from a file scanner.addRuleFile("./rules/more_rules.yar"); // Add another rule scanner.addRuleSource(` rule another_rule { strings: $a = "another pattern" condition: $a } `); // Now scan with all the rules const matches = scanner.scan(Buffer.from("test data with first pattern")); ``` ## 规则验证 ``` import { validate } from "@litko/yara-x"; // Validate rules without executing them const result = validate(` rule valid_rule { strings: $a = "valid" condition: $a } `); if (result.errors.length === 0) { console.log("Rules are valid!"); } else { console.error("Rule validation failed:"); result.errors.forEach((error) => { console.error(`- ${error.code}: ${error.message}`); }); } ``` ## 高级选项 ``` import { compile } from "@litko/yara-x"; // Create a scanner with advanced options const rules = compile( ` rule advanced_rule { strings: $a = /hello[[:space:]]world/ // Using POSIX character class condition: $a and test_var > 10 } `, { // Define variables defineVariables: { test_var: "20", }, // Compile these rules into a YARA namespace namespace: "research", // Enable relaxed regular expression syntax relaxedReSyntax: true, // Enable condition optimization conditionOptimization: true, // Ignore specific modules ignoreModules: ["pe"], // Error on potentially slow patterns errorOnSlowPattern: true, // Error on potentially slow loops errorOnSlowLoop: true, // Specify directories for include statements (v1.5.0+) includeDirectories: ["./rules/includes", "./rules/common"], // Enable or disable include statements (v1.5.0+) enableIncludes: true, }, ); ``` ## 错误处理 ### 编译错误 ``` import { compile } from "@litko/yara-x"; try { // This will throw an error due to invalid syntax const rules = compile(` rule invalid_rule { strings: $a = "unclosed string condition: $a } `); } catch (error) { console.error(`Compilation error: ${error.message}`); // Output: Compilation error: error[E001]: syntax error // --> line:3:28 // | // 3 | $a = "unclosed string // | ^ expecting `"`, found end of file // 278: } } ``` ### 扫描错误 ``` import { compile } from "@litko/yara-x"; const rules = compile(` rule test_rule { condition: true } `); try { // This will throw if the file doesn't exist rules.scanFile("/path/to/nonexistent/file.bin"); } catch (error) { console.error(`Scanning error: ${error.message}`); // Output: Scanning error: Error reading file: No such file or directory (os error 2) } ``` ### 异步错误 ``` import { compile, compileToWasm } from "@litko/yara-x"; async function handleAsyncErrors() { const rules = compile(` rule test_rule { condition: true } `); try { await rules.scanFileAsync("/path/to/nonexistent/file.bin"); } catch (error) { console.error(`Async scanning error: ${error.message}`); } try { await compileToWasm( "rule test { condition: true }", "/invalid/path/rules.wasm", ); } catch (error) { console.error(`WASM compilation error: ${error.message}`); } } handleAsyncErrors(); ``` ## 编译器警告 ``` import { compile } from "@litko/yara-x"; // Create a scanner with a rule that generates warnings const rules = compile(` rule warning_rule { strings: $a = "unused string" condition: true // Warning: invariant expression } `); // Get and display warnings const warnings = rules.getWarnings(); if (warnings.length > 0) { console.log("Compiler warnings:"); warnings.forEach((warning) => { console.log(`- ${warning.code}: ${warning.message}`); }); } ``` ## 包含目录 ``` import { compile } from "@litko/yara-x"; // Create a main rule that includes other rules const mainRule = ` include "common/strings.yar" include "malware/pe_patterns.yar" rule main_detection { condition: common_string_rule or pe_malware_rule } `; // Compile with include directories const rules = compile(mainRule, { includeDirectories: [ "./rules", // Base directory "./rules/common", // Additional include path "./rules/malware", // Another include path ], }); // Scan as usual const matches = rules.scan(Buffer.from("test data")); ``` ## 扫描性能选项 控制扫描行为以获得更好的性能或安全性。 ### 限制每个模式的匹配数 通过限制每个模式的匹配数来防止过多的内存消耗: ``` import { compile } from "@litko/yara-x"; const rules = compile(` rule find_pattern { strings: $a = "pattern" condition: $a } `); // Limit to 1000 matches per pattern rules.setMaxMatchesPerPattern(1000); // Scan data with many occurrences const data = Buffer.from("pattern ".repeat(10000)); const matches = rules.scan(data); // Will only return up to 1000 matches per pattern console.log(`Found ${matches[0].matches.length} matches (limited to 1000)`); ``` ### 内存映射文件控制 控制扫描时是否使用内存映射文件: ``` import { compile } from "@litko/yara-x"; const rules = compile(` rule test { strings: $a = "test" condition: $a } `); // Disable memory-mapped files for safer scanning // (slower but safer for untrusted files) rules.setUseMmap(false); // Scan file without memory mapping const matches = rules.scanFile("./sample.bin"); ``` ### 扫描超时 为扫描操作设置超时时间以防止扫描失控: ``` import { compile } from "@litko/yara-x"; const rules = compile(` rule slow_rule { strings: $a = /(.+)*\1/ condition: $a } `); // Set a 5-second timeout rules.setTimeout(5000); const matches = rules.scan(Buffer.from("test data")); ``` ### 匹配上下文 你可以通过设置匹配上下文大小来获取匹配周围的字节。这对于在更广泛的上下文中分析匹配非常有用。 ``` import { compile } from "@litko/yara-x"; const rules = compile(` rule context_rule { strings: $a = "secret" condition: $a } `); // Request 10 bytes of context before and after the match rules.setMatchContextSize(10); const data = Buffer.from("this is a top secret document containing sensitive info"); const matches = rules.scan(data); if (matches.length > 0) { matches[0].matches.forEach(match => { console.log(`Match: ${match.data}`); // "secret" console.log(`Context: ${match.contextData}`); // " a top secret document" console.log(`Match offset within context: ${match.contextMatchOffset}`); }); } ``` ## 性能基准测试 `node-yara-x` 通过智能的 scanner 缓存和优化的 Rust 实现提供了卓越的性能。 ### 基准测试结果 测试环境:MacBook Pro M3 Max,36GB RAM,启用 LTO 的 Release 构建 方法论:跨多次迭代的统计分析及百分位报告 #### Scanner 创建性能 | 规则类型 | 平均值 | p50 | p95 | p99 | | -------------- | ------ | ------ | ------ | ------ | | 简单规则 (Simple Rule) | 2.43ms | 2.41ms | 2.87ms | 3.11ms | | 复杂规则 (Complex Rule) | 2.57ms | 2.52ms | 2.96ms | 3.06ms | | 正则规则 (Regex Rule) | 7.57ms | 7.47ms | 8.29ms | 8.70ms | | 多重规则 (Multiple Rules)| 2.05ms | 2.03ms | 2.24ms | 2.42ms | #### 按数据大小划分的扫描性能 | 数据大小 | 规则类型 | 平均值 | 吞吐量 | | --------- | -------------- | ----- | ---------- | | 64 bytes | 简单 (Simple) | 3μs | ~21 MB/s | | 100KB | 简单 (Simple) | 6μs | ~16.7 GB/s | | 100KB | 复杂 (Complex) | 73μs | ~1.4 GB/s | | 100KB | 正则 (Regex) | 7μs | ~14.3 GB/s | | 100KB | 多重 (Multiple Rules) | 73μs | ~1.4 GB/s | | 10MB | 简单 (Simple) | 204μs | ~49 GB/s | #### 高级功能性能 | 功能 | 平均值 | 备注 | | ---------------- | ---- | -------------------------- | | 变量扫描 (Variable Scanning) | 1μs | 预编译的变量 | | 运行时变量 (Runtime Variables)| 2μs | 在扫描时设置的变量 | | 异步扫描 (Async Scanning) | 11μs | 非阻塞操作 | ## API 参考 ### 函数 - `compile(ruleSource: string, options?: CompilerOptions)` - 从字符串编译 yara 规则。 - `compileToWasm(ruleSource: string, outputPath: string, options?: CompilerOptions)` - 将字符串形式的 yara 规则编译为 WASM 文件。 - `compileFileToWasm(rulesPath: string, outputPath: string, options?: CompilerOptions)` - 将文件形式的 yara 规则编译为 WASM 文件。 - `validate(ruleSource: string, options?: CompilerOptions)` - 验证但不执行 yara 规则。 - `create()` - 创建一个空的规则 scanner 以增量添加规则。 - `fromFile(rulePath: string, options?: CompilerOptions)` - 从文件编译 yara 规则。 ### YaraX 方法 - `getWarnings()` - 获取编译器警告。 - `scan(data: Buffer, variables?: Record)` - 扫描 buffer。 - `scanFile(filePath: string, variables?: Record)` - 扫描文件。 - `scanAsync(data: Buffer, variables?: Record)` - 异步扫描 buffer。 - `scanFileAsync(filePath: string, variables?: Record)` - 异步扫描文件。 - `emitWasmFile(filePath: string)` - 同步将编译后的规则输出为 WASM 文件。 - `emitWasmFileAsync(filePath: string)` - 异步将编译后的规则输出为 WASM 文件。 - `addRuleSource(rules: string)` - 从字符串向现有 scanner 添加规则。 - `addRuleFile(filePath: string)` - 从文件向现有 scanner 添加规则。 - `defineVariable(name: string, value: string)` - 为 YARA 编译器定义变量。 - `setMaxMatchesPerPattern(maxMatches: number)` - 设置每个模式的最大匹配数。 - `setUseMmap(useMmap: boolean)` - 启用或禁用用于扫描的内存映射文件。 - `setTimeout(timeoutMs: number)` - 以毫秒为单位设置扫描超时。 - `setMatchContextSize(size: number)` - 设置在匹配字符串周围获取的上下文字节数。 ### CompilerOptions - `defineVariables?: object` - 为 YARA 规则定义全局变量。 - `ignoreModules?: string[]` - 编译期间要忽略的模块名列表。 - `bannedModules?: BannedModule[]` - 禁止使用的模块列表。 - `features?: string[]` - 为 YARA 规则启用的功能列表。 - `relaxedReSyntax?: boolean` - 使用宽松的正则表达式语法。 - `conditionOptimization?: boolean` - 优化 YARA 规则中的条件。 - `errorOnSlowPattern?: boolean` - 遇到慢速模式时报错。 - `errorOnSlowLoop?: boolean` - 遇到慢速循环时报错。 - `includeDirectories?: string[]` - **(v1.5.0+)** 编译器查找包含文件的目录。 - `enableIncludes?: boolean` - **(v1.5.0+)** 在 YARA 规则中启用或禁用 include 语句。 ## 许可证 本项目包含受两个不同许可证约束的代码: - **MIT 许可证:** - Node.js 绑定以及特定于本模块的其他代码均在 MIT 许可证下授权。 - 完整文本请参见 `LICENSE-MIT`。 - **BSD-3-Clause 许可证:** - 包含的 yara-x 库在 BSD-3-Clause 许可证下授权。 - 完整文本请参见 `LICENSE-BSD-3-Clause`。
标签:AI工具, GNU通用公共许可证, MITM代理, Node.js, Rust, YARA, 云资产可视化, 可视化界面, 网络流量审计, 自定义脚本, 规则匹配引擎