renatahodovan/grammarinator

GitHub: renatahodovan/grammarinator

基于 ANTLR v4 语法的结构感知型 fuzzer，能生成、变异和重组语法合法的测试用例，并原生集成 libFuzzer 和 AFL++ 实现覆盖率引导的模糊测试。

Stars: 430 | Forks: 67

# Grammarinator *基于 ANTLRv4 语法的测试生成器* .. image:: https://img.shields.io/pypi/v/grammarinator?logo=python&logoColor=white :target: https://pypi.org/project/grammarinator/ .. image:: https://img.shields.io/pypi/l/grammarinator?logo=open-source-initiative&logoColor=white :target: https://pypi.org/project/grammarinator/ .. image:: https://img.shields.io/github/actions/workflow/status/renatahodovan/grammarinator/main.yml?branch=master&logo=github&logoColor=white :target: https://github.com/renatahodovan/grammarinator/actions .. image:: https://img.shields.io/coveralls/github/renatahodovan/grammarinator/master?logo=coveralls&logoColor=white :target: https://coveralls.io/github/renatahodovan/grammarinator .. image:: https://img.shields.io/readthedocs/grammarinator?logo=read-the-docs&logoColor=white :target: http://grammarinator.readthedocs.io/en/latest/ .. 开始包含文档 *Grammarinator* 是一个随机测试生成器 / fuzzer，它根据输入的 ANTLR_ v4 语法来创建测试用例。这种基于语法的方法背后的动机是利用大量公开可用的 `ANTLR v4 语法`_。它同时包含了基于 Python 的高性能 C++ 生成后端。 .. _ANTLR: http://www.antlr.org .. _`ANTLR v4 语法`: https://github.com/antlr/grammars-v4 .. _`奖项页面`: https://github.com/renatahodovan/grammarinator/wiki +--------------------------------------------------------------------------+ | **太长不看 - 核心特性** | +--------------------------------------------------------------------------+ | *对最重要功能的简要概述* | +==========================================================================+ | | | * 基于 `ANTLR v4 语法`_ 从头**生成**测试用例，或者在解析现有测试用例后 | | 对其进行**变异/重组**。 | | | | * 除了黑盒测试生成之外，还通过原生集成 `libFuzzer`_ 和 `AFL++`_ 支持引导| | 式 fuzzing。 | | | | * AFL++ 集成还通过 ``afl-tmin`` 工具支持**支持语法的测试用例最小化**。| | | | * **支持语法的变异和重组**，无需通过解析减慢 fuzzing 过程（使用预解析的 | | 输入种子）。 | | | | * 通过内联语法权重或基于 JSON 的外部权重配置（用于选择项和量词），实现 | | 细粒度的**概率生成控制**。 | | | | * 支持语法中的内联**语义谓词**，以便在生成过程中动态启用或禁用语法选择项| | 。 | | | | * 多种**大小控制策略**，包括最大递归深度和最大 token 计数限制。 | | | | * 内置**缓存**以过滤掉重复生成的输入。 | | | | * 同时包含**支持语法和不支持语法的 mutator**，并支持选择性启用和禁用。 | | | | * 可扩展的**序列化**管道，带有自定义序列化器，用于将基于树的输出格式化为| | 具体的测试输入。 | | | | * 高级自定义钩子： | | | | * 用于程序化决策指导的**自定义模型** | | * 用于在生成期间收集信息的**自定义监听器** | | * 用于生成后树转换的**自定义转换器** | +--------------------------------------------------------------------------+ .. _libFuzzer: https://llvm.org/docs/LibFuzzer.html .. _AFL++: https://aflplus.plus # 需求 * Python_ >= 3.10 * Java_ SE >= 11 JRE 或 JDK（后者为可选）此外，对于 C++ 后端： * C++20 编译器（例如，GCC >= 11.0, Clang >= 13.0, MSVC >= 2019） * CMake_ >= 3.10 .. _Python: https://www.python.org .. _Java: https://www.oracle.com/java/ .. _CMake: https://cmake.org # 安装要在其他项目中使用 *Grammarinator*，可以将其作为安装依赖项添加到 ``setup.cfg`` 中（如果使用带有声明性配置的 setuptools_）： .. code-block:: ini ``` [options] install_requires = grammarinator ``` 要手动安装 *Grammarinator*（例如，安装到虚拟环境中），请使用 pip_ :: ``` pip install grammarinator ``` 上述方法会从 PyPI_ 安装最新版本的 *Grammarinator*。或者，若要获取开发版本，请克隆项目并进行本地安装 :: ``` pip install . ``` .. _setuptools: https://github.com/pypa/setuptools .. _pip: https://pip.pypa.io .. _PyPI: https://pypi.org/ # 使用说明作为第一步，*Grammarinator* 接收一个 `ANTLR v4 语法`_ 并用 Python3 或 C++ 创建一个测试生成器脚本。Grammarinator 支持 ANTLR 语法的一个子集，这在文档的“语法概述”部分中有所介绍。如果需要，生成的生成器以后可以进行子类化以进一步自定义。创建测试生成器（Python 或 C++）的基本命令行语法 :: ``` grammarinator-process -o --no-actions [--language hpp] ``` .. ``` **Notes** *Grammarinator* uses the `ANTLR v4 grammar`_ format as its input, which makes existing grammars (lexer and parser rules) easily reusable. However, because of the inherently different goals of a fuzzer and a parser, inlined code (actions and conditions, header and members blocks) are most probably not reusable, or even preventing proper execution. For first experiments with existing grammar files, ``grammarinator-process`` supports the command-line option ``--no-actions``, which skips all such code blocks during fuzzer generation. Once inlined code is tuned for fuzzing, that option may be omitted. ``` .. _`ANTLR v4 语法`: https://github.com/antlr/grammars-v4 ## 基于 Python 的测试生成在生成并可选择自定义 fuzzer 之后，可以通过 ``grammarinator-generate`` 脚本执行它（当然，也可以在自定义编写的驱动程序中手动实例化）。 ``grammarinator-generate`` 的基本命令行语法 :: ``` grammarinator-generate \ -r -d \ -o -n \ -t -t ``` ## 基于 C++ 的测试生成在使用 ``grammarinator-process`` 并加上 ``--language hpp`` 标志生成基于 C++ 的 fuzzer 之后，需要构建它 :: ``` python3 grammarinator-cxx/dev/build.py --clean \ --generator \ --includedir \ --tools ``` 构建完成后，可以按如下方式运行独立生成器 :: ``` grammarinator-cxx/build/Release/bin/grammarinator-generate- \ -r -d \ -o -n ``` 注意：C++ 后端也可以用作 libFuzzer 的自定义 mutator。有关此内容的详细信息，请参阅文档的 *LibFuzzer 集成* 部分。 # 演化生成除了基于 ANTLR 语法从头生成测试用例外，Grammarinator 还能够重组现有输入或仅对其中的一小部分进行变异。要使用这些额外的生成方法，必须准备一组选定的测试用例作为种群。准备工作使用 ``grammarinator-parse`` 工具完成，该工具使用 ANTLR 语法处理输入文件（可能与生成器使用的语法相同），并从中构建 grammarinator 树表示（带有 ``.grt*`` 扩展名）。这些文件对输入的完整派生树进行编码，并可在不同的 fuzzing 策略中重用。 ``grammarinator-parse`` 的基本命令行语法 :: grammarinator-parse -g -r \ -o 拥有这些 ``.grt*`` 文件的种群后，``grammarinator-generate`` 或 ``grammarinator-generate-`` 可以通过 ``--population`` CLI 选项利用它们。如果设置了 ``--population`` 选项（对于 Python 或 C++ 生成器），则 *Grammarinator* 将为每个新测试用例随机选择一种策略（生成、变异或重组）。如果不希望使用某种策略，可以使用 ``--no-generate``、``--no-mutate`` 或 ``--no-recombine`` 选项将其禁用。 .. ``` **Notes** Real-life grammars often use recursive rules to express certain patterns. However, when using such rule(s) for generation, we can easily end up in an unexpectedly deep call stack. With the ``--max-depth`` or ``-d`` options, this depth - and also the size of the generated test cases - can be controlled. Another specialty of the ANTLR grammars is that they support so-called hidden tokens. These rules typically describe such elements of the target language that can be placed basically anywhere without breaking the syntax. The most common examples are comments or whitespaces. However, when using these grammars - which don't define explicitly where whitespace may or may not appear in rules - to generate test cases, we have to insert the missing spaces manually. This can be done by applying a serializer (with the ``-s`` option) to the tree representation of the output tests. A simple serializer - that inserts a space after every unparser rule - is provided by *Grammarinator* (``grammarinator.runtime.simple_space_serializer``). In some cases, we may want to postprocess the output tree itself (without serializing it). For example, to enforce some logic that cannot be expressed by a context-free grammar. For this purpose the transformer mechanism can be used (with the ``-t`` option). Similarly to the serializers, it will take a tree as input, but instead of creating a string representation, it is expected to return the modified (transformed) tree object. As a final thought, one must not forget that the original purpose of grammars is the syntax-wise validation of various inputs. As a consequence, these grammars encode syntactic expectations only and not semantic rules. If we still want to add semantic knowledge into the generated test, then we can inherit custom fuzzers from the generated ones and redefine methods corresponding to lexer or parser rules in ways that encode the required knowledge (e.g.: HTMLCustomGenerator_). ``` .. _HTMLCustomGenerator: examples/fuzzer/HTMLCustomGenerator.py # 工作示例该代码库包含一个用于生成 HTML 文件的最小化示例_。要试用它，请首先运行处理器，然后使用生成器生成测试用例。使用 Python 后端 :: ``` grammarinator-process examples/grammars/HTMLLexer.g4 examples/grammars/HTMLParser.g4 \ -o examples/fuzzer/ grammarinator-generate HTMLCustomGenerator.HTMLCustomGenerator \ -r htmlDocument -d 20 \ -o examples/tests/test_%d.html -n 100 \ -s HTMLGenerator.html_space_serializer \ --sys-path examples/fuzzer/ ``` 使用 C++ 后端 :: ``` grammarinator-process examples/grammars/HTMLLexer.g4 examples/grammars/HTMLParser.g4 \ -o examples/fuzzer/ --no-actions --language hpp python3 grammarinator-cxx/dev/build.py --clean \ --generator HTMLGenerator \ --serializer HTMLSpaceSerializer \ --include HTMLConfig.hpp \ --includedir examples/fuzzer/ \ --tools grammarinator-cxx/build/Release/bin/grammarinator-generate-html \ -r htmlDocument -d 20 \ -o examples/tests/test_%d.html -n 100 ``` .. _example: examples/ # 兼容性 *Grammarinator* 已在以下平台上测试： * Linux (Ubuntu 16.04 ... 24.04) * OS X / macOS (10.12 ... 15.5) * Windows (Server 2012 R2 / Server version 1809 / Windows 10 / Windows Server 2022) # 引用关于 *Grammarinator* 的背景信息已发布于： * Renata Hodovan, Akos Kiss, and Tibor Gyimothy. Grammarinator: A Grammar-Based Open Source Fuzzer. In Proceedings of the 9th ACM SIGSOFT International Workshop on Automating Test Case Design, Selection, and Evaluation (A-TEST 2018), pages 45-48, Lake Buena Vista, Florida, USA, November 2018. ACM. https://doi.org/10.1145/3278186.3278193 * Renata Hodovan, Akos Kiss. Grammarinator Meets LibFuzzer: A Structure-Aware In-Process Approach. In Proceedings of the 20th International Conference on Software Technologies (ICSOFT 2025), pages 178-189, Bilbao, Spain, June 2025. SciTePress. Best paper award. https://doi.org/10.5220/0013571500003964 .. 结束包含文档 # 版权和许可根据 BSD 3-Clause 许可证_ 授权。 .. _许可证: LICENSE.rst

标签：ANTLRv4, Bash脚本, C++, Fuzzing, JS文件枚举, pocsuite3, Python, 代码生成, 安全测试, 开源测试工具, 攻击性安全, 数据擦除, 数据管道, 无后门, 测试生成器, 测试用例变异, 测试用例重组, 渗透测试工具, 编译器测试, 解析器测试, 语法分析, 语法驱动测试, 软件工程, 软件测试, 逆向工具, 随机测试生成