MatrixEditor/caterpillar

GitHub: MatrixEditor/caterpillar

一个类型安全的 Python 二进制结构化数据打包解包库,通过类声明式语法增强原生 struct 模块能力。

Stars: 37 | Forks: 4

# Caterpillar - 🐛 [![python](https://img.shields.io/badge/Python-3.12+-blue?logo=python&logoColor=yellow)](https://www.python.org/downloads/) ![![Latest Version](https://pypi.org/project/caterpillar-py/)](https://img.shields.io/github/v/release/MatrixEditor/caterpillar.svg?logo=github&label=Latest+Version) [![Build and Deploy Docs](https://static.pigsec.cn/wp-content/uploads/repos/2026/03/97b01cbe66113808.svg)](https://github.com/MatrixEditor/caterpillar/actions/workflows/python-sphinx.yml) [![Run Tests](https://static.pigsec.cn/wp-content/uploads/repos/2026/03/e89061bc79113809.svg)](https://github.com/MatrixEditor/caterpillar/actions/workflows/python-test.yml) ![GitHub issues](https://img.shields.io/github/issues/MatrixEditor/caterpillar?logo=github) ![GitHub License](https://img.shields.io/github/license/MatrixEditor/caterpillar?logo=github) Caterpillar 是一个用于打包和解包结构化二进制数据的 Python 3.12+ 库(同时支持 3.10+)。它通过支持直接的类声明增强了 [Python Struct](https://docs.python.org/3/library/struct.html) 的能力。关于不同配置选项的更多信息将在未来添加。文档在[这里 >](https://matrixeditor.github.io/caterpillar/)。 *Caterpillar* 能够: * 仅通过处理 Python 类定义即可打包和解包数据(包括对强大的位域、类 C++ 模板和类 C 联合体的支持!), * 应用广泛的数据类型(支持字节序和架构配置), * 根据继承布局动态调整结构体, * 使用 `__slots__` 减少内存占用, * 允许在类定义中放置条件语句, * 在类定义中插入适当的类型以支持文档编写, * 它有助于编写更简洁、更紧凑的代码。 * 还有一个功能允许你在结构体中动态更改字节序! * 你甚至可以扩展 Caterpillar 并用 C 或 C++ 编写解析逻辑 * 所有的结构体定义都符合类型要求!!!(已通过 pyright 测试) ## 给我看看代码! *以下代码符合类型要求,这意味着在使用此代码进行开发时,你的静态类型检查器不会* *对你发出警告*。
如果你想查看默认语法,请展开此块。 ``` from caterpillar.py import * from caterpillar.types import * @bitfield(order=LittleEndian) class Header: version : 4 # 4bit integer valid : 1 # 1bit flag (boolean) ident : (8, CharFactory) # 8bit char # automatic alignment to 16bits THE_KEY = b"ITS MAGIC" @struct(order=LittleEndian, kw_only=True) class Format: magic : THE_KEY # Supports string and byte constants directly header : Header a : uint8 # Primitive data types b : Dynamic + int32 # dynamic endian based on global config length : uint8 # String fields with computed lengths name : String(this.length) # -> you can also use Prefixed(uint8) # custom actions, e.g. for hashes _hash_begin : DigestField.begin("hash", Md5_Algo) # Sequences with prefixed, computed lengths -+ part of the MD5 hash names : CString[uint8::] # | # -+ # automatic hash creation and verification + default value hash : Md5_Field("hash", verify=True) # Creation, packing and unpacking remains the same ```
``` from caterpillar.py import * from caterpillar.types import * @bitfield(order=LittleEndian) class Header: version : int4_t # 4bit integer valid : int1_t # 1bit flag (boolean) ident : f[str, (8, CharFactory)] # 8bit char # automatic alignment to 16bits THE_KEY = b"ITS MAGIC" @struct(order=LittleEndian, kw_only=True) class Format: magic : f[bytes, THE_KEY] = THE_KEY # Supports string and byte constants directly header : Header a : uint8_t # Primitive data types b : f[int, Dynamic + int32] # dynamic endian based on global config length : uint8_t # String fields with computed lengths name : f[str, String(this.length)] # -> you can also use Prefixed(uint8) # custom actions, e.g. for hashes _hash_begin : f[None, DigestField.begin("hash", Md5_Algo)] = None # Sequences with prefixed, computed lengths -+ part of the MD5 hash names : f[list[str], CString[uint8::]] # | # -+ # automatic hash creation and verification + default value hash : f[bytes, Md5_Field("hash", verify=True)] = b"" # Creation (keyword-only arguments, magic is auto-inferred): obj = Format( header=Header(version=2, valid=True, ident="F"), a=1, b=2, length=3, name="foo", names=["a", "b"] ) # Packing the object; reads as 'PACK obj FROM Format' # objects of struct classes can be packed right away data_le = pack(obj, Format) # results in: b'ITS MAGIC0*\x01\x02\x00\x00\x00\x03foo\x02a\x00b\x00)\x9a...' # Unpacking the binary data, reads as 'UNPACK Format FROM blob' obj2 = unpack(Format, data_le) assert obj2.names == obj.names # to pack with a different endian for fields 'a' and 'b', use 'order' data_be = pack(obj, Format, order=BigEndian) assert data_le != data_be ``` 该库提供了超越基本结构体定义的广泛功能。关于其强大功能的更多详情,请探索官方[文档](https://matrixeditor.github.io/caterpillar/)、[示例](./examples/)和[测试用例](./test/)。 ## 安装 ### PIP 安装(仅 Python) ``` pip install caterpillar-py ``` ### 纯 Python 安装 ``` pip install "caterpillar[all]@git+https://github.com/MatrixEditor/caterpillar" ``` ### 安装 + C 扩展 ``` pip install "caterpillar[all]@git+https://github.com/MatrixEditor/caterpillar/#subdirectory=src/ccaterpillar" ``` ## 起点 请访问[文档](https://matrixeditor.github.io/caterpillar/),其中包含关于如何使用此库的完整教程。 ## 其他方案 以下是使用 Python 解析结构化二进制数据的类似方案列表: * [construct](https://github.com/construct/construct) * [kaitai_struct](https://github.com/kaitai-io/kaitai_struct) * [hachoir](https://hachoir.readthedocs.io/en/latest/) * [mrcrowbar](https://github.com/moralrecordings/mrcrowbar) 文档还提供了与这些方案的[对比](https://matrixeditor.github.io/caterpillar/reference/introduction.html#comparison)。 ## 许可证 根据 GNU General Public License (V3) 分发。有关更多信息,请参阅[许可证](LICENSE)。
标签:C结构体映射, DNS解析, Python 3.12, struct增强, 二进制数据处理, 二进制解析, 位域, 内存优化, 动态解析, 字节序, 序列化与反序列化, 开发库, 开源项目, 数据序列化, 文件格式解析, 结构体打包, 网络协议分析, 逆向工程辅助