lightofbaldr/mojo-yara

GitHub: lightofbaldr/mojo-yara

Stars: 0 | Forks: 0

# mojo-yara [![License: Apache-2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE) [![Status: Pre-alpha](https://img.shields.io/badge/Status-Pre--alpha-orange)](#status) ## What this is A pure-Mojo reimplementation of the YARA malware-pattern-matching engine. YARA is the de-facto open standard for expressing malware signatures and indicators; it's used across threat-intel platforms, IR workflows, EDR rule packs, and forensic toolchains. The C reference (`libyara`) is mature but has lived with its multi-pattern-matching performance ceiling for ~15 years — exactly the workload Mojo's SIMD primitives and GPU offload are positioned to dominate. `mojo-yara` aims for **bit-equal output** with the C reference on the subset of YARA it supports, plus a measurable performance lead on large rule sets and large scan targets. ## Status **Pre-alpha — v0.1 in build.** See [`docs/PLAN.md`](docs/PLAN.md) for the build phases and ETA. Nothing is shippable yet. ## Goals - **Drop-in CLI compatibility** with `yara` for the supported rule subset, so existing pipelines work unchanged. - **Demonstrable correctness:** every release diffed against `libyara`'s output across a real rule + sample corpus. Ship when bit-equal. - **Demonstrable performance:** SIMD-accelerated literal-string matcher (Aho-Corasick style, vectorized). GPU offload planned for v0.2+ for big-corpus sweeps. - **Single-binary deploy:** no Python interpreter, no native-extension wheels. Compile once, drop into a container or VM. - **Classical only.** Pattern matching against externally-provided rules. No machine-learning at any layer. ## Scope — v0.1 | | In v0.1 | Queued for v0.2+ | |---|---|---| | Rule syntax | Literal ASCII strings, hex with `??`/`?A` wildcards, common conditions (`all of them`, `any of them`, `N of (...)`, boolean ops) | Full PCRE regex strings | | Targets | File scanning | Process-memory scanning | | Modules | None | `pe`, `elf`, `hash`, `cuckoo`, `math` | | Acceleration | SIMD on CPU | GPU offload, distributed scan | | Output | Plain text (`yara`-compatible), JSON | YARA-X serialized output | ## Build Requires Mojo 1.0+ via [pixi](https://pixi.sh). Mirrors the build pattern from [`mojo-http`](https://github.com/lightofbaldr/mojo-http). git clone https://github.com/lightofbaldr/mojo-yara cd mojo-yara pixi install # pulls Mojo + MAX pixi run test # runs unit tests pixi run build # produces build/mojo-yara ## Layout src/ parser/ rule tokenizer + AST matcher/ string and hex pattern matchers + Aho-Corasick cli/ mojo-yara entrypoint tests/ parser/ AST round-trip tests against curated rules matcher/ targeted match tests oracle/ bit-equal diff harness vs the C `yara` reference corpora/ rules/ public YARA rules for testing (gitignored beyond a small fixture set) samples/ small public test samples (de-fanged where applicable) docs/ PLAN.md build phases and ETA ## License Apache-2.0. See `LICENSE`. ## Publication policy This repository is a general-purpose, classical YARA-compatible pattern scanner, released under Apache-2.0. It does not implement or describe any separately- maintained proprietary technology of Light of Baldr LLC. Public release follows patent-counsel review.