lightofbaldr/mojo-yara
GitHub: lightofbaldr/mojo-yara
Stars: 0 | Forks: 0
# mojo-yara
[](LICENSE)
[](#status)
## What this is
A pure-Mojo reimplementation of the YARA malware-pattern-matching engine. YARA is the
de-facto open standard for expressing malware signatures and indicators; it's used across
threat-intel platforms, IR workflows, EDR rule packs, and forensic toolchains. The C
reference (`libyara`) is mature but has lived with its multi-pattern-matching performance
ceiling for ~15 years — exactly the workload Mojo's SIMD primitives and GPU offload are
positioned to dominate.
`mojo-yara` aims for **bit-equal output** with the C reference on the subset of YARA it
supports, plus a measurable performance lead on large rule sets and large scan targets.
## Status
**Pre-alpha — v0.1 in build.** See [`docs/PLAN.md`](docs/PLAN.md) for the build phases and
ETA. Nothing is shippable yet.
## Goals
- **Drop-in CLI compatibility** with `yara` for the supported rule subset, so existing
pipelines work unchanged.
- **Demonstrable correctness:** every release diffed against `libyara`'s output across a
real rule + sample corpus. Ship when bit-equal.
- **Demonstrable performance:** SIMD-accelerated literal-string matcher (Aho-Corasick
style, vectorized). GPU offload planned for v0.2+ for big-corpus sweeps.
- **Single-binary deploy:** no Python interpreter, no native-extension wheels. Compile once,
drop into a container or VM.
- **Classical only.** Pattern matching against externally-provided rules. No
machine-learning at any layer.
## Scope — v0.1
| | In v0.1 | Queued for v0.2+ |
|---|---|---|
| Rule syntax | Literal ASCII strings, hex with `??`/`?A` wildcards, common conditions (`all of them`, `any of them`, `N of (...)`, boolean ops) | Full PCRE regex strings |
| Targets | File scanning | Process-memory scanning |
| Modules | None | `pe`, `elf`, `hash`, `cuckoo`, `math` |
| Acceleration | SIMD on CPU | GPU offload, distributed scan |
| Output | Plain text (`yara`-compatible), JSON | YARA-X serialized output |
## Build
Requires Mojo 1.0+ via [pixi](https://pixi.sh). Mirrors the build pattern from
[`mojo-http`](https://github.com/lightofbaldr/mojo-http).
git clone https://github.com/lightofbaldr/mojo-yara
cd mojo-yara
pixi install # pulls Mojo + MAX
pixi run test # runs unit tests
pixi run build # produces build/mojo-yara
## Layout
src/
parser/ rule tokenizer + AST
matcher/ string and hex pattern matchers + Aho-Corasick
cli/ mojo-yara entrypoint
tests/
parser/ AST round-trip tests against curated rules
matcher/ targeted match tests
oracle/ bit-equal diff harness vs the C `yara` reference
corpora/
rules/ public YARA rules for testing (gitignored beyond a small fixture set)
samples/ small public test samples (de-fanged where applicable)
docs/
PLAN.md build phases and ETA
## License
Apache-2.0. See `LICENSE`.
## Publication policy
This repository is a general-purpose, classical YARA-compatible pattern scanner,
released under Apache-2.0. It does not implement or describe any separately-
maintained proprietary technology of Light of Baldr LLC. Public release follows
patent-counsel review.