Aoko7/TriGma-code
GitHub: Aoko7/TriGma-code
Stars: 0 | Forks: 0
# TriGma
[中文说明](README.zh-CN.md)
TriGma is a preprint-safe public code release for a tri-branch static malware family classification pipeline.
It combines:
- micro-level byte and opcode representations
- meso-level function-call graph representations
- macro-level PE metadata representations
## Overview
This repository releases the code and workflow structure behind TriGma before paper publication.
It is designed to help readers understand:
- how PE samples are curated and normalized
- how Ghidra-based static extraction feeds the pipeline
- how `micro`, `meso`, and `macro` branches are constructed
- how pretraining and downstream tri-modal fusion are organized
- how the robustness workflow is implemented in code
This is a code-first release. It is not a public malware dataset release.
## Why This Repository Is Worth Reading
- End-to-end workflow is preserved, from sample preparation to downstream training and robustness evaluation code paths.
- Reusable implementation is separated from paper-facing experiment mirrors, so you can read either the engineering layer or the paper workflow layer first.
- The public boundary is conservative: unpublished result bundles, prediction packets, sample-level split exports, raw malware binaries, packed shards, and checkpoints are intentionally excluded.
## Repository Structure
- `src/`
Reusable implementation for preparation, pre-processing, pretraining, fine-tuning, and visualization.
- `scripts/`
Orchestration, preprocessing, evaluation, and utility entrypoints.
- `experiments_trigma_mainline/`
Paper-facing workflow mirror for pipeline, pretraining, clean experiments, and robustness experiments.
- `tests/`
Lightweight public validation coverage.
- `plotting/`
Figure-generation pipeline and plotting config.
- `docs/`
Codebase map, data pipeline guide, reproducibility notes, and release guidance.
## Quick Start
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python3 scripts/run_finetune_repro.py --dry-run
pytest tests -q
More setup details:
- installation guide: `INSTALL.md`
- runtime asset layout: `RUNTIME_LAYOUT.md`
- quickstart notes: `docs/quickstart.md`
## Choose Your Reading Path
- End-to-end paper workflow:
`experiments_trigma_mainline/workflow.md`
- Reusable downstream training code:
`src/finetune/README.md`
- Repository map:
`docs/codebase_map.md`
- Data pipeline guide:
`docs/data_pipeline_guide.md`
- Reproducibility guide:
`docs/reproducibility_guide.md`
## Release Scope
This repository includes:
- reusable code
- workflow mirrors
- plotting support
- public-facing documentation
This repository does not include:
- raw malware samples
- attack-generated binaries
- packed dataset shards
- checkpoints
- unpublished result bundles
- prediction packets and evaluation manifests
- sample-level split exports
See:
- release boundary: `docs/OPEN_SOURCE_SCOPE.md`
- included vs excluded summary: `INCLUDED_AND_EXCLUDED.md`
- public doc index: `docs/PUBLIC_DOC_INDEX.md`
## Runtime Note
Some scripts still expect external assets under `runtime/` and write local outputs under `outputs/`.
Those paths are placeholders for assets from the original research environment and are intentionally not distributed here.
See:
- `RUNTIME_LAYOUT.md`
## Project Status
This directory exists as the safer publication surface before paper release.
Compared with the internal release candidate, it intentionally omits result-oriented materials that are valuable to preserve internally but are not suitable to publish yet.
## Citation And Release
- license: `LICENSE`
- citation template: `CITATION.md`
- license record: `LICENSE_STATUS.md`
- publication gate: `docs/OPEN_SOURCE_RELEASE_GATE.md`
- GitHub launch checklist: `docs/GITHUB_RELEASE_CHECKLIST.md`
- short public positioning note: `docs/PREPRINT_PUBLIC_POSITIONING.md`