CERT-Polska/karton-classifier

GitHub: CERT-Polska/karton-classifier

Stars: 8 | Forks: 12

# Classifier karton service File type classifier for the Karton framework. Entrypoint for samples. Classifies type of samples labeled as `kind: raw`, which makes them available for subsystems that receive samples with specific type only (e.g. `raw` => `runnable:win32:exe`) **Author**: CERT.pl **Maintainers**: psrok1, msm, nazywam **Consumes:** { "type": "sample", "kind": "raw" "payload": { "magic": "output from 'file' command", "sample": } } **Produces:** { "type": "sample", "stage": "recognized", "kind": "runnable" # Executable format default for OS platform || "document" # Office document || "archive" # Archive containing samples (zip, e-mails) || "dump" # Dump from sandbox || "script", # Script (js/vbs/bat...) || "misc", # No platform or extension "platform": "win32" || "win64" || "linux" || "android", || "macos", || "freebsd", || "netbsd", || "openbsd", || "solaris", "extension": "*", # Expected file extension "mime": "*", # Expected file mimetype ... (other fields are derived from incoming task) } **Warning** the output `mime` field is not deterministic across libmagic versions and can change depending on the version you're using. We don't recommend creating consumers that listen on it directly. ## Usage First of all, make sure you have setup the core system: https://github.com/CERT-Polska/karton Then install karton-classifier from PyPi: $ pip install karton-classifier $ karton-classifier ## YARA rule classifiers Since karton-classifier v2.1.0 it's possible to extend the classifier logic using YARA rules. You can enable it by passing `--yara-rules` with the path to the directory containing the rules. Each rule **has to** specify the resulting `kind` using the meta section. Other meta-attributes (`platform` and `extension`) are supported but optional. A working rule looks like this: rule pe_file { meta: description = "classifies incoming windows executables" kind = "runnable" platform = "win32" extension = "exe" strings: $mz = "MZ" condition: $mz at 0 and uint32(uint32(0x3C)) == 0x4550 } Some caveats to consider: * Classifier will still process files normally, so in some cases it may report the same file twice. * Classifier will report all matching Yara rules (so N matches on a single file will create N tasks) * The outgoing task includes the matched rule name in `rule-name` in the task header * All Yara rules must have a `.yar` extension. All other files in the specified directory are ignored. In particular, `.yara` extension is not supported. * Directories are not supported too - all Yara rules must reside directly in the specified directory.