wahidhendrawan/Forensis

GitHub: wahidhendrawan/Forensis

Stars: 0 | Forks: 0

# Forensis Forensis is an open-source web platform for threat analysis and digital forensics operations. It provides a unified workflow for log analysis, network packet inspection, memory triage, multi-engine detection correlation, and secure user administration. ## DFIR Service Architecture Forensis now uses a service-oriented internal architecture (modular monolith baseline) that maps directly to production microservice boundaries: - **api-service** responsibilities: - auth, users, case registry, artifact registry, job submission, job status APIs. - endpoints: `/api/cases`, `/api/jobs`, `/api/jobs/`, `/api/jobs//status`. - **analysis-workers** responsibilities: - dedicated Celery workers for `logs`, `network`, and `memory` analysis jobs. - queue-separated workers (`logs`, `network`, `memory`, `rules`) ready for queue-depth autoscaling. - staged pipeline: `queued -> parse -> enrich -> persist -> post_rule_match -> complete/failed`. - **rule-service** responsibilities: - Sigma/YARA enrichment orchestration and correlation limits. - asynchronous Sigma post-processing for large datasets. - **ui-service** responsibilities: - frontend pages consume job status and show real-time progress during analysis. The current repository keeps these services in one deployable unit for compatibility, while the domain/service boundaries are ready to be extracted into separate runtime services. ### Investigation Data Model Core normalized entities (internal ECS-style schema version `forensis-ecs-0.1`): - `Case` - `Artifact` - `AnalysisJob` - `Finding` - `RuleMatch` - `TimelineEvent` ### Recommended Production Database - **Primary recommendation**: PostgreSQL 16+ (`postgresql+psycopg` driver). - Why: - stronger concurrent write behavior than SQLite under multi-worker load, - better indexing, query planning, and operational tooling, - safer growth path for case/finding/timeline scale. - SQLite remains supported for lightweight deployments and development. ### Migration Versioning - Alembic is included for DB schema versioning. - Migration sources: - `alembic.ini` - `migrations/` - CI drift guard: - `scripts/check_migration_drift.py` - `.github/workflows/ci.yml` ## Key Capabilities ### 1. Log Parser and Analyzer - Parse Apache, Syslog, CSV, JSON, Elastic-like, and Splunk-like log inputs. - Detect anomalies from suspicious patterns and status behavior. - Correlate parsed events with Sigma, YARA, Threat Intel, and baseline profiling. - Display threat score, TI matches, YARA hits, and baseline drift in one view. - Export results to JSON and CSV. ### 2. Network Traffic Analyzer - Analyze PCAP and PCAPNG files. - Build flow summaries (source, destination, ports, protocol, bytes, packets, duration). - Highlight suspicious communication patterns. - Run Sigma + YARA + Threat Intel + baseline correlation against network events. - Include timeline fields (`first_seen`, `last_seen`) in flows and anomaly output. - Apply packet/flow guardrails to keep large capture processing responsive. - Export results to JSON and CSV. ### 3. Memory - Dedicated triage page separate from Helper. - Accept raw paste or uploads in: TXT, LOG, JSON, NDJSON/JSONL, CSV, TSV, XML, YAML, ZIP, VMEM, MEM, RAW, DMP, IMG, BIN. - Parse mixed memory tool output and surface suspicious indicators with severity. - Run enrichment for YARA/TI/baseline on parsed memory artifacts. - Provide follow-up recommendations and export to JSON/CSV. ### 4. Helper - Plan Generator for memory, network, and log investigation playbooks. - Operational Cheatsheets for common DFIR commands. - Optimized for fast triage handoff and repeatable analyst workflow. ### 5. Sigma Engine and Rule Management - Built-in SigmaHQ baseline ruleset from repository `SigmaHQ/sigma`, pinned to commit `994da16651194500b607a3007186c29779e1f961` (`rules/` path). - Automatic local baseline cache bootstrap on startup (no manual sync required for core rules). - Local Sigma rule correlation for logs, network, and memory artifacts. - Dashboard actions to sync Sigma rules from remote URLs. - Rule reload support without restarting the full stack. ### 6. Fast Malicious Detection Stack (Beyond Sigma) - **YARA Engine** for memory/log/network text artifacts (`yara_rules/`). - **Threat Intel Enrichment** for IP/domain/hash with local feed + local cache + scoring (`threat_intel/ioc_feed.json`). - **OTX Public IP Enrichment** for external reputation lookup on public IP indicators from log analysis (configured in Users & Administration). - **Cross-Source Correlation** (log + network + memory) by time window and shared identity (IP/host). - **Entity Baseline & Allowlist** per environment (`config/entity_baseline.json`, `config/entity_allowlist.json`). - **Rule QA Pipeline** with benign/malicious datasets and automated regression checks (`scripts/rule_qa.py`). ### 7. Event Search and Analytics Storage - **OpenSearch** sink for indexed event search at scale. - **ClickHouse** sink for high-volume event analytics and timeline aggregations. - Search API supports OpenSearch backend + database fallback. ### 8. Users and Administration - Role-based access control (Admin and Analyst). - User CRUD and group administration. - Built-in MFA (TOTP) setup, disable, and reset flows. - Dedicated Users and Security area for account governance. - OTX API key management in admin panel with masked display and clear/rotate controls. - CSRF protection for sensitive POST operations (administration and analyzer actions). ### 9. History and Reporting - Persist analysis history for logs, network, memory playbooks, and memory triage. - View, delete, and review previous sessions. - Export report bundle from current in-memory result set. ## Stack - Flask - SQLAlchemy (PostgreSQL recommended, SQLite fallback) - Flask-Login and Flask-Bcrypt - PyOTP (MFA) - Celery + Redis (async processing) - YARA (via `yara-python`) - Gunicorn (WSGI runtime) - Alembic (schema migrations) - Bootstrap 5 frontend ## Quick Start ### Docker (recommended) 1. Ensure Docker and Docker Compose are installed. 2. Create or edit `.env` in project root: FORENSIS_SECRET_KEY=replace_with_strong_secret FORENSIS_ADMIN_USER=admin FORENSIS_ADMIN_PASSWORD=forensis123 CELERY_BROKER_URL=redis://redis:6379/0 CELERY_RESULT_BACKEND=redis://redis:6379/0 # Optional SigmaHQ baseline controls # FORENSIS_SIGMAHQ_REPO=SigmaHQ/sigma # FORENSIS_SIGMAHQ_COMMIT=994da16651194500b607a3007186c29779e1f961 # FORENSIS_SIGMAHQ_RULES_SUBDIR=rules # FORENSIS_SIGMAHQ_REFRESH=0 # Optional detection tuning # FORENSIS_CORRELATION_WINDOW_MINUTES=60 # FORENSIS_TI_CACHE_TTL=43200 # FORENSIS_TI_CACHE_MAX_ENTRIES=50000 # FORENSIS_TI_MAX_HITS=500 # FORENSIS_OTX_TIMEOUT_SECONDS=4 # FORENSIS_OTX_MAX_LOOKUPS=30 # FORENSIS_OTX_MAX_SECONDS=12 # Async post-rule correlation for faster first result render # FORENSIS_ASYNC_SIGMA_POSTPROCESS=1 # Optional Sigma correlation performance guardrails # FORENSIS_SIGMA_MAX_EVENTS=900 # FORENSIS_SIGMA_MAX_EVENTS_NETWORK=250 # FORENSIS_SIGMA_MAX_EVENTS_LOGS=900 # FORENSIS_SIGMA_MAX_EVENTS_MEMORY=700 # FORENSIS_SIGMA_MAX_MATCHES=1500 # FORENSIS_NETWORK_EVENTS_STORE_LIMIT=900 # FORENSIS_NETWORK_ANOMALIES_STORE_LIMIT=500 # FORENSIS_PCAP_MAX_UPLOAD_BYTES=314572800 # FORENSIS_PCAP_MAX_PACKETS=350000 # FORENSIS_PCAP_MAX_TRACKED_FLOWS=250000 # Optional DB backend (example PostgreSQL) # FORENSIS_DB_URI=postgresql+psycopg://forensis:forensis_change_me@postgres:5432/forensis # Optional OpenSearch sink/search # FORENSIS_OPENSEARCH_URL=http://opensearch:9200 # FORENSIS_OPENSEARCH_INDEX=forensis-events # FORENSIS_OPENSEARCH_USERNAME= # FORENSIS_OPENSEARCH_PASSWORD= # FORENSIS_OPENSEARCH_VERIFY_TLS=false # Optional ClickHouse sink/analytics # FORENSIS_CLICKHOUSE_URL=http://clickhouse:8123 # FORENSIS_CLICKHOUSE_DB=forensis # FORENSIS_CLICKHOUSE_TABLE=events # FORENSIS_CLICKHOUSE_USERNAME= # FORENSIS_CLICKHOUSE_PASSWORD= # Optional SQLAlchemy pool tuning for PostgreSQL # FORENSIS_DB_POOL_SIZE=20 # FORENSIS_DB_POOL_MAX_OVERFLOW=40 # FORENSIS_DB_POOL_TIMEOUT=30 # FORENSIS_DB_POOL_RECYCLE=1800 # Optional display timezone (default: Asia/Jakarta / GMT+7) # FORENSIS_DISPLAY_TZ=Asia/Jakarta 3. Build and run: docker compose up -d --build 4. Open: - `http://localhost:5000` Default credentials (if unchanged): - Username: `admin` - Password: `forensis123` ### Local run 1. Create virtual environment and install dependencies: python -m venv venv source venv/bin/activate pip install -r requirements.txt 2. Start app: python app.py 3. Open: - `http://127.0.0.1:5000` ## Main Routes - `/` (redirects to `/login`) - `/login` - `/dashboard` - `/log-analyzer` - `/network-analyzer` - `/memory-triage` - `/helper` - `/history` - `/users` Additional APIs: - `/api/search/events` - `/api/analytics/overview` ## Production Profile (Optional) Enable optional production-oriented stack components: docker compose --profile dfir-prod up -d --build This profile enables: - `minio` (object storage for artifact expansion) - `rabbitmq` (alternate queue backend option) - dedicated queue workers: `worker_logs`, `worker_network`, `worker_memory`, `worker_rules` - `opensearch` + `opensearch_dashboards` `postgres` runs in the default compose stack. Enable ClickHouse analytics profile: docker compose --profile dfir-analytics up -d --build ## SQLite to PostgreSQL Cutover 1. Start PostgreSQL target and set `FORENSIS_DB_URI` to PostgreSQL. 2. Start Forensis once so schema/tables are created. 3. Run migration script: python scripts/migrate_sqlite_to_postgres.py \ --sqlite-path instance/forensis.db \ --postgres-uri postgresql+psycopg://forensis:forensis_change_me@localhost:5432/forensis The migration script truncates destination tables, copies rows, and realigns PostgreSQL ID sequences to prevent duplicate primary-key inserts. Alembic upgrade/drift check: alembic upgrade head python scripts/check_migration_drift.py ## Job Pipeline (Event-Driven) For every uploaded artifact, Forensis creates: 1. `Case` (auto-open case if no active case for same analyst/day) 2. `Artifact` (hash + metadata) 3. `AnalysisJob` (state machine tracked) Execution flow: 1. Upload artifact 2. Create queued job 3. Worker runs parse + enrich 4. Persist normalized results 5. Async Sigma correlation (configurable) 6. Finalize findings/rule matches/timeline and update job state Job state machine: - `queued` - `running` - `partial` (core result ready, post-rule correlation still running) - `succeeded` - `failed` Job stage examples: - `artifact_received` - `parse` - `enrich` - `rule_match` - `post_rule_match` - `complete` ## Rule QA Regression Run automated benign/malicious regression checks: python scripts/rule_qa.py Machine-readable output: python scripts/rule_qa.py --json Expectations file: - `qa_datasets/regression_expectations.json` ## Queue-Depth Worker Autoscaling (KEDA) Kubernetes manifests for separated queue workers and autoscaling: - `deploy/k8s/workers-keda.yaml` - `deploy/k8s/README.md` ## Project Structure Forensis/ ├── app.py ├── forensis/ │ ├── models.py │ ├── analyzers/ │ │ ├── log_analyzer.py │ │ ├── network_analyzer.py │ │ ├── playbook_engine.py │ │ ├── sigma_engine.py │ │ ├── yara_engine.py │ │ ├── threat_intel.py │ │ ├── entity_profile.py │ │ ├── correlation_engine.py │ │ └── detection_pipeline.py │ ├── services/ │ │ ├── event_search_service.py │ │ ├── analytics_service.py │ │ ├── job_service.py │ │ └── rule_service.py │ └── integrations/ │ └── elk_loki.py ├── migrations/ │ └── versions/ ├── deploy/ │ ├── clickhouse/ │ └── k8s/ ├── .github/workflows/ci.yml ├── templates/ ├── static/ ├── sigma_rules/ ├── yara_rules/ ├── threat_intel/ ├── config/ ├── qa_datasets/ ├── scripts/ │ ├── check_migration_drift.py │ ├── migrate_sqlite_to_postgres.py │ └── rule_qa.py ├── instance/ ├── uploads/ ├── Dockerfile ├── alembic.ini ├── docker-compose.yml └── requirements.txt ## Security Notes - Change default admin credentials immediately. - Use a strong `FORENSIS_SECRET_KEY`. - Enable MFA for privileged users. - Keep TI feed, baseline allowlist, and custom YARA rules under version control. - Review uploaded artifact handling and storage policy before production use. ## License See [LICENSE](LICENSE).