healert-io/agent
GitHub: healert-io/agent
Stars: 0 | Forks: 0
# Healert Agent
**Kubernetes audit log friction detection agent for the Healert Friction Intelligence Platform.**
Tails the Kubernetes audit log, detects platform bypass events against configurable rules,
and sends friction events to the self-hosted Healert backend. Surfaces in Backstage as
per-service Friction Scores and Heatmaps via `@backstage-community/plugin-healert`.
[](https://opensource.org/licenses/Apache-2.0)
[](https://github.com/healert/agent/releases)
[](https://golang.org)
## Overview
Kubernetes Audit Log (/var/log/k3s-audit.log)
| NDJSON events, tailed from EOF — one line at a time
v
Healert Go Agent <- this repo
isInternalSystemActor() filter controllers, detect human operators
matchRules() evaluate all rules (AND logic per rule)
send() POST /events with API key auth
|
v
Healert Backend github.com/healert/backend
FastAPI + SQLite
Exponential decay scoring
|
v
Backstage Plugin @backstage-community/plugin-healert
FrictionScoreCard + FrictionHeatmap per catalog entity
The agent is a **single Go binary with zero external dependencies**.
It runs as a **local process** (development) or a **Kubernetes DaemonSet** (production).
## Repository Structure
healert-agent/
|
+-- main.go Go agent — 1,498 lines, zero external dependencies
| 9 sections:
| 1. Configuration env var loading and validation
| 2. Rule Types Rule, RuleMatch, RulesConfig structs
| 3. Rules Loader YAML parser, validator, config block
| 4. Audit Types AuditEvent, FrictionEvent structs
| 5. Detection isInternalSystemActor, matchRule,
| matchRules, normaliseWorkloadName
| 6. Description renderDescription, sanitiseLogValue
| 7. Backend Client send(), healthCheck(), 10s timeout
| 8. Log Tailer tailLog(), processLine(), bufio.Reader
| 9. Entry Point main(), banner, health check
|
+-- rules.yaml Detection rules — 520 lines
| config block: global ignore_namespaces
| 5 active rules + 10+ optional rules
| Rule types: TYPE 1 workload, TYPE 2 shared resource,
| TYPE 3 cluster, TYPE 4 network, TYPE 5 storage
|
+-- healert.sh Management script — 2,788 lines, 17 commands
| start [backend|agent|kubernetes]
| stop [backend|agent|kubernetes]
| update kubernetes
| configure [--audit-log|--rules|--namespace]
| configure scoring [--threshold|--half-life|--retention]
| validate, restart, reset, status, logs, test, version, help
|
+-- Dockerfile Multi-stage distroless build — 255 lines
| Stage 1: golang:1.22-alpine (builder)
| Stage 2: gcr.io/distroless/static:nonroot (final)
| Result: ~25MB, no shell, uid=65532
| Features: OCI labels, multi-arch, private registry support
|
+-- daemonset.yaml Kubernetes DaemonSet — 388 lines
| Resources: Namespace, ServiceAccount, NetworkPolicy, DaemonSet
| Security: nonroot uid=65532, readOnlyRootFilesystem, drop ALL
| Features: K8S_NAMESPACE Downward API, system-node-critical priority
| 30s termination grace period, rolling update strategy
|
|
+-- example.audit-policy.yaml (Tells the Kubernetes API server which events to write to the audit log and at what detail level.)
|
|
+-- go.mod Go module (zero external dependencies)
|
+-- .env.example Configuration template
|
+-- LICENSE Apache-2.0, Copyright 2026 Healert OU
## Prerequisites
| Requirement | Version | Notes |
|---|---|---|
| Go | 1.22+ | Compile the agent binary |
| Python 3 | 3.8+ | Backend runtime (managed by healert.sh) |
| pip | Any | Python package manager |
| curl | Any | Health checks and API calls |
| Kubernetes | Any | k3s, kubeadm, EKS, GKE, AKS |
| Audit logging | Enabled | See Audit Log Setup section |
## Quick Start
# 1. Clone and compile
git clone https://github.com/healert/agent.git
cd agent
go build -o healert-agent main.go
# 2. Configure directories
./healert.sh init
# 3. Check dependencies
./healert.sh deps
# 4. Generate API key and configure both sides
./healert.sh setup
# 5. Set audit log path (k3s)
./healert.sh configure --audit-log /var/log/k3s-audit.log
Note: if the audit log file is not found
sudo touch /var/log/k3s-audit.log
sudo chmod 644 /var/log/k3s-audit.log
# 6. Validate rules
./healert.sh validate
# 7. Start backend and agent
./healert.sh start
# 8. Verify pipeline
./healert.sh test
## Audit Log Setup
### k3s
# Create audit policy
sudo mkdir -p /etc/k3s
sudo cp example.audit-policy.yaml /etc/k3s/audit-policy.yaml
# Enable audit logging
sudo mkdir -p /etc/systemd/system/k3s.service.d
sudo tee /etc/systemd/system/k3s.service.d/audit.conf << CONF
[Service]
ExecStart=
ExecStart=/usr/local/bin/k3s server \
--kube-apiserver-arg=audit-log-path=/var/log/k3s-audit.log \
--kube-apiserver-arg=audit-policy-file=/etc/k3s/audit-policy.yaml \
--kube-apiserver-arg=audit-log-maxage=7 \
--kube-apiserver-arg=audit-log-maxbackup=3 \
--kube-apiserver-arg=audit-log-maxsize=100
CONF
sudo systemctl daemon-reload
sudo systemctl restart k3s
sleep 15
# Set permissions
sudo groupadd healert 2>/dev/null || true
sudo usermod -aG healert $USER
sudo chown root:healert /var/log/k3s-audit.log
sudo chmod 640 /var/log/k3s-audit.log
### kubeadm
# Add to kube-apiserver.yaml under spec.containers.command:
# - --audit-log-path=/var/log/kubernetes/audit/audit.log
# - --audit-policy-file=/etc/kubernetes/audit-policy.yaml
## Commands Reference
### Local Mode
| Command | Description |
|---|---|
| `./healert.sh init` | Configure backend and agent directories |
| `./healert.sh deps` | Check and install all dependencies |
| `./healert.sh setup` | Generate API key, configure both sides |
| `./healert.sh setup rotate` | Rotate existing API key |
| `./healert.sh configure` | Update agent settings interactively |
| `./healert.sh configure --audit-log PATH` | Set audit log path |
| `./healert.sh configure --rules PATH` | Set rules.yaml path |
| `./healert.sh configure --namespace NS` | Set Backstage entity namespace |
| `./healert.sh configure scoring` | Update scoring parameters interactively |
| `./healert.sh configure scoring --threshold N` | Points for score=100 (default: 50) |
| `./healert.sh configure scoring --half-life N` | Decay half-life in days (default: 7) |
| `./healert.sh configure scoring --retention N` | Event window in days (default: 30) |
| `./healert.sh configure scoring --reset` | Restore default scoring parameters |
| `./healert.sh start` | Start backend and agent |
| `./healert.sh start backend` | Start backend only |
| `./healert.sh start agent` | Start agent only |
| `./healert.sh stop` | Stop backend and agent |
| `./healert.sh stop backend` | Stop backend only |
| `./healert.sh stop agent` | Stop agent only |
| `./healert.sh restart` | Validate rules, stop and start both |
| `./healert.sh validate` | Validate rules.yaml |
| `./healert.sh reset` | Delete and recreate database |
| `./healert.sh reset --confirm` | Reset without confirmation prompt |
| `./healert.sh status` | Show health and running state |
| `./healert.sh logs` | Tail live logs from all processes |
| `./healert.sh test` | Send test event, verify full pipeline |
| `./healert.sh version` | Show version, copyright, license |
| `./healert.sh help` | Show all commands with descriptions |
### Kubernetes DaemonSet
| Command | Description |
|---|---|
| `./healert.sh start kubernetes` | Deploy agent as Kubernetes DaemonSet |
| `./healert.sh stop kubernetes` | Remove DaemonSet and healert-system namespace |
| `./healert.sh update kubernetes` | Apply latest config with rolling restart |
## Configuration
### Agent
| Variable | Default | Description |
|---|---|---|
| `HEALERT_BACKEND_URL` | `http://localhost:8000` | Backend URL |
| `HEALERT_HOST` | `127.0.0.1` | Backend bind host — set to `0.0.0.0` for DaemonSet mode |
| `AUDIT_LOG_PATH` | `/var/log/k3s-audit.log` | Audit log path |
| `ENTITY_NAMESPACE` | `default` | Fallback namespace for cluster-scoped resources |
| `RULES_PATH` | required | Detection rules file path |
| `HEALERT_API_KEY` | required | Bearer token for backend auth |
| `K8S_NAMESPACE` | auto (Downward API) | Agent namespace — auto-excluded from detection |
### Scoring
| Variable | Default | Description |
|---|---|---|
| `SCORE_CRITICAL_THRESHOLD` | `50` | Weighted points for score=100 |
| `SCORE_DECAY_HALF_LIFE` | `7` | Event weight half-life in days |
| `SCORE_RETENTION_DAYS` | `30` | Event window in days |
**Tuning guide:**
./healert.sh configure scoring --threshold 20 --half-life 3 # strict
./healert.sh configure scoring --threshold 50 --half-life 7 # default
./healert.sh configure scoring --threshold 100 --half-life 14 # lenient
## Detection Rules
### Global Namespace Exclusion
Add system namespaces to the config block to exclude them from ALL rules:
config:
ignore_namespaces:
- kube-system
- kube-public
- kube-node-lease
- cert-manager
- istio-system
- argocd
# Add your system namespaces here
The agent automatically excludes its own namespace (K8S_NAMESPACE) from all detections.
### Active Rules (v0.1.1 Coral)
| Rule | Severity | Type | What It Detects |
|---|---|---|---|
| `kubectl-exec` | High | TYPE 1 Workload | Interactive shell access to pods |
| `pipeline-skip` | High | TYPE 1 Workload | Policy bypass annotation on deployments |
| `config-drift` | High | TYPE 1 Workload | Direct write operations on workload resources |
| `port-forward` | Medium | TYPE 1 Workload | Direct port-forward to pods |
| `emergency-access` | Medium | TYPE 2 Shared | Direct secret access |
### Auto-Namespace Entity Resolution (v0.1.1 Coral)
The agent uses the Kubernetes event namespace directly as the Backstage catalog namespace:
pod in "default" -> component:default/payments-api (auto)
pod in "staging" -> component:staging/payments-api (auto)
pod in "production" -> component:production/payments-api (auto)
Zero configuration required. Works for any number of namespaces.
### Scoring Formula
Score = min(100, round(weighted_total / threshold x 100))
weighted_total = sum(points x 0.5^(age_days / half_life))
| Severity | Points |
|---|---|
| high | 10 |
| medium | 6 |
| low | 3 |
## Kubernetes Production Deployment
### Step 1 — Build and import image
docker build -t ghcr.io/healert-io/agent:0.1.1 .
docker push ghcr.io/healert-io/agent:0.1.1
# For k3s (local registry):
docker save ghcr.io/healert-io/agent:0.1.1 | sudo k3s ctr images import -
### Step 2 — Configure daemonset.yaml
# Set your backend host IP (not 127.0.0.1 — pods cannot reach loopback)
- name: HEALERT_BACKEND_URL
value: "http://192.168.x.x:8000"
# Update image tag
image: ghcr.io/healert-io/agent:0.1.1
### Step 3 — Deploy
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
./healert.sh start kubernetes
### Step 4 — Verify
kubectl get pods -n healert-system -o wide
kubectl logs -n healert-system -l app=healert-agent --tail=20
Expected:
Healert Agent v0.1.1 Coral
Rules: 5 loaded
Ignored namespaces (8):
- kube-system
...
- healert-system <- auto-added from K8S_NAMESPACE
Backend OK -- version=0.1.1 auth=enabled
Tailing "/var/log/k3s-audit.log" from end-of-file
### Update Without Downtime
# After changing daemonset.yaml or rotating API key:
./healert.sh update kubernetes
## Security
| Property | Implementation |
|---|---|
| API key storage | `.env` with `chmod 600` — never committed to git |
| API key injection | Environment variable — never in `ps aux` output |
| Backend binding | `127.0.0.1` by default — not exposed to network |
| Kubernetes Secret | API key stored as K8s Secret — not plain env var |
| Agent user | Runs as nonroot uid=65532 (distroless) |
| Filesystem | `readOnlyRootFilesystem: true` |
| Capabilities | `drop: ALL` — zero Linux capabilities |
| Network | NetworkPolicy: egress to backend:8000 and DNS only |
| Audit log | Read-only hostPath mount |
| Shell execution | Zero exec.Command() calls in agent binary |
| Path validation | Absolute paths only, no `..` traversal |
| Input sanitisation | `sanitiseLogValue()` prevents log injection |
| HTTP timeout | 10 seconds on all outbound requests |
| Script hardening | `set -euo pipefail`, `umask 077` |
| Graceful shutdown | SIGTERM/SIGINT handler — clean exit on rolling update |
| Namespace isolation | Agent auto-excludes its own namespace |
| Priority class | `system-node-critical` — never evicted under pressure |
## Audit Log Paths
| Distribution | Path |
|---|---|
| k3s | `/var/log/k3s-audit.log` |
| kubeadm | `/var/log/kubernetes/audit/audit.log` |
| Vanilla Kubernetes | `/var/log/audit/audit.log` |
## Related Repositories
| Repo | Description |
|---|---|
| [healert-io/backend](https://github.com/healert-io/backend) | FastAPI + SQLite backend |
| [backstage/community-plugins](https://github.com/backstage/community-plugins) | Backstage plugin (`@backstage-community/plugin-healert`) |
## License
Apache License 2.0 -- Copyright 2026 Healert OÜ
See [LICENSE](./LICENSE) for the full license text.