kimburkeanalytics/beyond-phishing-llm-threat-actor-tracker
GitHub: kimburkeanalytics/beyond-phishing-llm-threat-actor-tracker
Stars: 0 | Forks: 0
# Beyond Phishing: Tracking Threat Actor Use of LLMs Across the Attack Lifecycle
## Project Summary
This project assesses how threat actors are using large language models beyond basic social engineering. It converts public threat reporting into a structured, source-grounded dataset, then generates a capability matrix, summary tables, source bibliography, and executive intelligence assessment.
The core analytic question is not simply whether adversaries use LLMs, but what LLMs change about adversary capability: scale, speed, quality/precision, skill compression, and workflow chaining.
The project uses a custom LLM-enabled capability taxonomy as the primary analytic model and maps findings to MITRE ATT&CK as a secondary operational layer. This keeps the analysis focused on the defender-relevant "so what" while still making the findings useful for threat hunting, detection engineering, and customer-facing security recommendations.
## Why This Matters
Many discussions of AI-enabled threat activity remain anchored on phishing, translation, and social engineering. Those uses matter, but they do not capture the full operational risk.
This project focuses on the more consequential question: whether LLMs are helping adversaries operate faster, at greater scale, with higher quality, and across more stages of the attack lifecycle.
The central analytic judgment is that LLMs do not need to create entirely new attack categories to matter. They can still materially change the threat environment by compressing the time, skill, and labor required to conduct existing adversary workflows at acceptable quality.
## Analytic Thesis
LLMs increase adversary capability by compressing the time, skill, and labor required to conduct technical and influence operations at acceptable quality.
The key defender concern is not that LLMs magically create elite operators. The concern is that they may make more actors faster, more scalable, more precise, and operationally adequate across more stages of the attack lifecycle.
This project is designed to avoid two common analytic failures:
1. **Hype:** treating every AI misuse case as proof of fully autonomous cyber operations.
2. **Complacency:** reducing LLM-enabled activity to phishing, translation, or generic social engineering.
## What the Tool Does
- A capability matrix
- Summary tables by capability category and maturity tier
- An executive intelligence assessment
- A source bibliography for traceability and auditability
## Core Analytic Model
The project assesses each claim across several dimensions:
- Capability category
- Attack lifecycle stage
- MITRE ATT&CK tactic mapping
- Technical depth
- Maturity tier
- Evidence grade
- Confidence level
- Scale effect
- Speed effect
- Quality / precision effect
- Skill compression effect
- Workflow chaining effect
- Underestimation risk
- Defender implication
## Capability Categories
The project uses straightforward analytic categories to assess what the LLM appears to enable:
- Reconnaissance and target profiling
- Vulnerability research and exploit support
- Malware development and modification
- Credential and access workflows
- Post-compromise data analysis
- Operational decision support
- Workflow chaining / agentic execution
- OPSEC, evasion, and safeguard bypass
- Social engineering and content generation
- Toolchain integration
- Cross-platform workflow integration
## Maturity Model
| Tier | Meaning |
|---|---|
| 0 | No meaningful LLM use |
| 1 | Content assistance: phishing, translation, persona text, influence content |
| 2 | Technical assistance: coding, debugging, CVE research, malware modification |
| 3 | Operational enablement: supports recon, access, credential theft, data triage, extortion |
| 4 | Workflow chaining: connects multiple attack stages with human supervision |
| 5 | Agentic execution: performs substantial portions of the operation with limited human intervention |
## MITRE ATT&CK Mapping
This project uses MITRE ATT&CK as a secondary operational layer, not as the primary analytic model.
ATT&CK is useful for translating findings into defender workflows such as threat hunting, detection engineering, and customer-facing recommendations. However, ATT&CK alone does not explain what the LLM changed about the actor's capability.
The custom capability taxonomy answers the analytic question:
The ATT&CK mapping answers the operational question:
## Methodology
The workflow is intentionally simple and auditable:
1. Collect public reporting on threat actor LLM use.
2. Convert relevant reporting into structured, source-grounded claims.
3. Score each claim by evidence quality, confidence, technical depth, and maturity tier.
4. Assess capability effects: scale, speed, quality/precision, skill compression, and workflow chaining.
5. Flag claims with high underestimation risk.
6. Generate analytic outputs from the structured dataset.
The goal is not to automate judgment away. The goal is to make judgment more structured, repeatable, and transparent.
## How to Run
Run the full pipeline:
python src/run_pipeline.py
Or run each step manually:
python src/validate_dataset.py
python src/generate_outputs.py
## Outputs
Generated outputs are written to the `outputs/` folder:
- `capability_matrix.csv`
- `claims_by_capability_category.csv`
- `claims_by_maturity_tier.csv`
- `executive_assessment.md`
- `source_bibliography.md`
## Project Structure
threat-actor-llm-usage-tracker/
├── data/
│ ├── llm_capability_claims.csv
│ └── source_notes.md
├── outputs/
│ ├── capability_matrix.csv
│ ├── claims_by_capability_category.csv
│ ├── claims_by_maturity_tier.csv
│ ├── executive_assessment.md
│ └── source_bibliography.md
├── prompts/
├── src/
│ ├── generate_outputs.py
│ ├── run_pipeline.py
│ └── validate_dataset.py
├── README.md
└── requirements.txt
## Source Traceability
Each claim in the structured dataset includes source information, evidence grading, confidence, and an analytic note. The generated source bibliography supports traceability and auditability.
The bibliography should be reviewed alongside:
data/llm_capability_claims.csv
and:
outputs/executive_assessment.md
## Analytic Standard
This project is designed to produce source-grounded, confidence-scored analysis that distinguishes observed behavior from analytic judgment.
The intended standard is not simply to summarize AI threat reporting. The goal is to identify the defender-relevant implication: how LLMs may change adversary speed, scale, quality, precision, skill requirements, and operational workflow integration.