its-me-anvesh-var/pentestx
GitHub: its-me-anvesh-var/pentestx
Stars: 0 | Forks: 0
# PentestX
### AI-Augmented Penetration Testing + SOC Triage Toolkit
██████╗ ███████╗███╗ ██╗████████╗███████╗███████╗████████╗██╗ ██╗
██╔══██╗██╔════╝████╗ ██║╚══██╔══╝██╔════╝██╔════╝╚══██╔══╝╚██╗██╔╝
██████╔╝█████╗ ██╔██╗██║ ██║ █████╗ ███████╗ ██║ ╚███╔╝
██╔═══╝ ██╔══╝ ██║╚████║ ██║ ██╔══╝ ╚════██║ ██║ ██╔██╗
██║ ███████╗██║ ╚███║ ██║ ███████╗███████║ ██║ ██╔╝╚██╗
╚═╝ ╚══════╝╚═╝ ╚═══╝ ╚═╝ ╚══════╝╚══════╝ ╚═╝ ╚═╝ ╚═╝
[](https://python.org)
[](LICENSE)
[]()
[]()
[]()
## What This Is
PentestX is a **modular CLI security toolkit** that unifies offensive penetration testing and defensive SOC triage operations under a single AI reasoning layer.
**The core problem it solves:** In most organisations, offensive security (pentest) and defensive security (SOC) operate in silos. A pentester finds a vulnerability and writes a report. A SOC analyst sees an alert and doesn't know if it maps to an active pentest or a real attack. PentestX collapses that gap — it scans, finds vulnerabilities, triages alerts, enriches IOCs, and generates detection queries, all in one session, all reasoned over by the same AI engine.
**What makes it architecturally unusual:**
The 4-provider AI fallback chain (`Ollama → Groq → HuggingFace → Claude`) is not a convenience feature. It mirrors how production security platforms handle AI availability in 24/7 SOC environments — if one provider fails, the platform doesn't go down. This design decision came directly from studying how Microsoft Security Copilot and CrowdStrike Charlotte AI handle LLM availability.
The RAG pipeline over a local knowledge base means the AI answers are grounded in MITRE ATT&CK and real CVE data — not hallucinated from training weights. This is the same architecture used in enterprise AI-SOC research (see Research Foundation below).
## Architecture
┌─────────────────────────────────────────────────────────────┐
│ toolkit.py (CLI) │
└──────┬──────────┬──────────┬──────────┬──────────┬──────────┘
│ │ │ │ │
┌────▼───┐ ┌───▼────┐ ┌───▼───┐ ┌───▼────┐ ┌───▼──────┐
│ RECON │ │ VULN │ │ CRACK │ │ TRIAGE │ │ REPORT │
│ nmap │ │ web │ │ hash │ │ splunk │ │ report │
│ subdom │ │ cve │ │ │ │ wazuh │ │ gen │
│ │ │ exploit│ │ │ │ log │ │ │
└────┬───┘ └───┬────┘ └───┬───┘ │ alert │ └───┬──────┘
│ │ │ │ vt │ │
│ │ │ │ abuse │ │
└─────────┴──────────┴─────┴────┬───┘ │
│ │
┌─────────────────────────▼─────────▼──────────┐
│ AI ENGINE │
│ Ollama (local) → Groq → HuggingFace → │
│ Claude (fallback) + RAG (ChromaDB/MITRE) │
│ 6 specialised methods: scan · CVE · triage │
│ detection query · hash · exploit suggestion │
└──────────────────────────────────────────────┘
│
┌─────────────────────────▼──────────────────────┐
│ OUTPUT │
│ Markdown reports · JSON · Splunk SPL · │
│ Sentinel KQL · Incident Reports · CSV │
└──────────────────────────────────────────────────┘
**Why this architecture matters in a real SOC:**
The bidirectional flow — scan results feeding triage, triage results feeding reports — mirrors how a Security Operations Center actually functions. A SOC analyst doesn't do recon in one tool, enrichment in another, and reporting in a third. PentestX treats those as one continuous workflow. This is the key design principle that separates it from a collection of scripts.
## 14 Modules — Full Reference
### Recon
| Module | What It Does | AI Layer |
|--------|-------------|----------|
| `nmap_scanner` | OS detection · service fingerprinting · port enumeration | AI summarises attack surface and maps to MITRE Initial Access techniques |
| `subdomain_enum` | DNS brute-force + crt.sh certificate transparency enumeration | AI analyses subdomain patterns for exposed assets and attack vectors |
### Vulnerability Assessment
| Module | What It Does | AI Layer |
|--------|-------------|----------|
| `web_scanner` | SQLi · XSS · open redirect detection via active payload testing | AI explains exploitation impact and maps to OWASP Top 10 |
| `cve_lookup` | NIST NVD API · CVEs by service + version · CVSS scoring | AI translates CVSS scores into analyst-readable risk summaries |
| `exploit_suggest` | Maps discovered services to CVEs · suggests exploitation approach | AI generates detection evasion considerations for red team use |
### Credential Analysis
| Module | What It Does | AI Layer |
|--------|-------------|----------|
| `hash_cracker` | Hash type identification (MD5/SHA1/SHA256/bcrypt) · offline wordlist attack | AI provides guidance on uncracked hashes and cracking strategy |
### SOC Triage
| Module | What It Does | AI Layer |
|--------|-------------|----------|
| `splunk_triage` | Splunk REST API · run SPL searches · fetch results | AI triages results and generates follow-up SPL detection queries |
| `wazuh_triage` | Wazuh manager API · high-severity alert ingestion | AI maps alerts to MITRE ATT&CK techniques and recommends response |
| `alert_parser` | Offline JSON alert ingestion (Splunk/Wazuh/Sentinel exports) | AI triage without live SIEM access |
| `log_parser` | IOC extraction from syslog/raw logs · IPs · hashes · domains · CVEs · URLs | 91% precision · 96% recall on 500-line benchmark dataset |
| `vt_enricher` | VirusTotal API v3 · hash/IP/URL enrichment | AI generates malware behavioral analysis from VT results |
| `abuseipdb` | AbuseIPDB reputation check · local caching layer | AI adds threat context and recommended action |
| `report_gen` | Compiles full session output into structured incident report | AI writes executive summary and MITRE ATT&CK appendix |
### AI Layer
| Component | What It Does |
|-----------|-------------|
| `ai_engine` | 4-provider fallback chain · 6 specialised analysis methods · zero-downtime design |
| `rag_pipeline` | LangChain + ChromaDB · local all-MiniLM-L6-v2 embeddings · <2s retrieval latency |
| `knowledge_base` | MITRE ATT&CK techniques · high-impact CVEs · Splunk SPL detection queries |
## Performance & Validation
| Module | Metric | Result | Test Conditions |
|--------|--------|--------|-----------------|
| IOC Extractor | Precision | 91% | 500 log lines — Apache, SSH, Wazuh |
| IOC Extractor | Recall | 96% | Same dataset |
| Web Scanner | Detection rate | SQLi + XSS confirmed | testphp.vulnweb.com (Acunetix test env) |
| Subdomain Enum | Coverage | crt.sh + DNS combined outperforms either alone by 34% | bugcrowd.com |
| Hash Cracker | Type ID accuracy | 100% | MD5, SHA1, SHA256, bcrypt |
| AI Engine | Availability | 99%+ | 4-provider fallback chain |
| RAG Pipeline | Retrieval latency | <2s | 47 chunks · local all-MiniLM-L6-v2 |
| Alert Triage | MITRE mapping accuracy | 3/3 alerts correctly mapped | Sample Wazuh alert dataset |
**Measurement methodology:**
IOC extraction precision and recall measured against a manually labelled ground-truth dataset of 500 log lines spanning Apache access logs, SSH authentication logs, and Wazuh alert exports. False positives were primarily RFC1918 addresses and UUID collision with MD5 pattern — addressed via CIDR exclusion and context-aware length filtering.
## How It Connects to the FinSecure SOC Platform
PentestX is **Module 1 (Offensive-Defensive Bridge)** of the FinSecure AI-Augmented SOC Platform:
PentestX scans FinSecure lab environment
│
├── Nmap findings ──────────────► SOC Home Lab (P3) Splunk for detection validation
├── CVE discoveries ────────────► BFSI Threat Intel (P4) for BFSI-specific context
├── IOC extractions ────────────► LLM TI Summariser (P5) for SPL/KQL generation
├── Wazuh triage ───────────────► SOC Home Lab (P3) alert correlation
└── AI reasoning ───────────────► RAG Assistant (P2) for MITRE framework grounding
│
▼
CyberSentinel AI receives
all PentestX findings as
structured threat intelligence
PentestX is the only module in the platform that operates on both sides of the kill chain — it generates the offensive findings that the rest of the platform learns to detect and respond to. Without it, the detection rules in the SOC lab have no adversarial validation.
## Quick Start
### macOS (Apple Silicon — M1/M2/M3)
git clone https://github.com/its-me-anvesh-var/pentestx
cd pentestx
bash setup.sh # installs nmap, Ollama, pulls llama3.2:3b, builds RAG
source venv/bin/activate
python toolkit.py
`setup.sh` handles everything: Homebrew nmap, Ollama + model pull, Python venv, pip install, ChromaDB RAG build, `.env` creation.
### Linux / Kali
git clone https://github.com/its-me-anvesh-var/pentestx
cd pentestx
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
sudo apt install nmap -y
cp .env.example .env
python toolkit.py
### Windows
git clone https://github.com/its-me-anvesh-var/pentestx
cd pentestx
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
# Install nmap from https://nmap.org/download.html
cp .env.example .env
python toolkit.py
## Configuration
cp .env.example .env
Minimum viable — Ollama runs fully offline with zero keys:
# Free at console.groq.com — 500K tokens/day, ~300 tok/s
GROQ_API_KEY=your_key_here
# Free at virustotal.com — 4 requests/minute
VT_API_KEY=your_key_here
# Free at abuseipdb.com — 1000 requests/day
ABUSEIPDB_API_KEY=your_key_here
# Optional — only if connecting to live Splunk instance
SPLUNK_HOST=your_splunk_host
SPLUNK_PORT=8089
SPLUNK_TOKEN=your_token
# Optional — only if connecting to live Wazuh instance
WAZUH_HOST=your_wazuh_host
WAZUH_PORT=55000
WAZUH_USER=your_user
WAZUH_PASS=your_password
## AI Provider Chain
Priority Provider Cost Rate Limit Best For
─────────────────────────────────────────────────────────────────────
1st Ollama (local) Free None Privacy · offline ops
2nd Groq API Free 500K tok/day Speed (~300 tok/s)
3rd HuggingFace API Free Limited Fallback
4th Claude API Paid Per token Highest quality
The engine tries providers in order. On failure or rate-limit it drops to the next within 500ms. Total AI availability: 99%+ across all four providers combined.
## Test Without Any API Keys
python toolkit.py → Option 10 (Log Parser) → samples/sample_syslog.txt
python toolkit.py → Option 9 (Alert Parser) → sample
python toolkit.py → Option 6 (Hash Cracker) → single → 5f4dcc3b5aa765d61d8327deb882cf99
python toolkit.py → Option 4 (CVE Lookup) → apache log4j
All four run fully offline with local Ollama.
## Project Structure
pentestx/
├── toolkit.py # CLI entry point · interactive menu
├── setup.sh # One-command macOS setup
├── requirements.txt
├── .env.example
│
├── config/
│ └── settings.py # Centralised config loader
│
├── ai/
│ ├── ai_engine.py # 4-provider fallback chain · 6 methods
│ ├── rag_pipeline.py # LangChain + ChromaDB RAG
│ └── knowledge_base/ # MITRE ATT&CK · CVEs · SPL queries
│
├── modules/
│ ├── recon/
│ │ ├── nmap_scanner.py
│ │ └── subdomain_enum.py
│ ├── vuln/
│ │ ├── web_scanner.py
│ │ ├── cve_lookup.py
│ │ └── exploit_suggest.py
│ ├── crack/
│ │ └── hash_cracker.py
│ └── triage/
│ ├── splunk_triage.py
│ ├── wazuh_triage.py
│ ├── alert_parser.py
│ ├── log_parser.py
│ ├── vt_enricher.py
│ ├── abuseipdb.py
│ └── report_gen.py
│
├── samples/ # Test data — syslog · alerts
└── output/ # All session results (gitignored)
## 📚 Research Foundation
This project is grounded in peer-reviewed academic literature. The following papers directly informed the architecture and design decisions:
| # | Paper | Key Insight Applied |
|---|-------|-------------------|
| 1 | (2025). *AI-Augmented SOC: A Survey of LLMs and Agents for Security Automation.* MDPI Systems, 5(4), 95 | AI agents reduce MTTD/MTTM by up to 6× — validates the automated triage design of PentestX's SOC modules |
| 2 | (2025). *Large Language Models for Security Operations Centers: A Comprehensive Survey.* arXiv:2509.10858 | LLMs in log analysis, alert triage, threat intel — foundational justification for the AI engine's 6 specialised methods |
| 3 | Fayyazi et al. (2024). *Advancing TTP Analysis: Harnessing LLMs with RAG.* arXiv:2401.00280 | RAG + LLM for TTP analysis — direct parallel to the `rag_pipeline` + MITRE knowledge base design |
| 4 | (2025). *Advancing Autonomous Incident Response: Leveraging LLMs and CTI.* arXiv:2508.10677 | RAG-based framework for automated IR — validates the `report_gen` module's AI-assisted incident narrative design |
| 5 | Arazzi et al. (2023). *NLP-Based Techniques for Cyber Threat Intelligence.* arXiv:2311.08807 | NLP for CTI data extraction — informs the `log_parser` IOC extraction design and precision/recall measurement methodology |
| 6 | (2024). *Actionable CTI using Knowledge Graphs and LLMs.* arXiv:2407.02528 | Enterprise CTI extraction with LLMs (Microsoft, CrowdStrike, Trend Micro) — validates the `vt_enricher` + `abuseipdb` AI enrichment design |
| 7 | (2025). *Revealing the True Indicators: Understanding and Improving IoC Extraction from Threat Reports.* arXiv:2506.11325 | Ground-truth methodology for IOC extraction benchmarking — directly used to design the 500-line benchmark dataset and precision/recall measurement |
## What I Learned Building This
**On bridging offense and defense:** The hardest design decision was making the triage modules genuinely useful, not just wrappers. A Splunk module that just runs a query and prints results doesn't help an analyst. The value is in the AI triage layer that maps results to MITRE techniques and suggests the next SPL query. That required understanding both what a pentester finds and what a SOC analyst needs to act on it.
**On the 4-provider AI chain:** Each provider has a different failure mode. Ollama fails when the model isn't pulled or hardware is insufficient. Groq fails on rate limits during heavy sessions. HuggingFace fails on cold-start latency. Claude fails when the API key is missing. Testing all four failure modes independently — and making the fallback transparent to the user — took longer than building the modules themselves.
**On IOC extraction precision:** Achieving 91% precision required understanding why false positives occur. RFC1918 private IP addresses (192.168.x.x, 10.x.x.x) are valid IOC patterns but almost never malicious in syslog context. UUID strings match MD5 length but aren't hashes. Building those exclusion rules required reading real log files, not synthetic test data.
**On RAG for security:** The knowledge base needs to be opinionated. A generic MITRE ATT&CK dump produces low-quality retrievals because every technique has similar language. Curating chunks around specific attack scenarios and defensive SPL queries significantly improved retrieval relevance over a naive ingestion approach.
## Roadmap
- [ ] Add MITRE D3FEND defensive countermeasure mapping to `exploit_suggest`
- [ ] Integrate Microsoft Sentinel KQL generation into `report_gen`
- [ ] Build FastAPI REST wrapper for programmatic SOC tool integration
- [ ] Add Shodan API module for passive external recon
- [ ] Connect to CyberSentinel AI as its offensive intelligence feed
- [ ] Add CVSS v4.0 scoring support to `cve_lookup`
## Legal & Ethics
- Only use against systems you own or have explicit written permission to test
- `SAFE_MODE=true` (default in `.env`) prompts confirmation before any active scan
- Designed for authorised penetration testing, CTF practice, and SOC analyst training
- All test validations performed against dedicated lab environments and intentionally vulnerable targets (testphp.vulnweb.com, personal lab)
## Tech Stack
`Python 3.9+` · `Nmap` · `LangChain` · `ChromaDB` · `Ollama` · `Groq API` · `HuggingFace Inference API` · `Anthropic Claude API` · `sentence-transformers` · `Rich` · `Requests` · `BeautifulSoup4` · `VirusTotal API v3` · `AbuseIPDB API` · `NIST NVD API` · `Splunk REST API` · `Wazuh REST API`
## Author
**Anvesh Raju Varadharaju**
M.S. Cybersecurity · UNC Charlotte | M.Tech AI · University of Hyderabad
- GitHub: [@its-me-anvesh-var](https://github.com/its-me-anvesh-var)
- LinkedIn: [linkedin.com/in/arv007](https://linkedin.com/in/arv007)
- Portfolio: [your-portfolio-url]
## License
MIT — see [LICENSE](LICENSE) for details.
*Part of the FinSecure AI-Augmented SOC Platform — an independent 24-month research and build initiative covering AI-powered SIEM, cloud threat monitoring, incident response automation, NLP threat intelligence, and GenAI compliance reporting. PentestX is the offensive-defensive bridge that validates all detection capabilities across the platform.*