NarahariRaghava/Pre-Deployment-Detection-of-Terraform-Security-Misconfigurations-Using-Machine-Learning

GitHub: NarahariRaghava/Pre-Deployment-Detection-of-Terraform-Security-Misconfigurations-Using-Machine-Learning

Stars: 0 | Forks: 0

# Terraform Security Misconfiguration Detector A CLI tool that scans Terraform `.tf` files and flags security risks before they reach production, built after reviewing the infrastructure of 45+ cloud applications and seeing the same misconfigurations surface repeatedly. ![CI](https://static.pigsec.cn/wp-content/uploads/repos/2026/06/fec0e9acb4161532.svg) ## The Problem Terraform is how most teams manage AWS infrastructure. It's fast and repeatable, but that speed cuts both ways: a single misconfigured resource block, such as an open SSH port, a public S3 bucket, or a wildcard IAM policy, gets deployed just as easily as a correct one. Most tools catch these issues after deployment, during a cloud security audit or an incident. This tool catches them at the source, before `terraform apply` is ever run. ## What It Does Point it at a `.tf` file or a project folder. It finds every resource block, runs it through a trained ML model, and tells you what's risky and why. Resource : aws_security_group.open_ssh Risk : High (High: 94% | Low: 2% | Medium: 4%) Reason : CIDR range is open to the entire internet (0.0.0.0/0); SSH (port 22) is exposed. Resource : aws_db_instance.reporting_db Risk : Medium (High: 13% | Low: 10% | Medium: 76%) Reason : the database is publicly accessible. Resource : aws_db_instance.secure_db Risk : Low (High: 1% | Low: 73% | Medium: 26%) Reason : No high-risk indicators detected. Summary : High=3 Medium=1 Low=5 It also generates a colour-coded HTML report you can share or open in a browser. ## How It Works **2. Extract:** each block is checked against 17 security indicators across common AWS resource types (open CIDR ranges, exposed ports, unencrypted storage, hardcoded secrets, and more). Each indicator is a 1 or 0. The block becomes a row of 17 numbers. **3. Classify:** that row goes into a trained Random Forest model, which outputs Low, Medium, or High risk along with a per-class confidence score. **4. Explain:** whichever indicators fired are translated into a plain-English reason, so engineers know exactly what to fix. ## Resource Types Covered | Resource | What gets flagged | |---|---| | `aws_security_group` / `aws_security_group_rule` | SSH/RDP open to internet, DB ports exposed, IPv6 open | | `aws_s3_bucket` | Public-read ACL, missing public access block | | `aws_iam_policy` | Wildcard action or resource | | `aws_db_instance` | Publicly accessible, storage not encrypted | | `aws_instance` | Public IP assigned, unencrypted EBS, hardcoded credentials | | `aws_lambda_function` | Hardcoded passwords or tokens in environment variables | | `aws_lb_listener` | Plain HTTP with no redirect to HTTPS | | `aws_cloudtrail` | Logging explicitly disabled | ## Models Three classifiers were trained and compared on a balanced dataset of 300 labeled Terraform snippets: | Model | Test Accuracy | 5-Fold CV | |---|---|---| | Logistic Regression | **81.33%** | 78.67% ± 3.71% | | Random Forest | 77.33% | 78.33% ± 3.50% | | Decision Tree | 77.33% | 77.00% ± 3.23% | Random Forest is the default, providing per-class confidence scores and never misclassifying a High-risk resource as Low. Logistic Regression achieves the highest raw accuracy, which makes sense: the binary feature space is largely linearly separable. ## Project Structure terraform-security-ml/ ├── data/ │ ├── generate_dataset.py │ ├── terraform_dataset.csv │ └── sample_tf/ │ └── example.tf ├── src/ │ ├── feature_extractor.py │ ├── model_trainer.py │ ├── predictor.py │ └── report_generator.py ├── notebooks/ │ └── exploration.ipynb ├── outputs/ │ ├── evaluation_report.json / .txt │ ├── confusion_matrix_*.png │ ├── feature_importance_*.png │ ├── scan_*.json / .txt / .html │ └── models/ ├── main.py └── requirements.txt ## Setup cd terraform-security-ml python3 -m venv venv source venv/bin/activate pip install -r requirements.txt ## Usage Run the tool: python main.py This launches an interactive menu where you can choose to scan a file, scan a directory, train the models, or run demo predictions. You can also use command-line arguments directly: python main.py --file path/to/main.tf # Scan a single file python main.py --dir path/to/project/ # Scan a project directory python main.py --train # Train the models python main.py --predict # Run demo predictions After scanning, open the generated HTML report: open outputs/scan_.html