Luekrit/Cloud-Security-Automation
GitHub: Luekrit/Cloud-Security-Automation
Stars: 1 | Forks: 0
# Cloud Security Automation & Remediation
**AWS | Terraform | IAM | EventBridge | Lambda | CloudTrail | SNS | Python**
A cloud security engineering project demonstrating **event-driven detection, alerting, and governance-aware remediation decision logic** using Terraform and AWS native services.
# Project Overview
This project implements a **self-healing cloud security architecture** that detects high-risk IAM policy attachment activity in near real time.
Instead of relying on manual incident response, the system:
- Monitors AWS API activity using CloudTrail
- Detects security-relevant events using EventBridge
- Triggers response logic using Lambda
- Sends security alerts using SNS email
- Evaluates governance-aware exceptions using IAM tags
- Supports automated remediation in a controlled manner
The current implementation focuses on **AdministratorAccess attachment detection** for IAM users and validates the control through **dry-run testing** before enabling real enforcement.
# Architecture Diagram
graph TD
classDef trigger fill:#ed2c13,stroke:#333,stroke-width:2px;
classDef logic fill:#326ee6,stroke:#333,stroke-width:2px;
classDef action fill:#d4772a,stroke:#333,stroke-width:2px;
classDef final fill:#7330e6,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5;
subgraph Detection_Layer [1. Detection]
A[IAM / Security Events
AttachUserPolicy]:::trigger --> B[AWS CloudTrail
Records API Activity] end subgraph Routing_Layer [2. Filtering] B --> C[Amazon EventBridge
Matches Security Event Patterns]:::logic end subgraph Logic_Layer [3. Logic Engine] C --> D[AWS Lambda
remediate.py]:::logic D --> D1[Parse Event Metadata] D --> D2[Evaluate Risk] D --> D3[Check Governance Exceptions] D --> D4[Decide Remediate or Skip] end subgraph Response_Layer [4. Response] D4 --> E[SNS Email Alert
Structured Security Notification]:::action D4 --> F[CloudWatch Logs
Audit Trail and Debugging]:::action D4 --> G[IAM Remediation
Detach Policy in Enforcement Mode]:::action end subgraph Outcome [5. Desired State] G --> H[Least Privilege Preserved]:::final E --> H F --> H end # Key Security Capabilities ## Event-Driven Threat Detection The system currently monitors IAM-related API activity, with the main validated use case focused on: - AttachUserPolicy EventBridge filters matching CloudTrail events in real time and invokes the Lambda response workflow. ## Governance-Aware Response Logic The Lambda response engine: - Parses incoming CloudTrail event metadata - Identifies targeted IAM users and risky policy attachments - Evaluates whether remediation is in scope - Checks approved exception tags such as: - `SecurityApproved=true` - Decides whether to remediate or skip This creates a more realistic security control by combining technical detection with **governance-aware exception handling**. ## SNS Alerting When a risky IAM event is detected, the system sends a structured SNS email alert containing: - Event name - Actor ARN - Target user - Policy ARN - Dry-run status - Remediation decision - Reason for approval or skip This improves operational visibility before full enforcement is enabled. ## Dry-Run Safety Mode The control currently operates in **dry-run mode**. This means: - Risky activity is still detected - Alerts are still sent - Decisions are still logged - but no actual IAM detachment is performed yet This allows safe validation before enabling live remediation. ## Security Logging & Visibility All remediation actions are logged to: - **Amazon CloudWatch Logs** This provides: - Audit trail for security actions - Debugging capability - Operational visibility - Evidence for validation and testing # Attack Simulation To validate the system, the following scenario is tested: 1. An IAM user is granted the **AdministratorAccess** policy 2. CloudTrail records the IAM policy change 3. EventBridge detects the matching event 4. Lambda evaluates the event 5. SNS sends a structured security alert 6. Lambda either: - approves remediation in dry-run mode, or - skips remediation if an approved exception tag is present # Terraform Infrastructure Infrastructure is deployed using **Terraform with secure best practices**. ## Infrastructure Design graph TD %% Define Styles classDef tool fill:#742fba,stroke:#fff,stroke-width:2px,color:#fff; classDef iam fill:#f6a800,stroke:#333,stroke-width:2px; classDef aws fill:#232f3e,stroke:#fff,stroke-width:2px,color:#fff; classDef storage fill:#3b48cc,stroke:#fff,stroke-width:2px,color:#fff; A[Terraform CLI
Local Machine / CI/CD]:::tool -->|sts:AssumeRole| B[TerraformExecutionRole
IAM Role]:::iam B -->|Provision Resources| C[AWS Infrastructure
VPC, Lambda, EventBridge]:::aws subgraph Remote_Backend [Remote State Management] D[S3 Bucket
Remote State Storage]:::storage E[DynamoDB Table
State Locking]:::storage D ---|Stores| F(terraform.tfstate):::storage E ---|Prevents| G(Concurrent Runs):::storage end C -.->|Update State| D A <-->|Check/Update Lock| E ## Key Infrastructure Features - Modular Terraform architecture - Secure access using **AssumeRole (no long-term credentials)** - Remote state storage in **S3** - State locking using **DynamoDB** - Reusable modules for IAM, Lambda, EventBridge, SNS, CloudTrail, and S3 - Separate global path in us-east-1 for IAM event handling - Hardened Lambda execution role with scoped IAM permissions - Controlled remediation scope limited to test IAM users matching `iam-test-*` # Phase 3.5: Infrastructure & Lambda Role Hardening Before enabling live remediation, I completed a hardening pass to improve the project’s Terraform state management and Lambda execution role permissions. This phase focused on reducing operational risk before moving from dry-run testing toward controlled enforcement. ## Remote State & Locking Hardening Terraform state was moved to a remote backend using: * **Amazon S3** for remote state storage * **Amazon DynamoDB** for state locking * Separate backend paths for bootstrap and environment state * Environment-specific state separation for safer infrastructure management This improves reliability by preventing local state drift and reducing the risk of concurrent Terraform runs modifying the same infrastructure. ## Lambda Execution Role Hardening The Lambda remediation policy was also tightened to reduce the blast radius of automated remediation. The original policy allowed IAM read and detach actions across all resources. This was acceptable for early testing, but too broad for a realistic security automation workflow. The updated Lambda execution role now limits permissions so the function can: * Read IAM user details and tags only for controlled test users matching `iam-test-*` * Detach only the AWS-managed `AdministratorAccess` policy * Apply remediation only to test IAM users matching the `iam-test-*` naming pattern * Publish alerts only to the project SNS topic This improves least-privilege posture while keeping the workflow functional for controlled validation. ## Phase 3.5 Validation After applying the Terraform changes, the workflow was retested in dry-run mode. Validation confirmed: * Terraform applied the IAM policy update successfully with `0 added, 1 changed, 0 destroyed` * SNS alerting continued to work * CloudWatch logs confirmed Lambda execution * Test A: user without an exception tag was approved for remediation in dry-run mode * Test B: user with `SecurityApproved=true` was detected but skipped for remediation * `DRY_RUN=true` remained enabled, so no live policy detachment occurred This confirms the automation can still detect risky IAM activity, send alerts, evaluate exception tags, and make remediation decisions after the Lambda role was restricted. # Validation Results This phase validated the end-to-end detection, alerting, and governance-aware exception handling of the project in **dry-run mode**. ## Test Scenario A — Unapproved AdministratorAccess attachment **Objective:** Confirm that the control detects a high-risk IAM policy attachment, sends an alert, and approves remediation when no exception applies. **Test action** - Attached `AdministratorAccess` to `iam-test-user` **Expected behavior** - CloudTrail records the IAM API event - EventBridge matches the event - Lambda is invoked in `us-east-1` - SNS email alert is sent - Remediation is approved - Because `DRY_RUN=true`, no actual detach occurs **Observed result** - Lambda logs showed: - `Security detection triggered` - `Parsed event` - `SNS alert processed` - `Dry run enabled - remediation skipped` - SNS email alert showed: - `approved_for_remediation: true` - `decision_reason: "Approved for remediation"` **Evidence** **Figure 1. Test A — CloudWatch log showing detection, SNS alerting, and dry-run remediation approval**  **Figure 2. Test A — SNS email alert showing remediation approved**  **Outcome** - Detection worked - Alerting worked - Remediation decision logic worked - Dry-run safety control worked ## Test Scenario B — Approved exception using IAM tag **Objective:** Confirm that the control still detects and alerts on the risky IAM event, but skips remediation when the target user has an approved exception tag. **Test action** - Added IAM user tag: - `SecurityApproved = true` - Attached `AdministratorAccess` to `iam-test-user` **Expected behavior** - CloudTrail records the IAM API event - EventBridge matches the event - Lambda is invoked in `us-east-1` - SNS email alert is sent - Remediation is **not** approved because the target user has an approved exception tag - Lambda logs the skip reason clearly **Observed result** - Lambda logs showed: - `Security detection triggered` - `Parsed event` - `SNS alert processed` - `No remediation performed` - SNS email alert showed: - `approved_for_remediation: false` - `decision_reason: "User has approved exception tag: SecurityApproved=true"` **Evidence** **Figure 3. Test B — CloudWatch log showing alerting and exception-based remediation skip**  **Figure 4. Test B — SNS email alert showing approved exception decision**  **Outcome** - Detection worked - Alerting worked - Governance-aware exception handling worked - Approved exceptions were skipped correctly ## Validation Summary These tests confirmed that the control can: - detect risky IAM policy attachment events - alert security teams through SNS email - support safe rollout using dry-run mode - apply governance-aware exception handling using IAM user tags This phase demonstrates a more realistic security engineering workflow: **CloudTrail → EventBridge → Lambda → SNS alert → Dry-run remediation decision** # Security Principles Demonstrated This project applies core cloud security engineering practices: - **Least Privilege Access Control** - **Event-Driven Security Automation** - **Infrastructure as Code (IaC) Security** - **Automated Incident Response** - **Cloud Identity Protection** # Technologies Used - Terraform - AWS IAM - AWS CloudTrail - Amazon EventBridge - AWS Lambda - Amazon CloudWatch Logs - Amazon SNS - Python # Why This Project Exists This project was inspired by a security lesson learned during earlier development, where improper credential handling highlighted how easily cloud misconfigurations can introduce risk. The goal of this project is to demonstrate how **automation and security engineering practices can prevent those risks from persisting in real environments**. # Future Improvements - Add detection for additional IAM abuse scenarios - Integrate alerting via **SNS / Slack notifications** - Expand remediation logic for broader security events - Integrate with **AWS Security Hub or SIEM tools** - Add anomaly detection for unusual API behavior
AttachUserPolicy]:::trigger --> B[AWS CloudTrail
Records API Activity] end subgraph Routing_Layer [2. Filtering] B --> C[Amazon EventBridge
Matches Security Event Patterns]:::logic end subgraph Logic_Layer [3. Logic Engine] C --> D[AWS Lambda
remediate.py]:::logic D --> D1[Parse Event Metadata] D --> D2[Evaluate Risk] D --> D3[Check Governance Exceptions] D --> D4[Decide Remediate or Skip] end subgraph Response_Layer [4. Response] D4 --> E[SNS Email Alert
Structured Security Notification]:::action D4 --> F[CloudWatch Logs
Audit Trail and Debugging]:::action D4 --> G[IAM Remediation
Detach Policy in Enforcement Mode]:::action end subgraph Outcome [5. Desired State] G --> H[Least Privilege Preserved]:::final E --> H F --> H end # Key Security Capabilities ## Event-Driven Threat Detection The system currently monitors IAM-related API activity, with the main validated use case focused on: - AttachUserPolicy EventBridge filters matching CloudTrail events in real time and invokes the Lambda response workflow. ## Governance-Aware Response Logic The Lambda response engine: - Parses incoming CloudTrail event metadata - Identifies targeted IAM users and risky policy attachments - Evaluates whether remediation is in scope - Checks approved exception tags such as: - `SecurityApproved=true` - Decides whether to remediate or skip This creates a more realistic security control by combining technical detection with **governance-aware exception handling**. ## SNS Alerting When a risky IAM event is detected, the system sends a structured SNS email alert containing: - Event name - Actor ARN - Target user - Policy ARN - Dry-run status - Remediation decision - Reason for approval or skip This improves operational visibility before full enforcement is enabled. ## Dry-Run Safety Mode The control currently operates in **dry-run mode**. This means: - Risky activity is still detected - Alerts are still sent - Decisions are still logged - but no actual IAM detachment is performed yet This allows safe validation before enabling live remediation. ## Security Logging & Visibility All remediation actions are logged to: - **Amazon CloudWatch Logs** This provides: - Audit trail for security actions - Debugging capability - Operational visibility - Evidence for validation and testing # Attack Simulation To validate the system, the following scenario is tested: 1. An IAM user is granted the **AdministratorAccess** policy 2. CloudTrail records the IAM policy change 3. EventBridge detects the matching event 4. Lambda evaluates the event 5. SNS sends a structured security alert 6. Lambda either: - approves remediation in dry-run mode, or - skips remediation if an approved exception tag is present # Terraform Infrastructure Infrastructure is deployed using **Terraform with secure best practices**. ## Infrastructure Design graph TD %% Define Styles classDef tool fill:#742fba,stroke:#fff,stroke-width:2px,color:#fff; classDef iam fill:#f6a800,stroke:#333,stroke-width:2px; classDef aws fill:#232f3e,stroke:#fff,stroke-width:2px,color:#fff; classDef storage fill:#3b48cc,stroke:#fff,stroke-width:2px,color:#fff; A[Terraform CLI
Local Machine / CI/CD]:::tool -->|sts:AssumeRole| B[TerraformExecutionRole
IAM Role]:::iam B -->|Provision Resources| C[AWS Infrastructure
VPC, Lambda, EventBridge]:::aws subgraph Remote_Backend [Remote State Management] D[S3 Bucket
Remote State Storage]:::storage E[DynamoDB Table
State Locking]:::storage D ---|Stores| F(terraform.tfstate):::storage E ---|Prevents| G(Concurrent Runs):::storage end C -.->|Update State| D A <-->|Check/Update Lock| E ## Key Infrastructure Features - Modular Terraform architecture - Secure access using **AssumeRole (no long-term credentials)** - Remote state storage in **S3** - State locking using **DynamoDB** - Reusable modules for IAM, Lambda, EventBridge, SNS, CloudTrail, and S3 - Separate global path in us-east-1 for IAM event handling - Hardened Lambda execution role with scoped IAM permissions - Controlled remediation scope limited to test IAM users matching `iam-test-*` # Phase 3.5: Infrastructure & Lambda Role Hardening Before enabling live remediation, I completed a hardening pass to improve the project’s Terraform state management and Lambda execution role permissions. This phase focused on reducing operational risk before moving from dry-run testing toward controlled enforcement. ## Remote State & Locking Hardening Terraform state was moved to a remote backend using: * **Amazon S3** for remote state storage * **Amazon DynamoDB** for state locking * Separate backend paths for bootstrap and environment state * Environment-specific state separation for safer infrastructure management This improves reliability by preventing local state drift and reducing the risk of concurrent Terraform runs modifying the same infrastructure. ## Lambda Execution Role Hardening The Lambda remediation policy was also tightened to reduce the blast radius of automated remediation. The original policy allowed IAM read and detach actions across all resources. This was acceptable for early testing, but too broad for a realistic security automation workflow. The updated Lambda execution role now limits permissions so the function can: * Read IAM user details and tags only for controlled test users matching `iam-test-*` * Detach only the AWS-managed `AdministratorAccess` policy * Apply remediation only to test IAM users matching the `iam-test-*` naming pattern * Publish alerts only to the project SNS topic This improves least-privilege posture while keeping the workflow functional for controlled validation. ## Phase 3.5 Validation After applying the Terraform changes, the workflow was retested in dry-run mode. Validation confirmed: * Terraform applied the IAM policy update successfully with `0 added, 1 changed, 0 destroyed` * SNS alerting continued to work * CloudWatch logs confirmed Lambda execution * Test A: user without an exception tag was approved for remediation in dry-run mode * Test B: user with `SecurityApproved=true` was detected but skipped for remediation * `DRY_RUN=true` remained enabled, so no live policy detachment occurred This confirms the automation can still detect risky IAM activity, send alerts, evaluate exception tags, and make remediation decisions after the Lambda role was restricted. # Validation Results This phase validated the end-to-end detection, alerting, and governance-aware exception handling of the project in **dry-run mode**. ## Test Scenario A — Unapproved AdministratorAccess attachment **Objective:** Confirm that the control detects a high-risk IAM policy attachment, sends an alert, and approves remediation when no exception applies. **Test action** - Attached `AdministratorAccess` to `iam-test-user` **Expected behavior** - CloudTrail records the IAM API event - EventBridge matches the event - Lambda is invoked in `us-east-1` - SNS email alert is sent - Remediation is approved - Because `DRY_RUN=true`, no actual detach occurs **Observed result** - Lambda logs showed: - `Security detection triggered` - `Parsed event` - `SNS alert processed` - `Dry run enabled - remediation skipped` - SNS email alert showed: - `approved_for_remediation: true` - `decision_reason: "Approved for remediation"` **Evidence** **Figure 1. Test A — CloudWatch log showing detection, SNS alerting, and dry-run remediation approval**  **Figure 2. Test A — SNS email alert showing remediation approved**  **Outcome** - Detection worked - Alerting worked - Remediation decision logic worked - Dry-run safety control worked ## Test Scenario B — Approved exception using IAM tag **Objective:** Confirm that the control still detects and alerts on the risky IAM event, but skips remediation when the target user has an approved exception tag. **Test action** - Added IAM user tag: - `SecurityApproved = true` - Attached `AdministratorAccess` to `iam-test-user` **Expected behavior** - CloudTrail records the IAM API event - EventBridge matches the event - Lambda is invoked in `us-east-1` - SNS email alert is sent - Remediation is **not** approved because the target user has an approved exception tag - Lambda logs the skip reason clearly **Observed result** - Lambda logs showed: - `Security detection triggered` - `Parsed event` - `SNS alert processed` - `No remediation performed` - SNS email alert showed: - `approved_for_remediation: false` - `decision_reason: "User has approved exception tag: SecurityApproved=true"` **Evidence** **Figure 3. Test B — CloudWatch log showing alerting and exception-based remediation skip**  **Figure 4. Test B — SNS email alert showing approved exception decision**  **Outcome** - Detection worked - Alerting worked - Governance-aware exception handling worked - Approved exceptions were skipped correctly ## Validation Summary These tests confirmed that the control can: - detect risky IAM policy attachment events - alert security teams through SNS email - support safe rollout using dry-run mode - apply governance-aware exception handling using IAM user tags This phase demonstrates a more realistic security engineering workflow: **CloudTrail → EventBridge → Lambda → SNS alert → Dry-run remediation decision** # Security Principles Demonstrated This project applies core cloud security engineering practices: - **Least Privilege Access Control** - **Event-Driven Security Automation** - **Infrastructure as Code (IaC) Security** - **Automated Incident Response** - **Cloud Identity Protection** # Technologies Used - Terraform - AWS IAM - AWS CloudTrail - Amazon EventBridge - AWS Lambda - Amazon CloudWatch Logs - Amazon SNS - Python # Why This Project Exists This project was inspired by a security lesson learned during earlier development, where improper credential handling highlighted how easily cloud misconfigurations can introduce risk. The goal of this project is to demonstrate how **automation and security engineering practices can prevent those risks from persisting in real environments**. # Future Improvements - Add detection for additional IAM abuse scenarios - Integrate alerting via **SNS / Slack notifications** - Expand remediation logic for broader security events - Integrate with **AWS Security Hub or SIEM tools** - Add anomaly detection for unusual API behavior