kanikevinay/aws-automated-incident-response

GitHub: kanikevinay/aws-automated-incident-response

一个基于 AWS 的事件驱动无服务器安全自动化工具，实时检测并自动修复将 SSH/RDP 端口暴露至公网的安全组配置错误。

Stars: 0 | Forks: 0

# ☁️🛡️ Cloud Guard: 自动化事件驱动响应

![AWS](https://img.shields.io/badge/AWS-Cloud%20Platform-FF9900?style=for-the-badge&logo=amazonaws&logoColor=white) ![Python](https://img.shields.io/badge/Python-3.12-3776AB?style=for-the-badge&logo=python&logoColor=white) ![Lambda](https://img.shields.io/badge/AWS-Lambda-FF9900?style=for-the-badge&logo=awslambda&logoColor=white) ![EventBridge](https://img.shields.io/badge/Amazon-EventBridge-FF4F8B?style=for-the-badge&logo=amazonaws&logoColor=white) ![Slack](https://img.shields.io/badge/Slack-Alerts-4A154B?style=for-the-badge&logo=slack&logoColor=white) ![Status](https://img.shields.io/badge/Status-Live%20%26%20Operational-brightgreen?style=for-the-badge) **一个生产级、完全无服务器的安全自动化流水线，构建于 AWS 之上。** 实时检测严重的防火墙配置错误并自动修复它们 —— 在攻击者能够利用它们之前。 *零人工干预。零延迟。对开放的管理端口零容忍。*

## 📐 架构概述 ``` [INSERT ARCHITECTURE DIAGRAM HERE] (Recommended: A flow diagram showing EC2 Security Group → CloudTrail → EventBridge → Lambda → EC2 Revoke + Slack Alert) ``` ## 🚨 问题：开放的管理端口是严重风险在云环境中，单条配置错误的防火墙规则就可能导致您的整个基础设施暴露在互联网上。最危险的违规项： | 端口 | 协议 | 服务 | 风险等级 | |------|----------|---------|------------| | **22** | TCP | SSH (Secure Shell) | 🔴 **严重** | | **3389** | TCP | RDP (Remote Desktop Protocol) | 🔴 **严重** | 当开发人员或管理员意外（或恶意）向 `0.0.0.0/0`（整个互联网）开放这些端口时，暴露窗口**立即**开始。自动化扫描器和僵尸网络持续探测开放的 SSH/RDP 端口 —— 在暴露的几分钟内，暴力攻击就会开始。 **传统的响应方式？** 等待人工分析师注意到警报、进行调查、升级并手动修复。这个过程可能需要**数小时**。 ## ✅ 自动化解决方案：Cloud Guard **Cloud Guard** 将该窗口缩短至** 10 秒以内**。一个事件驱动的 AWS 流水线实时监控环境中的每一个 API 调用。在安全组规则向公共互联网开放 SSH（端口 22）或 RDP（端口 3389）的瞬间，无服务器 Lambda 函数将： 1. 通过 EventBridge 规则触发器**检测**违规行为 2. **评估**新规则是否将敏感的管理端口暴露给 `0.0.0.0/0` 3. 使用 AWS SDK 自动**撤销**违规的防火墙规则 4. 使用内容丰富且详细的 Slack 警报**通知**安全团队 —— 包括已移除的确切规则、受影响的资源和时间戳这是**云规模的自动化威胁响应** —— 与企业 SOC 环境和 AWS Security Hub 修复中使用的模式相同。 ## 🏗️ 使用的 AWS 服务 | 服务 | 在流水线中的作用 | 资源名称 | |---------|-----------------|---------------| | **AWS IAM** | 管理员身份与最小权限 Lambda 权限 | `devsecops-admin` / `LambdaFirewallAccessPolicy` | | **AWS CloudTrail** | 所有 API 管理事件的实时审计日志 | `security-environment-trail` | | **Amazon EventBridge** | 匹配 `AuthorizeSecurityGroupIngress` 的事件规则触发器 | `DetectPublicFirewallChanges` | | **AWS Lambda** | 无服务器 Python 修复引擎 | `CloudSecurityGuard` | | **Amazon EC2 API** | 检测和修复的目标 (`revoke_security_group_ingress`) | Security Groups | | **Slack Webhooks** | 实时事件警报通道 | `#security-alerts` | ## 🔧 逐步实施 ### 步骤 1 — CloudTrail：审计基础所有 AWS 管理事件均由 `security-environment-trail` 捕获。这为环境中的每一个 API 调用创建了连续的、防篡改的审计日志 —— 这是整个检测流水线的基础数据源。 ### 步骤 2 — EventBridge：触发引擎一个名为 `DetectPublicFirewallChanges` 的 EventBridge 规则监控 CloudTrail 事件流，并在侦测到 `AuthorizeSecurityGroupIngress` API 调用的瞬间触发 —— 这是每当向安全组添加入站规则时调用的确切 AWS SDK 方法。 **EventBridge 规则事件模式：** ``` { "source": ["aws.ec2"], "detail-type": ["AWS API Call via CloudTrail"], "detail": { "eventSource": ["ec2.amazonaws.com"], "eventName": ["AuthorizeSecurityGroupIngress"] } } ``` ### 步骤 3 — IAM：最小权限 Lambda 函数的执行角色仅被授予完成其工作所需的最低权限 —— 仅此而已。这遵循了**最小权限原则**，这是云安全的基石。 **内联策略：`LambdaFirewallAccessPolicy`** ``` { "Version": "2012-10-17", "Statement": [ { "Sid": "FirewallRemediationPermissions", "Effect": "Allow", "Action": [ "ec2:RevokeSecurityGroupIngress", "ec2:DescribeSecurityGroups" ], "Resource": "*" }, { "Sid": "CloudWatchLogsPermissions", "Effect": "Allow", "Action": [ "logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents" ], "Resource": "arn:aws:logs:*:*:*" } ] } ``` ### 步骤 4 — Lambda：修复大脑 `CloudSecurityGuard` Lambda 函数是系统的核心。它使用 **Python 3.12** 编写，由 EventBridge 触发，解析 CloudTrail 事件 payload，检查策略违规，并立即进行修复。 **敏感配置（Slack Webhook URL）作为 Lambda 环境变量存储 —— 绝不硬编码在源代码中。** ## 🐍 Python 3 自动化脚本 ``` import boto3 import json import logging import os import urllib.request import urllib.error # 为 CloudWatch 配置 structured logging logger = logging.getLogger() logger.setLevel(logging.INFO) # 绝对不能暴露到公共互联网的敏感 admin ports SENSITIVE_PORTS = [22, 3389] PUBLIC_CIDR = "0.0.0.0/0" def lambda_handler(event, context): """ CloudSecurityGuard — Automated Firewall Remediation Engine Triggered by EventBridge when AuthorizeSecurityGroupIngress is called. Checks for public exposure of sensitive admin ports and remediates instantly. """ logger.info("CloudSecurityGuard triggered. Analysing event payload...") ec2_client = boto3.client("ec2") try: # Parse the CloudTrail event detail from the EventBridge wrapper detail = event.get("detail", {}) request_params = detail.get("requestParameters", {}) group_id = request_params.get("groupId") if not group_id: logger.warning("No groupId found in event. Skipping.") return {"statusCode": 200, "body": "No Security Group ID found in event."} logger.info(f"Inspecting Security Group: {group_id}") # Fetch the current inbound rules for the security group response = ec2_client.describe_security_groups(GroupIds=[group_id]) security_group = response["SecurityGroups"][0] sg_name = security_group.get("GroupName", "Unknown") inbound_rules = security_group.get("IpPermissions", []) violations_found = False for rule in inbound_rules: from_port = rule.get("FromPort", -1) to_port = rule.get("ToPort", -1) ip_ranges = rule.get("IpRanges", []) ip_protocol = rule.get("IpProtocol", "") for ip_range in ip_ranges: cidr = ip_range.get("CidrIp", "") # Check: Is a sensitive port exposed to the entire internet? if cidr == PUBLIC_CIDR and from_port in SENSITIVE_PORTS: port = from_port protocol = "SSH" if port == 22 else "RDP" logger.warning( f"VIOLATION DETECTED: Port {port} ({protocol}) open to {PUBLIC_CIDR} " f"on Security Group {group_id} ({sg_name}). Initiating remediation..." ) # --- AUTOMATED REMEDIATION --- ec2_client.revoke_security_group_ingress( GroupId=group_id, IpPermissions=[ { "IpProtocol": ip_protocol, "FromPort": from_port, "ToPort": to_port, "IpRanges": [{"CidrIp": PUBLIC_CIDR}], } ], ) logger.info( f"REMEDIATED: Port {port} ({protocol}) rule successfully " f"revoked from {group_id} ({sg_name})." ) violations_found = True # --- SLACK ALERT --- send_slack_alert(group_id, sg_name, port, protocol) if not violations_found: logger.info( f"Security Group {group_id} ({sg_name}) inspected. No violations found." ) return {"statusCode": 200, "body": "CloudSecurityGuard scan complete."} except Exception as e: logger.error(f"CloudSecurityGuard encountered an error: {str(e)}", exc_info=True) raise def send_slack_alert(group_id: str, sg_name: str, port: int, protocol: str): """ Sends a rich-formatted incident alert to the configured Slack channel using a webhook URL stored securely in Lambda environment variables. """ webhook_url = os.environ.get("SLACK_WEBHOOK_URL") if not webhook_url: logger.error("SLACK_WEBHOOK_URL environment variable not set. Cannot send alert.") return port_label = f"Port {port} ({protocol})" emoji = "🚨" slack_payload = { "blocks": [ { "type": "header", "text": { "type": "plain_text", "text": f"{emoji} SECURITY INCIDENT: Firewall Violation Auto-Remediated", "emoji": True, }, }, {"type": "divider"}, { "type": "section", "fields": [ { "type": "mrkdwn", "text": f"*Security Group ID:*\n`{group_id}`", }, { "type": "mrkdwn", "text": f"*Security Group Name:*\n`{sg_name}`", }, { "type": "mrkdwn", "text": f"*Offending Rule:*\n`{port_label} → 0.0.0.0/0 (Internet)`", }, { "type": "mrkdwn", "text": "*Action Taken:*\n`Rule REVOKED automatically ✅`", }, ], }, {"type": "divider"}, { "type": "context", "elements": [ { "type": "mrkdwn", "text": "⚙️ *Cloud Guard* | Automated Incident Response | AWS Lambda `CloudSecurityGuard`", } ], }, ] } payload_bytes = json.dumps(slack_payload).encode("utf-8") try: req = urllib.request.Request( webhook_url, data=payload_bytes, headers={"Content-Type": "application/json"}, method="POST", ) with urllib.request.urlopen(req, timeout=5) as resp: logger.info(f"Slack alert sent successfully. HTTP Status: {resp.status}") except urllib.error.URLError as e: logger.error(f"Failed to send Slack alert: {e.reason}") ``` ## 🔥 实战测试与验证这才是真正见真章的时刻。以下测试端到端地验证了整个流水线。 ### 测试流程 **1. 识别一个测试 Security Group** 在 AWS Console 中导航到 EC2 → Security Groups。选择任何非生产环境的 security group 作为测试目标。 **2. 添加“不良”入站规则（模拟配置错误）** 手动添加一条入站规则： | 类型 | 协议 | 端口范围 | 来源 | 描述 | |------|----------|------------|--------|-------------| | SSH | TCP | 22 | 0.0.0.0/0 | ⚠️ 测试 —— 故意设置的错误规则 | **3. 保存规则并开始计时** 点击 **Save rules**。在 `AuthorizeSecurityGroupIngress` API 调用到达的那一刻，CloudTrail 会捕获它，EventBridge 触发，Lambda 唤醒。 **4. 观察规则消失** 刷新 Inbound Rules 标签页。在** 10 秒以内**，违规规则将不复存在 —— 已由 `CloudSecurityGuard` 自动撤销。 **5. 在 Lambda CloudWatch Logs 中确认** 导航到 Lambda → `CloudSecurityGuard` → Monitor → View CloudWatch Logs。确认执行日志显示： ``` [WARNING] VIOLATION DETECTED: Port 22 (SSH) open to 0.0.0.0/0 on Security Group sg-XXXXXXXXX... [INFO] REMEDIATED: Port 22 (SSH) rule successfully revoked from sg-XXXXXXXXX... [INFO] Slack alert sent successfully. HTTP Status: 200 ``` **6. 验证 Slack 警报** 检查 `#security-alerts` Slack 频道。一个内容丰富的事故卡片应该已经实时送达： ## 💩 展示的专业能力本项目是对 **Cloud Support**、**DevSecOps** 和 **SOC** 角色中最关键技能的真实证明： ### 🔐 安全运营 (SOC) - **威胁检测：** 设计并实施针对特定高危 API 调用（`AuthorizeSecurityGroupIngress`）的检测规则 —— 相当于在真实的 SOC 中编写 SIEM 警报规则 - **自动化响应：** 构建了一个 SOAR 式的 playbook，在控制威胁时完全不需要人工干预 - **事故通知：** 为安全分析师设计了一种结构化、可操作的警报格式 —— 遵循 SOC 仪表板和值班警报的最佳实践 ### ☁️ 云工程与支持 - **AWS 服务集成：** 展示了在 IAM、CloudTrail、EventBridge、Lambda 和 EC2 方面的实际操作能力 —— AWS 安全栈的核心服务 - **无服务器架构：** 构建并部署了具备适当错误处理、结构化日志记录和环境变量管理的生产级无服务器函数 - **故障排除：** 独立调试了事件 schema、IAM 权限边界和 CloudWatch 日志追踪 ### 🔒 DevSecOps 原则 - **最小权限 IAM：** 编写了极简的内联策略，仅授予所需的两个 EC2 权限 —— 绝不多给 - **密钥管理：** 将 webhook 凭据存储为 Lambda 环境变量，绝不放在源代码中 - **基础设施即代码思维：** 每个组件都经过了明确的配置、记录，并且可重现 - **安全左移：** 将传统上需要人工审查的安全控制自动化 —— 将 MTTD（平均检测时间）和 MTTR（平均响应时间）缩短至 10 秒以内 ### 🐍 软件工程 - **Python 3.12：** 具备结构化日志记录、异常处理、类型化函数签名和清晰的关注点分离的生产级脚本 - **AWS SDK (boto3)：** 熟练使用 EC2 client 方法、响应解析和 API 调用构造 - **Webhook 集成：** 完全使用 Python 的标准库（`urllib`）从头构建了 Slack Block Kit payload —— 无第三方依赖 ## 📁 仓库结构 ``` cloud-guard/ ├── README.md # This file ├── lambda/ │ └── cloud_security_guard.py # Main Lambda handler (Python 3.12) ├── iam/ │ └── lambda_firewall_policy.json # Least-privilege IAM inline policy ├── eventbridge/ │ └── event_pattern.json # EventBridge rule event pattern ├── docs/ │ ├── architecture-diagram.png # System architecture diagram │ └── screenshots/ # Console screenshots for README └── tests/ └── test_handler.py # Unit tests for Lambda handler logic ``` ## 🚀 如何部署本项目 **1. 部署 Lambda 函数** ``` # 压缩 function code zip cloud_security_guard.zip lambda/cloud_security_guard.py # 创建 Lambda function aws lambda create-function \ --function-name CloudSecurityGuard \ --runtime python3.12 \ --role arn:aws:iam:::role/ \ --handler cloud_security_guard.lambda_handler \ --zip-file fileb://cloud_security_guard.zip ``` **2. 设置 Slack Webhook 环境变量** ``` aws lambda update-function-configuration \ --function-name CloudSecurityGuard \ --environment "Variables={SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/WEBHOOK/URL}" ``` **3. 启用 CloudTrail** ``` aws cloudtrail create-trail \ --name security-environment-trail \ --s3-bucket-name \ --is-multi-region-trail aws cloudtrail start-logging --name security-environment-trail ``` **4. 创建 EventBridge Rule** ``` aws events put-rule \ --name DetectPublicFirewallChanges \ --event-pattern file://eventbridge/event_pattern.json \ --state ENABLED aws events put-targets \ --rule DetectPublicFirewallChanges \ --targets "Id=CloudSecurityGuardTarget,Arn=arn:aws:lambda:::function:CloudSecurityGuard" ``` ## 👤 作者 **KANIKE VINAY ** 计算机科学（人工智能）专业本科最后一年 TRAINEE AT EXPOSYS DATA LABS 专注于云安全、DevSecOps 和安全运营 [![LinkedIn](https://img.shields.io/badge/LinkedIn-Connect-0A66C2?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/kanikevinay12/) [![GitHub](https://img.shields.io/badge/GitHub-Follow-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/kanikevinay)

*使用 ☁️ + 🛡️ + ☕ 在 Amazon Web Services 上构建* *"安全不是产品，而是过程。" — Bruce Schneier*

标签：AWS, DevSecOps, DPI, 上游代理, 事件驱动, 自动响应, 逆向工具