Abhiram-Rakesh/Three-Tier-EKS-Terraform

GitHub: Abhiram-Rakesh/Three-Tier-EKS-Terraform

一个端到端的 DevSecOps 示例项目，演示如何在 AWS EKS 上通过 IaC、CI/CD 与 GitOps 实现安全、可观测的生产级部署。

Stars: 18 | Forks: 16

# H&M Fashion Clone — AWS EKS 上的 DevSecOps ## 架构图 ![架构图](https://static.pigsec.cn/wp-content/uploads/repos/2026/04/3bfbf7c5c6140003.png) ## 技术栈 | 层 | 技术 | 用途 | |----------------|-----------------------------|---------------------------------------| | 前端 | React 18 + Nginx 1.25 | 通过 Nginx 反向代理提供 SPA | | 后端 | Node.js 18 + Express | 带 JWT 认证的 REST API | | 数据库 | PostgreSQL 15 | 关系型存储，EBS 后端 PVC | | 存储 | AWS EBS gp2（动态） | PostgreSQL 数据的持久卷 | | 容器构建 | Docker（多阶段） | 非 root 镜像，最小攻击面 | | CI | Jenkins（EC2 上） | 6 阶段安全加固流水线 | | CD / GitOps | ArgoCD（集群内） | 轮询 GitHub，自动同步清单 | | IaC | Terraform >= 1.5 | VPC、EKS、ECR、IAM、EC2 | | 容器注册表 | AWS ECR 私有 | 私有镜像存储，支持推送时扫描 | | ECR 认证 | IRSA | IAM 角色绑定到 K8s ServiceAccount | | 代码质量 | SonarQube LTS 社区版 | 静态分析 + 质量门禁 | | 依赖扫描 | Trivy | 文件系统与容器镜像漏洞扫描 | | 监控 | Prometheus + Grafana | 指标收集与仪表盘 | | 入口 | AWS ALB 控制器 + IngressClass| 面向互联网的 ALB，路径路由 | | HPA | autoscaling/v2 | 基于 CPU 与内存的自动伸缩 | | 集群伸缩 | Cluster Autoscaler | 节点级扩容 | ## 先决条件 ### 本地机器要求 #### 1. AWS CLI v2 AWS CLI 允许你通过终端与 AWS 服务交互。 ``` curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o awscliv2.zip unzip awscliv2.zip sudo ./aws/install aws --version # Expected: aws-cli/2.x.x Python/3.x.x Linux/... ``` 配置凭证： ``` aws configure # AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE # AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY # Default region name [None]: ap-south-1 # Default output format [None]: json ``` - **验证：** `aws sts get-caller-identity` 返回你的账户 ID 和 ARN。 #### 2. Terraform >= 1.5 Terraform 以声明方式创建所有 AWS 基础设施。 ``` wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list sudo apt-get update && sudo apt-get install terraform terraform --version # Expected: Terraform v1.5.x or higher ``` - **验证：** `terraform --version` 显示 1.5+。 #### 3. kubectl 用于与 EKS 集群交互的 Kubernetes 命令行工具。 ``` KUBECTL_VER=$(curl -fsSL https://dl.k8s.io/release/stable.txt) curl -fsSLO "https://dl.k8s.io/release/${KUBECTL_VER}/bin/linux/amd64/kubectl" sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl kubectl version --client # Expected: Client Version: v1.29.x ``` - **验证：** `kubectl version --client` 显示版本号。 #### 4. Helm v3 Helm 是 Kubernetes 的包管理器 — 用于安装 ALB 控制器、自动伸缩器、Prometheus 和 Grafana。 ``` curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash helm version # Expected: version.BuildInfo{Version:"v3.x.x", ...} ``` - **验证：** `helm version` 显示 v3.x。 #### 5. Docker Docker 构建前端和后端的容器镜像。 ``` sudo apt-get install -y docker.io sudo systemctl enable docker && sudo systemctl start docker sudo usermod -aG docker $USER newgrp docker docker --version # Expected: Docker version 24.x.x, build ... ``` - **验证：** `docker run hello-world` 无权限错误完成。 #### 6. 支持工具（jq、git、curl） ``` sudo apt-get install -y jq git curl jq --version # jq-1.6 git --version # git version 2.x.x curl --version # curl 7.x.x ``` #### 7. 用于 Jenkins SSH 的 EC2 密钥对需要 SSH 密钥对以访问 Jenkins EC2 进行设置（第 12 步）。 ``` # Create the key pair in ap-south-1 aws ec2 create-key-pair --key-name hm-eks-key --region ap-south-1 --query KeyMaterial --output text > hm-eks-key.pem chmod 400 hm-eks-key.pem ``` - **验证：** `ls -la hm-eks-key.pem` 显示 `-r--------` 权限。 #### 8. GitHub 仓库将本仓库 Fork 或克隆到你的 GitHub 账户下： ``` git clone https://github.com/Abhiram-Rakesh/Three-Tier-EKS-Terraform.git cd Three-Tier-EKS-Terraform ``` 或新建仓库： ``` git init git remote add origin https://github.com/Abhiram-Rakesh/Three-Tier-EKS-Terraform.git git add . && git commit -m "Initial commit" git push -u origin main ``` ### AWS IAM 要求你的 IAM 用户/角色需要附加以下策略以便 Terraform 创建基础设施： | 策略 | 用途 | |------|------| | `AmazonEKSFullAccess` | 创建与管理 EKS 集群 | | `AmazonEC2FullAccess` | 配置 EC2、VPC、子网、安全组 | | `AmazonVPCFullAccess` | 创建 VPC、子网、路由表、NAT 网关 | | `AmazonECR_FullAccess` | 创建 ECR 仓库与生命周期策略 | | `IAMFullAccess` | 创建 IRSA 角色、OIDC 提供者、策略 | | 内部 ECR 拉取策略 | 允许节点从 ECR 拉取（由 irsa.tf 添加） | ## 部署 — 分步说明 ### 第 1 步 — 克隆仓库 ``` git clone https://github.com/Abhiram-Rakesh/Three-Tier-EKS-Terraform.git cd Three-Tier-EKS-Terraform ``` 预期输出： ``` Cloning into 'Three-Tier-EKS-Terraform'... remote: Enumerating objects: 87, done. Receiving objects: 100% (87/87), done. ``` - **成功指示：** `ls` 显示 Jenkinsfile、terraform/、k8s_manifests/、app/ ### 第 2 步 — 配置基础设施（Terraform） ``` cd terraform terraform init -input=false terraform plan -out=tfplan -input=false terraform apply tfplan ``` 预期输出（apply 末尾）： ``` Apply complete! Resources: 34 added, 0 changed, 0 destroyed. Outputs: cluster_endpoint = "https://XXXXXXXXXXXXXXXX.gr7.ap-south-1.eks.amazonaws.com" ecr_frontend_url = "123456789012.dkr.ecr.ap-south-1.amazonaws.com/hm-frontend" ecr_backend_url = "123456789012.dkr.ecr.ap-south-1.amazonaws.com/hm-backend" ecr_pull_role_arn_frontend = "arn:aws:iam::123456789012:role/hm-shop-frontend-ecr-role" ecr_pull_role_arn_backend = "arn:aws:iam::123456789012:role/hm-shop-backend-ecr-role" alb_controller_role_arn = "arn:aws:iam::123456789012:role/hm-shop-alb-controller-role" jenkins_public_ip = "13.233.x.x" aws_account_id = "123456789012" ``` 导出输出为 Shell 变量： ``` export AWS_ACCOUNT_ID=$(terraform output -raw aws_account_id) export ECR_FRONTEND=$(terraform output -raw ecr_frontend_url) export ECR_BACKEND=$(terraform output -raw ecr_backend_url) export JENKINS_IP=$(terraform output -raw jenkins_public_ip) cd .. ``` - **成功指示：** `aws eks list-clusters --region ap-south-1` 显示 `three-tier-cluster`。 ### 第 3 步 — 配置 kubectl ``` aws eks update-kubeconfig --region ap-south-1 --name three-tier-cluster kubectl get nodes ``` 预期输出： ``` NAME STATUS ROLES AGE VERSION ip-10-0-10-xx.ap-south-1.compute.internal Ready 2m v1.29.x ip-10-0-11-xx.ap-south-1.compute.internal Ready 2m v1.29.x ``` - **成功指示：** 两个节点均显示 `STATUS=Ready`。 ### 第 4 步 — 安装 AWS EBS CSI 驱动 **为何关键：** 若无 EBS CSI 驱动，PostgreSQL PersistentVolumeClaim 将永远处于 `Pending`，Pod 无法启动。 ``` aws eks create-addon --cluster-name three-tier-cluster --addon-name aws-ebs-csi-driver --region ap-south-1 # Poll until ACTIVE watch aws eks describe-addon --cluster-name three-tier-cluster --addon-name aws-ebs-csi-driver --region ap-south-1 --query "addon.status" --output text ``` 预期输出（2–3 分钟后）： ``` ACTIVE ``` 验证驱动 Pod 是否运行： ``` kubectl get pods -n kube-system | grep ebs ``` 预期： ``` ebs-csi-controller-xxxxxxxxx-xxxxx 6/6 Running 0 2m ebs-csi-node-xxxxx 3/3 Running 0 2m ebs-csi-node-xxxxx 3/3 Running 0 2m ``` - **成功指示：** `ACTIVE` 状态且控制器 Pod 处于 `Running`。 ### 第 5 步 — 应用 StorageClass 和 IngressClass ``` kubectl apply -f k8s_manifests/storageclass.yaml kubectl apply -f k8s_manifests/ingressclass.yaml kubectl get storageclass kubectl get ingressclass ``` 预期输出： ``` NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION hm-ebs-gp2 ebs.csi.aws.com Retain WaitForFirstConsumer true NAME CONTROLLER PARAMETERS AGE alb ingress.k8s.aws/alb 5s ``` - **成功指示：** `hm-ebs-gp2` StorageClass 与 `alb` IngressClass 出现。 ### 第 6 步 — 安装 AWS 负载均衡器控制器 ``` # Get VPC ID VPC_ID=$(aws eks describe-cluster --name three-tier-cluster --region ap-south-1 --query "cluster.resourcesVpcConfig.vpcId" --output text) helm repo add eks https://aws.github.io/eks-charts helm repo update helm upgrade --install aws-load-balancer-controller eks/aws-load-balancer-controller --namespace kube-system --set clusterName=three-tier-cluster --set region=ap-south-1 --set vpcId=${VPC_ID} --set serviceAccount.create=true --set serviceAccount.name=aws-load-balancer-controller --set "serviceAccount.annotations.eks\.amazonaws\.com/role-arn=arn:aws:iam::${AWS_ACCOUNT_ID}:role/hm-shop-alb-controller-role" --wait kubectl get deployment -n kube-system aws-load-balancer-controller kubectl get pods -n kube-system | grep aws-load-balancer ``` 预期： ``` NAME READY UP-TO-DATE AVAILABLE AGE aws-load-balancer-controller 2/2 2 2 60s ``` - **成功指示：** 部署显示 `2/2 READY`。 ### 第 7 步 — 安装集群自动伸缩器 ``` helm repo add autoscaler https://kubernetes.github.io/autoscaler helm repo update helm upgrade --install cluster-autoscaler autoscaler/cluster-autoscaler --namespace kube-system --set autoDiscovery.clusterName=three-tier-cluster --set awsRegion=ap-south-1 --set rbac.serviceAccount.create=true --set rbac.serviceAccount.name=cluster-autoscaler --set "rbac.serviceAccount.annotations.eks\.amazonaws\.com/role-arn=arn:aws:iam::${AWS_ACCOUNT_ID}:role/hm-shop-cluster-autoscaler-role" --wait kubectl get deployment -n kube-system cluster-autoscaler ``` - **成功指示：** `cluster-autoscaler` 部署显示 `1/1 READY`。 ### 第 8 步 — 安装 Metrics Server（自动伸缩所需） **原因：** 若无 Metrics Server，HorizontalPodAutoscaler 无法读取 CPU/内存指标，目标将显示 ``。 ``` kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml # Wait ~60s then verify kubectl top nodes ``` 预期输出： ``` NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-10-0-10-xx.ap-south-1.compute.internal 120m 6% 820Mi 27% ip-10-0-11-xx.ap-south-1.compute.internal 115m 5% 790Mi 26% ``` - **成功指示：** `kubectl top nodes` 显示 CPU% 与 MEMORY%（无错误）。 ### 第 9 步 — 注入 AWS 账户 ID 到清单 K8s 清单包含 `` 占位符，用于 ECR 镜像 URL 与 IRSA 角色 ARN。请替换为真实账户 ID： ``` export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) export GITHUB_USER= # Replace in all K8s manifests find k8s_manifests/ -name "*.yaml" -exec sed -i "s||${AWS_ACCOUNT_ID}|g" {} \; # Replace GitHub username in ArgoCD application sed -i "s||${GITHUB_USER}|g" argocd/application.yaml # Replace in Jenkinsfile sed -i "s||${AWS_ACCOUNT_ID}|g" Jenkinsfile sed -i "s||${GITHUB_USER}|g" Jenkinsfile # Commit and push so ArgoCD can read the updated manifests git add k8s_manifests/ argocd/ Jenkinsfile git commit -m "CI: Inject AWS Account ID ${AWS_ACCOUNT_ID} into manifests" git push origin main ``` - **成功指示：** `grep '' k8s_manifests/**/*.yaml` 无输出。 ### 第 10 步 — 安装 ArgoCD ``` kubectl create namespace argocd kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml # Wait for ArgoCD server to be ready kubectl wait deployment/argocd-server --namespace argocd --for=condition=Available --timeout=180s # Get initial admin password ARGOCD_PASS=$(kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d) echo "ArgoCD password: ${ARGOCD_PASS}" # Deploy the hm-shop Application kubectl apply -f argocd/application.yaml ``` **公开 ArgoCD：** 默认 ArgoCD 服务为 `ClusterIP`。需改为 `LoadBalancer` 以直接访问： ``` kubectl patch svc argocd-server -n argocd -p '{"spec": {"type": "LoadBalancer"}}' # Wait for the external IP to be assigned (1-3 minutes) kubectl get svc argocd-server -n argocd --watch ``` 获取 `EXTERNAL-IP` 后： ``` ARGOCD_URL=$(kubectl get svc argocd-server -n argocd -o jsonpath='{.status.loadBalancer.ingress[0].hostname}') echo "ArgoCD URL: https://${ARGOCD_URL}" ``` 在浏览器打开 `https://`。ArgoCD 会将 HTTP 重定向至 HTTPS 并使用自签名证书 — 浏览器会出现证书警告。点击 **Advanced → Proceed anyway** 继续。凭证： - 用户名：`admin` - 密码：上方 `ARGOCD_PASS`的输出 **ArgoCD 自动操作：** 它会每 3 分钟轮询仓库的 `k8s_manifests/` 路径。Jenkins 推送新镜像标签（Stage 7）后，ArgoCD 检测到提交并自动应用清单，完成 GitOps 闭环。 - **成功指示：** ArgoCD UI 显示 `hm-shop` 应用状态为 `Synced` 且 `Healthy`。 ### 第 11 步 — 验证应用 Pod ``` kubectl get pods -n hm-shop --watch ``` 预期输出（全部 Running）： ``` NAME READY STATUS RESTARTS AGE backend-xxxxxxxxx-xxxxx 1/1 Running 0 3m backend-xxxxxxxxx-yyyyy 1/1 Running 0 3m frontend-xxxxxxxxx-xxxxx 1/1 Running 0 3m frontend-xxxxxxxxx-yyyyy 1/1 Running 0 3m postgres-xxxxxxxxx-xxxxx 1/1 Running 0 5m ``` 检查 HPA： ``` kubectl get hpa -n hm-shop ``` 预期： ``` NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS backend-hpa Deployment/backend 15%/70%, 20%/80% 2 5 2 frontend-hpa Deployment/frontend 10%/70%, 15%/80% 2 5 2 postgres-hpa Deployment/postgres 8%/80%, 12%/85% 1 2 1 ``` 检查入口（最多等待 5 分钟获取 ALB）： ``` kubectl get ingress -n hm-shop ``` 预期： ``` NAME CLASS HOSTS ADDRESS PORTS AGE hm-shop-ingress alb * k8s-hmshop-xxxx.ap-south-1.elb.amazonaws.com 80 5m ``` - **成功指示：** 所有 Pod 处于 `Running`，HPA 显示真实百分比（非 ``），入口 `ADDRESS` 字段已填充。 ### 第 12 步 — 设置 Jenkins EC2 #### 12a — 获取 Jenkins EC2 IP ``` JENKINS_IP=$(aws ec2 describe-instances --filters "Name=tag:Name,Values=jenkins-server" "Name=instance-state-name,Values=running" --query "Reservations[0].Instances[0].PublicIpAddress" --region ap-south-1 --output text) echo "Jenkins IP: ${JENKINS_IP}" ``` #### 12b — 所需安全组端口 | 端口 | 协议 | 来源 | 用途 | |------|------|------|------| | 22 | TCP | 0.0.0.0/0 | SSH 访问 | | 8080 | TCP | 0.0.0.0/0 | Jenkins Web UI | | 9000 | TCP | 0.0.0.0/0 | SonarQube Web UI | | 50000| TCP | 0.0.0.0/0 | Jenkins Agent JNLP 端口 | 这些由 Terraform 在 `vpc.tf` 中自动创建。 #### 12c — 引导 Jenkins EC2 SSH 登录 Jenkins EC2 并安装各工具： ``` ssh -i hm-eks-key.pem ubuntu@${JENKINS_IP} ``` **Java 17：** ``` sudo apt-get update && sudo apt-get install -y openjdk-17-jdk java -version # openjdk version "17.0.x" ``` **Jenkins：** ``` # The jenkins.io-2023.key URL is outdated — Jenkins rotated their signing key. # Fetch the actual key used to sign the repo directly from a keyserver. gpg --keyserver keyserver.ubuntu.com --recv-keys 5E386EADB55F01504CAE8BCF7198F4B714ABFC68 gpg --export 5E386EADB55F01504CAE8BCF7198F4B714ABFC68 | sudo tee /usr/share/keyrings/jenkins-keyring.gpg > /dev/null echo "deb [signed-by=/usr/share/keyrings/jenkins-keyring.gpg] https://pkg.jenkins.io/debian-stable binary/" | sudo tee /etc/apt/sources.list.d/jenkins.list > /dev/null sudo apt-get update && sudo apt-get install -y jenkins sudo systemctl enable jenkins && sudo systemctl start jenkins sudo systemctl status jenkins # Active: active (running) ``` **Docker：** ``` sudo apt-get install -y docker.io sudo usermod -aG docker jenkins sudo usermod -aG docker ubuntu sudo systemctl restart jenkins docker --version # Docker version 24.x.x ``` **AWS CLI v2：** ``` curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o awscliv2.zip sudo apt install unzip unzip awscliv2.zip && sudo ./aws/install aws --version # aws-cli/2.x.x ``` **kubectl：** ``` KUBECTL_VER=$(curl -fsSL https://dl.k8s.io/release/stable.txt) curl -fsSLO "https://dl.k8s.io/release/${KUBECTL_VER}/bin/linux/amd64/kubectl" sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl kubectl version --client ``` **Node.js 18：** ``` curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash - sudo apt-get install -y nodejs node --version ``` **SonarScanner 5.0.1.3006：** ``` SONAR_VERSION="5.0.1.3006" curl -fsSLO "https://binaries.sonarsource.com/Distribution/sonar-scanner-cli/sonar-scanner-cli-${SONAR_VERSION}-linux.zip" sudo unzip -q "sonar-scanner-cli-${SONAR_VERSION}-linux.zip" -d /opt/ sudo ln -sf "/opt/sonar-scanner-${SONAR_VERSION}-linux/bin/sonar-scanner" /usr/local/bin/sonar-scanner sonar-scanner --version ``` **Trivy：** ``` curl -fsSL https://aquasecurity.github.io/trivy-repo/deb/public.key | sudo gpg --dearmor -o /usr/share/keyrings/trivy.gpg echo "deb [signed-by=/usr/share/keyrings/trivy.gpg] https://aquasecurity.github.io/trivy-repo/deb generic main" | sudo tee /etc/apt/sources.list.d/trivy.list sudo apt-get update && sudo apt-get install -y trivy trivy --version ``` **vm.max_map_count（SonarQube 所需）：** ``` sudo sysctl -w vm.max_map_count=524288 echo 'vm.max_map_count=524288' | sudo tee -a /etc/sysctl.conf ``` #### 12d — 获取 Jenkins 初始密码 ``` sudo cat /var/lib/jenkins/secrets/initialAdminPassword # Output: a32-character hex string e.g. 3d4f2bf07a6c4e8... ``` #### 12e — Jenkins 浏览器设置 1. 打开 `http://:8080` 2. 粘贴初始管理员密码 3. 点击 **Install suggested plugins** 并等待约 3 分钟 4. 创建管理员用户：填写用户名、密码、完整名称、邮箱 5. 点击 **Save and Finish** → **Start using Jenkins** - **成功指示：** Jenkins 仪表盘加载并显示 “Welcome to Jenkins!” ### 第 13 步 — 设置 SonarQube **在 Jenkins EC2 上启动 SonarQube：** ``` ssh -i hm-eks-key.pem ubuntu@${JENKINS_IP} docker run -d --name sonarqube --restart unless-stopped -p 9000:9000 -e SONAR_ES_BOOTSTRAP_CHECKS_DISABLE=true -v sonarqube_data:/opt/sonarqube/data sonarqube:lts-community ``` 等待约 60 秒后，打开 `http://:9000`。 **首次登录：** 1. 用户名：`admin`，密码：`admin` 2. 按提示修改密码 3. 点击 **Create a local project** → 项目键：`hm-fashion-clone`，显示名：`H&M Fashion Clone` 4. 点击 **Set up project for clean code** **生成 Token：** 1. 点击头像（右上角）→ **My Account** → **Security** 2. **Generate Tokens**：名称 `jenkins-token`，类型 `User Token` 3. 点击 **Generate** — **立即复制 Token**（仅显示一次） **在 Jenkins 中安装插件（安装 SonarQube 服务器前必须完成）：** 导航至：**Jenkins → Manage Jenkins → Plugins → Available** 搜索并安装： - `pipeline-stage-view` - `git` - `github` - `github-branch-source` - `docker-workflow` - `docker-plugin` - `sonar` - `credentials-binding` - `pipeline-utility-steps` - `ws-cleanup` - `build-timeout` - `timestamper` - `ansicolor` - `workflow-aggregator` 点击 **Install** 并等待重启。`sonar` 插件必须安装后，才能在下一步配置 SonarQube 服务器。 - **成功指示：** Jenkins 重启后，14 个插件均显示 **Installed**。 **将 Token 添加到 Jenkins：** 1. Jenkins → **Manage Jenkins** → **Configure System** 2. 滚动到 **SonarQube servers** → **Add SonarQube** 3. 名称：`SonarQube`，服务器 URL：`http://:9000` 4. 服务器认证 Token → **Add** → **Jenkins** → 类型 **Secret text** → 粘贴 Token 5. 点击 **Save** **在 SonarQube 中创建指向 Jenkins 的 Webhook：** 这对于 `waitForQualityGate()` 步骤至关重要，否则流水线会在质量门禁处无限等待。 1. 进入 **Administration** → **Configuration** → **Webhooks** 2. 点击 **Create** 3. 填写： - 名称：`Jenkins` - URL：`http://:8080/sonarqube-webhook/` - 密钥：留空（除非 Jenkins 中已配置） 4. 点击 **Create** - **成功指示：** 流水线可通过测试，不会在质量门禁阶段挂起。 ### 第 14 步 — 配置 Jenkins 凭证导航至：**Jenkins → Manage Jenkins → Credentials → System → Global credentials → Add Credentials** | ID | 类型 | 值 | 安全说明 | |----|------|----|----------| | `aws-access-key` | 密文 | AWS 访问密钥 ID | 切勿使用个人管理员密钥 | | `aws-secret-key` | 密文 | AWS 秘密访问密钥 | 每 90 天轮换 | | `sonar-token` | 密文 | SonarQube 用户 Token | 泄露后立即再生 | | `git-credentials` | 用户名/密码 | GitHub 用户名 + PAT | PAT 需 `repo` 与 `admin:repo_hook` 权限 | **IAM 用户 — 所需策略：** 仅附加一个 AWS 托管策略：**不要使用内联策略** | 策略名称 | ARN | |----------|-----| | `AmazonEC2ContainerRegistryPowerUser` | `arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryPowerUser` | 此策略授予 `ecr:GetAuthorizationToken` 及所有仓库的读写权限，不包含删除仓库或生命周期策略等破坏性操作。EKS/kubectl 访问由 Jenkins EC2 实例配置文件处理，该用户仅用于 ECR。 - **成功指示：** 4 个凭证均出现在全局凭证列表中。 ### 第 15 步 — 创建 Jenkins 流水线任务 1. Jenkins 仪表盘 → **New Item** 2. 输入名称：`hm-fashion-pipeline` 3. 选择 **Pipeline** → 点击 **OK** 4. **Build Triggers**：勾选 **GitHub hook trigger for GITScm polling** 5. **Pipeline**： - 定义：`Pipeline script from SCM` - SCM：`Git` - 仓库 URL：`https://github.com//Three-Tier-EKS-Terraform.git` - 凭据：选择 `git-credentials` - 分支：`*/main` - 脚本路径：`Jenkinsfile` 6. 点击 **Save** - **成功指示：** Jenkins 仪表盘出现流水线任务。 ### 第 16 步 — 配置 GitHub Webhook 1. 进入：`://github.com/Abhiram-Rakesh/Three-Tier-EKS-Terraform/settings/hooks/new` 2. 填写： - Payload URL：`http://:8080/github-webhook/` - 内容类型：`application/json` - 事件：选择 **Just the push event** 3. 点击 **Add webhook** 4. GitHub 将发送 Ping — 在 Webhook 页面应显示绿色 ✓ - **成功指示：** GitHub 页面显示绿色对勾，Jenkins 触发一次构建。 ### 第 17 步 — 触发首次流水线运行 ``` git commit --allow-empty -m "CI: trigger first pipeline run" git push origin main ``` 在 Jenkins 观察流水线（`http://:8080/job/hm-fashion-pipeline/`）： | 阶段 | 关注点 | |------|--------| | Stage 1（SonarQube） | 质量门禁结果 — 必须为 **PASSED** | | Stage 2（Trivy FS） | 结果表格，流水线继续执行 | | Stage 3（构建） | `Successfully built `（两个镜像） | | Stage 4（ECR 推送） | `The push refers to repository [...]` | | Stage 5（Trivy 镜像） | 扫描结果作为构件归档 | | Stage 6（GitOps） | 出现 `CI: Update image tags to build-1 [skip ci]` 提交 | 流水线完成后，ArgoCD 在 3 分钟内检测到新提交并自动部署更新镜像。 - **成功指示：** 6 个阶段均为绿色，ArgoCD 应用状态显示 `Synced`。 ### 第 18 步 — 安装监控栈 ``` kubectl create namespace monitoring helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo add grafana https://grafana.github.io/helm-charts helm repo update # Install Prometheus stack helm upgrade --install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --values monitoring/prometheus-values.yaml --wait # Install Grafana helm upgrade --install grafana grafana/grafana --namespace monitoring --values monitoring/grafana-values.yaml --wait ``` **访问 Grafana：** Grafana 以 `LoadBalancer` 服务形式部署。等待 NLB 分配（安装后约 2–5 分钟）： ``` kubectl get svc grafana -n monitoring --watch ``` 获取 `EXTERNAL-IP` 后： ``` GRAFANA_URL=$(kubectl get svc grafana -n monitoring -o jsonpath='{.status.loadBalancer.ingress[0].hostname}') echo "Grafana URL: http://${GRAFANA_URL}" ``` 在浏览器打开 `http://`。凭证： - 用户名：`admin` - 密码：在 `monitoring/grafana-values.yaml` 中设置的 `adminPassword` 预置仪表盘（**H&M Shop** 文件夹下）： - Kubernetes Cluster（6417） - Kubernetes Pods（6336） - Node Exporter Full（1860） - Nginx Ingress（9614） - **成功指示：** Grafana 加载完成，4 个仪表盘显示实时数据。 ### 第 19 步 — 访问应用 ``` # Get the ALB URL ALB_URL=$(kubectl get ingress hm-shop-ingress -n hm-shop -o jsonpath='{.status.loadBalancer.ingress[0].hostname}') # Test the API health endpoint curl http://${ALB_URL}/api/health ``` 预期 JSON 响应： ``` { "status": "healthy", "timestamp": "2024-01-15T10:30:00.000Z", "service": "hm-backend", "database": { "status": "connected", "latency_ms": 2 }, "uptime_seconds": 120 } ``` 在浏览器打开应用： ``` echo "Application URL: http://${ALB_URL}" ``` - **成功指示：** 浏览器加载 H&M Fashion Clone 首页并显示商品列表。 ## 日常运维 ### 查看日志 ``` # Backend logs (follow) kubectl logs -f deployment/backend -n hm-shop # Frontend logs kubectl logs -f deployment/frontend -n hm-shop # PostgreSQL logs kubectl logs -f deployment/postgres -n hm-shop # All pods in namespace kubectl logs -f -l app=backend -n hm-shop --all-containers=true ``` ### 手动伸缩与观察 HPA ``` # Scale backend manually kubectl scale deployment backend --replicas=4 -n hm-shop # Watch HPA react kubectl get hpa -n hm-shop --watch ``` ### 查看安全扫描结果 Trivy 结果作为 Jenkins 构建产物归档。访问地址： `http://:8080/job/hm-fashion-pipeline//artifact/` ### 手动触发流水线 ``` git commit --allow-empty -m "CI: manual trigger" git push origin main ``` ### ArgoCD 同步检查与强制同步 ``` # Check sync status kubectl get application hm-shop -n argocd # Force sync immediately (don't wait 3 minutes) kubectl patch application hm-shop -n argocd --type merge -p '{"operation":{"sync":{"syncStrategy":{"hook":{"force":true}}}}}' ``` ### 滚动重启 ``` kubectl rollout restart deployment/frontend -n hm-shop kubectl rollout restart deployment/backend -n hm-shop kubectl rollout status deployment/backend -n hm-shop ``` ### 连接 PostgreSQL CLI ``` # Get the postgres pod name POSTGRES_POD=$(kubectl get pod -n hm-shop -l app=postgres -o jsonpath='{.items[0].metadata.name}') # Open psql kubectl exec -it ${POSTGRES_POD} -n hm-shop -- psql -U hmuser -d hmshop # Once inside psql: \dt -- list tables SELECT COUNT(*) FROM products; SELECT * FROM orders LIMIT 5; \q -- quit ``` ## 清理环境 ### 手动清理（按顺序执行） ``` # 1. Uninstall Helm releases FIRST (triggers controller-managed LB deletion) helm uninstall grafana -n monitoring helm uninstall prometheus -n monitoring helm uninstall cluster-autoscaler -n kube-system helm uninstall aws-load-balancer-controller -n kube-system # 2. Delete namespaces (triggers ALB + EBS cleanup) kubectl delete namespace hm-shop argocd monitoring --timeout=120s # 3. Explicitly delete all ALBs and NLBs in the VPC via AWS CLI # Kubernetes controllers sometimes leave LBs behind after namespace deletion. # Terraform cannot delete the VPC while any LB ENIs still exist in subnets. VPC_ID=$(aws ec2 describe-vpcs --region ap-south-1 --filters "Name=tag:Name,Values=hm-shop-vpc" --query "Vpcs[0].VpcId" --output text) echo "VPC: ${VPC_ID}" LB_ARNS=$(aws elbv2 describe-load-balancers --region ap-south-1 --query "LoadBalancers[?VpcId=='${VPC_ID}'].LoadBalancerArn" --output text) if [ -n "$LB_ARNS" ]; then for ARN in $LB_ARNS; do echo "Deleting LB: $ARN" aws elbv2 delete-load-balancer --region ap-south-1 --load-balancer-arn $ARN done echo "Waiting 60s for LBs to finish deleting..." sleep 60 else echo "No load balancers found in VPC." fi # 4. Delete any leftover Target Groups (LBs must be gone first) TG_ARNS=$(aws elbv2 describe-target-groups --region ap-south-1 --query "TargetGroups[*].TargetGroupArn" --output text) if [ -n "$TG_ARNS" ]; then for TG in $TG_ARNS; do aws elbv2 delete-target-group --region ap-south-1 --target-group-arn $TG 2>/dev/null && echo "Deleted TG: $TG" done fi # 5. Delete any remaining ENIs in the VPC (LBs leave ENIs behind on deletion) ENI_IDS=$(aws ec2 describe-network-interfaces --region ap-south-1 --filters "Name=vpc-id,Values=${VPC_ID}" --query "NetworkInterfaces[*].NetworkInterfaceId" --output text) if [ -n "$ENI_IDS" ]; then for ENI in $ENI_IDS; do echo "Deleting ENI: $ENI" aws ec2 delete-network-interface --region ap-south-1 --network-interface-id $ENI 2>/dev/null || echo " Skipped $ENI (still in use or already gone)" done else echo "No leftover ENIs found." fi # 6. Terraform destroy cd terraform terraform destroy -auto-approve ``` ## 环境变量参考 | 变量 | 设置位置 | 值 | 说明 | |------|----------|----|------| | `AWS_REGION` | Shell / Jenkinsfile | `ap-south-1` | 所有资源位于孟买 | | `CLUSTER_NAME` | Jenkinsfile env | `three-tier-cluster` | EKS 集群名称 | | `AWS_ACCOUNT_ID` | Shell（步骤 9） | 12 位数字 ID | 用于构建 ECR URL | | `REGISTRY` | Jenkinsfile env | `.dkr.ecr.ap-south-1.amazonaws.com` | 步骤 9 替换后设置 | | `GIT_REPO` | Jenkinsfile env | 你的 GitHub 仓库 URL | 首次运行前更新 | | `DB_PASSWORD` | K8s 密钥 `backend-secret` | 在 `k8s_manifests/database/secret.yaml` 中设置 | 生产环境请轮换 | | `JWT_SECRET` | K8s 密钥 `backend-secret` | 在 `k8s_manifests/database/secret.yaml` 中设置 | 生产使用前修改 | ## 故障排查 ### 1. PostgreSQL Pod 卡在 `Pending` **症状：** ``` NAME READY STATUS RESTARTS postgres-xxx-xxx 0/1 Pending 0 ``` **诊断：** ``` kubectl describe pod -n hm-shop -l app=postgres | grep -A 5 Events # Look for: "no volume plugin matched" or "waiting for first consumer" ``` **修复：** ``` # Verify EBS CSI Driver is ACTIVE aws eks describe-addon --cluster-name three-tier-cluster --addon-name aws-ebs-csi-driver --region ap-south-1 --query "addon.status" --output text # If not ACTIVE, check node role has AmazonEBSCSIDriverPolicy attached aws iam list-attached-role-policies --role-name three-tier-cluster-node-role --query "AttachedPolicies[].PolicyName" ``` ### 2. ALB 未创建（Ingress ADDRESS 始终为空） **症状：** `kubectl get ingress -n hm-shop` 长时间无 ADDRESS。 **诊断：** ``` kubectl logs -n kube-system -l app.kubernetes.io/name=aws-load-balancer-controller --tail=50 ``` **修复：** ``` # Check VPC subnets have correct tags aws ec2 describe-subnets --filters "Name=vpc-id,Values=" --query "Subnets[*].{ID:SubnetId,Tags:Tags}" # Public subnets need: kubernetes.io/role/elb = 1 # Private subnets need: kubernetes.io/role/internal-elb = 1 # Verify ALB Controller IRSA role kubectl get sa -n kube-system aws-load-balancer-controller -o yaml | grep role-arn ``` ### 3. ArgoCD 不同步 **症状：** ArgoCD UI 显示 `OutOfSync` 或 `Unknown`。 **诊断：** ``` kubectl describe application hm-shop -n argocd | grep -A 10 "Conditions" ``` **修复：** ``` # Check repoURL in application.yaml matches your GitHub repo exactly cat argocd/application.yaml | grep repoURL # Check ArgoCD can reach GitHub kubectl exec -it deployment/argocd-server -n argocd -- argocd-util repo ls # Force a sync kubectl patch application hm-shop -n argocd --type merge -p '{"metadata":{"annotations":{"argocd.argoproj.io/refresh":"hard"}}}' ``` ### 4. Jenkins Stage 5 失败（ECR 认证错误） **症状：** `no basic auth credentials` 或 `denied: Your authorization token has expired` **诊断：** ``` # Check Jenkins aws-access-key credential is set # Jenkins → Manage Jenkins → Credentials → look for aws-access-key ``` **修复：** ``` # Verify the IAM user has ECR permissions aws iam list-attached-user-policies --user-name # Should include: AmazonEC2ContainerRegistryPowerUser # Test ECR login manually on Jenkins EC2 ssh -i hm-eks-key.pem ubuntu@${JENKINS_IP} aws ecr get-login-password --region ap-south-1 | docker login --username AWS --password-stdin ${AWS_ACCOUNT_ID}.dkr.ecr.ap-south-1.amazonaws.com # Expected: Login Succeeded ``` ### 5. HPA 显示 `` 目标 **症状：** ``` NAME TARGETS MINPODS MAXPODS backend-hpa /70% 2 5 ``` **诊断：** ``` kubectl top pods -n hm-shop # If this fails: Metrics Server is not running ``` **修复：** ``` # Check Metrics Server pods kubectl get pods -n kube-system | grep metrics-server # Reinstall if missing kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml # Wait 60 seconds then check HPA again kubectl get hpa -n hm-shop ``` ### 6. SonarQube 质量门禁失败 **症状：** Stage 1 失败，显示 `QUALITY GATE STATUS: FAILED` **诊断：** ``` # Open SonarQube dashboard # http://:9000/dashboard?id=hm-fashion-clone # Look at the Issues tab for specific violations ``` **修复：** - 审查 SonarQube 仪表板中的代码问题 - 常见问题：代码异味、认知复杂度高、测试覆盖率低 - 临时允许流水线继续：在 SonarQube 中 **Quality Gates → Conditions** 提高阈值或切换为更宽松的门禁 ### 7. Jenkins 触发无限构建循环 **症状：** 流水线成功后立即再次触发，形成无限循环。 **原因：** Jenkins 不原生支持 `[skip ci]` 提交信息约定（这是 GitHub Actions 的特性）。流水线推送新标签会触发 webhook 再次构建。 **修复：** 在 `Jenkinsfile` 的 `pipeline` 块顶部添加提交信息检查： ``` pipeline { agent any stages { stage('Check Skip CI') { steps { script { def commitMsg = sh(script: 'git log -1 --pretty=%B', returnStdout: true).trim() if (commitMsg.contains('[skip ci]')) { currentBuild.result = 'SUCCESS' error('Skipping CI — commit message contains [skip ci]') } } } } // ... rest of your stages } } ``` 这会在检测到自身标签更新提交时干净退出，打破循环。 ### 8. GitHub Webhook 返回 404 **症状：** GitHub webhook 投递显示红色 ✗，响应 404。 **诊断：** 检查 Jenkins 是否可从公网访问（不是 localhost）： ``` curl -I http://:8080/github-webhook/ # Expected: HTTP/1.1 200 ``` **修复：** ``` # Verify port 8080 is open in Jenkins security group aws ec2 describe-security-groups --filters "Name=group-name,Values=hm-shop-jenkins-sg" --region ap-south-1 --query "SecurityGroups[0].IpPermissions" # Verify Jenkins GitHub plugin is installed # Jenkins → Manage Jenkins → Plugins → Installed → search "github" # Verify webhook URL format (must end with /github-webhook/) # Correct: http://13.233.x.x:8080/github-webhook/ # Incorrect: http://13.233.x.x:8080/github-webhook (no trailing slash) ```

标签：ArgoCD, AWS ALB, AWS EKS, CI/CD 流水线, DevSecOps, Docker, EBS, EC2, ECR, ECS, GitOps, GNU通用公共许可证, Grafana, HPA, IaC, Ingress, Jenkins CI/CD, JWT认证, MITM代理, Node.js, PostgreSQL, React, SEO 时尚电商, SonarQube, Syscalls, Terraform, 三层次架构, 上游代理, 依赖扫描, 前端SPA, 多阶段构建, 安全扫描, 安全防御评估, 容器安全扫描, 时尚电商, 时序注入, 模型提供商, 测试用例, 生产级部署, 监控, 自动伸缩, 自定义脚本, 自定义请求头, 请求拦截, 非root容器