sajjadm624/Kubecrash
GitHub: sajjadm624/Kubecrash
Stars: 1 | Forks: 0
# KubeCrash




## What is KubeCrash?
KubeCrash is a **Kubernetes incident training platform** designed to turn learners into confident operators.
It combines two modes in one experience:
- **Incident Game** — pressure-tested troubleshooting with real `kubectl` command flow
- **CKA Learning Journey** — structured domain-by-domain progression across the CKA blueprint
No local cluster setup required. Everything runs in the browser, with simulation logic tuned for practical decision-making.
## Quick start
Run KubeCrash locally in minutes.
### 1. Backend (FastAPI + WebSocket)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r backend/requirements.txt
uvicorn backend.main:app --reload --port 8000
### 2. Frontend (React + Vite)
cd frontend
npm install
npm run dev
Open **http://localhost:5173**
## Incident Game — Command loop
1. Pick an incident from the level select screen
2. Read the briefing like an on-call handoff
3. Run `kubectl` commands to isolate root cause
4. Apply the fix before the timer expires
5. Submit and compare results on the leaderboard
### Levels
| # | Title | Concept |
|---|-------|---------|
| 1 | The Crash at Dawn | CrashLoopBackOff, env vars |
| 2 | The Invisible Service | Label selectors, endpoints |
| 3 | The OOM Reaper | OOMKilled, resource limits |
| 4 | The Ghost Image | ImagePullBackOff, rollback |
| 5 | The Dead Node | Node lifecycle, drain |
## CKA Learning Journey
Structured CKA preparation built for operational fluency, not passive reading.
### Features
- 15 structured lessons across **Beginner → Foundation → Intermediate** tracks
- Interactive shell with **simulated `kubectl` output** for each command path
- Per-checkpoint **why-this-command** explanations for fast mental model building
- **Command syntax coach** with live verb/resource/flags breakdown
- **Lesson recap quiz** to lock in understanding before moving forward
- Timed **mini-mocks** and a full **120-minute weighted CKA mock**
- **Adaptive hint modes**: `beginner`, `standard`, `exam`, `adaptive`
- **Realtime architecture diagram** that updates per lesson domain
- Persistent score, streak, badges, and certificate state via localStorage
- Official Kubernetes docs linked directly from lessons
- Built-in 30-day study roadmap
### CKA blueprint coverage
| Domain | Weight |
|--------|--------|
| Troubleshooting | 30% |
| Cluster Architecture, Installation and Configuration | 25% |
| Services and Networking | 20% |
| Workloads and Scheduling | 15% |
| Storage | 10% |
### Lesson tracks
| Track | Lessons |
|-------|---------|
| Beginner | Lesson 0: Kubernetes from Zero |
| Foundation | Lessons 1–6: Env vars, Services, Resources, RBAC, PVCs, Ingress |
| Intermediate | Lessons 7–14: Taints, Rollouts, ConfigMaps, StatefulSets, DNS, Upgrades, TLS |
## KubeCrash Mastery Roadmap
KubeCrash began with focused incident scenarios to make onboarding approachable.
KubeCrash now expands into a full mastery platform with measurable skill growth, portfolio evidence, and role-based pathways.
### Product evolution map
| Stage | Experience | Scope |
|------|------------|-------|
| 1. Onboarding | Incident game intro | 5 fast starter incidents |
| 2. Core Training | CKA learning journey | 15 lessons + mocks |
| 3. Advanced Tracks | Incident case-study academy | 4 tracks x 4 lessons |
| 4. Mastery Platform | Role paths + capstones + skill graph | 60+ labs + 5 projects |
### Curriculum target
| Layer | Target count | Outcome |
|------|--------------|---------|
| Starter incidents | 10 | Build initial confidence in command fluency |
| Foundation labs | 30 | Strong CKA fundamentals across all blueprint domains |
| Advanced incidents | 24 | Multi-signal diagnosis under realistic constraints |
| Role-path missions | 16 | SRE, Platform, Security, DevOps specialization |
| Capstone projects | 5 | Portfolio-grade end-to-end Kubernetes projects |
### Role paths (new)
- SRE Path: observability, SLOs, alerting, incident command, postmortems
- Platform Engineer Path: cluster operations, cost controls, multi-tenant architecture
- Security Engineer Path: RBAC, policy, secrets, supply-chain and audit controls
- DevOps/GitOps Path: release strategy, progressive delivery, rollback governance
### Mastery model and progression rules
Learners unlock new content by proof of capability, not just completion.
#### Skill graph nodes
- Command fluency
- Debug workflow
- Workload reliability
- Networking diagnostics
- Storage reliability
- Security hardening
- Observability reasoning
- Delivery safety
- Cluster operations
#### Unlock logic (default)
1. Foundation Track unlocks immediately
2. Advanced Track requires:
- 70%+ completion in Foundation
- Minimum 60% average quiz score in completed lessons
3. Role Path requires:
- One completed Advanced Track
- At least 3 saved retrospectives
4. Capstone requires:
- Two completed Role Paths
- Mastery score >= 75 in at least 5 skill nodes
### Scoring model
Total score combines speed, correctness, and learning behavior.
$$
Mastery = 0.35C + 0.20S + 0.20R + 0.15Q + 0.10L
$$
Where:
- $C$ = command correctness score
- $S$ = scenario completion reliability score
- $R$ = retrospective quality/completion score
- $Q$ = quiz understanding score
- $L$ = long-term retention score (repeat challenge delta)
### Portfolio outputs (must-have)
Every advanced lesson and capstone should produce artifacts:
- Incident brief + timeline
- Root cause analysis
- Retrospective answers
- Action items
- Suggested runbook snippets
- Final score + elapsed time
### 12-week release plan
| Week | Delivery focus | Exit criteria |
|------|----------------|---------------|
| 1 | Expand starter incidents from 5 to 8 | New incidents playable end-to-end |
| 2 | Add 2 more starter incidents (10 total) | Starter onboarding complete |
| 3 | Add 8 foundation labs (phase A) | Lessons + checks + recap quizzes live |
| 4 | Add 8 foundation labs (phase B) | 16 new labs total in roadmap branch |
| 5 | Add 8 foundation labs (phase C) | 24 new labs total |
| 6 | Add 6 foundation labs + polish | 30 foundation labs complete |
| 7 | Build role path framework + progress rules | Path UI + unlock gating functional |
| 8 | Ship SRE + Platform paths | 8 role missions live |
| 9 | Ship Security + DevOps paths | 16 role missions complete |
| 10 | Implement skill graph + mastery scoring | Node scores visible in profile |
| 11 | Build first 3 capstones | Project rubrics + artifact export |
| 12 | Build final 2 capstones + launch prep | 5 capstones + launch readiness |
### Success metrics
- 7-day retention > 35%
- Lesson-to-lesson completion > 60%
- Advanced track completion > 30% of active learners
- Average mastery score gain of +20 points in 30 days
- At least 1 portfolio artifact exported per active learner each week
### Definition of done for KubeCrash
- 60+ labs live with stable validation
- 4 role paths with progression gates
- 5 capstone projects fully rubric-scored
- Skill graph and mastery score visible in UI
- Exportable learner portfolio artifacts
### Revamp execution docs
- [Execution plan](docs/revamp/EXECUTION_PLAN.md)
- [Sprint backlog](docs/revamp/SPRINT_BACKLOG.md)
- [Lab spec template](docs/revamp/LAB_SPEC_TEMPLATE.md)
- [Capstone rubric](docs/revamp/CAPSTONE_RUBRIC.md)
## Tech stack
| Layer | Technology |
|-------|-----------|
| Frontend | React 18, Vite, Zustand, xterm.js |
| Backend | FastAPI, Uvicorn, WebSockets, Pydantic |
| Simulation | Custom kubectl parser + per-level state machines |
| Persistence | localStorage (client-side progress) |
## Project structure
KubeCrash/
├── backend/
│ ├── main.py # FastAPI app + WebSocket handler
│ ├── engine/ # Kubectl parser + scenario engine
│ ├── scenarios/ # Per-level incident definitions
│ └── routers/ # HTTP endpoints (leaderboard, session)
└── frontend/
└── src/
├── components/ # LearningJourney, Terminal, LevelSelect
├── hooks/ # useTerminal (xterm lifecycle)
├── store/ # Zustand game state
└── utils/ # kubectlParser (semantic matching)
## License
MIT
标签:自定义脚本