raman01211/sre-observability-platform

GitHub: raman01211/sre-observability-platform

Stars: 0 | Forks: 0

# SRE Observability Platform ![Prometheus](https://img.shields.io/badge/Prometheus-E6522C?logo=prometheus&logoColor=white) ![Grafana](https://img.shields.io/badge/Grafana-F46800?logo=grafana&logoColor=white) ![OpenTelemetry](https://img.shields.io/badge/OpenTelemetry-000000?logo=opentelemetry&logoColor=white) ![Docker](https://img.shields.io/badge/Docker-2496ED?logo=docker&logoColor=white) ![Python](https://img.shields.io/badge/Python-3776AB?logo=python&logoColor=white) ![License](https://img.shields.io/badge/License-MIT-green) [![Run Locally](https://img.shields.io/badge/Run%20Locally-Quick%20Start-blueviolet?style=for-the-badge)](https://github.com/raman01211/sre-observability-platform) ## Quick Start # Prerequisites: Docker and docker-compose # Start the full observability stack make up # Open Grafana dashboard make dashboard # Trigger a test alert make alert-test # View logs make logs # Stop everything make down ### URLs | Service | URL | |---------|-----| | Grafana | http://localhost:3000 | | Prometheus | http://localhost:9090 | | Alertmanager | http://localhost:9093 | | Sample App | http://localhost:8000 | **Default Grafana credentials:** `admin` / `admin` ## Prerequisites - [Docker](https://docs.docker.com/get-docker/) - [Docker Compose](https://docs.docker.com/compose/install/) (v2+) ## Components - **Metrics**: Prometheus (scrape + recording rules) -> Grafana (dashboards + alerting) - **Logging**: Fluentd/Logstash -> Elasticsearch -> Kibana (ELK Stack) - **Tracing**: OpenTelemetry Collector -> Jaeger/Tempo -> Grafana - **APM**: Dynatrace for application performance monitoring - **Alerting**: Alertmanager -> PagerDuty with SLO-based alerting rules ## Features - SLO/SLI definition and error-budget tracking with burn-rate alerting - Multi-cluster monitoring with Thanos/Prometheus federation - Automated incident response with runbooks and PagerDuty on-call schedules - Real-time dashboards for engineering teams and management - Cost visibility and FinOps dashboards - Log correlation with traces and metrics for faster root cause analysis ## Architecture Applications -> OpenTelemetry SDK/Agent |-> Prometheus/Grafana (Metrics) |-> ELK Stack (Logs) |-> Jaeger/Tempo (Traces) |-> Dynatrace (APM) Alertmanager -> PagerDuty -> On-Call Engineer ## Tech Stack | Category | Technology | |----------|-----------| | Metrics | Prometheus, Thanos, Grafana | | Logging | Elasticsearch, Logstash, Kibana (ELK) | | Tracing | OpenTelemetry, Jaeger, Tempo | | APM | Dynatrace, Application Insights | | Alerting | Alertmanager, PagerDuty | | Kubernetes | AKS/EKS, Helm, Kustomize | ## Quick Install *More at [ramansrivastava.dev](https://ramansrivastava.dev) | [github.com/raman01211](https://github.com/raman01211)*