mrzasad/complianceShield-pro
GitHub: mrzasad/complianceShield-pro
Stars: 0 | Forks: 0
# 🛡️ ComplianceShield — PECA/GDPR Data Pipeline
A production-grade **Streamlit application** that intercepts raw data, audits it
against **PECA 2016** (Pakistan Electronic Crimes Act) and **GDPR** compliance
frameworks, encrypts sensitive PII fields, and produces an immutable structured
audit log.
## Architecture
Raw Data Source
│
▼
┌──────────────────────────────────────────────────────────┐
│ Stage 1 · INTERCEPT │
│ DataInterceptor — SHA-256 checksum, batch ID, metadata │
└──────────────────────────────────┬───────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────┐
│ Stage 2 · AUDIT (GDPR + PECA) │
│ ComplianceAuditor — PII detection, rule matching, │
│ violation scoring, CRITICAL/HIGH/MEDIUM/LOW risk rating │
└──────────────────────────────────┬───────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────┐
│ Stage 3 · ENCRYPT │
│ DataEncryptor — AES-256-GCM (or Fernet / RSA-OAEP+AES) │
│ All PII fields replaced with ENC: │
└──────────────────────────────────┬───────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────┐
│ Stage 4 · LOG │
│ ComplianceLogger — append-only JSON structured log, │
│ exportable as JSON Lines (SIEM) or CSV │
└──────────────────────────────────────────────────────────┘
## Compliance Frameworks
### GDPR
| Rule ID | Article | Field | Risk |
|---------|---------|-------|------|
| GDPR-ART25-001 | Art. 25 – Data Minimisation | national_id | HIGH |
| GDPR-ART32-001 | Art. 32 – Security of Processing | credit_card | CRITICAL |
| GDPR-ART35-001 | Art. 35 – DPIA Required | dob | MEDIUM |
| GDPR-ART5-001 | Art. 5 – Purpose Limitation | ip_address | MEDIUM |
| GDPR-ART5-002 | Art. 5 – Lawfulness | email | MEDIUM |
### PECA 2016
| Rule ID | Section | Field | Risk |
|---------|---------|-------|------|
| PECA-SEC14-001 | Sec. 14 – Identity Information | national_id | HIGH |
| PECA-SEC18-001 | Sec. 18 – Data Protection | phone | MEDIUM |
| PECA-SEC34-001 | Sec. 34 – Dignity/Privacy | dob | LOW |
| PECA-SEC14-002 | Sec. 14 – Identity Information | credit_card | CRITICAL |
## Encryption
| Algorithm | Key Size | Mode | Notes |
|-----------|----------|------|-------|
| AES-256-GCM | 256-bit | Authenticated | Default — recommended |
| Fernet (AES-128-CBC) | 128-bit | HMAC-SHA256 | Simple symmetric |
| RSA-OAEP + AES | 2048-bit RSA + 256-bit AES | Hybrid | Key wrapping |
Encrypted values format: `ENC:`
## Quick Start
### Local (Python)
pip install -r requirements.txt
streamlit run app.py
### Docker
docker compose up --build
# Open http://localhost:8501
### Docker (manual)
docker build -t complianceshield .
docker run -p 8501:8501 complianceshield
## Production Extensions
### Apache Spark
Replace `DataInterceptor.intercept()` with a PySpark job:
spark = SparkSession.builder.appName("ComplianceShield").getOrCreate()
df = spark.read.json("s3a://raw-data/landing/")
df = df.rdd.mapPartitions(compliance_audit_udf).toDF()
df.write.format("delta").mode("append").save("s3a://processed/compliant/")
### Apache Airflow DAG
from airflow import DAG
from airflow.operators.python import PythonOperator
with DAG("compliance_pipeline", schedule="@hourly") as dag:
ingest = PythonOperator(task_id="ingest", python_callable=intercept)
audit = PythonOperator(task_id="audit", python_callable=audit_records)
encrypt = PythonOperator(task_id="encrypt", python_callable=encrypt_fields)
log_task = PythonOperator(task_id="log", python_callable=write_audit_log)
ingest >> audit >> encrypt >> log_task
### Key Management (Production)
- Store AES keys in **Azure Key Vault** or **AWS KMS**
- Implement 90-day automatic key rotation
- Use **Hardware Security Modules (HSM)** for RSA private keys
- Log all key access events to the compliance audit trail
## File Structure
compliance_pipeline/
├── app.py # Streamlit UI
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
├── README.md
└── pipeline/
├── __init__.py `
├── interceptor.py # Stage 1: Data interception + checksums
├── auditor.py # Stage 2: GDPR/PECA rule engine
├── encryptor.py # Stage 3: AES-256-GCM encryption
├── logger.py # Stage 4: Structured audit logging
└── spark_engine.py # Spark/Airflow execution simulation

