EIM019/azure-financebank-data-pipeline

GitHub: EIM019/azure-financebank-data-pipeline

Stars: 0 | Forks: 0

# Azure FinanceBank Data Pipeline ## Overview ## Pipeline Flow 1. Read raw transaction data from Azure Blob Storage. 2. Validate required schema columns. 3. Clean branch names, remove duplicates, and handle missing cost values. 4. Calculate profit, monthly profit, quarterly profit, and anomaly signals. 5. Generate reporting charts for branch, product, and monthly performance. 6. Load curated data and anomalies into Azure SQL. ## Tech Stack - Python - Pandas - SQLAlchemy - Azure Blob Storage - Azure SQL - Matplotlib ## Key Files - `pipelines/ingest/financeBank/src/main.py` - main pipeline entry point. - `pipelines/ingest/financeBank/src/clean_data.py` - data cleaning helpers. - `pipelines/ingest/financeBank/src/calc_profit.py` - KPI and profit calculations. - `pipelines/ingest/financeBank/src/export_sql.py` - SQL loading logic. - `pipelines/ingest/financeBank/reports/charts/` - generated chart outputs. ## Environment Variables Create a local `.env` file from `.env.example` and add your own Azure and database credentials. Do not commit real secrets. ## Portfolio Value This project shows batch ingestion, cloud storage integration, validation, analytics preparation, reporting outputs, and SQL warehouse loading.