ShashwatKamlapure08-hub/CodeAegis

GitHub: ShashwatKamlapure08-hub/CodeAegis

Stars: 0 | Forks: 0

# 🛡️ CodeAegis: Adversarial Robustness Analyzer for Static Security Models CodeAegis is an adversarial machine learning evaluation and defense framework designed to expose and patch the structural vulnerabilities of shallow static code scanners. By leveraging non-functional semantic mutations (identifier obfuscation and dead-code comment injection), CodeAegis demonstrates how easily character-level text representation models can be deceived without altering the underlying execution logic of software or security parameters. It implements a robust defense architecture via adversarial data augmentation to close this critical security exploit. ## 📊 Core Performance Metrics Matrix | Model Configuration | Evaluation Target Set | Overall Model Accuracy | Safe (0) F1-Score | Vulnerable (1) F1-Score | Status / Insights | | :--- | :--- | :---: | :---: | :---: | :--- | | **Baseline Classifier** | Clean Test Split | 33.55% | 0.37 (Estimated) | 0.28 (Estimated) | 🟢 Functional Base Layer | | **Baseline Classifier** | Adversarial Split | 32.59% | 0.37 | 0.28 | ⚠️ Fragile (Drop due to Camouflage) | | **Adversarially Patched** | Clean Test Split | **34.19%** | 0.36 | 0.31 | 🛡️ Reinforced via Data Augmentation | | **Adversarially Patched** | Adversarial Split | **33.55%** | 0.36 | 0.31 | 🛡️ Immune (**Post-Patch Delta: 0.64%**) | ### 🔍 Key Academic Observations 1. **The Fragility Gap:** Under baseline conditions, simple structural rearrangements cause a drop in performance. Adversarial mutations alter superficial character-level features, causing semantic disconnects in traditional machine learning vector spaces. 2. **The Defensive Patch:** By expanding the training corpus vocabulary from **1251 to 2502 rows** using automated mutations (`patched_scanner.py`), the Patched Model stabilizes its internal invariants. The robustness drop compresses from a variable margin to a highly stable **0.64%**. 3. **The Semantic Bottleneck:** The absolute accuracy boundaries (~34%) confirm that character-level n-gram text vectorization arrays cannot parse deep program semantics, establishing a strong academic justification for future graph-based dependency structures. ## 📁 Repository Directory Structure CodeAegis/ ├── src/ │ ├── CodeAegis.png # Dashboard preview image │ ├── perturbation_engine.py # Week 2: Attack implementation & token camouflage │ ├── patched_scanner.py # Week 3: Defensive adversarial retraining module │ └── app.py # Week 4: Streamlit multi-engine web user interface ├── .gitignore └── README.md # System configuration & overview documentation ## 🖥️ System Interface & Demonstration Below is a live preview of the CodeAegis Adversarial Evaluation suite processing code scripts side-by-side to detect engineered masking bypasses: ### CodeAegis Dashboard Preview ![CodeAegis Dashboard](https://static.pigsec.cn/wp-content/uploads/repos/2026/06/a6310a91cb185614.png)