Ravindra1t/AppPoet-Local-Optimized

GitHub: Ravindra1t/AppPoet-Local-Optimized

AppPoet 是一款结合静态反混淆、云端 LLM 语义综合与本地 PyTorch MLP 神经分类的 Android 恶意软件混合分析流水线，可在数秒内对 APK 样本给出高可信度的恶意/良性判定。

Stars: 0 | Forks: 0

# 🎭 AppPoet：混合神经符号化 Android 恶意软件分析流水线 [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/) [![Groq API](https://img.shields.io/badge/LLM-Groq%20Cloud-orange.svg)](https://groq.com/) [![PyTorch](https://img.shields.io/badge/Neural_Network-PyTorch-red.svg)](https://pytorch.org/) [![SentenceTransformers](https://img.shields.io/badge/Embeddings-Sentence--Transformers-green.svg)](https://sbert.net/) [![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE) AppPoet 是一款先进的**混合神经符号化恶意软件分析流水线**，专为对 Android 应用进行快速、高保真安全审计而设计。它弥合了结构化字节码分析与语义 LLM 综合之间的鸿沟，结合了**高速程序化静态反混淆**、**多视角自然语言特征摘要**、**Sentence-Transformer 嵌入**以及 **PyTorch 多层感知器（MLP）**神经网络分类器。 ## 📐 流水线架构 AppPoet 从目标 APK 文件中提取三个离散的行为"视角"，使用 Groq Cloud API 综合语义行为描述，生成密集嵌入，融合后执行神经分类。 ``` flowchart TD APK["Target Android APK"] --> Decoder["APKTool & Androguard Parser"] Decoder --> FeatureExtract["Feature Extraction Engine"] subgraph Multi_View_Extraction ["Multi-View Feature Isolation"] FeatureExtract --> PermView["Permission View"] FeatureExtract --> APIView["Sensitive API View"] FeatureExtract --> URLView["Component & URL View"] end subgraph Deobfuscation_Pass ["High-Speed Static Deobfuscation"] APIView --> Deobf["Programmatic Static Deobfuscator"] Deobf --> |"Cracked Base64/XOR APIs"| APIView Deobf --> |"Cracked Base64/XOR URLs"| URLView end subgraph LLM_Synthesis ["Semantic LLM Synthesis (Groq Cloud)"] PermView --> |"Prompt Formatting"| PermSummary["Permission Behavioral Summary"] APIView --> |"Prompt Formatting"| APISummary["API Behavioral Summary"] URLView --> |"Prompt Formatting"| URLSummary["URL Behavioral Summary"] end subgraph Vector_Embedding_Fusion ["Vector Embedding & View Fusion"] PermSummary --> Embedder["Sentence-Transformers (all-MiniLM-L6-v2)"] APISummary --> Embedder URLSummary --> Embedder Embedder --> |"3x 384-dim Dense Vectors"| Concat["View Fusion (Concatenation)"] Concat --> |"1152-dim Unified Vector"| MLP["PyTorch MLP Classifier"] end MLP --> |"Authentic Model Probability"| Verdict["Malicious / Benign Binary Verdict"] Verdict --> Reporter["Unified Heuristic Diagnostic Report"] PermSummary --> Reporter APISummary --> Reporter URLSummary --> Reporter ``` ## 🛠️ 核心技术特性 ### 1. ⚡ 高速程序化静态反混淆（`0.2s`） AppPoet 不依赖繁重的 CPU 密集型本地模型追踪，而是在 [deobfuscator.py](file:///c:/Users/rravi/AppPoet/AppPoet_Project/src/orchestrator/deobfuscator.py) 中内置了基于 Python 的静态反混淆模块。它遍历调用方字节码中引用密码学（`Cipher;->doFinal`）、反射（`Method;->invoke`）或字符串解密工具的部分，程序化地逆向还原： * **Base64 IOC 基元：** 解码混淆的 URL 和 API 端点。 * **字符移位与 XOR 掩码：** 即时解密自定义字符串打包方法。 ### 2. 🧠 云端加速 Groq LLM 推理利用 **Groq Cloud API**（`llama-3.1-8b-instant`）在 [qwen_interface.py](file:///c:/Users/rravi/AppPoet/AppPoet_Project/src/llm_engine/qwen_interface.py) 中对权限、API 和网络配置文件进行摘要。Groq 的高速 LPU 推理在 **总计不到 2 秒**内处理所有视角并输出连贯的语义诊断段落，完全绕过了本地 CPU 瓶颈。 ### 3. 🎯 真实神经网络分类（`1152 维 MLP`）将 1152 维融合密集嵌入直接传递给 [pytorch_mlp.py](file:///c:/Users/rravi/AppPoet/AppPoet_Project/src/classifier/pytorch_mlp.py) 中自定义的预训练 PyTorch MLP 模型。 * **输入维度：** `1152`（`3` 个视角 × `384` 维 `all-MiniLM-L6-v2` 语义嵌入的拼接）。 * **架构：** `1152` 个输入神经元 $\rightarrow$ `512` 个隐藏单元 $\rightarrow$ `1` 个输出神经元，使用 Sigmoid 激活函数。 * **输出：** 代表风险等级的真实原始数学概率。**演示模式覆盖已完全禁用**，以确保预测结果真实可信。 ## 📂 项目目录结构 ``` AppPoet_Project/ ├── .env # Secure local configuration (Git-ignored) ├── .gitignore # Defines untracked local resources ├── README.md # Visual system documentation ├── report.txt # Quick-access diagnostic output of last run ├── run_apppoet.py # Interactive script launcher ├── data/ │ ├── raw_apks/ # Storage for raw APK binaries │ └── temp_decoded/ # Temporary decompiled smali sources ├── models/ │ └── apppoet_mlp_weights.pth # Pre-trained PyTorch MLP weights ├── src/ │ ├── classifier/ │ │ ├── pytorch_mlp.py # PyTorch Neural Network Classifier │ │ └── text_embedder.py # SentenceTransformer Vector Embedder │ ├── extraction/ │ │ ├── androguard_parser.py # Native DEX parsing (APIs, URLs, and permissions) │ │ └── apktool_decoder.py # Decompilation wrapping using Apktool │ ├── llm_engine/ │ │ ├── prompt_templates.py # Cohesive security prompt structures │ │ └── qwen_interface.py # Groq API Client and env manager │ └── orchestrator/ │ ├── apk_inference.py # Core inference pipeline orchestrator │ └── deobfuscator.py # Native programmatic static deobfuscator ``` ## 🚀 安装与配置 ### 前置条件 * 系统已安装 **Python 3.9+**。 * **Apktool** 已加入系统路径（用于资源解码）。 * 拥有 **Groq Cloud API 密钥**（可在 [console.groq.com](https://console.groq.com) 免费获取）。 ### 1. 克隆代码库 ``` git clone https://github.com/Ravindra1t/AppPoet-Local-Optimized.git cd AppPoet-Local-Optimized ``` ### 2. 配置环境变量（安全方式）在 `AppPoet_Project` 文件夹根目录创建一个名为 `.env` 的安全文件。AppPoet 已预先配置为安全加载您的凭据，不会将其暴露在 Git 历史记录中： ``` GROQ_API_KEY=gsk_your_groq_api_key_here ``` ### 3. 安装依赖 ``` pip install -r requirements.txt ``` ## 💻 使用方法运行主流水线启动器即可分析任意 Android APK： ``` py .\run_apppoet.py ``` ### 分步分析执行流程 1. **APK 路径输入：** 启动器会要求输入目标 APK 的路径。 2. **字节码解析：** 程序化提取权限、原生组件和敏感 API 交叉引用（XREF）。 3. **静态反混淆：** 在 Smali 块中原生解密 Base64、XOR 和移位字符串，仅需 `0.2` 秒。 4. **语义综合：** 查询 Groq 为每个视角构建自然语言行为审查。 5. **神经分类：** 生成 `1152` 维融合密集嵌入并评估真实网络分类输出。 6. **报告生成：** 在根目录保存详细的 `report.txt` 文件，并在 `reports/` 下保存带时间戳的结构化报告。 ## 📊 诊断报告输出示例执行完成后，AppPoet 输出一份清晰、人类可读的报告： ``` ====================================================================== APPPOET HEURISTIC DIAGNOSTIC REPORT ====================================================================== Analysis Type: Hybrid Inference (Groq API Cloud) Timestamp: 2026-05-18 11:04:12 Target APK: sample_app.apk ====================================================================== NEURAL NETWORK CLASSIFICATION ------------------------------ MLP Architecture: 1152 -> 512 -> 1 (Sigmoid) Input Dimensions: 1152 (concatenated 3-view embeddings) Prediction Results: * Binary Verdict: MALICIOUS * Confidence Score: 89.62% * Threshold: 0.5 (>=0.5 = MALICIOUS, <0.5 = BENIGN) Risk Assessment: [HIGH] HIGH RISK - Likely Malicious ====================================================================== FULL LLM-GENERATED BEHAVIORAL ANALYSIS ====================================================================== **App Behavioral Profile** The application operates as a utility but requests highly critical device permissions. It establishes sockets to background IP addresses and dynamically loads third-party modules... **Threat Indicators** 1. Bytecode execution reveals hidden reflection targeting Landroid/telephony/SmsManager. 2. Multiple Base64 encrypted URLs were statically decoded pointing to C2 endpoints... ====================================================================== EXTRACTED FEATURE SUMMARY ====================================================================== Permission View: ['android.permission.INTERNET', 'android.permission.READ_PHONE_STATE', ...] API View (Restricted APIs): ['Landroid/telephony/TelephonyManager;->getDeviceId', 'Ljava/lang/reflect/Method;->invoke', ...] Component View: Activities: 4 | Services: 2 | Receivers: 1 ====================================================================== END OF REPORT ====================================================================== ``` ## 🔒 安全与最佳实践 * **凭据保护：** `.env` 配置变量已加入 `.gitignore`。**请勿**提交 `.env` 文件或在代码中硬编码 API 密钥。 * **静态隔离：** 分析以静态方式执行，即目标 APK 被解码但**永远不会被执行**，确保您的主机完全与运行时恶意软件向量隔离。

标签：AMSI绕过, Androguard, Android恶意软件分析, APKTool, APK分析, Base64/XOR破解, DNS 反向解析, GPLv3, Groq Cloud API, MLP分类, Python 3.9+, PyTorch多层感知机, Sentence-Transformer嵌入, 云安全监控, 云资产清单, 代码反混淆, 凭据扫描, 域名收集, 多模态特征隔离, 多视图特征提取, 威胁检测, 安全研究工具, 嵌入向量, 敏感API视图, 无线安全, 本地化处理, 本地零超时管道, 权限视图, 深度学习, 混合神经符号系统, 特征融合, 目录枚举, 神经网络分类器, 移动安全, 程序化静态反混淆, 组件与URL视图, 网络安全审计, 行为建模, 行为语义合成, 语义描述生成, 逆向工具, 逆向工程, 零信任架构, 静态分析, 静态反混淆, 高保真分析