thc1006/SpikeIDS-MCU
GitHub: thc1006/SpikeIDS-MCU
在STM32N6 Neural-ART NPU上部署的INT8量化MLP入侵检测系统,实现了亚毫秒级推理并首次验证了T=1 SNN-ANN等效性。
Stars: 0 | Forks: 0
# SNN-IDS:在通用 MCU NPU 上基于硬件验证的 SNN 等效入侵检测
[](https://doi.org/10.5281/zenodo.18906060)
[](https://opensource.org/licenses/Apache-2.0)
[](https://www.python.org/)
[](https://www.st.com/en/evaluation-tools/stm32n6570-dk.html)
[](#)
[](#key-results)
**首个在通用 MCU NPU 上,通过硬件验证的 INT8 量化 ANN(近似等效于 T=1 SNN)用于实时网络入侵检测的部署。**
## 关键结果
| 指标 | NSL-KDD (5-class) | UNSW-NB15 (10-class) |
|--------|-------------------|----------------------|
| **总体准确率** | 78.86 ± 1.32% | 64.75 ± 0.61% |
| **Macro F1** | 59.20 ± 2.80% | 40.29 ± 0.90% |
| **推理延迟** | 0.46 ms | **0.29 ms** |
| **NPU 执行** | 5 HW + 1 Hyb + 2 SW | 4 HW (100% NPU) |
| **Flash / RAM** | 137.7 KB / 1.25 KB | 120.6 KB / 0.50 KB |
| **评估** | 10 seeds, mean ± std | 10 seeds, mean ± std |
**目标板:** STM32N6570-DK (ARM Cortex-M55 @ 800 MHz + Neural-ART NPU 600 GOPS INT8)
## 最新动态
据我们所知,这是:
1. **首个在 ARM Cortex-M NPU (Neural-ART) 上的 IDS 部署** —— 之前的 MCU 级 IDS 研究使用的是 MAX78000 (Ngo et al., 2022),这是一款带有固定 CNN 加速器的 AI 专用 MCU
2. **首个在商用 NPU 芯片上对 T=1 SNN–ANN 等效性的实证验证** —— FP32 和 INT8 模型之间的最终预测一致性达到 99%
3. **首个为 Neural-ART 目标编译的 QCFS 激活函数** —— Floor 算子确认为 CPU 回退执行,增加了 17.6% 的延迟
## 理论基础
具有零初始膜电位的单时间步 (T=1) SNN 产生的前向传播近似等效于带有 ReLU 激活的 INT8 量化 ANN:
```
T=1 SNN inference ≈ INT8 Quantized ANN inference
```
关键参考文献:
- Bu et al., "Optimal ANN-SNN Conversion" (QCFS), **ICLR 2022**
- Jiang et al., "Unified Optimization Framework", **ICML 2023**
- Bu et al., "Inference-Scale Complexity", **CVPR 2025**
## 架构
```
IDS_MLP: Linear(d→256) → BN → σ → Linear(256→256) → BN → σ → Linear(256→128) → BN → σ → Linear(128→C)
```
- `d` = 41 (NSL-KDD) 或 34 (UNSW-NB15),`C` = 5 或 10
- `σ` = ReLU (路径 B) 或 QCFS L=4 (路径 A)
- BatchNorm 在导出时融合进 Linear → ONNX 图:仅包含 `Gemm` + `Relu`
- 111,365 (NSL-KDD) 或 110,218 (UNSW-NB15) 个参数
- 针对极端不平衡的逆频率类别加权
## NPU 硬件基准测试
所有模型均通过 ST Edge AI Developer Cloud v4.0.0 在 STM32N6570-DK 上进行基准测试:
| Model | Dataset | Inference | HW | Hyb | SW | Flash | RAM |
|-------|---------|-----------|---:|----:|---:|-------|-----|
| ReLU FP32 (CPU) | NSL-KDD | 1.24 ms | 0 | 0 | 11 | 466.4 KB | 2.17 KB |
| **ReLU INT8 (NPU)** | **NSL-KDD** | **0.46 ms (2.7×)** | 5 | 1 | 2 | 137.7 KB | 1.25 KB |
| ReLU FP32 (CPU) | UNSW-NB15 | 1.23 ms | 0 | 0 | 11 | 461.9 KB | 2.14 KB |
| **ReLU INT8 (NPU)** | **UNSW-NB15** | **0.29 ms (4.2×)** | 4 | 0 | 0 | 120.6 KB | 0.50 KB |
| QCFS INT8 | NSL-KDD | 0.54 ms | 13 | 1 | 14 | 138.0 KB | 2.00 KB |
关键发现:
- **NPU 相比同模型下的纯 CPU 执行实现了 2.7–4.2× 加速**
- **Floor 算子不被 Neural-ART NPU 支持** —— 回退至 CPU 作为 `Floor(float)` 执行
- **ReLU INT8 是最优 NPU 路径** —— 所有 Gemm+Relu 均在 NPU 上执行,激活函数无 CPU 回退
- **基于树的模型 (RF, XGBoost) 无法在 STM32N6 上运行** —— `TreeEnsembleClassifier` 被 ST Edge AI Core 拒绝 ("NOT IMPLEMENTED")
## 复现
```
# 设置
python3 -m venv snn-ids-env
source snn-ids-env/bin/activate
pip install -r requirements.txt
# 下载数据集
mkdir -p data
# NSL-KDD:将 KDDTrain+.txt 和 KDDTest+.txt 放置在 data/ 中
# UNSW-NB15:将 parquet 文件放置在 data/ 中
# 运行实验
make train # Train ReLU model (single seed, NSL-KDD)
make multiseed # 10-seed experiment (NSL-KDD, ReLU vs QCFS)
make unsw # 10-seed experiment (UNSW-NB15)
make unsw-export # ONNX + INT8 for UNSW-NB15
make tree-baseline # RF + XGBoost baselines
make layerwise # FP32 vs INT8 layer-wise analysis
make quant-ablation # 24-config quantization ablation
make paper # Compile LaTeX paper
# NPU 验证(需要浏览器)
# 上传 models/*.onnx 至 https://stedgeai-dc.st.com
# 选择目标:STM32N6570-DK → Benchmark
```
## 项目结构
```
├── src/
│ ├── train.py # ReLU model training (Path B)
│ ├── train_qcfs.py # QCFS model training (Path A)
│ ├── export_onnx.py # ONNX export with BN fusion (NSL-KDD)
│ ├── export_qcfs_onnx.py # QCFS ONNX export
│ ├── export_unsw_onnx.py # ONNX export + INT8 PTQ (UNSW-NB15)
│ ├── quantize.py # INT8 PTQ (ReLU, NSL-KDD)
│ ├── quantize_qcfs.py # INT8 PTQ (QCFS)
│ ├── quantize_ablation.py # 24-config quantization ablation
│ ├── experiment_multiseed.py # 10-seed experiment (NSL-KDD)
│ ├── experiment_unsw.py # 10-seed experiment (UNSW-NB15)
│ ├── tree_baseline.py # RF + XGBoost baselines
│ └── layerwise_analysis.py # FP32 vs INT8 layer-wise comparison
├── results/
│ ├── multiseed_experiment.json # NSL-KDD 10-seed results
│ ├── unsw_multiseed_experiment.json # UNSW-NB15 10-seed results
│ ├── tree_baselines.json # RF + XGBoost results
│ ├── layerwise_analysis.json # FP32 vs INT8 analysis
│ ├── quantize_ablation.json # Quantization ablation
│ └── related_work_table.json # Related work comparison
├── paper/ # LaTeX paper
├── configs/
│ └── default.yaml # Experiment configuration
├── docs/
│ ├── ADR-001-SNN-NPU-GoNoGo-Verification.md
│ └── SNN_RTOS_Telecom_Analysis.md
├── CITATION.cff # Citation metadata
├── requirements.txt # Pinned dependencies
├── Makefile # One-command reproducibility
└── LICENSE # Apache 2.0
```
## 引用
```
@software{tsai2026snnids,
title = {SNN-IDS: Deploying SNN-Equivalent Intrusion Detection on a Commodity MCU NPU},
author = {Tsai, Hsiu-Chi},
year = {2026},
url = {https://github.com/thc1006/SpikeIDS-MCU},
doi = {10.5281/zenodo.18906060},
version = {1.0.0}
}
```
## 参考文献
- **QCFS Activation**: Bu et al., "Optimal ANN-SNN Conversion for High-accuracy and Ultra-low-latency Spiking Neural Networks," *ICLR 2022*.
- **Unified ANN-SNN Framework**: Jiang et al., "A Unified Optimization Framework of ANN-SNN Conversion," *ICML 2023*.
- **Inference-Scale Complexity**: Bu et al., "Inference-Scale Complexity in ANN-SNN Conversion," *CVPR 2025*.
- **NSL-KDD Dataset**: Tavallaee et al., *IEEE CISDA*, 2009.
- **UNSW-NB15 Dataset**: Moustafa & Slay, *MilCIS*, 2015.
- **Neural-ART NPU**: STMicroelectronics, STM32N6 Application Note UM3225.
- **HH-NIDS (MAX78000)**: Ngo et al., *Future Internet* 15(1):9, 2022.
- **Akida IDS**: Zahm et al., *CSIAC*, 2024.
## 许可证
Apache License 2.0。详见 [LICENSE](LICENSE)。
标签:Apex, ARM Cortex-M55, CISA项目, CNCF毕业项目, INT8量化, MCU, MLP, Neural-ART, NPU, NSL-KDD, QCFS激活, SNN, SNN-ANN等价性, STM32N6, T=1, UNSW-NB15, 入侵检测系统, 凭据扫描, 安全数据湖, 实时推理, 密码管理, 嵌入式安全, 微控制器, 机器学习, 深度学习, 物联网安全, 硬件加速, 网络流量分析, 脉冲神经网络, 边缘计算, 逆向工具