pawel-kaczmarek/The-A-Files

GitHub: pawel-kaczmarek/The-A-Files

一个数字音频隐写与语音水印研究工具包，集成多种嵌入算法、语音质量指标和鲁棒性攻击算子，支持端到端的可复现实验。

Stars: 27 | Forks: 6

# A-Files ![logo.png](https://static.pigsec.cn/wp-content/uploads/repos/cas/3e/3ee601c6cdfe28d2a70b499316ac192f054641f3f63c2f66ef4633e5288e1284.png) ## 🧭 目录 * [🔎 关于](#about) * [📦 安装](#install) * [⚡ 用法](#usage) * [🔐 隐写算法](#steganography-algorithms) * [📊 指标](#metrics) * [🤖 基于 AI](#ai-based) * [🏛️ 语音混响](#speech-reverberation) * [🗣️ 语音可懂度](#speech-intelligibility) * [🎧 语音质量](#speech-quality) * [🧪 攻击](#attacks) * [📚 参考文献](#references) * [📄 文章](#articles) * [🔗 软件资源](#links) * [⚖️ 许可证](#licence) * [🧩 依赖项](#dependencies) * [👥 作者](#authors) ## 🔎 关于 The A-Files 是一个面向研究的工具包，用于数字音频隐写和语音信号水印。它将嵌入问题建模为 payload 容量、感知透明度、抵抗信号处理的鲁棒性以及比特级解码可靠性之间的权衡。一个典型的实验从载体波形 `x[n]` 开始，嵌入二进制 payload `b` 生成隐秘波形 `y[n]`，可选择应用信道或攻击变换，然后将解码后的 payload 与原始消息进行比较，同时测量由嵌入过程引入的声学失真。该项目提供： * 用于时域、变换域、扩频、回声/相位和神经方法的嵌入和解码方法 * 客观语音质量、可懂度、混响和基于 AI 的评估指标 * 用于在常见信号处理失真下进行鲁棒性测试的音频攻击算子 The A-Files functions

###### 加载音频作为离散时间波形连同采样率和格式元数据一起加载。该包支持常见的未压缩和无损语音容器，如 WAV 和 FLAC，包括内置的 VCTK 和 LibriSpeech 样本，以便进行可重复的实验。Payload 表示为二进制消息向量，这使得每种方法都能公开独立于存储格式的显式 `encode` 和 `decode` 行为。 ###### 容量容量被评估为可以在有限音频片段中嵌入的可恢复 payload 比特数。已实现的方法包括低复杂度 LSB 替换、相位和回声编码、DCT/DWT/LWT 域嵌入、 patchwork 水印、范数空间和基于 SVD 的方案、自适应 AAC/STC 嵌入、无线信道 DWT-LSB，以及固定解码器对抗隐写。这些算法在压缩、滤波、重采样和同步变化下展现出不同的容量-失真分布和不同的失效模式。 ###### 透明度透明度通过使用客观语音指标比较载体和隐秘信号来测量。该库包括基于能量和频谱的测量方法，如 SNR、分段 SNR、频率加权分段 SNR、对数似然比、加权谱斜率、倒谱距离、Mel 倒谱距离和 SI-SDR。它还包括感知和面向任务的测量方法，如 PESQ、STOI、CSII、NCM、wSTMI、STGI、SRMR、BSD、复合语音增强指标，以及 MOSNet。这些指标共同量化了失真、可懂度损失、与混响相关的退化，以及预测的平均意见分。 ###### 鲁棒性鲁棒性通过在嵌入后应用受控的信号变换并测量 payload 是否仍能被解码来进行测试。攻击集包括加性噪声、幅度缩放、频率裁剪、滤波、裁剪、重采样、量化、音高移动和时间拉伸。这实现了在类似信道退化和对抗性移除尝试下的方法比较，使用解码成功率和客观指标变化作为主要证据。 *由一些很棒的 [GitHub 仓库](#links) 提供支持* ## 📦 安装安装最新的 PyPI 发行版： ``` pip install the-a-files ``` 可选的 AI 功能（如 `FgasMethod` 和 `MosNetMetric`）需要 TensorFlow： ``` pip install "the-a-files[ai]" ``` ## ⚡ 用法运行内置的评估工作流或将该包作为 `taf` 导入： ``` taf-eval direct-no-metrics taf-eval full python -c "from taf.methods.factory import SteganographyMethodFactory; from taf.models.types import MethodType; print(SteganographyMethodFactory.get(16000, MethodType.LSB_METHOD).type())" ``` ## 🔐 隐写算法已实现的方法列表： | 序号 | 名称 | 简要描述 | 参考 | |-----|---------------------------------------------|-------------------------------------------------------------------------|-------------------| | 1 | `LsbMethod.py` | 标准 LSB 编码 | [[1]](#articles) | | 2. | `EchoMethod.py` | 具有单个回声核的回声隐藏 | [[1]](#articles) | | 3. | `PhaseCodingMethod.py` | 相位编码技术 | [[1]](#articles) | | 4. | `ImprovedPhaseCodingMethod.py` | 改进的相位编码技术 | [[19]](#articles) | | 5. | `DctDeltaLsbMethod.py` | DCT delta LSB 嵌入 | [[1]](#articles) | | 6. | `DwtLsbMethod.py` | 基于 DWT 的 LSB 嵌入 | [[1]](#articles) | | 7. | `DctB1Method.py` | 第一频带 DCT 系数嵌入，DCT-b1 | [[2]](#articles) | | 8. | `PatchworkMultilayerMethod.py` | 基于 Patchwork 的多层水印 | [[3]](#articles) | | 9. | `NormSpaceMethod.py` | 范数空间音频水印 | [[4]](#articles) | | 10. | `FsvcMethod.py` | 频率奇异值系数修改，FSVC | [[5]](#articles) | | 11. | `DsssMethod.py` | 直接序列扩频嵌入 | [[6]](#articles) | | 12. | `BlindSvdMethod.py` | 使用熵和对数极坐标变换的盲 SVD 嵌入 | [[20]](#articles) | | 13. | `PrimeFactorInterpolatedMethod.py` | 素因子插值嵌入 | [[21]](#articles) | | 14. | `LwtMethod.py` | 基于 LWT 的嵌入 | [[22]](#articles) | | 15. | `ForegroundBackgroundSegmentationMethod.py` | 前景-背景分割 LSB，FBS-LSB | [[23]](#articles) | | 16. | `FgasMethod.py` | 具有对抗性扰动生成的固定解码器网络，FGAS | [[24]](#articles) | | 17. | `AacStcMethod.py` | 通过 AAC 感知残差和 syndrome-trellis codes 的自适应 +-1 LSB | [[25]](#articles) | | 18. | `WirelessDwtLsbMethod.py` | 无线信道 DWT LSB 消息嵌入 | [[27]](#articles) | 每个方法都扩展了抽象类 ```SteganographyMethod``` ``` from abc import abstractmethod, ABC from typing import List import numpy as np class SteganographyMethod(ABC): @abstractmethod def encode(self, data: np.ndarray, message: List[int]) -> np.ndarray: ... @abstractmethod def decode(self, data_with_watermark: np.ndarray, watermark_length: int) -> List[int]: ... @abstractmethod def type(self) -> str: ... ``` ## 📊 指标已实现的指标列表： #### 🤖 基于 AI | 序号 | 名称 | 简要描述 | 参考 | |-----|-------------------|------------------------------------------------------------|-------------------| | 1. | `MosNetMetric.py` | 用于语音转换评估的 MOSNet 深度学习客观指标 | [[16]](#articles) | #### 🏛️ 语音混响 | 序号 | 名称 | 简要描述 | 参考 | |-----|-----------------|-------------------------------------------------------|-------------------| | 1. | `BsdMetric.py` | Bark 谱失真，BSD | [[7]](#articles) | | 2. | `SrmrMetric.py` | 语音到混响调制能量比，SRMR | [[10]](#articles) | #### 🗣️ 语音可懂度 | 序号 | 名称 | 简要描述 | 参考 | |-----|-----------------|--------------------------------------------------|------------------| | 1. | `CsiiMetric.py` | 相干性和语音可懂度指数，CSII | [[7]](#articles) | | 2. | `NcmMetric.py` | 归一化协方差度量，NCM | [[7]](#articles) | | 3. | `StoiMetric.py` | 短时客观可懂度，STOI | [[9]](#articles) | #### 🎧 语音质量 | 序号 | 名称 | 简要描述 | 参考 | |-----|--------------------------------|-----------------------------------------------------------------------|-------------------| | 1. | `SnrMetric.py` | 信噪比，SNR | [[12]](#articles) | | 2. | `MelCepstralDistanceMetric.py` | 用于客观语音质量评估的 Mel 倒谱距离测量方法 | [[11]](#articles) | | 3. | `SnrSegMetric.py` | 分段信噪比，SNRseg | [[7]](#articles) | | 4. | `FWSnrSegMetric.py` | 频率加权分段 SNR，fwSNRseg | [[7]](#articles) | | 5. | `CepstrumDistanceMetric.py` | 倒谱距离客观语音质量测量，CD | [[7]](#articles) | | 6. | `LlrMetric.py` | 对数似然比，LLR | [[7]](#articles) | | 7. | `WssMetric.py` | 加权谱斜率，WSS | [[7]](#articles) | | 8. | `PesqMetric.py` | 语音质量感知评估，PESQ | [[8]](#articles) | | 9. | `CsigMetric.py` | 复合语音信号失真评级，Csig | [[13]](#articles) | | 10. | `CovlMetric.py` | 复合整体信号质量评级，Covl | [[13]](#articles) | | 11. | `CbakMetric.py` | 复合背景噪声侵入评级，Cbak | [[13]](#articles) | | 12. | `WstmiMetric.py` | 加权时频调制指数，wSTMI | [[14]](#articles) | | 13. | `StgiMetric.py` | 时频 Glimpsing 指数，STGI | [[15]](#articles) | | 14. | `SisdrMetric.py` | 尺度不变信号失真比，SI-SDR | [[17]](#articles) | | 15. | `BSSEvalMetric.py` | BSSEval v4 信号分离评估指标 | [[18]](#articles) | 每个指标都扩展了抽象类 `Metric`。 ``` from abc import ABC, abstractmethod from numbers import Number import numpy as np class Metric(ABC): @abstractmethod def calculate(self, samples_original: np.ndarray, samples_processed: np.ndarray, fs: int, frame_len: float = 0.03, overlap: float = 0.75) -> Number | np.ndarray: ... @abstractmethod def name(self) -> str: ... ``` ## 🧪 攻击音频样本攻击列表： * 低通滤波器 * 加性噪声 * 频率滤波器 * 翻转随机样本 * 裁剪随机样本 * 重采样（降采样、升采样） * 幅度缩放 * 音高移动 * 时间拉伸 ## 📚 参考文献 #### 📄 文章 [1] A. A. Alsabhany, A. H. Ali, F. Ridzuan, A. H. Azni, and M. R. Mokhtar, "Digital Audio Steganography: Systematic Review, Classification, and Analysis of the Current State of the Art," *Computer Science Review*, vol. 38, article 100316, 2020. [doi:10.1016/j.cosrev.2020.100316](https://doi.org/10.1016/j.cosrev.2020.100316) [2] H. T. Hu and L. Y. Hsu, "Robust, Transparent and High-Capacity Audio Watermarking in DCT Domain," *Signal Processing*, vol. 109, pp. 226-235, 2015. [doi:10.1016/j.sigpro.2014.11.011](https://doi.org/10.1016/j.sigpro.2014.11.011) [3] I. Natgunanathan, Y. Xiang, G. Hua, G. Beliakov, and J. Yearwood, "Patchwork-Based Multilayer Watermarking," *IEEE/ACM Transactions on Audio, Speech, and Language Processing*, vol. 25, no. 11, pp. 2176-2187, 2017. [doi:10.1109/TASLP.2017.2749001](https://doi.org/10.1109/TASLP.2017.2749001) [4] S. Saadi, A. Merrad, and A. Benziane, "Novel Secured Scheme for Blind Audio/Speech Norm-Space Watermarking by Arnold Algorithm," *Signal Processing*, vol. 154, pp. 74-86, 2019. [doi:10.1016/j.sigpro.2018.08.011](https://doi.org/10.1016/j.sigpro.2018.08.011) [5] J. Zhao, T. Zong, Y. Xiang, L. Gao, W. Zhou, and G. Beliakov, "Desynchronization Attacks Resilient Watermarking Method Based on Frequency Singular Value Coefficient Modification," *IEEE/ACM Transactions on Audio, Speech, and Language Processing*, vol. 29, pp. 2282-2295, 2021. [doi:10.1109/TASLP.2021.3092555](https://doi.org/10.1109/TASLP.2021.3092555) [6] R. M. Nugraha, "Implementation of Direct Sequence Spread Spectrum Steganography on Audio Data," in *Proceedings of the 2011 International Conference on Electrical Engineering and Informatics (ICEEI)*, 2011. [doi:10.1109/ICEEI.2011.6021662](https://doi.org/10.1109/ICEEI.2011.6021662) [7] P. C. Loizou, *Speech Enhancement: Theory and Practice*, 2nd ed. CRC Press, 2013. [doi:10.1201/b14529](https://doi.org/10.1201/b14529) [8] M. Wang, C. Boeddeker, R. G. Dantas, and A. Seelan, "PESQ (Perceptual Evaluation of Speech Quality) Wrapper for Python Users," Zenodo, 2022. [doi:10.5281/zenodo.6549559](https://doi.org/10.5281/zenodo.6549559) [9] C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, "A Short-Time Objective Intelligibility Measure for Time-Frequency Weighted Noisy Speech," in *Proceedings of ICASSP 2010*, Dallas, TX, USA, 2010. [doi:10.1109/ICASSP.2010.5495701](https://doi.org/10.1109/ICASSP.2010.5495701) [10] T. H. Falk, C. Zheng, and W.-Y. Chan, "A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech," *IEEE Transactions on Audio, Speech, and Language Processing*, vol. 18, no. 7, pp. 1766-1774, 2010. [doi:10.1109/TASL.2010.2052247](https://doi.org/10.1109/TASL.2010.2052247) [11] R. Kubichek, "Mel-Cepstral Distance Measure for Objective Speech Quality Assessment," in *Proceedings of the IEEE Pacific Rim Conference on Communications, Computers and Signal Processing*, Victoria, BC, Canada, vol. 1, pp. 125-128, 1993. [doi:10.1109/PACRIM.1993.407206](https://doi.org/10.1109/PACRIM.1993.407206) [13] Y. Hu and P. C. Loizou, "Evaluation of Objective Quality Measures for Speech Enhancement," *IEEE Transactions on Audio, Speech, and Language Processing*, vol. 16, no. 1, pp. 229-238, 2008. [doi:10.1109/TASL.2007.911054](https://doi.org/10.1109/TASL.2007.911054) [14] A. Edraki, W.-Y. Chan, J. Jensen, and D. Fogerty, "Speech Intelligibility Prediction Using Spectro-Temporal Modulation Analysis," *IEEE/ACM Transactions on Audio, Speech, and Language Processing*, vol. 29, pp. 210-225, 2021. [doi:10.1109/TASLP.2020.3039929](https://doi.org/10.1109/TASLP.2020.3039929) [15] A. Edraki, W.-Y. Chan, J. Jensen, and D. Fogerty, "A Spectro-Temporal Glimpsing Index (STGI) for Speech Intelligibility Prediction," in *Proceedings of Interspeech 2021*, 2021. [doi:10.21437/Interspeech.2021-605](https://doi.org/10.21437/Interspeech.2021-605) [16] C.-C. Lo, S.-W. Fu, W.-C. Huang, X. Wang, J. Yamagishi, Y. Tsao, and H.-M. Wang, "MOSNet: Deep Learning Based Objective Assessment for Voice Conversion," *arXiv preprint*, arXiv:1904.08352, 2019. [arXiv:1904.08352](https://arxiv.org/abs/1904.08352) [17] J. Le Roux, S. Wisdom, H. Erdogan, and J. R. Hershey, "SDR - Half-Baked or Well Done?," in *Proceedings of ICASSP 2019*, 2019. [doi:10.1109/ICASSP.2019.8683855](https://doi.org/10.1109/ICASSP.2019.8683855) [18] F.-R. Stöter, A. Liutkus, and N. Ito, "The 2018 Signal Separation Evaluation Campaign," in *Latent Variable Analysis and Signal Separation*, LVA/ICA 2018, pp. 293-305, 2018. [doi:10.5281/zenodo.3376621](https://doi.org/10.5281/zenodo.3376621) [19] G. Yang, "An Improved Phase Coding Audio Steganography Algorithm," *arXiv preprint*, arXiv:2408.13277, 2024. [doi:10.48550/arXiv.2408.13277](https://doi.org/10.48550/arXiv.2408.13277) [20] P. K. Dhar and T. Shimamura, "Blind SVD-Based Audio Watermarking Using Entropy and Log-Polar Transformation," *Journal of Information Security and Applications*, vol. 20, pp. 74-83, 2015. [doi:10.1016/j.jisa.2014.10.007](https://doi.org/10.1016/j.jisa.2014.10.007) [21] F. A. Adhiyaksa, T. Ahmad, A. M. Shiddiqi, B. J. Santoso, H. Studiawan, and B. A. Pratomo, "Reversible Audio Steganography Using Least Prime Factor and Audio Interpolation," in *Proceedings of the 2021 International Seminar on Machine Learning, Optimization, and Data Science (ISMODE)*, pp. 97-102, 2022. [doi:10.1109/ISMODE53584.2022.9743066](https://doi.org/10.1109/ISMODE53584.2022.9743066) [22] S. Mushtaq, S. Mehraj, and S. A. Parah, "Blind and Robust Watermarking Framework for Audio Signals," in *Proceedings of the 2024 11th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO)*, pp. 1-5, 2024. [doi:10.1109/ICRITO61523.2024.10522195](https://doi.org/10.1109/ICRITO61523.2024.10522195) [23] J. Wang and K. Wang, "A Novel Audio Steganography Based on the Segmentation of the Foreground and Background of Audio," *Computers & Electrical Engineering*, vol. 117, article 109247, 2025. [doi:10.1016/j.compeleceng.2024.109247](https://doi.org/10.1016/j.compeleceng.2024.109247) [24] J. Yan, Y. Cheng, Z. Yin, X. Zhang, S. Wang, T. Sun, and X. Jiang, "FGAS: Fixed Decoder Network-Based Audio Steganography with Adversarial Perturbation Generation," *arXiv preprint*, arXiv:2505.22266, 2025. [arXiv:2505.22266](https://arxiv.org/abs/2505.22266) [25] W. Luo, Y. Zhang, and H. Li, "Adaptive Audio Steganography Based on Advanced Audio Coding and Syndrome-Trellis Coding," in *Digital Forensics and Watermarking, IWDW 2017*, Lecture Notes in Computer Science, vol. 10431, pp. 177-186. Springer, 2017. [doi:10.1007/978-3-319-64185-0_14](https://doi.org/10.1007/978-3-319-64185-0_14) [27] A. A. Hamdi, A. A. Eyssa, M. I. Abdalla, M. ElAffendi, A. A. S. AlQahtani, A. A. Ateya, and R. A. Elsayed, "Improving Audio Steganography Transmission over Various Wireless Channels," *Journal of Sensor and Actuator Networks*, vol. 14, no. 6, article 106, 2025. [doi:10.3390/jsan14060106](https://doi.org/10.3390/jsan14060106) #### 🔗 软件资源 | 参考 | 项目 | 范围 | | --- | --- | --- | | [1] | [audio-watermarking](https://github.com/kosta-pmf/audio-watermarking) | 音频水印和隐写术实现参考 | | [2] | [audio-steganography-algorithms](https://github.com/ktekeli/audio-steganography-algorithms) | 音频隐写算法示例 | | [3] | [pysepm](https://github.com/schmiph2/pysepm) | 客观语音增强和质量指标 | | [4] | [PESQ](https://github.com/ludlows/PESQ) | PESQ 的 Python 封装 | | [5] | [pystoi](https://github.com/mpariente/pystoi) | 用于 Python 的 STOI 实现 | | [6] | [SRMRpy](https://github.com/jfsantos/SRMRpy) | SRMR 语音混响指标实现 | | [7] | [mel_cepstral_distance](https://github.com/jasminsternkopf/mel_cepstral_distance) | Mel 倒谱距离实现 | | [8] | [semetrics](https://github.com/nglehuy/semetrics) | 语音增强指标实现 | | [9] | [py-intelligibility](https://github.com/aminEdraki/py-intelligibility) | 语音可懂度指标 | | [10] | [speechmetrics](https://github.com/aliutkus/speechmetrics) | 语音和音频评估指标 | | [11] | [sigsep-mus-eval](https://github.com/sigsep/sigsep-mus-eval) | BSSEval 信号分离指标 | ### ⚖️ 许可证 The A-Files 是一款基于 GPLv3 许可证的开源软件。 ### 🧩 依赖项 PESQ 需要 Microsoft Visual C++ 14.0 或更高版本。你可以通过 Microsoft C++ Build Tools 安装它： https://visualstudio.microsoft.com/visual-cpp-build-tools/ 在某些情况下，你可能还需要 FFmpeg： https://ffmpeg.org/ ### 👥 作者 - Paweł Kaczmarek ([@pawel-kaczmarek](https://github.com/pawel-kaczmarek)) - 军事技术大学，电子学院 - Zbigniew Piotrowski - 军事技术大学，电子学院

标签：Python, 无后门, 语音信号处理, 逆向工具, 隐写术, 音频信息隐藏, 音频水印, 音频质量评估, 鲁棒性测试