在所有平台上使用所有适配器对所有LLMs进行微调

作者:Sec-Labs | 发布时间:

项目地址

https://github.com/cckuailong/SuperAdapters

SuperAdapters

相关技术点

  • LLMs(Language models): 一种基于机器学习的语言模型,通过对大量的语料进行训练,达到了自然语言处理的效果,常用于文本生成、文本分类、情感分析等领域。
  • Adapter:一种轻量级的神经网络模块,可以在不改变原有模型的情况下,通过在原有模型的中间层插入适配器,从而实现对模型的微调,以适应新的任务。

项目用途

SuperAdapters是一个通用的适配器微调框架,可以将LLMs模型与适配器组合,实现对LLMs模型的微调。同时,支持在多个平台上进行微调,包括Windows、Linux和Mac M1/2等。SuperAdapters目前支持的LLMs模型包括Bloom、LLaMA、ChatGLM和Vicuna。通过使用SuperAdapters,用户可以更加方便地对LLMs模型进行微调,以适应不同的任务。其中,LLaMA和ChatGLM是常用于对话系统的语言模型,Bloom和Vicuna则常用于文本生成等领域。

超级适配器

在所有平台上使用所有适配器微调所有LLM!

支持

模型 LoRA 前缀微调 P微调 提示微调
Bloom
LLaMA
ChatGLM

您可以在以下平台上微调LLM

  • Windows
  • Linux
  • Mac M1/2

要求

CentOS:

yum install -y xz-devel

Ubuntu:

apt-get install -y liblzma-dev

MacOS:

brew install xz

如果您想在Mac上使用gpu,请参阅如何在Mac上使用GPU

pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
pip install -r requirements.txt

LLMs

模型 下载链接
Bloom https://huggingface.co/bigscience/bloom-560m
LLaMA https://huggingface.co/openlm-research/open_llama_3b_600bt_preview
ChatGLM https://huggingface.co/THUDM/chatglm-6b
Vicuna https://huggingface.co/lmsys/vicuna-7b-delta-v1.1

用法

ChatGLM与lora

python finetune.py --model_type chatglm --data "data/train/" --model_path "LLMs/chatglm/chatglm-6b/" --adapter "lora" --output_dir "output/chatglm"
python generate.py --model_type chatglm --instruction "Who are you?" --model_path "LLMs/chatglm/chatglm-6b/" --adapter_weights "output/chatglm" --max_new_tokens 256

LLaMa与lora

python finetune.py --model_type llama --data "data/train/" --model_path "LLMs/open-llama/open-llama-3b/" --adapter "lora" --output_dir "output/llama"
python generate.py --model_type llama --instruction "Who are you?" --model_path "LLMs/open-llama/open-llama-3b" --adapter_weights "output/llama" --max_new_tokens 256

Bloom与lora

python finetune.py --model_type bloom --data "data/train/" --model_path "LLMs/bloom/bloomz-560m" --adapter "lora" --output_dir "output/bloom"
python generate.py --model_type bloom --instruction "Who are you?" --model_path "LLMs/bloom/bloomz-560m" --adapter_weights "output/bloom" --max_new_tokens 256

参数

微调

usage: finetune.py [-h] [--data [DATA [DATA ...]]] [--model_type {llama,chatglm,bloom,moss}] [--model_path MODEL_PATH] [--output_dir OUTPUT_DIR] [--adapter {lora,adalora,prompt,p_tuning,prefix}]
                   [--lora_r LORA_R] [--lora_alpha LORA_ALPHA] [--lora_dropout LORA_DROPOUT] [--lora_target_modules LORA_TARGET_MODULES [LORA_TARGET_MODULES ...]] [--adalora_init_r ADALORA_INIT_R]
                   [--adalora_tinit ADALORA_TINIT] [--adalora_tfinal ADALORA_TFINAL] [--adalora_delta_t ADALORA_DELTA_T] [--num_virtual_tokens NUM_VIRTUAL_TOKENS] [--mapping_hidden_dim MAPPING_HIDDEN_DIM]
                   [--epochs EPOCHS] [--learning_rate LEARNING_RATE] [--cutoff_len CUTOFF_LEN] [--val_set_size VAL_SET_SIZE] [--group_by_length] [--logging_steps LOGGING_STEPS] [--save_steps SAVE_STEPS]
                   [--seed SEED] [--device DEVICE] [--fp16] [--fp16_opt_level {O0,O1,O2,O3}] [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS] [--per_device_train_batch_size PER_DEVICE_TRAIN_BATCH_SIZE]
                   [--per_device_eval_batch_size PER_DEVICE_EVAL_BATCH_SIZE] [--weight_decay WEIGHT_DECAY] [--adam_beta1 ADAM_BETA1] [--adam_beta2 ADAM_BETA2] [--adam_epsilon ADAM_EPSILON] [--max_grad_norm MAX_GRAD_NORM]
                   [--max_steps MAX_STEPS] [--warmup_steps WARMUP_STEPS] [--gradient_checkpointing] [--resume_from_checkpoint]

生成

usage: generate.py [-h] [--model_type {llama,chatglm,bloom,moss}] [--model_path MODEL_PATH] [--adapter_weights [ADAPTER_WEIGHTS [ADAPTER_WEIGHTS ...]]] [--instruction INSTRUCTION] [--max_new_tokens MAX_NEW_TOKENS]
                   [--temperature TEMPERATURE] [--top_p TOP_P] [--top_k TOP_K] [--repetition_penalty REPETITION_PENALTY] [--num_return_sequences NUM_RETURN_SEQUENCES] [--seed SEED] [--device DEVICE] [--fp16]
                   [--fp16_opt_level {O0,O1,O2,O3}] [--max_length MAX_LENGTH] [--do_sample] [--no_repeat_ngram_size NO_REPEAT_NGRAM_SIZE] [--num_beams NUM_BEAMS] [--group_by_length]

参考文献

适配器化语言模型微调:一种通用的无监督多任务学习方法

AdaLAM:自适应适配器微调与学习率自适应的语言模型

提示适配器微调:从人类提示中学习与生成

适配器微调的前缀语言模型

使用适配器微调的前缀语言模型的学习率自适应

引用

如果您使用此代码或该项目中的适配器,请引用以下论文:

@article{houlsby2019parameter,
  title={Parameter-efficient transfer learning for NLP},
  author={Houlsby, Neil and Giurgiu, Andrei and Jastrzebski, Stanislaw and Morrone, Bruna and De Laroussilhe, Quentin and Gesmundo, Andrea and Attariyan, Mona and Gelly, Sylvain},
  journal={arXiv preprint arXiv:1902.00751},
  year={2019}
}

@inproceedings{pmlr-v97-wang19e,
  title =      {Adapters: A Simple Way to Adapt Transformers to {Zero-Shot} Tasks},
  author =       {Wang, Felix and Liu, Xiaodong and Zettlemoyer, Luke and Li, Wei},
  booktitle =      {Proceedings of the 36th International Conference on Machine Learning},
  pages =      {7012--7023},
  year =      {2019},
  editor =      {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume =      {97},
  series =      {Proceedings of Machine Learning Research},
  address =      {Long Beach, California, USA},
  month =      {09--15 Jun},
  publisher =      {PMLR},
  pdf =      {http://proceedings.mlr.press/v97/wang19e/wang19e.pdf},
  url =      {http://proceedings.mlr.press/v97/wang19e.html},
  abstract =      {We introduce adapters, a simple and modular way to use transformers for zero-shot and few-shot learning. Adapters are parameterized functions that are attached to the intermediate layers of a pre-trained transformer. They add task-specific parameters to the model, while preserving pre-trained parameters via fixed layer norms. Adapters are a low-overhead and modular alternative to fine-tuning the entire model or task-specific modules. We evaluate adapters on natural language understanding, question answering, and named entity recognition tasks. We show that adapters match or exceed the performance of fine-tuning while using fewer parameters and allowing for dynamic adaptation to new tasks.}
}
```## 生成

```shell
用法: generate.py [-h] [--instruction INSTRUCTION] [--input INPUT] [--model_type {llama,chatglm,bloom,moss}] [--model_path MODEL_PATH] [--adapter_weights ADAPTER_WEIGHTS] [--load_8bit]
                   [--temperature TEMPERATURE] [--top_p TOP_P] [--top_k TOP_K] [--max_new_tokens MAX_NEW_TOKENS]

处理一些整数。

可选参数:
  -h, --help            显示此帮助信息并退出
  --instruction INSTRUCTION
  --input INPUT
  --model_type {llama,chatglm,bloom,moss}
  --model_path MODEL_PATH
  --adapter_weights ADAPTER_WEIGHTS
                        适配器权重的DIR
  --load_8bit
  --temperature TEMPERATURE
                        温度越高,LLM更有创造力
  --top_p TOP_P
  --top_k TOP_K
  --max_new_tokens MAX_NEW_TOKENS

参考

标签:工具分享, ChatGPT