全部开源,完全可商用的中文版 Llama2 模型及中英文 SFT 数据集,输入格式严格遵循 llama-2-chat 格式,兼容适配所有针对原版 llama-2-chat 模型的优化。不做任何优化的情况下,14G 显存足够用,未来能做到更小资源。
在线试玩
- Demo 地址 / HuggingFace Spaces
https://huggingface.co/spaces/LinkSoul/Chinese-Llama-2-7b
快速测试
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
model_path = "LinkSoul/Chinese-Llama-2-7b"
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(model_path).half().cuda()
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
instruction = """[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n<</SYS>>\n\n{} [/INST]"""
prompt = instruction.format("用中文回答,When is the best time to visit Beijing, and do you have any suggestions for me?")
generate_ids = model.generate(tokenizer(prompt, return_tensors='pt').input_ids.cuda(), max_new_tokens=4096, streamer=streamer)
Docker
你可以使用仓库中的 Dockerfile,来快速制作基于 Nvidia 最新版本的 nvcr.io/nvidia/pytorch:23.06-py3
基础镜像,在任何地方使用容器来运行中文的 LLaMA2 模型应用。
docker build -t linksoul/chinese-llama2-chat .
镜像构建完毕,使用命令运行镜像即可:
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --rm -it -v `pwd`/LinkSoul:/app/LinkSoul -p 7860:7860 linksoul/chinese-llama2-chat
如何训练
DATASET="LinkSoul/instruction_merge_set"
DATA_CACHE_PATH="hf_datasets_cache"
MODEL_PATH="/PATH/TO/TRANSFORMERS/VERSION/LLAMA2"
output_dir="./checkpoints_llama2"
torchrun --nnodes=1 --node_rank=0 --nproc_per_node=8 \
--master_port=25003 \
train.py \
--model_name_or_path ${MODEL_PATH} \
--data_path ${DATASET} \
--data_cache_path ${DATA_CACHE_PATH} \
--bf16 True \
--output_dir ${output_dir} \
--num_train_epochs 1 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 1 \
--evaluation_strategy 'no' \
--save_strategy 'steps' \
--save_steps 1200 \
--save_total_limit 5 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type cosine \
--logging_steps 1 \
--fsdp 'full_shard auto_wrap' \
--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
--tf32 True \
--model_max_length 4096 \
--gradient_checkpointing True
项目链接
https://github.com/LinkSoul-AI/Chinese-Llama-2-7b#
© 版权声明
文章版权归作者所有,未经允许请勿转载。