昇腾-ms-swift框架训练Qwen3-embedding/reranker
本文档详细介绍了在昇腾910B平台上使用Docker容器配置Ubuntu 24.04环境并部署Swift训练框架的全过程。主要内容包括:1)基于Ubuntu镜像创建支持NPU的Docker容器;2)安装Conda环境和CANN工具链;3)配置Python 3.11环境并安装Torch_NPU适配版本;4)使用Swift框架进行Qwen3-Embedding和Qwen3-Reranker模型的训练配
·
1 创建基础容器
拉取镜像
docker pull ubuntu
通过网盘分享的文件:ubuntu_24.04.tar
链接: https://pan.baidu.com/s/123tm6MhQqRWanv_R5iCRcA?pwd=9xs3 提取码: 9xs3
创建容器
docker run -itd -u root \
--ipc=host \
--network=host \
--privileged \
--device=/dev/davinci_manager \
--device=/dev/devmm_svm \
--device=/dev/hisi_hdc \
-v /var/log/npu/:/usr/slog \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/Ascend/firmware:/usr/local/Ascend/firmware \
-v /usr/local/sbin:/usr/local/sbin \
-v /etc/hccn.conf:/etc/hccn.conf \
-v /weight:/weight \
--name swift \
ubuntu:latest \
/bin/bash
进入容器
docker exec -it swift bash
查看系统版本
cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.2 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo
2 安装环境
2.1 安装依赖
apt-get update
apt-get install -y gcc g++ make cmake zlib1g zlib1g-dev openssl libsqlite3-dev libssl-dev libffi-dev libbz2-dev libxslt1-dev unzip pciutils net-tools libblas-dev gfortran libblas3 vim zip wget git
# 安装过程中时区选择'Asia/Shanghai' 5,69
2.2 安装conda
mkdir /root/downloads
cd /root/downloads
wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-py311_25.1.1-2-Linux-aarch64.sh
bash Miniconda3-py311_25.1.1-2-Linux-aarch64.sh
# 安装过程中全选yes
source ~/.bashrc
2.3 安装cann
在资源中心下载并上传配套的软件包到服务器
# 添加环境变量
export LD_LIBRARY_PATH=/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:${LD_LIBRARY_PATH}
echo "export LD_LIBRARY_PATH=/usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/driver:${LD_LIBRARY_PATH}" >> ~/.bashrc
# 添加权限
chmod +x Ascend-cann-*
# 安装
./Ascend-cann-toolkit_8.3.RC1.alpha002_linux-aarch64.run --install
echo "source /usr/local/Ascend/ascend-toolkit/set_env.sh" >> ~/.bashrc
source ~/.bashrc
./Ascend-cann-kernels-910b_8.3.RC1.alpha002_linux-aarch64.run --install
3 python环境
3.1 python环境创建
conda create -n swift python=3.11
conda activate swift
echo "conda activate swift" >> ~/.bashrc
3.2 安装依赖
pip3 install attrs numpy==1.26.4 decorator sympy cffi pyyaml pathlib2 psutil protobuf scipy requests absl-py -i https://pypi.tuna.tsinghua.edu.cn/simple
安装torchvision包
pip3 install torchvision==0.20.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
安装torch
pip3 install torch==2.5.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
安装torch_npu
# 安装
pip install torch_npu==2.5.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
验证
python3 -c "import torch;import torch_npu; a = torch.randn(3, 4).npu(); print(a + a);"
4 swift
Swift DOCUMENTATION — swift 3.10.0.dev0 文档
4.1 安装
pip install ms-swift -U -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install transformers -U -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install deepspeed -i https://pypi.tuna.tsinghua.edu.cn/simple
4.2 Qwen3-Embedding-0.6B训练
Embedding训练 — swift 3.10.0.dev0 文档
pip install modelscope -i https://pypi.tuna.tsinghua.edu.cn/simple
# 下载权重
cd /weight
modelscope download --model Qwen/Qwen3-Embedding-0.6B --local_dir ./Qwen3-Embedding-0.6B
# 下载数据集
modelscope download --dataset sentence-transformers/stsb --local_dir ./sentence-transformers/stsb
创建训练脚本vim swift_train.sh ,执行bash swift_train.sh开始训练
nproc_per_node=8
NPROC_PER_NODE=$nproc_per_node \
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
swift sft \
--model /weight/Qwen3-Embedding-0.6B \
--task_type embedding \
--model_type qwen3_emb \
--train_type full \
--dataset /weight/sentence-transformers/stsb:positive \
--split_dataset_ratio 0.05 \
--eval_strategy steps \
--output_dir output \
--eval_steps 20 \
--num_train_epochs 5 \
--save_steps 20 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 4 \
--learning_rate 6e-6 \
--loss_type infonce \
--label_names labels \
--dataloader_drop_last true \
--deepspeed zero3
4.2 Qwen3-Reranker-0.6B训练
Reranker训练 — swift 3.10.0.dev0 文档
# 下载权重
cd /weight
modelscope download --model Qwen/Qwen3-Reranker-0.6B --local_dir ./Qwen3-Reranker-0.6B
# 下载数据集
modelscope download --dataset MTEB/scidocs-reranking --local_dir ./MTEB/scidocs-reranking
创建训练脚本vim swift_train.sh ,执行bash swift_train.sh开始训练
nproc_per_node=4
# 4*47G
# losses: plugin/loss.py
# only support --padding_side left
NPROC_PER_NODE=$nproc_per_node \
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
--model ./Qwen3-Reranker-0.6B \
--task_type generative_reranker \
--loss_type generative_reranker \
--train_type full \
--dataset ./MTEB/scidocs-reranking \
--load_from_cache_file true \
--split_dataset_ratio 0.05 \
--eval_strategy steps \
--padding_side left \
--output_dir output \
--eval_steps 100 \
--num_train_epochs 1 \
--save_steps 200 \
--per_device_train_batch_size 2 \
--per_device_eval_batch_size 2 \
--gradient_accumulation_steps 8 \
--dataset_num_proc 8 \
--learning_rate 6e-6 \
--label_names labels \
--dataloader_drop_last true
鲲鹏昇腾开发者社区是面向全社会开放的“联接全球计算开发者,聚合华为+生态”的社区,内容涵盖鲲鹏、昇腾资源,帮助开发者快速获取所需的知识、经验、软件、工具、算力,支撑开发者易学、好用、成功,成为核心开发者。
更多推荐



所有评论(0)