当互联网遇上AI存储革命:3FS+鲲鹏,如何打造高性能数据引擎?

​ ——记3FS存储引擎在鲲鹏平台的适配与实践

AI时代的存储困境与3FS的破局

在AI大模型训练与推理的浪潮中,数据存储的吞吐量和延迟成为制约效率的关键瓶颈。传统存储架构难以应对PB级数据的实时处理需求,而商业解决方案如DDN、VAST Data虽性能强劲,却因闭源和高成本令中小企业望而却步。2025年2月,DeepSeek开源的高性能并行文件系统3FS(Fire-Flyer File System)应运而生,以6.6 TiB/s的读取吞吐量强一致性语义,直击AI数据访问痛点,被业界誉为“颠覆存储架构”的里程碑。

3FS的核心优势在于:

  1. 全流程AI优化:覆盖数据预处理、训练样本加载、检查点恢复、推理KVCache缓存,实现“存算一体”加速;
  2. 分离式架构:整合数千块SSD与RDMA网络带宽,无需数据局部性依赖;
  3. 开源生态:填补高性能并行文件系统的开源空白,降低企业技术门槛。

为什么选择鲲鹏平台?

鲲鹏平台凭借其ARM架构的高性能、高能效和生态支持,成为AI和大数据场景的理想选择。鲲鹏处理器的多核设计和强大的计算能力,能够完美适配3FS的高性能需求。此外,鲲鹏平台与昇腾处理器的协同作战能力,为AI场景提供了更强大的算力支持。


实践:3FS与鲲鹏ARM平台的“化学反应”

我们选择TaiShan服务器进行深度适配,以下是关键步骤与成果:

存储引擎移植:从x86到ARM的适配
  • 代码适配:优化代码编译工具链、解决X86与ARM系统的地址位偏移差异,以适配ARM平台上openEuler操作系统。
  • 完善部署流程:参考官方文档,对部署流程进行补充,完善3FS在ARM上的部署体验。

一、部署环境

  1. 节点信息

    Node IP Memory NVMe SSD RDMA OS
    meta 192.168.65.10 256GB - 100GE ConnectX-6 OpenEuler 22.03 SP4
    fuseclient1 192.168.65.11 256GB - 100GE ConnectX-6 OpenEuler 22.03 SP4
    fuseclient2 192.168.65.12 256GB - 100GE ConnectX-6 OpenEuler 22.03 SP4
    fuseclient3 192.168.65.13 256GB - 100GE ConnectX-6 OpenEuler 22.03 SP4
    storage1 192.168.65.14 1TB 3TB*8 100GE ConnectX-6 OpenEuler 22.03 SP4
    storage2 192.168.65.15 1TB 3TB*8 100GE ConnectX-6 OpenEuler 22.03 SP4
    storage3 192.168.65.16 1TB 3TB*8 100GE ConnectX-6 OpenEuler 22.03 SP4
  2. 服务信息

    Service Binary Config files NodeID Node
    monitor monitor_collector_main monitor_collector_main.toml - meta
    admin_cli admin_cli admin_cli.toml fdb.cluster - meta storage1 storage2 storage3
    mgmtd mgmtd_main mgmtd_main_launcher.toml mgmtd_main.toml mgmtd_main_app.toml fdb.cluster 1 meta
    meta meta_main meta_main_launcher.toml meta_main.toml meta_main_app.toml fdb.cluster 100 meta
    storage storage_main storage_main_launcher.toml storage_main.toml storage_main_app.toml 10001~10003 storage1 storage2 storage3
    client hf3fs_fuse_main hf3fs_fuse_main_launcher.toml hf3fs_fuse_main.toml - meta

二、环境准备

关闭防火墙与设置seliunx模式

**注意:以下操作在所有节点进行操作**

关闭防火墙

systemctl stop firewalld
systemctl disable firewalld
systemctl status firewalld

在这里插入图片描述

设置selinuxpermissive模式

vi /etc/selinux/config

SELINUX=permissive

在这里插入图片描述

设置Hosts

**注意:以下操作在所有节点进行操作**

将 meta 对应的 ip 填入每个节点的 /etc/hosts

vi /etc/hosts
192.168.65.10 meta

集群时间同步

**注意:以下操作在所有节点进行操作**

yum install -y ntp ntpdate

在所有节点备份旧配置

cd /etc && mv ntp.conf ntp.conf.bak

**注意:以下操作在META节点进行操作**

vi /etc/ntp.conf
restrict 127.0.0.1
restrict ::1
restrict 192.168.65.10 mask 255.255.255.0
server 127.127.1.0
fudge 127.127.1.0 stratum 8

# Hosts on local network are less restricted.
restrict 192.168.65.10 mask 255.255.255.0 nomodify notrap

开启ntpd服务

systemctl start ntpd
systemctl enable ntpd
systemctl status ntpd

在这里插入图片描述

**注意:以下操作在除META外的节点进行操作**

vi /etc/ntp.conf
server 192.168.65.10

在除 meta的所有节点强制同步server时间。这里需要等待5min,否则会报错(no server suitable for synchronization found)

ntpdate meta

在除 meta的所有节点写入硬件时钟,避免重启后失效

hwclock -w

安装并启动crontab工具。

yum install -y crontabs
chkconfig crond on
systemctl start crond
crontab -e

每隔10分钟与meta节点同步一次时间

*/10 * * * * /usr/sbin/ntpdate 192.168.65.10

三、编译3FS

1.1 构建编译环境

​ 这里我们选择meta节点作为编译节点

① 安装编译所需的依赖

**注意:在所有节点进行操作**

# for openEuler
yum install cmake libuv-devel lz4-devel xz-devel double-conversion-devel libdwarf libdwarf-devel libunwind libunwind-devel libaio-devel gflags-devel glog glog-devel gtest-devel gmock-devel gperftools-devel gperftools openssl-devel gcc gcc-c++ boost* libatomic autoconf libevent-devel libibverbs libibverbs-devel cargo numactl-devel lld gperftools-devel gperftools double-conversion libibverbs rsync gperftools-libs glibc-devel python3-devel meson vim jemalloc -y

② 安装Rust

**注意:在META节点进行操作**

前往Rust官网下载aarch64架构对应的压缩包,并上传到服务器进行解压。

在这里插入图片描述

如果你的服务器可以直接连接Rust官网,可以通过wget命令获取对应的压缩包。

wget https://static.rust-lang.org/dist/rust-1.85.0-aarch64-unknown-linux-gnu.tar.xz
tar -xvf rust-1.85.0-aarch64-unknown-linux-gnu.tar.xz
cd rust-1.85.0-aarch64-unknown-linux-gnu
sh install.sh

③ 安装foundationdb

**注意:以下操作在META节点进行操作**

前往foundationdb/Github代码仓获取foundationdb安装包,并上传到服务器进行安装。
在这里插入图片描述

如果你的服务器可以直接链接GitHub,可以通过wget或者yum命令安装对应的安装包。

# 通过wget下载
wget https://github.com/apple/foundationdb/releases/download/7.3.63/foundationdb-clients-7.3.63-1.el9.aarch64.rpm
wget https://github.com/apple/foundationdb/releases/download/7.3.63/foundationdb-server-7.3.63-1.el9.aarch64.rpm
rpm -ivh foundationdb-clients-7.3.63-1.el9.aarch64.rpm
rpm -ivh foundationdb-server-7.3.63-1.el9.aarch64.rpm
# 或者通过yum下载
yum install https://github.com/apple/foundationdb/releases/download/7.3.63/foundationdb-clients-7.3.63-1.el9.aarch64.rpm -y
yum install https://github.com/apple/foundationdb/releases/download/7.3.63/foundationdb-server-7.3.63-1.el9.aarch64.rpm -y

④ 安装毕昇编译器

**注意:以下操作在META节点进行操作**

  1. 可以前往毕昇编译器官网下载4.2.0版本编译器压缩包,上传到服务器进行解压。

    在这里插入图片描述

    或者通过wget命令直接获取对应的压缩包:

    cd /home
    wget https://mirrors.huaweicloud.com/kunpeng/archive/compiler/bisheng_compiler/BiShengCompiler-4.2.0-aarch64-linux.tar.gz
    tar -xvf BiShengCompiler-4.2.0-aarch64-linux.tar.gz
    
  2. 临时配置环境变量:

    export PATH=/home/BiShengCompiler-4.2.0-aarch64-linux/bin:$PATH
    
  3. 查看编译器版本:

    clang -v
    BiSheng Enterprise 4.2.0.B009 clang version 17.0.6 (958fd14d28f0)
    Target: aarch64-unknown-linux-gnu
    Thread model: posix
    InstalledDir: /home/BiShengCompiler-4.2.0-aarch64-linux/bin
    Found candidate GCC installation: /usr/lib/gcc/aarch64-linux-gnu/10.3.1
    Selected GCC installation: /usr/lib/gcc/aarch64-linux-gnu/10.3.1
    Candidate multilib: .;@m64
    Selected multilib: .;@m64
    
    

⑤安装libfuse

**注意:以下操作在META节点和Client节点进行操作**

Release libfuse 3.16.1 · libfuse/libfuse · GitHub

# 下载包
cd /home
wget https://github.com/libfuse/libfuse/releases/download/fuse-3.16.1/fuse-3.16.1.tar.gz
# 解压包
tar vzxf fuse-3.16.1.tar.gz
cd fuse-3.16.1
mkdir build
cd build
yum install -y meson
meson setup ..
ninja
ninja install

1.2 获取源码及编译

**注意:以下操作在META节点进行操作**

export LD_LIBRARY_PATH=/usr/local/lib64:/usr/local/lib:/usr/local:/usr/lib:/usr/lib64:$LD_LIBRARY_PATH
yum install git -y
cd /home
git clone https://gitee.com/kunpeng_compute/3FS.git
cd 3FS
git checkout origin/openeuler
git submodule update --init --recursive
./patches/apply.sh
cmake -S . -B build -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
cmake --build build -j

检查编辑结果

cd build/bin
ll 

在这里插入图片描述

三、部署3FS

3.1 Meta节点

① 安装ClickHouse

**注意:以下操作在META节点进行操作**

  1. 获取安装包并安装

    cd /home
    curl -k https://clickhouse.com/ | sh
    sudo ./clickhouse install
    

    安装完成时会让我们输入密码,假设我们输入的密码为clickhouse123

  2. 修改ClickHouse默认端口:

    chmod 660 /etc/clickhouse-server/config.xml
    vim /etc/clickhouse-server/config.xml
    

    定位到<tcp_port>标签 将端口号修改为9123:

    ...
    <tcp_port>9123</tcp_port>
    ...
    
  3. 启动clickhouse

    clickhouse start
    
  4. 创建Metric table

    clickhouse-client --port 9123 --password 'clickhouse123' -n  < /home/3FS/deploy/sql/3fs-monitor.sql
    

② 更新FoundationDB配置

**注意:以下操作在META节点进行操作**

  1. 更新FoundationDB的配置

    vim /etc/foundationdb/foundationdb.conf
    # 定位到 [fdbserver] 下的 public-address 配置项,将其修改为 本机IP:$ID
    # 如Meta节点Ip为192.168.65.10,则修改为 192.168.65.10:$ID
    
    vim /etc/foundationdb/fdb.cluster
    # 将文件中的 127.0.0.1:4500 修改为本机IP:4500
    # 如Meta节点Ip为192.168.65.10,则修改为 192.168.65.10:4500
    
  2. 重启FoundationDB服务

    systemctl restart foundationdb.service
    
  3. 查看FoundationDB服务端口

    ss -tuln
    # Netid      State       Recv-Q      Send-Q            Local Address:Port           Peer Address:Port      Process
    # ...
    # tcp        LISTEN      0           4096             192.168.65.10:4500                 0.0.0.0:*
    # ...
    

③ 启动monitor_collector

**注意:以下操作在META节点进行操作**

  1. 从编译服务器获取二进制文件以及配置文件,本文编译节点为Meta节点:

    mkdir -p /var/log/3fs
    mkdir -p /opt/3fs/{bin,etc}
    rsync -avz meta:/home/3FS/build/bin/monitor_collector_main /opt/3fs/bin/
    rsync -avz meta:/home/3FS/configs/monitor_collector_main.toml /opt/3fs/etc/
    rsync -avz meta:/home/3FS/deploy/systemd/monitor_collector_main.service /usr/lib/systemd/system
    
  2. 修改配置文件monitor_collector_main.toml

    vim /opt/3fs/etc/monitor_collector_main.toml
    
    ...
    [server.base.groups.listener]
    filter_list = ['enp1s0f0np0']             		# 查询RDMA网卡名填入
    listen_port = 10000
    listen_queue_depth = 4096
    rdma_listen_ethernet = true
    reuse_port = false
    ...
    [server.monitor_collector.reporter.clickhouse]
    db = '3fs'
    host = '127.0.0.1'
    passwd = 'clickhouse123'
    port = '9123'
    user = 'default'
    
  3. 启动monitor_collector服务:

    systemctl start monitor_collector_main
    
  4. 检查monitor_collector状态:

    systemctl status monitor_collector_main
    

    在这里插入图片描述

  5. 检查端口情况:

    ss -tuln
    # Netid      State       Recv-Q      Send-Q            Local Address:Port            Peer Address:Port     Process
    # ...
    # tcp        LISTEN      0           4096             192.168.65.10:10000                0.0.0.0:*
    # ...
    

    **注意:**这里只能有一个192.168.65.10:10000 不能存在 127.0.0.1:10000 避免其他服务连接端口错误。如果存在多个10000端口,请检查上面monitor_collector_main.toml文件中是否填写filter_list配置项,下不赘述

④ 安装Admin Client

**注意:以下操作在所有节点进行操作**

  1. 从编译服务器获取二进制文件以及配置文件:

    mkdir -p /var/log/3fs
    mkdir -p /opt/3fs/{bin,etc}
    rsync -avz meta:/home/3FS/build/bin/admin_cli /opt/3fs/bin
    rsync -avz meta:/home/3FS/configs/admin_cli.toml /opt/3fs/etc
    rsync -avz meta:/etc/foundationdb/fdb.cluster /opt/3fs/etc
    rsync -avz meta:/usr/lib64/libfdb_c.so /usr/lib64/
    
  2. 更新配置文件admin_cli.toml

    vim /opt/3fs/etc/admin_cli.toml
    
    ...
    cluster_id = "stage"
    ...
    [fdb]
    clusterFile = '/opt/3fs/etc/fdb.cluster'
    
    [mgmtd_client]
    mgmtd_server_addresses = ["RDMA://192.168.65.10:8000"]
    ...
    
  3. 查看帮助

    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml help
    

⑤ 启动Mgmtd Service

**注意:以下操作在META节点进行操作**

  1. 从编译服务器获取二进制文件以及配置文件:

    mkdir -p /var/log/3fs
    mkdir -p /opt/3fs/{bin,etc}
    rsync -avz meta:/home/3FS/build/bin/mgmtd_main /opt/3fs/bin
    rsync -avz meta:/home/3FS/configs/{mgmtd_main.toml,mgmtd_main_launcher.toml,mgmtd_main_app.toml} /opt/3fs/etc
    rsync -avz meta:/home/3FS/deploy/systemd/mgmtd_main.service /usr/lib/systemd/system
    
  2. 修改配置文件mgmtd_main_app.toml

    vim /opt/3fs/etc/mgmtd_main_app.toml
    
    allow_empty_node_id = true
    node_id = 1 # 修改node_id 为1
    
  3. 修改配置文件mgmtd_main_launcher.toml

    vim /opt/3fs/etc/mgmtd_main_launcher.toml
    
    ...
    cluster_id = "stage"
    ...
    [fdb]
    clusterFile = '/opt/3fs/etc/fdb.cluster'
    ...
    
  4. 修改配置文件mgmtd_main.toml

    vim /opt/3fs/etc/mgmtd_main.toml
    
    ...
    [server.base.groups.listener]
    filter_list = ['enp1s0f0np0'] # 查询RDMA网卡名填入
    ...
    [server.base.groups.listener] 
    filter_list = ['enp1s0f0np0'] # 查询RDMA网卡名填入
    ...
    [common.monitor.reporters.monitor_collector]
    remote_ip = "192.168.65.10:10000" # monitor_collector节点ip及端口
    ...
    
  5. 初始化集群

    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml "init-cluster --mgmtd /opt/3fs/etc/mgmtd_main.toml 1 1048576 16"
    # Init filesystem, root directory layout: chain table ChainTableId(1), chunksize 1048576, stripesize 16
    #
    # Init config for MGMTD version 1
    
  6. 启动服务

    systemctl start mgmtd_main
    
  7. 检查服务状态

    systemctl status mgmtd_main
    

    在这里插入图片描述

  8. 检查端口

    ss -tuln
    # Netid      State       Recv-Q      Send-Q            Local Address:Port            Peer Address:Port     Process
    # ...
    # tcp        LISTEN      0           4096             192.168.65.10:8000                0.0.0.0:*
    # tcp        LISTEN      0           4096             192.168.65.10:9000                0.0.0.0:*
    # ...
    
  9. 检查集群Nodes List

    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml "list-nodes"
    
    # Id     Type     Status               Hostname  Pid      Tags  LastHeartbeatTime    ConfigVersion  ReleaseVersion
    # 1      MGMTD    PRIMARY_MGMTD        meta      2281735  []    N/A                  1(UPTODATE)   250228-dev-1-999999-923bdd7c
    

⑥ 启动Meta Server

**注意:以下操作在META节点进行操作**

  1. 从编译服务器获取二进制文件以及配置文件:

    mkdir -p /var/log/3fs
    mkdir -p /opt/3fs/{bin,etc}
    rsync -avz meta:/home/3FS/build/bin/meta_main /opt/3fs/bin
    rsync -avz meta:/home/3FS/configs/{meta_main_launcher.toml,meta_main.toml,meta_main_app.toml} /opt/3fs/etc
    rsync -avz meta:/home/3FS/deploy/systemd/meta_main.service /usr/lib/systemd/system
    
  2. 更新配置文件meta_main_app.toml

    vim /opt/3fs/etc/meta_main_app.toml
    
    allow_empty_node_id = true
    node_id = 100 # 更新node_id
    
  3. 更新配置文件meta_main_launcher.toml

    vim /opt/3fs/etc/meta_main_launcher.toml
    
    ...
    cluster_id = "stage"
    ...
    [mgmtd_client]
    mgmtd_server_addresses = ["RDMA://192.168.65.10:8000"]
    ...
    
  4. 更新配置文件meta_main.toml

    vim /opt/3fs/etc/meta_main.toml
    
    ...
    [server.mgmtd_client]
    mgmtd_server_addresses = ["RDMA://192.168.65.10:8000"]
    
    [common.monitor.reporters.monitor_collector]
    remote_ip = "192.168.65.10:10000"
    ...
    [server.fdb]
    clusterFile = '/opt/3fs/etc/fdb.cluster'
    ...
    [server.base.groups.listener]
    filter_list = ['enp1s0f0np0']
    listen_port = 8001
    ...
    [server.base.groups.listener]
    filter_list = ['enp1s0f0np0']
    listen_port = 9001
    ...
    
  5. 向Mgmtd Server更新Meta节点配置

    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml "set-config --type META --file /opt/3fs/etc/meta_main.toml"
    
  6. 启动服务

    systemctl start meta_main
    
  7. 检查服务状态

    systemctl status meta_main
    

    在这里插入图片描述

  8. 检查端口

    ss -tuln
    # Netid      State       Recv-Q      Send-Q            Local Address:Port            Peer Address:Port     Process
    # ...
    # tcp        LISTEN      0           4096             192.168.65.10:8001                0.0.0.0:*
    # tcp        LISTEN      0           4096             192.168.65.10:9001                0.0.0.0:*
    # ...
    
  9. 检查集群Nodes List

    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml "list-nodes"
    # Id     Type     Status               Hostname  Pid      Tags  LastHeartbeatTime    ConfigVersion  ReleaseVersion
    # 1      MGMTD    PRIMARY_MGMTD        meta      2281735  []    N/A                  1(UPTODATE)   250228-dev-1-999999-923bdd7c
    # 100    META     HEARTBEAT_CONNECTED  meta      2281842  []    2025-03-12 17:01:32  1(UPTODATE)   250228-dev-1-999999-923bdd7c
    

3.2 Storage节点

**注意:以下操作在Storage节点进行操作**

① SSD盘准备

  1. 查看可以用来挂载的硬盘

    lsblk
    # NAME                      MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
    # ...
    # nvme0n1                   259:0    0  2.9T  0 disk
    # nvme1n1                   259:1    0  2.9T  0 disk 
    # nvme2n1                   259:2    0  2.9T  0 disk
    # nvme3n1                   259:3    0  2.9T  0 disk 
    # nvme4n1                   259:4    0  2.9T  0 disk
    # nvme5n1                   259:5    0  2.9T  0 disk
    # nvme6n1                   259:6    0  2.9T  0 disk
    # nvme7n1                   259:7    0  2.9T  0 disk 
    # ...
    
  2. 创建目录

    mkdir -p /storage/data{0..7}
    mkdir -p /var/log/3fs
    
  3. 格式化硬盘并进行挂载

    注意!我们的环境8块NVMe盘是从0~7连续的,可以直接使用下面的命令,大家根据自己的环境灵活调整命名,不要直接复制使用!

    for i in {0..7};do mkfs.xfs -L data${i} /dev/nvme${i}n1;mount -o noatime,nodiratime -L data${i} /storage/data${i};done
    mkdir -p /storage/data{0..7}/3fs
    
  4. 检查格式化及挂载结果

    lsblk
    # NAME                      MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
    # ...
    # nvme0n1                   259:0    0  2.9T  0 disk /storage/data0
    # nvme1n1                   259:1    0  2.9T  0 disk /storage/data1
    # nvme2n1                   259:2    0  2.9T  0 disk /storage/data2
    # nvme3n1                   259:3    0  2.9T  0 disk /storage/data3
    # nvme4n1                   259:4    0  2.9T  0 disk /storage/data4
    # nvme5n1                   259:5    0  2.9T  0 disk /storage/data5
    # nvme6n1                   259:6    0  2.9T  0 disk /storage/data6
    # nvme7n1                   259:7    0  2.9T  0 disk /storage/data7
    # ...
    

② 增加aio请求的最大数

sysctl -w fs.aio-max-nr=67108864

③ 启动Storage Server服务

  1. 从编译服务器获取二进制文件以及配置文件:

    mkdir -p /opt/3fs/{bin,etc}
    mkdir -p /var/log/3fs
    rsync -avz meta:/home/3FS/build/bin/storage_main /opt/3fs/bin
    rsync -avz meta:/home/3FS/configs/{storage_main_launcher.toml,storage_main.toml,storage_main_app.toml} /opt/3fs/etc
    rsync -avz meta:/home/3FS/deploy/systemd/storage_main.service /usr/lib/systemd/system
    rsync -avz meta:/usr/lib64/libfdb_c.so /usr/lib64
    
    
  2. 更新配置文件storage_main_app.toml

    vim /opt/3fs/etc/storage_main_app.toml
    
    allow_empty_node_id = true
    node_id = 10001   			# 更新node_id 注意几个storage节点的node_list不同
    
  3. 更新配置文件storage_main_launcher.toml

    vim /opt/3fs/etc/storage_main_launcher.toml
    
    ...
    cluster_id = "stage"
    ...
    [mgmtd_client]
    mgmtd_server_addresses = ["RDMA://192.168.65.10:8000"]
    ...
    
  4. 更新配置文件storage_main.toml

    vim /opt/3fs/etc/storage_main.toml
    
    ...
    [server.base.groups.listener]
    filter_list = ['enp133s0f0np0']
    listen_port = 8000
    ...
    [server.base.groups.listener]
    filter_list = ['enp133s0f0np0']
    listen_port = 9000
    ...
    [server.mgmtd]
    mgmtd_server_address = ["RDMA://192.168.65.10:8000"]
    ...
    [common.monitor.reporters.monitor_collector]
    remote_ip = "192.168.65.10:10000"
    ...
    [server.targets]
    target_paths = ["/storage/data0/3fs","/storage/data1/3fs","/storage/data2/3fs","/storage/data3/3fs","/storage/data4/3fs","/storage/data5/3fs","/storage/data6/3fs","/storage/data7/3fs"]
    ...
    
  5. 向Mgmtd Server更新Storage节点配置

    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml "set-config --type STORAGE --file /opt/3fs/etc/storage_main.toml"
    
  6. 启动服务

    systemctl start storage_main
    
  7. 检查服务状态

    systemctl status storage_main
    

    在这里插入图片描述

  8. 检查集群Nodes List

    如果没有看到Storage节点,通常需要等待1~2分钟左右

    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml "list-nodes"
    # Id     Type     Status               Hostname  Pid      Tags  LastHeartbeatTime    ConfigVersion  ReleaseVersion
    # 1      MGMTD    PRIMARY_MGMTD        meta      2281735  []    N/A                  1(UPTODATE)   250228-dev-1-999999-923bdd7c
    # 100    META     HEARTBEAT_CONNECTED  meta      2281842  []    2025-03-12 17:01:32  1(UPTODATE)   250228-dev-1-999999-923bdd7c
    # 10001  STORAGE  HEARTBEAT_CONNECTED  storage1  3294593  []    2025-03-12 17:38:13  1(UPTODATE)   250228-dev-1-999999-923bdd7c
    # 10002  STORAGE  HEARTBEAT_CONNECTED  storage2  476286   []    2025-03-12 17:38:12  1(UPTODATE)   250228-dev-1-999999-923bdd7c
    # 10003  STORAGE  HEARTBEAT_CONNECTED  storage3  2173767  []    2025-03-12 17:38:12  1(UPTODATE)   250228-dev-1-999999-923bdd7c
    

3.3 创建admin user、storage targets和chain table

**注意:以下操作在META节点进行操作**

  1. 创建admin user

    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml "user-add --root --admin 0 root"
    # Uid                0
    # Name               root
    # Token              AAB7sN/h8QBs7/+B2wBQ03Lp(Expired at N/A)
    # IsRootUser         true
    # IsAdmin            true
    # Gid                0
    

    其中AAB7sN/h8QBs7/+B2wBQ03Lp就是Token,将其保存到/opt/3fs/etc/token.txt中。

  2. 安装python依赖

    pip3 install -r /home/3FS/deploy/data_placement/requirements.txt
    
  3. 创建chain Table

    cd /home
    python3 /home/3FS/deploy/data_placement/src/model/data_placement.py \
       -ql -relax -type CR --num_nodes 3 --replication_factor 3 --min_targets_per_disk 6
    

    其中关注以下配置项:

    • --num_nodes:存储节点数量;
    • --replication_factor:副本因子;

    执行成功会在当前目录下生成一个output/DataPlacementModel-v_*文件夹,如/home/output/DataPlacementModel-v_3-b_6-r_6-k_3-λ_3-lb_3-ub_3

  4. 创建chainTable

    python3 /home/3FS/deploy/data_placement/src/setup/gen_chain_table.py \
       --chain_table_type CR --node_id_begin 10001 --node_id_end 10003 \
       --num_disks_per_node 8 --num_targets_per_disk 6 \
       --target_id_prefix 1 --chain_id_prefix 9 \
       --incidence_matrix_path /home/output/DataPlacementModel-v_3-b_6-r_6-k_3-λ_3-lb_3-ub_3/incidence_matrix.pickle
    

    其中关注以下配置项:

    • --node_id_begin:storage节点NodeId开始值;
    • --node_id_end:storage节点NodeId结束值;
    • --num_disks_per_node:每个存储节点挂载了几块硬盘;
    • -num_targets_per_disk:每个挂载的硬盘有几个taget;
    • --incidence_matrix_path:上一步生成文件路径;

    执行成功后,查看output目录下是否产生了一下文件:

    -rw-r--r--  1 root root  2387 Mar  6 11:55 generated_chains.csv
    -rw-r--r--  1 root root   488 Mar  6 11:55 generated_chain_table.csv
    -rw-r--r--  1 root root 15984 Mar  6 11:55 remove_target_cmd.txt
    
  5. 创建storage target

    /opt/3fs/bin/admin_cli --cfg /opt/3fs/etc/admin_cli.toml --config.user_info.token $(<"/opt/3fs/etc/token.txt") < /home/output/create_target_cmd.txt
    
  6. 上传chains到mgmtd service

    /opt/3fs/bin/admin_cli --cfg /opt/3fs/etc/admin_cli.toml --config.user_info.token $(<"/opt/3fs/etc/token.txt") "upload-chains /home/output/generated_chains.csv"
    
  7. 上传chain table到mgmtd service

    /opt/3fs/bin/admin_cli --cfg /opt/3fs/etc/admin_cli.toml --config.user_info.token $(<"/opt/3fs/etc/token.txt") "upload-chain-table --desc stage 1 /home/output/generated_chain_table.csv"
    
  8. 查看list-chains

    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml "list-chains"
    # ChainId    ReferencedBy  ChainVersion  Status   PreferredOrder  Target                          Target   Target
    # 900100001  1             1             SERVING  []              101000300101(SERVING-UPTODATE)  101000200101(SERVING-UPTODATE)  101000100101(SERVING-UPTODATE)
    # 900100002  1             1             SERVING  []              101000300102(SERVING-UPTODATE)  101000200102(SERVING-UPTODATE)  101000100102(SERVING-UPTODATE)
    # ...
    

3.4 FUSE Client节点

**注意:在FUSE Client节点进行操作**

  1. 从编译服务器获取二进制文件以及配置文件:

    mkdir -p /var/log/3fs
    mkdir -p /opt/3fs/{bin,etc}
    rsync -avz meta:/home/3FS/build/bin/hf3fs_fuse_main /opt/3fs/bin
    rsync -avz meta:/home/3FS/build/bin/admin_cli /opt/3fs/bin
    rsync -avz meta:/home/3FS/configs/{hf3fs_fuse_main_launcher.toml,hf3fs_fuse_main.toml,hf3fs_fuse_main_app.toml} /opt/3fs/etc
    rsync -avz meta:/home/3FS/deploy/systemd/hf3fs_fuse_main.service /usr/lib/systemd/system
    rsync -avz meta:/opt/3fs/etc/token.txt /opt/3fs/etc
    rsync -avz meta:/usr/lib64/libfdb_c.so /usr/lib64
    
  2. 创建挂载点

    mkdir -p /3fs/stage
    
  3. 更新配置文件hf3fs_fuse_main_launcher.toml

    vim /opt/3fs/etc/hf3fs_fuse_main_launcher.toml
    
    ...
    cluster_id = "stage"
    mountpoint = '/3fs/stage'
    token_file = '/opt/3fs/etc/token.txt'
    ...
    [mgmtd_client]
    mgmtd_server_addresses = ["RDMA://192.168.65.10:8000"]
    ...
    
  4. 更新配置文件hf3fs_fuse_main.toml

    vim /opt/3fs/etc/hf3fs_fuse_main.toml
    
    ...
    [mgmtd]
    mgmtd_server_addresses = ["RDMA://192.168.65.10:8000"]
    ...
    [common.monitor.reporters.monitor_collector]
    remote_ip = "192.168.65.10:10000"
    ...
    
  5. 更新配置:

    /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml "set-config --type FUSE --file /opt/3fs/etc/hf3fs_fuse_main.toml"
    
  6. 启动服务

    systemctl start hf3fs_fuse_main
    
  7. 检查服务状态

    systemctl status hf3fs_fuse_main
    

    在这里插入图片描述

  8. 检查挂载点

    df -h
    # Filesystem                         Size  Used Avail Use% Mounted on
    # ...
    # hf3fs.stage                         70T  650G   70T   1% /3fs/stage
    
    

四、测试3FS

使用fio在3个客户端对3FS进行并发读取测试:

yum install fio -y
fio -numjobs=128 -fallocate=none -iodepth=2 -ioengine=libaio -direct=1 \
	-rw=read -bs=4M --group_reporting -size=100M -time_based -runtime=3000 \
	-name=2depth_128file_4M_direct_read_bw -directory=/3fs/stage

4M并发读测试下,每个客户端都能达到10GB/s的带宽。

在这里插入图片描述

五、展望——3FS+鲲鹏,AI基建的新范式

此次适配不仅验证了3FS在ARM生态的成熟度,更揭示了AI技术栈的无限可能。

3FS的高性能与开源属性,为AI时代的数据引擎提供了新方案。我们期待,这场由3FS引领的存储革命,将加速AI基础设施的全面崛起。

3FS在鲲鹏平台上的成功实践,不仅展现了其在高性能存储领域的潜力,也为AI和大数据场景提供了新的选择。未来,随着技术的不断进步和生态的完善,3FS有望在更多领域中大放异彩,成为存储领域的标杆。

持续关注我们,获取更多3FS在鲲鹏平台的优化实践案例!

Logo

鲲鹏昇腾开发者社区是面向全社会开放的“联接全球计算开发者,聚合华为+生态”的社区,内容涵盖鲲鹏、昇腾资源,帮助开发者快速获取所需的知识、经验、软件、工具、算力,支撑开发者易学、好用、成功,成为核心开发者。

更多推荐