目前昇腾的多模态大模型推理能力主要集成在MindIE推理引擎的LLM和SD组件

  • 多模态理解:MindIE LLM
  • 多模态生成:MindIE SD

MindIE最新版本支持的多模态模型

LLaVa、Qwen-VL、internVL、internLM-XComposer2、MiniCPM-V2、MiniCPM-LLaMa3-V2.5支持多模态理解VLM模型对接服务化调度、单图url/base64。

Open-Sora推理

1、开启cpu高性能模式
echo performance |tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
sysctl -w vm.swappiness=0
sysctl -w kernel.numa_balancing=0
2、视频生成
python inference.py
./configs/opensora/inference/16x256x256.py
–ckpt-path ./OpenSora-v1-HQ-16x512x512.pth
–prompt-path ./assets/texts/t2v_samples.txt
–use_mindie 1
–device_id 0

参数说明:

  • –ckpt-path:STDIT的权重路径
  • –prompt-path:prompt数据集的路径
  • –use_mindie:是否使用MindIE推理。1代表是,0代表否
  • –device_id:使用哪张卡

模型推理性能

StableDiffusion v1.5

StableDiffusion v2.1

StableDiffusion v1.5 迭代50次精度:

average score: 0.363
category average scores:
[Abstract], average score: 0.280
[Vehicles], average score: 0.363
[Illustrations], average score: 0.359
[Arts], average score: 0.404
[World Knowledge], average score: 0.372
[People], average score: 0.364
[Animals], average score: 0.373
[Artifacts], average score: 0.359
[Food & Beverage], average score: 0.355
[Produce & Plants], average score: 0.358
[Outdoor Scenes], average score: 0.355
[Indoor Scenes], average score: 0.368

StableDiffusion v2.1 迭代50次精度:

average score: 0.376
category average scores:
[Abstract], average score: 0.285
[Vehicles], average score: 0.377
[Illustrations], average score: 0.376
[Arts], average score: 0.414
[World Knowledge], average score: 0.383
[People], average score: 0.381
[Animals], average score: 0.389
[Artifacts], average score: 0.369
[Food & Beverage], average score: 0.369
[Produce & Plants], average score: 0.364
[Outdoor Scenes], average score: 0.366
[Indoor Scenes], average score: 0.381

StableVideoDiffusion

Tips:
MindSpeed-MM是面向大规模分布式训练的昇腾多模态大模型套件,同时支持多模态生成及多模态理解,旨在为华为 昇腾芯片 提供端到端的多模态训练解决方案, 包含预置业界主流模型,数据工程,分布式训练及加速,预训练、微调、在线推理任务等特性。

Logo

鲲鹏昇腾开发者社区是面向全社会开放的“联接全球计算开发者,聚合华为+生态”的社区,内容涵盖鲲鹏、昇腾资源,帮助开发者快速获取所需的知识、经验、软件、工具、算力,支撑开发者易学、好用、成功,成为核心开发者。

更多推荐