Skip to content

华为910b4-1显卡启动报错:Call aclInit(nullptr) failed : 507008 at file /work/PaddleCustomDevice/backends/npu/runtime/runtime.cc line 403 #72458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JackMacs opened this issue Apr 24, 2025 · 2 comments
Assignees

Comments

@JackMacs
Copy link

bug描述 Describe the Bug

docker run -itd --name intention_embedding
--hostname base-model-intention-embedding-1
--privileged --network=qll-net --shm-size=128G
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi
-v /usr/local/dcmi:/usr/local/dcmi
-v /root/xiaoe_model/intention_embedding/deploy:/home/qll_chat_intention_embedding/deploy
-v /root/xiaoe_model/license:/home/qll_ai/license
-e ASCEND_RT_VISIBLE_DEVICES="0"
swr.cn-east-3.myhuaweicloud.com/weaver-qianliling/deploy:hw_intention_embedding_private
bash

python -c "import paddle_custom_device; paddle_custom_device.npu.version()
报错

Image
ENV [CUSTOM_DEVICE_ROOT]=/usr/local/lib/python3.9/dist-packages/paddle_custom_device
I0411 10:12:08.927596 650 init.cc:146] Try loading custom device libs from: [/usr/local/lib/python3.9/dist-packages/paddle_custom_device]
Call aclInit(nullptr) failed : 507008 at file /work/PaddleCustomDevice/backends/npu/runtime/runtime.cc line 403
EE1001: 2025-04-11-10:12:09.580.593 The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship.
TraceBack (most recent call last):
faultVersion(0x50a) from driver[FUNC:InitSocTypeFrom910BVersion][FILE:runtime.cc][LINE:1327]
[Init][Version]init soc version failed, ret = 507008[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4692]
The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]

其他补充信息 Additional Supplementary Information

根据paddle链接(https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/hardware_support/npu/install_cn.html)拉取了镜像,安装了包。启动服务或者import paddle包报错如上。在容器内部执行npu-smi info是正常的。

@xiaoguoguo626807
Copy link
Contributor

这是偶现的问题吧,换个卡试试呢

@xiaoguoguo626807
Copy link
Contributor

Image 之前这个问题升级驱动之后发生率低了,可以试下升级驱动

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants