Skip to content

python test_low_latency.py blocked in H20 #103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
cll24 opened this issue Mar 31, 2025 · 1 comment
Open

python test_low_latency.py blocked in H20 #103

cll24 opened this issue Mar 31, 2025 · 1 comment

Comments

@cll24
Copy link

cll24 commented Mar 31, 2025

By analyzing the code, we find the code is blocked in the instantiation of buffer

Image

Image

Image

We got the following log when setting "NVSHMEM_DEBUG=INFO"

Image

What does the Error "cudaErrorInvalidValue 1 cudaErrorInvalidSymbol 13 cudaErrorInvalidMemcpyDirection 21 cudaErrorNoKernelImageForDevice 209" mean? Is it the root cause of process blocking.

Here is the configuration of NVSHMEM
Image

@sphish
Copy link
Collaborator

sphish commented Apr 1, 2025

The log line cudaErrorInvalidValue 1 cudaErrorInvalidSymbol 13 cudaErrorInvalidMemcpyDirection 21 cudaErrorNoKernelImageForDevice 209 is a normal log in the NVSHMEM.

I suggest you use gdb to print the backtrace when the test processes hang.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants