Skip to content

low latency mode bandwidth #148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
duzeyan opened this issue May 8, 2025 · 5 comments
Open

low latency mode bandwidth #148

duzeyan opened this issue May 8, 2025 · 5 comments

Comments

@duzeyan
Copy link

duzeyan commented May 8, 2025

In an H20 2-node setup, why is the bandwidth of test_internode much higher than that of test_low_latency?

test_internode.py log,token's num is 4096
[tuning] Best combine: SMs 24, NVL chunk 4, RDMA chunk 32: 62.15 GB/s (RDMA), 205.04 GB/s (NVL)

test_low_latency.py log, token's num is 128
[rank 2] Dispatch + combine bandwidth: 0.26 GB/s, avg_t=83610.61 us, min_t=924.03 us, max_t=377962.68 us
[rank 4] Dispatch + combine bandwidth: 0.26 GB/s, avg_t=83610.93 us, min_t=644.26 us, max_t=377964.57 us
[rank 7] Dispatch + combine bandwidth: 0.26 GB/s, avg_t=83611.20 us, min_t=920.80 us, max_t=377974.52 us
[rank 1] Dispatch + combine bandwidth: 0.26 GB/s, avg_t=83612.06 us, min_t=778.56 us, max_t=377976.93 us
[rank 5] Dispatch + combine bandwidth: 0.26 GB/s, avg_t=84203.44 us, min_t=785.44 us, max_t=395137.12 us
[rank 6] Dispatch + combine bandwidth: 0.26 GB/s, avg_t=84464.68 us, min_t=774.43 us, max_t=402746.15 us
[rank 3] Dispatch + combine bandwidth: 0.26 GB/s, avg_t=86244.62 us, min_t=665.34 us, max_t=454342.25 us
[rank 0] Dispatch + combine bandwidth: 0.26 GB/s, avg_t=86347.94 us, min_t=703.68 us, max_t=413629.70 us
[rank 4] Dispatch bandwidth: 0.10 GB/s, avg_t=73857.00 us | Combine bandwidth: 1.24 GB/s, avg_t=11727.00 us
[rank 7] Dispatch bandwidth: 0.11 GB/s, avg_t=68931.00 us | Combine bandwidth: 0.88 GB/s, avg_t=16481.00 us
[rank 0] Dispatch bandwidth: 0.13 GB/s, avg_t=57557.00 us | Combine bandwidth: 0.50 GB/s, avg_t=29269.00 us
[rank 2] Dispatch bandwidth: 0.10 GB/s, avg_t=74740.00 us | Combine bandwidth: 1.36 GB/s, avg_t=10686.00 us
[rank 5] Dispatch bandwidth: 0.10 GB/s, avg_t=76297.00 us | Combine bandwidth: 1.59 GB/s, avg_t=9131.00 us
[rank 1] Dispatch bandwidth: 0.16 GB/s, avg_t=48329.00 us | Combine bandwidth: 0.39 GB/s, avg_t=37488.00 us
[rank 6] Dispatch bandwidth: 0.12 GB/s, avg_t=63882.00 us | Combine bandwidth: 0.57 GB/s, avg_t=25385.00 us
[rank 3] Dispatch bandwidth: 0.10 GB/s, avg_t=73351.00 us | Combine bandwidth: 1.07 GB/s, avg_t=13643.00 us
[rank 7] Dispatch send/recv time: 1088.61 us | Combine send/recv time: 1319.52 us
[rank 4] Dispatch send/recv time: 229.82 us | Combine send/recv time: 333.26 us
[rank 3] Dispatch send/recv time: 1100.09 us | Combine send/recv time: 1346.97 us
[rank 6] Dispatch send/recv time: 308.76 us | Combine send/recv time: 381.72 us
[rank 2] Dispatch send/recv time: 343.36 us | Combine send/recv time: 445.31 us
[rank 1] Dispatch send/recv time: 728.29 us | Combine send/recv time: 924.38 us
[rank 5] Dispatch send/recv time: 822.57 us | Combine send/recv time: 1049.78 us
[rank 0] Dispatch send/recv time: 29.04 us | Combine send/recv time: 29.63 us

@alpha-baby
Copy link

Maybe you can look at the bandwidth monitoring of the physical RNIC and see if all the network cards are used.

@zartbot
Copy link

zartbot commented May 9, 2025

from the Normal Kernel result, seems you have 8x400Gbps RDMA NIC. but LL mode seems has significant congestion, is IB network or RoCE ?

@duzeyan
Copy link
Author

duzeyan commented May 12, 2025

from the Normal Kernel result, seems you have 8x400Gbps RDMA NIC. but LL mode seems has significant congestion, is IB network or RoCE ?

We are using RoCE. Do you have any suggestions in this regard, such as tools or benchmark scripts?
Additionally, the ​​internode​​ test results seem to align with expectations, so the relevant network cards should all be in use, right? @alpha-baby

@alpha-baby
Copy link

modify test_ll_compatibility from False to True in file: test_internode.py

run test file: test_internode.py, and show your test result.

@zartbot
Copy link

zartbot commented May 14, 2025

from the Normal Kernel result, seems you have 8x400Gbps RDMA NIC. but LL mode seems has significant congestion, is IB network or RoCE ?

We are using RoCE. Do you have any suggestions in this regard, such as tools or benchmark scripts? Additionally, the ​​internode​​ test results seem to align with expectations, so the relevant network cards should all be in use, right? @alpha-baby

From your previously result on normal kernel , I assume you have 8x400Gbps RoCE NIC, total BW is 3.2Tbps. what's the topology of these 2-H20 node, all ports on same switch ? how about the ECN configuration ? You may try to tune your network and CC on NIC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants