some questions in ibgda #113

Thunderbrook · 2025-04-08T05:28:53Z

Hi, I have some questions about the following code in ibgda

void ibgda_submit_requests(nvshmemi_ibgda_device_qp_t *qp, uint64_t base_wqe_idx,
                           uint32_t num_wqes, int message_idx = 0) {
    nvshmemi_ibgda_device_qp_management_t *mvars = &qp->mvars;
    uint64_t new_wqe_idx = base_wqe_idx + num_wqes;

    // WQE writes must be finished first
    __threadfence();    // (1)

    // Wait for prior WQE slots to be filled first
    auto *ready_idx = reinterpret_cast<unsigned long long int*>(&mvars->tx_wq.ready_head);
    while (atomicCAS(ready_idx, base_wqe_idx, new_wqe_idx) != base_wqe_idx);     // (2)

    // Always post, not in batch
    constexpr int kNumRequestInBatch = 4;
    if (kAlwaysDoPostSend or (message_idx + 1) % kNumRequestInBatch == 0)
        ibgda_post_send(qp, new_wqe_idx);
}

(1) I personally understand that the purpose of threadfence here is to ensure that writing to the WQE and writing to the DB do not occur out of order. From the view of the NIC, should threadfence_system be used instead?
(2) I personally understand that all threads executing atomicCAS have different compare/swap values, so is it necessary to use "atomic" operations in this case?

Thanks

sphish · 2025-04-08T06:40:50Z

The ibgda_submit_requests code here is simplified from NVSHMEM, and I'm also somewhat unclear about the synchronization semantics needed when GPUs and NICs interact with each other. However, NVSHMEM implements it this way, and I believe that as an internal NVIDIA team, they have more insight into these details and can ensure the safety of this approach. The following is just my personal understanding:

(1) If we consider the NIC as another GPU device, the threadfence here is indeed not strong enough to guarantee that the written WQE is visible to other GPUs. However, I think there might be special mechanisms when the NIC reads WQEs, such as always bypassing the cache, in which case threadfence would be sufficient.

(2) I think you are right, atomic op is not necessary, we can have a try.

Thunderbrook · 2025-04-08T07:16:27Z

get it, very thanks for reply~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some questions in ibgda #113

some questions in ibgda #113

Thunderbrook commented Apr 8, 2025 •

edited

Loading

sphish commented Apr 8, 2025

Thunderbrook commented Apr 8, 2025

some questions in ibgda #113

some questions in ibgda #113

Comments

Thunderbrook commented Apr 8, 2025 • edited Loading

sphish commented Apr 8, 2025

Thunderbrook commented Apr 8, 2025

Thunderbrook commented Apr 8, 2025 •

edited

Loading