You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I have some questions about the following code in ibgda
voidibgda_submit_requests(nvshmemi_ibgda_device_qp_t *qp, uint64_t base_wqe_idx,
uint32_t num_wqes, int message_idx = 0) {
nvshmemi_ibgda_device_qp_management_t *mvars = &qp->mvars;
uint64_t new_wqe_idx = base_wqe_idx + num_wqes;
// WQE writes must be finished first__threadfence(); // (1)// Wait for prior WQE slots to be filled firstauto *ready_idx = reinterpret_cast<unsignedlonglongint*>(&mvars->tx_wq.ready_head);
while (atomicCAS(ready_idx, base_wqe_idx, new_wqe_idx) != base_wqe_idx); // (2)// Always post, not in batchconstexprintkNumRequestInBatch = 4;
if (kAlwaysDoPostSendor (message_idx + 1) % kNumRequestInBatch == 0)
ibgda_post_send(qp, new_wqe_idx);
}
(1) I personally understand that the purpose of threadfence here is to ensure that writing to the WQE and writing to the DB do not occur out of order. From the view of the NIC, should threadfence_system be used instead?
(2) I personally understand that all threads executing atomicCAS have different compare/swap values, so is it necessary to use "atomic" operations in this case?
Thanks
The text was updated successfully, but these errors were encountered:
The ibgda_submit_requests code here is simplified from NVSHMEM, and I'm also somewhat unclear about the synchronization semantics needed when GPUs and NICs interact with each other. However, NVSHMEM implements it this way, and I believe that as an internal NVIDIA team, they have more insight into these details and can ensure the safety of this approach. The following is just my personal understanding:
(1) If we consider the NIC as another GPU device, the threadfence here is indeed not strong enough to guarantee that the written WQE is visible to other GPUs. However, I think there might be special mechanisms when the NIC reads WQEs, such as always bypassing the cache, in which case threadfence would be sufficient.
(2) I think you are right, atomic op is not necessary, we can have a try.
Hi, I have some questions about the following code in ibgda
(1) I personally understand that the purpose of threadfence here is to ensure that writing to the WQE and writing to the DB do not occur out of order. From the view of the NIC, should threadfence_system be used instead?
(2) I personally understand that all threads executing atomicCAS have different compare/swap values, so is it necessary to use "atomic" operations in this case?
Thanks
The text was updated successfully, but these errors were encountered: