You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I am getting a CUDA error: device-side assert triggered on the global_mean_pool to the extent that I cannot:
print the variable
detach and save it as a a tensor to see
put in a try catch and just ignore the batch, the whole program crashes
this happens for >15k data and after running for a good 3-4 epochs
there does not seem to be a nan in the dataset
the error goes on to subsequent batch and i cannot pass on to it
The followup error to this is: failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [51,0,0], thread: [45,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [51,0,0], thread: [46,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [51,0,0], thread: [47,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [51,0,0], thread: [48,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [51,0,0], thread: [49,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [51,0,0], thread: [50,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [51,0,0], thread: [51,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [51,0,0], thread: [52,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [51,0,0], thread: [53,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [51,0,0], thread: [54,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [51,0,0], thread: [55,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [51,0,0], thread: [56,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [14,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [15,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [16,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [17,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [18,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [19,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [20,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [21,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [22,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [23,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [24,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [25,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [26,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\nTraceback (most recent call last):\n File "/package/molclass-0.1.1.dev618/molclass/cubes/scripts/script_helper.py", line 303, in forward\n apool = global_mean_pool(ah, abinput.x_s_batch)\n File "/home/floeuser/miniconda/envs/user_env/lib/python3.9/site-packages/torch_geometric/nn/pool/glob.py", line 63, in global_mean_pool\n return scatter(x, batch, dim=dim, dim_size=size, reduce=\'mean\')\n File "/home/floeuser/miniconda/envs/user_env/lib/python3.9/site-packages/torch_geometric/utils/_scatter.py", line 53, in scatter\n dim_size = int(index.max()) + 1 if index.numel() > 0 else 0\nRuntimeError: CUDA error: device-side assert triggered\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nCompile withTORCH_USE_CUDA_DSAto enable device-side assertions.\n\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/package/molclass-0.1.1.dev618/molclass/cubes/scripts/script_helper.py", line 313, in forward\n tt = torch.max(abinput.x_s_batch).to(device=\'cpu\')\nRuntimeError: CUDA error: device-side assert triggered\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nCompile withTORCH_USE_CUDA_DSAto enable device-side assertions.\n\nTraceback (most recent call last):\n File "/package/molclass-0.1.1.dev618/molclass/cubes/scripts/multi_gpu_train_regressor_module_script.py", line 109, in batch_train\n batch = batch.to(device)\n File "/home/floeuser/miniconda/envs/user_env/lib/python3.9/site-packages/torch_geometric/data/data.py", line 360, in to\n return self.apply(\n File "/home/floeuser/miniconda/envs/user_env/lib/python3.9/site-packages/torch_geometric/data/data.py", line 340, in apply\n store.apply(func, *args)\n File "/home/floeuser/miniconda/envs/user_env/lib/python3.9/site-packages/torch_geometric/data/storage.py", line 201, in apply\n self[key] = recursive_apply(value, func)\n File "/home/floeuser/miniconda/envs/user_env/lib/python3.9/site-packages/torch_geometric/data/storage.py", line 895, in recursive_apply\n return func(data)\n File "/home/floeuser/miniconda/envs/user_env/lib/python3.9/site-packages/torch_geometric/data/data.py", line 361, in <lambda>\n lambda x: x.to(device=device, non_blocking=non_blocking), *args)\nRuntimeError: CUDA error: device-side assert triggered\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing
I checked the bounds of the batch and that seems to be within bound so am not sure why it gets the index out of bound.
Also note, this DOES NOT happen if I do not pass a DistSampler or turn shuffle off for the data. here is how i do that bit
ok I think i may have found the error. CUDA_LAUNCH_BLOCKING=1 put the error on the global_mean_pool but not more insightful.
the tensors which went into the pooling function were shaped (n,1) and n.. for any pytorch function (n,1) is reshaped to n or it rasies an error. but with global_mean_pool, it ran fine with the (n,1) when pytorch sampler (shuffle) was off, but threw cuda error with shuffle on.
imo, it should either say shape mismatch and fail or not fail at all. i wasnt expecting a cuda error for this.
i did a torch.squeeze() reshaping both my arrays to n and that seems to have fixed the issue.
would be great if you guys can add a safeguard for this function for future
🐛 Describe the bug
Hello,
I am getting a CUDA error: device-side assert triggered on the global_mean_pool to the extent that I cannot:
The followup error to this is:
failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [51,0,0], thread: [45,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [51,0,0], thread: [46,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [51,0,0], thread: [47,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [51,0,0], thread: [48,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [51,0,0], thread: [49,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [51,0,0], thread: [50,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [51,0,0], thread: [51,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [51,0,0], thread: [52,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [51,0,0], thread: [53,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [51,0,0], thread: [54,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [51,0,0], thread: [55,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [51,0,0], thread: [56,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [14,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [15,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [16,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [17,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [18,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [19,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [20,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [21,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [22,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [23,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [24,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [25,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\n../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [50,0,0], thread: [26,0,0] Assertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed.\nTraceback (most recent call last):\n File "/package/molclass-0.1.1.dev618/molclass/cubes/scripts/script_helper.py", line 303, in forward\n apool = global_mean_pool(ah, abinput.x_s_batch)\n File "/home/floeuser/miniconda/envs/user_env/lib/python3.9/site-packages/torch_geometric/nn/pool/glob.py", line 63, in global_mean_pool\n return scatter(x, batch, dim=dim, dim_size=size, reduce=\'mean\')\n File "/home/floeuser/miniconda/envs/user_env/lib/python3.9/site-packages/torch_geometric/utils/_scatter.py", line 53, in scatter\n dim_size = int(index.max()) + 1 if index.numel() > 0 else 0\nRuntimeError: CUDA error: device-side assert triggered\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nCompile with
TORCH_USE_CUDA_DSAto enable device-side assertions.\n\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/package/molclass-0.1.1.dev618/molclass/cubes/scripts/script_helper.py", line 313, in forward\n tt = torch.max(abinput.x_s_batch).to(device=\'cpu\')\nRuntimeError: CUDA error: device-side assert triggered\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nCompile with
TORCH_USE_CUDA_DSAto enable device-side assertions.\n\nTraceback (most recent call last):\n File "/package/molclass-0.1.1.dev618/molclass/cubes/scripts/multi_gpu_train_regressor_module_script.py", line 109, in batch_train\n batch = batch.to(device)\n File "/home/floeuser/miniconda/envs/user_env/lib/python3.9/site-packages/torch_geometric/data/data.py", line 360, in to\n return self.apply(\n File "/home/floeuser/miniconda/envs/user_env/lib/python3.9/site-packages/torch_geometric/data/data.py", line 340, in apply\n store.apply(func, *args)\n File "/home/floeuser/miniconda/envs/user_env/lib/python3.9/site-packages/torch_geometric/data/storage.py", line 201, in apply\n self[key] = recursive_apply(value, func)\n File "/home/floeuser/miniconda/envs/user_env/lib/python3.9/site-packages/torch_geometric/data/storage.py", line 895, in recursive_apply\n return func(data)\n File "/home/floeuser/miniconda/envs/user_env/lib/python3.9/site-packages/torch_geometric/data/data.py", line 361, in <lambda>\n lambda x: x.to(device=device, non_blocking=non_blocking), *args)\nRuntimeError: CUDA error: device-side assert triggered\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing
I checked the bounds of the batch and that seems to be within bound so am not sure why it gets the index out of bound.
Also note, this DOES NOT happen if I do not pass a DistSampler or turn shuffle off for the data. here is how i do that bit
I can compile a working data but that is difficult, i was wondering if you guys have noticed this bug, specially on larger (>10k) data.
Versions
NA
The text was updated successfully, but these errors were encountered: