You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
LAUNCH INFO 2025-04-24 13:14:14,333 ------------------------- ERROR LOG DETAIL -------------------------
nfo.func(*info.args, **(info.kwargs or {}))
File "/usr/local/lib/python3.9/dist-packages/numba/cuda/cudadrv/driver.py", line 1698, in core
dealloc.add_item(module_unload, handle)
File "/usr/local/lib/python3.9/dist-packages/numba/cuda/cudadrv/driver.py", line 1180, in add_item
self.clear()
File "/usr/local/lib/python3.9/dist-packages/numba/cuda/cudadrv/driver.py", line 1191, in clear
dtor(handle)
File "/usr/local/lib/python3.9/dist-packages/numba/cuda/cudadrv/driver.py", line 327, in safe_cuda_api_call
self._check_ctypes_error(fname, retcode)
File "/usr/local/lib/python3.9/dist-packages/numba/cuda/cudadrv/driver.py", line 395, in _check_ctypes_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [719] Call to cuStreamDestroy results in CUDA_ERROR_LAUNCH_FAILED
[2025-04-24 13:14:00,143] [ WARNING] dataloader_iter.py:707 - DataLoader 5 workers exit unexpectedly, pids: 179206, 179222, 179238, 179288, 179336
I0424 13:14:03.881071 176447 process_group_nccl.cc:132] ProcessGroupNCCL destruct
terminate called after throwing an instance of 'phi::enforce::EnforceNotMet'
what(): (External) CUDA error(719), unspecified launch failure.
[Hint: 'cudaErrorLaunchFailure'. An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointerand accessing out of bounds shared memory. Less common cases can be system specific - more information about these cases canbe found in the system specific user guide. This leaves the process in an inconsistent state and any further CUDA work willreturn the same error. To continue using CUDA, the process must be terminated and relaunched.] (at /data/Eager/Paddle3/paddle/phi/backends/gpu/cuda/cuda_info.cc:296)
FatalError: Process abort signal is detected by the operating system.
[TimeInfo: *** Aborted at 1745471643 (unix time) try "date -d @1745471643" if you are using GNU date ***]
[SignalInfo: *** SIGABRT (@0x2b13f) received by PID 176447 (TID 0x7fd85a095c00) from PID 176447 ***]
The text was updated successfully, but these errors were encountered:
请提出你的问题 Please ask your question
报这个错,但是找不到错误的发生在哪里,不知道怎么查,paddle版本2.6.2
LAUNCH INFO 2025-04-24 13:14:14,333 ------------------------- ERROR LOG DETAIL -------------------------
nfo.func(*info.args, **(info.kwargs or {}))
File "/usr/local/lib/python3.9/dist-packages/numba/cuda/cudadrv/driver.py", line 1698, in core
dealloc.add_item(module_unload, handle)
File "/usr/local/lib/python3.9/dist-packages/numba/cuda/cudadrv/driver.py", line 1180, in add_item
self.clear()
File "/usr/local/lib/python3.9/dist-packages/numba/cuda/cudadrv/driver.py", line 1191, in clear
dtor(handle)
File "/usr/local/lib/python3.9/dist-packages/numba/cuda/cudadrv/driver.py", line 327, in safe_cuda_api_call
self._check_ctypes_error(fname, retcode)
File "/usr/local/lib/python3.9/dist-packages/numba/cuda/cudadrv/driver.py", line 395, in _check_ctypes_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [719] Call to cuStreamDestroy results in CUDA_ERROR_LAUNCH_FAILED
[2025-04-24 13:14:00,143] [ WARNING] dataloader_iter.py:707 - DataLoader 5 workers exit unexpectedly, pids: 179206, 179222, 179238, 179288, 179336
I0424 13:14:03.881071 176447 process_group_nccl.cc:132] ProcessGroupNCCL destruct
terminate called after throwing an instance of 'phi::enforce::EnforceNotMet'
what(): (External) CUDA error(719), unspecified launch failure.
[Hint: 'cudaErrorLaunchFailure'. An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointerand accessing out of bounds shared memory. Less common cases can be system specific - more information about these cases canbe found in the system specific user guide. This leaves the process in an inconsistent state and any further CUDA work willreturn the same error. To continue using CUDA, the process must be terminated and relaunched.] (at /data/Eager/Paddle3/paddle/phi/backends/gpu/cuda/cuda_info.cc:296)
C++ Traceback (most recent call last):
0 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()
1 paddle::distributed::ProcessGroupNCCL::
ProcessGroupNCCL()_Hashtable()2 std::_Hashtable<std::string, std::pair<std::string const, std::unique_ptr<phi::GPUContext, std::default_deletephi::GPUContext > >, std::allocator<std::pair<std::string const, std::unique_ptr<phi::GPUContext, std::default_deletephi::GPUContext > > >, std::__detail::_Select1st, std::equal_to<std::string >, std::hash<std::string >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::
3 phi::GPUContext::~GPUContext()
4 phi::GPUContext::~GPUContext()
5 phi::GPUContext::Impl::~Impl()
Error Message Summary:
FatalError:
Process abort signal
is detected by the operating system.[TimeInfo: *** Aborted at 1745471643 (unix time) try "date -d @1745471643" if you are using GNU date ***]
[SignalInfo: *** SIGABRT (@0x2b13f) received by PID 176447 (TID 0x7fd85a095c00) from PID 176447 ***]
The text was updated successfully, but these errors were encountered: