You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, the following code repeatedly creates and del sm or proxy channels. The VRAM% from rocm-smi keeps increasing until reaching 25%, at which point it triggers RuntimeError: Call to cudaIpcGetMemHandle(&handle, baseDataPtr) failed. mscclpp/src/registered_memory.cc:102 (Cuda failure: invalid argument). Both sm and proxy channels have the problem, and the problem only appears on rocm. NVIDIA GPUs are fine.
Hi, the following code repeatedly creates and
del
sm or proxy channels. The VRAM% from rocm-smi keeps increasing until reaching 25%, at which point it triggersRuntimeError: Call to cudaIpcGetMemHandle(&handle, baseDataPtr) failed. mscclpp/src/registered_memory.cc:102 (Cuda failure: invalid argument)
. Both sm and proxy channels have the problem, and the problem only appears on rocm. NVIDIA GPUs are fine.The text was updated successfully, but these errors were encountered: