ERR_NVGPUCTRPERM when profiling CUDA kernels
I'm trying to profile CUDA kernels with NCU and I encountered this error due to a said lack of permission :
"ERR_NVGPUCTRPERM - The user does not have permission to access NVIDIA GPU Performance Counters on the target device 0. For instructions on enabling permissions and to get more information see https://developer.nvidia.com/ERR_NVGPUCTRPERM"
on the linked website, it is said that when profiling kernels on containers (which is the case here with pods right?), one has to launch the container with --cap-add=SYS_ADMIN but I'm not sure this is possible with Runpod pods.
Have you find a workaround ? Surely there is a way to profile kernels on container GPUs ?
Thank you
3 Replies
yeah i don't think previlleged containers for gpu isn't possible for runpod
but why do you want to profile it tho
profiling is a good way to know how to improve your kernel ? like where are your stalls etc
i see
maybe try contacting support if you want
@yhlong00000