ERR_NVGPUCTRPERM when profiling CUDA kernels

I'm trying to profile CUDA kernels with NCU and I encountered this error due to a said lack of permission : "ERR_NVGPUCTRPERM - The user does not have permission to access NVIDIA GPU Performance Counters on the target device 0. For instructions on enabling permissions and to get more information see https://developer.nvidia.com/ERR_NVGPUCTRPERM" on the linked website, it is said that when profiling kernels on containers (which is the case here with pods right?), one has to launch the container with --cap-add=SYS_ADMIN but I'm not sure this is possible with Runpod pods. Have you find a workaround ? Surely there is a way to profile kernels on container GPUs ? Thank you
3 Replies
nerdylive
nerdylive3w ago
yeah i don't think previlleged containers for gpu isn't possible for runpod but why do you want to profile it tho
Alexandre TL
Alexandre TLOP3w ago
profiling is a good way to know how to improve your kernel ? like where are your stalls etc
nerdylive
nerdylive3w ago
i see maybe try contacting support if you want @yhlong00000
Want results from more Discord servers?
Add your server