Since I don't know how computers work... I'm always running into issues and for some reason I tend to never read the errors or logs.
NOT TODAY!
I just thought about it for a few moments, but I was trying to figure out how to get ncu
to run (Nsight Compute CLI)
ncu --set full -o output python3 prof.py
I already have ncu installed, but it spits out an error saying something like this
==ERROR== ERR_NVGPUCTRPERM - The user does not have permission to access NVIDIA GPU Performance Counters on the target device 0. For instructions
on enabling permissions and to get more information see https://developer.nvidia.com/ERR_NVGPUCTRPERM
ok... so then I prepend sudo
and then...
sudo: ncu: command not found
(???) well the reason is actually quite obvious when I think about it because I have all the cuda paths defined on my non-root user .bashrc
so of course it can't find it.
Instead of relying on the $PATH
variable we should just swap it out with the absolute path.
$(which ncu) --set full -o output python3 prof.py
Now run it again and I get
ModuleNotFoundError: No module named 'torch'
==ERROR== The application returned an error code (1).
Ok now that I'm using my brain again, it's obvious what we need to do. I'm using venv
, but as I realize now... sudo
is running things as the root
user and resolving the path differently. Just do the same thing as before, but for python
$(which ncu) --set full -o output $(which python3) prof.py
life is good.
NOTE:
The alternatives suggest doing something more heavy-handed like:
modprobe nvidia NVreg_RestrictProfilingToAdminUsers=0
Or even writing a .conf
in /etc/modprobe.d/
to persist this option... IMO you shouldn't do too much if you are just profiling on your personal machines.
saved my life