Monitor GPU VRAM - Which GPU to check?
I am trying to monitor the GPU VRAM usage in serverless worker. To do this with pynvml I need to provide the index of the GPU. Is there a way I can obtain the index of the GPU my worker is using? I did not see this info in the ENV variables. I do see RUNPOD_GPU_COUNT but not sure if that helps.
Seems that RunPod is monitoring cpu, gpu stats as they present that information in their web interface. Does the RunPod python module expose those stats, without having to code our own?
Below is a code snippet that reports VRAM usage in a %.
Thanks! 🙂
16 Replies
Maybe I could use GraphQL with PodTelemetry? Where's my GraphQL experts at? 😉
I've never used graphql before, is the index not starting from 0?
Im not clear yet, what kind of index are you looking for?
If I assume that my worker is using gpu at index 0. If there are multiple GPU in the server that might not be accurate. I might be on GPU 3 and another worker using GPU 0. I am pretty sure I can get that info with GraphQL. I should be able to query by pod ID and it has PodTelemetry in the return, which contains cpu and gpu stats. I'm just struggling with the documentation for it.
Oh can you figure out whats the index sorted from?
like whats sorting the index
https://graphql-spec.runpod.io/#definition-PodTelemetry
Yeah, I've seen that. I'm still looking for a good example of making a graphql request.
srry bad formatting
use your own input
I would need to provide the pod id
yes
correct
So what do I do? add podId: ${pod_id} to inupt?
{"input": {"podId": "MYPODID"}}
it'll be something like this
MAybe u should use
That's great, thanks!
I was going to send that data over the web socket but this is much better. I can just have the browser call this once a second and update CPU/GPU graph. 🙂
nice hahah
oh wait what you're building ?
cpu graph 🤔
Yeah, I think It is really coming along. Everything works just need to update the CPU/GPU graph and display the result media.
wew a tooncrafter app
cool
ToonCrafter is just one in the market... I will likely try and add a lot of models before going live. My code builds the interface dynamically so should be able to add them pretty fast.