R
RunPod•5mo ago
Encyrption

Monitor GPU VRAM - Which GPU to check?

I am trying to monitor the GPU VRAM usage in serverless worker. To do this with pynvml I need to provide the index of the GPU. Is there a way I can obtain the index of the GPU my worker is using? I did not see this info in the ENV variables. I do see RUNPOD_GPU_COUNT but not sure if that helps. Seems that RunPod is monitoring cpu, gpu stats as they present that information in their web interface. Does the RunPod python module expose those stats, without having to code our own? Below is a code snippet that reports VRAM usage in a %.
import pynvml
import time

# Initialize NVML
pynvml.nvmlInit()

handle = pynvml.nvmlDeviceGetHandleByIndex(0) # Assuming you have only one GPU

while True:
# Get the memory information for the GPU
memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle)

used_vram = memory_info.used // (1024 ** 2) # Convert bytes to MB
total_vram = memory_info.total // (1024 ** 2) # Convert bytes to MB
vram_usage_percentage = round((used_vram / total_vram) * 100)

print(f'vram usage: {vram_usage_percentage}%')

time.sleep(5)
import pynvml
import time

# Initialize NVML
pynvml.nvmlInit()

handle = pynvml.nvmlDeviceGetHandleByIndex(0) # Assuming you have only one GPU

while True:
# Get the memory information for the GPU
memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle)

used_vram = memory_info.used // (1024 ** 2) # Convert bytes to MB
total_vram = memory_info.total // (1024 ** 2) # Convert bytes to MB
vram_usage_percentage = round((used_vram / total_vram) * 100)

print(f'vram usage: {vram_usage_percentage}%')

time.sleep(5)
Thanks! 🙂
16 Replies
Encyrption
EncyrptionOP•5mo ago
Maybe I could use GraphQL with PodTelemetry? Where's my GraphQL experts at? 😉
nerdylive
nerdylive•5mo ago
I've never used graphql before, is the index not starting from 0? Im not clear yet, what kind of index are you looking for?
Encyrption
EncyrptionOP•5mo ago
If I assume that my worker is using gpu at index 0. If there are multiple GPU in the server that might not be accurate. I might be on GPU 3 and another worker using GPU 0. I am pretty sure I can get that info with GraphQL. I should be able to query by pod ID and it has PodTelemetry in the return, which contains cpu and gpu stats. I'm just struggling with the documentation for it.
nerdylive
nerdylive•5mo ago
Oh can you figure out whats the index sorted from? like whats sorting the index https://graphql-spec.runpod.io/#definition-PodTelemetry
Encyrption
EncyrptionOP•5mo ago
Yeah, I've seen that. I'm still looking for a good example of making a graphql request.
nerdylive
nerdylive•5mo ago
query pod($input: PodFilter) {
pod(input: $input) {
latestTelemetry {
state,
time,
memoryUtilization
averageGpuMetrics {
id,
powerWatts,
memoryUtilization,
percentUtilization
}
}
query pod($input: PodFilter) {
pod(input: $input) {
latestTelemetry {
state,
time,
memoryUtilization
averageGpuMetrics {
id,
powerWatts,
memoryUtilization,
percentUtilization
}
}
srry bad formatting use your own input
Encyrption
EncyrptionOP•5mo ago
I would need to provide the pod id
nerdylive
nerdylive•5mo ago
yes correct
Encyrption
EncyrptionOP•5mo ago
So what do I do? add podId: ${pod_id} to inupt?
nerdylive
nerdylive•5mo ago
{"input": {"podId": "MYPODID"}}
"runtime": {
"uptimeInSeconds": 135,
"gpus": [
{
"id": "GPU-26e2eb9c-c0f5-9870-687c-28cdec1a68ea",
"gpuUtilPercent": 0,
"memoryUtilPercent": 0
}
]
},
"latestTelemetry": {
"individualGpuMetrics": [
{
"id": "GPU-26e2eb9c-c0f5-9870-687c-28cdec1a68ea",
"temperatureCelcius": 33,
"percentUtilization": 0,
"memoryUtilization": 0,
"powerWatts": 74
}
],
"runtime": {
"uptimeInSeconds": 135,
"gpus": [
{
"id": "GPU-26e2eb9c-c0f5-9870-687c-28cdec1a68ea",
"gpuUtilPercent": 0,
"memoryUtilPercent": 0
}
]
},
"latestTelemetry": {
"individualGpuMetrics": [
{
"id": "GPU-26e2eb9c-c0f5-9870-687c-28cdec1a68ea",
"temperatureCelcius": 33,
"percentUtilization": 0,
"memoryUtilization": 0,
"powerWatts": 74
}
],
it'll be something like this MAybe u should use
latestTelemetry {
individualGpuMetrics {
latestTelemetry {
individualGpuMetrics {
Encyrption
EncyrptionOP•5mo ago
That's great, thanks! I was going to send that data over the web socket but this is much better. I can just have the browser call this once a second and update CPU/GPU graph. 🙂
nerdylive
nerdylive•5mo ago
nice hahah oh wait what you're building ? cpu graph 🤔
Encyrption
EncyrptionOP•5mo ago
Yeah, I think It is really coming along. Everything works just need to update the CPU/GPU graph and display the result media.
No description
nerdylive
nerdylive•5mo ago
wew a tooncrafter app cool
Encyrption
EncyrptionOP•5mo ago
ToonCrafter is just one in the market... I will likely try and add a lot of models before going live. My code builds the interface dynamically so should be able to add them pretty fast.
Encyrption
EncyrptionOP•5mo ago
No description

Did you find this page helpful?