RunPod•16mo ago

RuntimeError: The NVIDIA driver on your system is too old (found version 11080). Please update your

I deploy a new version today but keep running into this error. Did something changed on RunPod? Thanks!

23 Replies

J.•16mo ago

https://discord.com/channels/912829806415085598/1023588055174611027/1173632165423104051 Maybe filter by the CUDA version? if you are expecting a 12.0+ verison of Cuda? is my guess?

ashleyk•16mo ago

You can't filter by CUDA version in serverless, only GPU cloud. Will be awesome to get all machines onto the latest CUDA version though.

J.•16mo ago

😮

sssstevenOP•16mo ago

when that happens, the workers will stuck at running state and costing money 😦 since that's part of the caching code before the handler is called. Is there any improvement coming?

ashleyk•16mo ago

@flash-singh ?

J.•16mo ago

@Alpay Ariyak : I saw you say: https://discord.com/channels/912829806415085598/1194109966349500498/1194731299898933348 Do you have any ideas as to what to do when you need to a certain cuda for serverless, but you get handed a lower version of cuda to then lead to a crash?

sssstevenOP•16mo ago

It would be great to filter out the old cuda versions for serverless. However, I still think there should be a timeout on setting up the worker(the max time allowed before the handler is called)

Alpay Ariyak•16mo ago

You got a worker with a CUDA version lower than 11.8?

J.•16mo ago

I think @ssssteven got a worker with 11.8, but Im guessing he needs a worker with 12.0+, and it caused a crashed causing the worker to hang + just paying for hang time

Alpay Ariyak•16mo ago

I see, the feature to specify worker cuda version is in the works to my knowledge, but not currently out, so the easiest route would be to try to make everything work with 11.8, as both workers with 11.8 and 12.0+ should be compatible that way

ashleyk•16mo ago

Not possible with things like Oobabooga, and latest xformers requires CUDA 12 as well, so would be better if all machines are on CUDA 12 which has been out for several months already

J.•16mo ago

is v12 and v11 breaking for cuda? just wondering never tried Or is v12 always backwards compatable

ashleyk•16mo ago

12 is backwards compatible

J.•16mo ago

Intresting.. i guess the answer.. till cuda filtering for serverless is out is 11.8... 😦

ashleyk•16mo ago

Not really an acceptable answer/solution since you can't use Torch 2.1.2 with xformers 0.0.23.post1 on CUDA lower than 12

sssstevenOP•16mo ago

can we at least implement the timeout?

J.•16mo ago

@flash-singh / @Alpay Ariyak Yeah. I do think that you guys need to catch the handler.py at the very least to refresh the worker > or kill it if it fails to initialize before the handler.py

flash-singh•16mo ago

whats the worker id?

sssstevenOP•16mo ago

@flash-singh I can't find it anymore. It's not in the logs. The endpoint is d7n1ceeuq4swlp and it happened few mins before I post this question. oops.. it just happened again A100 80GB - iuot3yjoez7 bjo

flash-singh•16mo ago

@JM @Justin can we track this down

Justin Merrell•16mo ago

Tracking down hosts with outdated cuda?

flash-singh•16mo ago

yep

sssstevenOP•16mo ago

thank you

Gaming

Programming

RuntimeError: The NVIDIA driver on your system is too old (found version 11080). Please update your

Did you find this page helpful?