Serverless multi gpu
I have a model deployed on 2 48 GB GPUs and 1 worker. It ran correctly for the first time with cuda distributed. But then fails with this "error_message": "Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument tensors in method wrapper_CUDA_cat)",\n "error_traceback": "Traceback (most recent call last):\n File \"/usr/local/lib/python3.10/dist-packages/runpod/serverless/modules/rp_job.py\\
What can be the issue here?
9 Replies
Update : If I stop for some large amount of time and then send a request then it is working. I think it is working every time after some refresh. Please help.
What model? What are you running on serverless?
Impossible to help without full information.
so this is my code, where I am trying to run a chat model, get_chat_response is the handler
I am facing similar issue !
I don't know if I should make any changes to runpod source code for multi-gpu?
You usually need to set
CUDA_VISIBLE_DEVICES
to use more than one GPU or configure your code to do so, it doesn't happen magically by itself.oh you mean adding devices in the dockerfile while creating the container?
No, that won't work
Then you mean exporting the variable before running the code? But I don't seem to understand why does it work correctly for the first time the worker is spawned