R
RunPod12mo ago
ricopella

Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!

I've been utilizing RunPod endpoints for the past month or so, no issues everythings been working wonderfully. Past week, a handful of my jobs have been failing. I'm not entirely sure why. I have not made any code changes and not changed the docker image that is from my template. I do notice that it seems to be waiting for GPU's to be available. but not sure why, when it finds them this error throws. Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select
No description
44 Replies
ashleyk
ashleyk12mo ago
This is a code issue, what is the endpoint you are using?
ricopella
ricopellaOP12mo ago
Like my actual endpoint? i've been calling /run to run my jobs and get back a job ID and then webhook back to my server once done @ashleyk
ashleyk
ashleyk12mo ago
Yes the actual endpoint. This makes no difference, there is something wrong with the code that your rundpoint is running. No good, I need to know the code you are running.
ricopella
ricopellaOP12mo ago
Oh, sorry I'm not sure what you're looking for. the docker image tag? I think when I had set up my endpoint and got everything working it was around this tag version of yours ashleykza/runpod-worker-a1111:2.2.1
ashleyk
ashleyk12mo ago
Yeah, did you build your own image or use someone else's image?
ricopella
ricopellaOP12mo ago
Then all i've done is add loras and models to my network volume. eveyrthing was working great I used yours but built mine on digital ocean, tagged it, and have just been using that one
ashleyk
ashleyk12mo ago
I suggest updating to ashleykza/runpod-worker-a1111:2.3.4.
ricopella
ricopellaOP12mo ago
ok so just change my template to that?
ashleyk
ashleyk12mo ago
When did you build your own image?
ricopella
ricopellaOP12mo ago
Nov 21, 2023 at 10:32 pm
ashleyk
ashleyk12mo ago
Sorry I am not reading the messages fast enough, I thought you were on ashleykza/runpod-worker-a1111:2.2.1, so if you built it around that time, probably a good idea to pull the changes and build it again to get the updates.
ricopella
ricopellaOP12mo ago
No need to appologize, you've been very helpful for me and I appreciate your guidance. Do i just use your tag in my template, or do i go back through the process of deploying to digital ocean, tag to my docker hub and use that?
ashleyk
ashleyk12mo ago
There isn't really a need to build your own image unless you customized something such as adding an additional extension or something like that, otherwise you can just use mine.
ricopella
ricopellaOP12mo ago
ok, no i didnt make any changes. Only changes I've made have been to my network volume I will try the 2.3.4 tag of yours Seems to be working much better! Do your tagged version ever get removed? Also, do you have a release log or somewhere that you update to keep up with changes?
ashleyk
ashleyk12mo ago
Only inactive versions that haven't been used in a month or more, not active ones that are being pulled by workers. The release log is here: https://github.com/ashleykleynhans/runpod-worker-a1111/releases
GitHub
Releases · ashleykleynhans/runpod-worker-a1111
RunPod Serverless Worker for the Automatic1111 Stable Diffusion API - ashleykleynhans/runpod-worker-a1111
Irina
Irina8mo ago
Hello! I'm making a Serverless NLLB Endpoint on RunPod. I have built and pulled a docker image that works perfectly well locally, but when I deploy it on RunPod, it doesn't work with the same error (Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select) Does anybody have some ideas about what's going wrong?
Madiator2011
Madiator20118mo ago
usually I see this error when checkpoint file is being corupted
Irina
Irina8mo ago
Sorry, but I don't have any checkpoint file in my project... or I don't know about it 🙂 I'm a newbie in this field..
digigoblin
digigoblin8mo ago
What is NLLB? Doesn't it use some kind of model?
Irina
Irina8mo ago
Yes, I use facebook/nllb-200-distilled-600M from huggingface
digigoblin
digigoblin8mo ago
This is what @Papa Madiator means by "checkpoint"
Irina
Irina8mo ago
Ok, then what's the difference between local and serverless(in RunPod) model running? It works locally..
Madiator2011
Madiator20118mo ago
up cause on RunPod you use RunPod gpu's instead of local one
Irina
Irina8mo ago
I understand this, of cause 🙂 But what does this change? I have device choice in code so gpu is used, when it possible.
nerdylive
nerdylive8mo ago
What do you mean "by what does this change"?
Irina
Irina8mo ago
​I mean, what's the difference if I run cpu or gpu on totally the same code? Why is it OK with cpu, but it isn't OK with gpu? Sorry, if I'm writing something strange 🙂
nerdylive
nerdylive8mo ago
It's fine just needed to clear things out so maybe I can give you a better answer Hmm What's not working with gpu ? Some code are specifically designed to be run on cpu, or gpu, or both. But they need to get access to the hardware's program Maybe if it works with cpu and not gpu it doesn't have support yet
Irina
Irina8mo ago
It's working in other environment (Beam, if you ever heart of it), so the problem that it doesn't work in RunPod...This conversation takes me some interesting debugging ideas, so I'll try and will return :))
nerdylive
nerdylive8mo ago
Whats working in beam srry? i dont know what type of software, model are you running
Irina
Irina8mo ago
I'm simply running facebook/nllb-200-distilled-600M model from Huggingface. That's all)
nerdylive
nerdylive8mo ago
Hmm okay then whats the error on runpod?
digigoblin
digigoblin8mo ago
Running it how? oobabooga or something else?
Irina
Irina8mo ago
It's in the name of the thread: "Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!" Serverless running
nerdylive
nerdylive8mo ago
using hf transformers? which library did you use to run it? might be a cuda driver version or template problem. which docker image are you using? This error is caused by the model and the inputs not being moved to the same device before calling the model. To resolve, move both the model and inputs to the same device before running the model:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
model.to(device)
inputs = inputs.to(device)
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
model.to(device)
inputs = inputs.to(device)
found this on stackoverflow
Irina
Irina8mo ago
I'm making docker image by myself and I check the device (DEVICE:str = 'cuda' if torch.cuda.is_available() else 'cpu')
nerdylive
nerdylive8mo ago
whats your code to run it? ( like in the handler ) that makes that error
Irina
Irina8mo ago
What makes the error is my question 🙂 I have just found out that the problem in function, that directly uses the library transformers. I can write its code, if it's ok here)
nerdylive
nerdylive8mo ago
well maybe your code the library you use mostly yeah that what i asked sure
Irina
Irina8mo ago
here this function def translate(self, collection, tgt_lang): texts = [t.strip() for t in collection] batch_count = len(texts) / BATCH_SIZE if batch_count < 1: batch_count = 1 texts_split = np.split(np.array(texts), batch_count) result = [] with torch.inference_mode(): for batch in texts_split: encoded = self.tokenizer.batch_encode_plus(list(batch), max_length=self.max_length, truncation=True, return_tensors="pt", padding=True) for key in encoded.keys(): encoded[key] = encoded[key].to(self.device) translated = self.model.generate(**encoded, max_length=self.max_length, forced_bos_token_id=self.tokenizer.lang_code_to_id[tgt_lang]) decoded = self.tokenizer.batch_decode(translated, skip_special_tokens=True) result.extend(decoded) return [tr if len(tx)>0 else tx for tr, tx in zip(result, texts)] This function is a method of my class model
nerdylive
nerdylive8mo ago
What is torch.inference_mode(): for btw try using .to(self.device) on your self.model or the self.tokenizer
Irina
Irina8mo ago
https://pytorch.org/cppdocs/notes/inference_mode.html about inference mode you can read here 🙂 it's complicated to say in two words) I'll try! Thanks!
nerdylive
nerdylive8mo ago
right thanks
Irina
Irina8mo ago
I have changed the string ... translated = self.model.to(self.device).generate( ... ... added 'to(self.device)', as you advised me. All other strings are without .to(self.device)  And it's working now!!! ☺️ Great thanks!! and if I write encoded[key] = encoded[key] without to(self.device), it's working also! (that is logical)
nerdylive
nerdylive8mo ago
Uyep Hope this device points to the gpu If not then it will use cpu 😂
Want results from more Discord servers?
Add your server