R
RunPodβ€’5mo ago
BadNoise

Pipeline is not using gpu on serverless

Hi! I 'm running bart-large-mnli on serverless but as I can see from the worker stats it's not using the gpu, do you know what I'm doing wrong? The image is my current handler.py And as docker base I'm using "FROM runpod/base:0.6.2-cuda12.2.0", also tried with "runpod/pytorch:2.2.1-py3.10-cuda12.1.1-devel-ubuntu22.04" but still 0% usage of gpu. Let me know if you need more details! Thank you πŸ™‚
No description
57 Replies
digigoblin
digigoblinβ€’5mo ago
How are you running the model?
BadNoise
BadNoiseOPβ€’5mo ago
this is the docker, I'm building + push on my docker and running it from a 24gb gpu on serverless
No description
BadNoise
BadNoiseOPβ€’5mo ago
and this is the model downloader
No description
PatrickR
PatrickRβ€’5mo ago
I have a feeling this line:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
Is doing something funky. You should try doing a print right after that:
print(torch.cuda.is_available())
print(torch.cuda.device_count())
print(torch.cuda.memory_allocated())
print(torch.cuda.memory_reserved())
print(torch.cuda.is_available())
print(torch.cuda.device_count())
print(torch.cuda.memory_allocated())
print(torch.cuda.memory_reserved())
And see if your code thinks it is running on a CPU.
BadNoise
BadNoiseOPβ€’5mo ago
thank you! I'll try it immediately and let you know
BadNoise
BadNoiseOPβ€’5mo ago
@PatrickR this is the output
No description
BadNoise
BadNoiseOPβ€’5mo ago
I can give you the full repo if you need πŸ™‚
digigoblin
digigoblinβ€’5mo ago
Yep, will be useful for us to help you test it
PatrickR
PatrickRβ€’5mo ago
That would be useful yes! Would love to test out and see what is going on.
BadNoise
BadNoiseOPβ€’5mo ago
here it is! thank you so much for your help
PatrickR
PatrickRβ€’5mo ago
Risky click πŸ˜†
nerdylive
nerdyliveβ€’5mo ago
It's Just a zip right? 😊
BadNoise
BadNoiseOPβ€’5mo ago
if you'd prefer I can give you single files
BadNoise
BadNoiseOPβ€’5mo ago
this is the folder structure
No description
nerdylive
nerdyliveβ€’5mo ago
Hmm can you try like some codes to move the hf model that you use to the Cuda gpus Try searching for codes like that model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32") model.to(device)
digigoblin
digigoblinβ€’5mo ago
its already doing that
nerdylive
nerdyliveβ€’5mo ago
Oh How long does your process take? In serverless
BadNoise
BadNoiseOPβ€’5mo ago
with 5 concurrent requests ~5s per request
No description
nerdylive
nerdyliveβ€’5mo ago
If you try your pipeline on cpu does it have the same performance?
BadNoise
BadNoiseOPβ€’5mo ago
let me try again cause I don't remember πŸ˜… I'll launch the 32vcpu and let you know!
nerdylive
nerdyliveβ€’5mo ago
Sorry not quite following the thread from the start... But how did you know it wasn't using the gpu again? Right sure
BadNoise
BadNoiseOPβ€’5mo ago
sure no problem, I see 100% CPU usage and 0% for the GPU
nerdylive
nerdyliveβ€’5mo ago
Oh.. Because, sometimes I think the usage on the ui isn't that updated especially if your job only took couple of secs
BadNoise
BadNoiseOPβ€’5mo ago
thanks for the tip, but I'm performing stress tests sending constantly requests for 1 minutes on it to understand how many requests it can handle so it's always running
nerdylive
nerdyliveβ€’5mo ago
Ic
BadNoise
BadNoiseOPβ€’5mo ago
another strange thing is that on a cheap cpu on hugging face inference endpoint it performs faster than on a 24gb gpu on runpod (that's also why I think that is not using it) πŸ˜… always ~5 seconds with 5 concurrent requests on a 32 vcpu
nerdylive
nerdyliveβ€’5mo ago
Wow... In gpu it takes more? Hahahah if you got your code right, and you think it's a gpu problem feel free to report it in the site's contact button on the left menu thrn Btw @BadNoise have you tried this export CUDA_VISIBLE_DEVICES=0
BadNoise
BadNoiseOPβ€’5mo ago
@nerdylive tried now, still 100% CPU usage and 0% for the GPU 😦
Madiator2011
Madiator2011β€’5mo ago
I might look at it
BadNoise
BadNoiseOPβ€’5mo ago
thank you πŸ™‚
PatrickR
PatrickRβ€’5mo ago
Hey, so I went through this and I've this input:
{
"input": {
"sequence": "The weather is sunny today.",
"labels": ["weather", "sports", "news"]
}
}
{
"input": {
"sequence": "The weather is sunny today.",
"labels": ["weather", "sports", "news"]
}
}
and this output:
{
"id": "test-822c3793-23b3-4464-8b65-972bb5776867",
"status": "COMPLETED",
"output": {
"classification_result": {
"sequence": "The weather is sunny today.",
"labels": [
"weather",
"news",
"sports"
],
"scores": [
0.989009439945221,
0.24655567109584808,
0.008112689480185509
]
},
"device": "cuda"
}
}
{
"id": "test-822c3793-23b3-4464-8b65-972bb5776867",
"status": "COMPLETED",
"output": {
"classification_result": {
"sequence": "The weather is sunny today.",
"labels": [
"weather",
"news",
"sports"
],
"scores": [
0.989009439945221,
0.24655567109584808,
0.008112689480185509
]
},
"device": "cuda"
}
}
Here is my python code:
import torch
import runpod
from runpod.serverless.utils.rp_validator import validate
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

print(device)
INPUT_SCHEMA = {
'sequence': {
'type': str,
'required': True
},
'labels': {
'type': list,
'required': True,
}
}

def classify_text(sequence, labels):
model = AutoModelForSequenceClassification.from_pretrained(
"facebook/bart-large-mnli",
local_files_only=False # Change this to False to download if not available locally
).to(device)
tokenizer = AutoTokenizer.from_pretrained(
"facebook/bart-large-mnli", local_files_only=False) # Change this to False to download if not available locally

classifier = pipeline(
"zero-shot-classification",
model=model,
tokenizer=tokenizer,
device=0,
)

return classifier(sequence, labels, multi_label=True)

async def handler(job):
val_input = validate(job['input'], INPUT_SCHEMA)
if 'errors' in val_input:
return {"error": val_input['errors']}
val_input = val_input['validated_input']

classification_result = classify_text(val_input["sequence"], val_input["labels"])

return {
"classification_result": classification_result,
"device": str(device)
}

runpod.serverless.start({"handler": handler, "concurrency_modifier": lambda x: 1000})
import torch
import runpod
from runpod.serverless.utils.rp_validator import validate
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

print(device)
INPUT_SCHEMA = {
'sequence': {
'type': str,
'required': True
},
'labels': {
'type': list,
'required': True,
}
}

def classify_text(sequence, labels):
model = AutoModelForSequenceClassification.from_pretrained(
"facebook/bart-large-mnli",
local_files_only=False # Change this to False to download if not available locally
).to(device)
tokenizer = AutoTokenizer.from_pretrained(
"facebook/bart-large-mnli", local_files_only=False) # Change this to False to download if not available locally

classifier = pipeline(
"zero-shot-classification",
model=model,
tokenizer=tokenizer,
device=0,
)

return classifier(sequence, labels, multi_label=True)

async def handler(job):
val_input = validate(job['input'], INPUT_SCHEMA)
if 'errors' in val_input:
return {"error": val_input['errors']}
val_input = val_input['validated_input']

classification_result = classify_text(val_input["sequence"], val_input["labels"])

return {
"classification_result": classification_result,
"device": str(device)
}

runpod.serverless.start({"handler": handler, "concurrency_modifier": lambda x: 1000})
`
nerdylive
nerdyliveβ€’5mo ago
Did it work? used gpu?
PatrickR
PatrickRβ€’5mo ago
So I am getting the GPU to run through CUDA. Yes, output of the device is GPU. BTW I used the CLI tool runpodctl project create for faster itteration cycles/not having to rebuild docker constantly.
nerdylive
nerdyliveβ€’5mo ago
Hmm okay cool, whats the difference with badnoise's code?
PatrickR
PatrickRβ€’5mo ago
I rebuilt the new Docker image based off another image:
FROM runpod/base:0.6.1-cuda12.2.0


COPY builder/requirements.txt /requirements.txt
RUN python3.11 -m pip install --upgrade pip && \
python3.11 -m pip install --upgrade -r /requirements.txt --no-cache-dir && \
rm /requirements.txt

ADD . /

CMD python3.11 -u /src/handler.py
FROM runpod/base:0.6.1-cuda12.2.0


COPY builder/requirements.txt /requirements.txt
RUN python3.11 -m pip install --upgrade pip && \
python3.11 -m pip install --upgrade -r /requirements.txt --no-cache-dir && \
rm /requirements.txt

ADD . /

CMD python3.11 -u /src/handler.py
yhlong00000
yhlong00000β€’5mo ago
I think he trying to use the cache_model.py to cache the model locally when building the docker image. He set local_files_only=True, just to make sure it never download from internet.
nerdylive
nerdyliveβ€’5mo ago
yeah whats wrong with that?
yhlong00000
yhlong00000β€’5mo ago
i don't feel anything wrong with thatπŸ˜‚ , I am still wondering what Patrick changed make it works to start using the GPU.
nerdylive
nerdyliveβ€’5mo ago
ahh i thought you found it already hahah
PatrickR
PatrickRβ€’5mo ago
Sorry, my code was a little bit of a redherring. Here is a screenshot of it running on GPU though.
No description
nerdylive
nerdyliveβ€’5mo ago
I guess it can be a dependency issue ( torch ) thats causing it not to use the gpu
BadNoise
BadNoiseOPβ€’5mo ago
hi! thank you so much for your help, I will try with the suggested docker image πŸ™‚
yhlong00000
yhlong00000β€’5mo ago
I think this might be the root cause, in your requirements.txt, you have to set: torch==2.2.1
No description
Madiator2011
Madiator2011β€’5mo ago
Make sure to install cuda version not cpu
BadNoise
BadNoiseOPβ€’5mo ago
I'll try setting manually the torch version, because it's strange that I still see 0% of the GPU usage
No description
BadNoise
BadNoiseOPβ€’5mo ago
so I have to remove torch and use pytorch and pytorch-cuda=12.1 right?
digigoblin
digigoblinβ€’5mo ago
pip3 install --no-cache-dir torch==2.3.0+cu121 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 && \
pip3 install --no-cache-dir xformers==0.0.26.post1 --index-url https://download.pytorch.org/whl/cu121
pip3 install --no-cache-dir torch==2.3.0+cu121 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 && \
pip3 install --no-cache-dir xformers==0.0.26.post1 --index-url https://download.pytorch.org/whl/cu121
Assming your base image is CUDA 12.1
BadNoise
BadNoiseOPβ€’5mo ago
that's crazy, always 0% 😩
No description
digigoblin
digigoblinβ€’5mo ago
Its using GPU if the GPU memory is showing as used That telemetry is not real time and not reliable
BadNoise
BadNoiseOPβ€’5mo ago
but it's strange that even if I run stress test on it for over 1 minute it's never used πŸ˜…
digigoblin
digigoblinβ€’5mo ago
check nvidia-smi
yhlong00000
yhlong00000β€’5mo ago
I added some logs in the code and it is using the GPU.
No description
yhlong00000
yhlong00000β€’5mo ago
No description
digigoblin
digigoblinβ€’5mo ago
Yep, the GPU utilization telemetry always confuses people because its not real-time
yhlong00000
yhlong00000β€’5mo ago
this one is interesting, lol πŸ˜‚
No description
nerdylive
nerdyliveβ€’5mo ago
too confused hahah
Want results from more Discord servers?
Add your server