R
RunPod5d ago
ErezL

I am trying to deploy a "meta-llama/Llama-3.1-8B-Instruct" model on Serverless vLLM

I do this with maximum possible memory. After setup, I try to run the "hello world" sample, but the request is stuck in queue and I get "[error]worker exited with exit code 1" with no other error or message in log. Is it even possible to run this model? What is the problem? can this be resolved? (for the record, I did manage to run a much smaller model using the same procedure as above)
18 Replies
Jason
Jason5d ago
what gpu did you use? try setting the allow remote code
ErezL
ErezLOP5d ago
I tried all of them I think. The storngest possible for sure. I did
Jason
Jason5d ago
can i see your request example
ErezL
ErezLOP5d ago
is this chioce ok?
No description
Jason
Jason5d ago
maybe you use the vllm template from runpod? your inputs ok
ErezL
ErezLOP5d ago
it's the default (I deleted the instace by now) yes How can I choose a GPU? (there is no choice available in the setup process) should I be using a different template?
Jason
Jason5d ago
ok let me try im trying to run that model with runpod's template now that one you screenshot works
{
"delayTime": 70122,
"executionTime": 896,
"id": "f83c48a3-3bcd-41e2-8a48-1d2126bbb7b1-e1",
"output": [
{
"choices": [
{
"tokens": [
"! Welcome to my blog about London: the Great City!\nIn this blog you"
]
}
],
"usage": {
"input": 3,
"output": 16
}
}
],
"status": "COMPLETED",
"workerId": "wl0u8r3xp794gx"
}
{
"delayTime": 70122,
"executionTime": 896,
"id": "f83c48a3-3bcd-41e2-8a48-1d2126bbb7b1-e1",
"output": [
{
"choices": [
{
"tokens": [
"! Welcome to my blog about London: the Great City!\nIn this blog you"
]
}
],
"usage": {
"input": 3,
"output": 16
}
}
],
"status": "COMPLETED",
"workerId": "wl0u8r3xp794gx"
}
my run just succeed with h100 did you have your huggingface token in the endpoint ?
ErezL
ErezLOP5d ago
no
Jason
Jason5d ago
well thats a problem you need it
ErezL
ErezLOP5d ago
How do I choose a GPU? Where do I even see which GPU I got?
Jason
Jason5d ago
This here if you select one, thats all the gpu will be if you select two, you can see the workers in the other tab
ErezL
ErezLOP5d ago
readonly token is ok?
Jason
Jason5d ago
yeah for downloading
No description
Jason
Jason5d ago
theres a screenshot about how to see your workers
ErezL
ErezLOP4d ago
I get an error that I need to ask for access to the model in huggingface I applied and waiting for approval... Thanks for your time.
Jason
Jason4d ago
Oh i see okok your welcome
ErezL
ErezLOP4d ago
I works now. Thanks.
Jason
Jason4d ago
yay

Did you find this page helpful?