RunPod•5d ago

I am trying to deploy a "meta-llama/Llama-3.1-8B-Instruct" model on Serverless vLLM

I do this with maximum possible memory. After setup, I try to run the "hello world" sample, but the request is stuck in queue and I get "[error]worker exited with exit code 1" with no other error or message in log. Is it even possible to run this model? What is the problem? can this be resolved? (for the record, I did manage to run a much smaller model using the same procedure as above)

18 Replies

Jason•5d ago

what gpu did you use? try setting the allow remote code

ErezLOP•5d ago

I tried all of them I think. The storngest possible for sure. I did

Jason•5d ago

can i see your request example

ErezLOP•5d ago

is this chioce ok?

Jason•5d ago

maybe you use the vllm template from runpod? your inputs ok

ErezLOP•5d ago

it's the default (I deleted the instace by now) yes How can I choose a GPU? (there is no choice available in the setup process) should I be using a different template?

Jason•5d ago

ok let me try im trying to run that model with runpod's template now that one you screenshot works

{
  "delayTime": 70122,
  "executionTime": 896,
  "id": "f83c48a3-3bcd-41e2-8a48-1d2126bbb7b1-e1",
  "output": [
    {
      "choices": [
        {
          "tokens": [
            "! Welcome to my blog about London: the Great City!\nIn this blog you"
          ]
        }
      ],
      "usage": {
        "input": 3,
        "output": 16
      }
    }
  ],
  "status": "COMPLETED",
  "workerId": "wl0u8r3xp794gx"
}

{
  "delayTime": 70122,
  "executionTime": 896,
  "id": "f83c48a3-3bcd-41e2-8a48-1d2126bbb7b1-e1",
  "output": [
    {
      "choices": [
        {
          "tokens": [
            "! Welcome to my blog about London: the Great City!\nIn this blog you"
          ]
        }
      ],
      "usage": {
        "input": 3,
        "output": 16
      }
    }
  ],
  "status": "COMPLETED",
  "workerId": "wl0u8r3xp794gx"
}

my run just succeed with h100 did you have your huggingface token in the endpoint ?

ErezLOP•5d ago

Jason•5d ago

well thats a problem you need it

ErezLOP•5d ago

How do I choose a GPU? Where do I even see which GPU I got?

Jason•5d ago

This here if you select one, thats all the gpu will be if you select two, you can see the workers in the other tab

ErezLOP•5d ago

readonly token is ok?

Jason•5d ago

yeah for downloading

Jason•5d ago

theres a screenshot about how to see your workers

ErezLOP•4d ago

I get an error that I need to ask for access to the model in huggingface I applied and waiting for approval... Thanks for your time.

Jason•4d ago

Oh i see okok your welcome

ErezLOP•4d ago

I works now. Thanks.

Jason•4d ago

yay

Gaming

Programming

I am trying to deploy a "meta-llama/Llama-3.1-8B-Instruct" model on Serverless vLLM

Did you find this page helpful?