Toxibunny Comments - Answer Overflow

Sorry to hijack, but if it’s not too much trouble, it’d be nice to have the option to use locally stored models when baking them into the docker image. For times when huggingface is down, for example…

11 replies

RRunPod

•Created by bigslam on 3/24/2024 in #⚡｜serverless

How to run OLLAMA on Runpod Serverless?

Vllm supports AWQ quantized, but yeah it would be nice to have other options for text inference. Like I keep seeing this ‘grammars’ thing mentioned about the place, but afaik Vllm doesn’t support that either…

32 replies

RRunPod

•Created by octopus on 2/26/2024 in #⚡｜serverless

Help: Serverless Mixtral OutOfMemory Error

And yeah, I guess I tried a few times before hitting it lucky with that.

48 replies

RRunPod

•Created by octopus on 2/26/2024 in #⚡｜serverless

Help: Serverless Mixtral OutOfMemory Error

I just looked, all I have is model name and quantization

48 replies

RRunPod

•Created by octopus on 2/26/2024 in #⚡｜serverless

Help: Serverless Mixtral OutOfMemory Error

I only got a mixtral working by putting the context a lot lower than I’d hoped to… Edit: actually looking at my template I didn’t set that environment variable. 🤷‍♂️

48 replies

RRunPod

•Created by kingclimax7569 on 2/16/2024 in #⚡｜serverless

Status endpoint only returns "COMPLETED" but no answer to the question

I just used the ready made vllm endpoint. 🤷‍♂️ I’m not really the one to ask. 👀

283 replies

RRunPod

•Created by kingclimax7569 on 2/16/2024 in #⚡｜serverless

Status endpoint only returns "COMPLETED" but no answer to the question

…unless the problem really is that all you’re getting back is ’completed’ and no tokens at all anywhere. In which case forget all I said 😅

283 replies

RRunPod

•Created by kingclimax7569 on 2/16/2024 in #⚡｜serverless

Status endpoint only returns "COMPLETED" but no answer to the question

...so if I'm reading yours right, you'll want something like

LLM_response = json.loads(get_status.text)['tokens']

LLM_response = json.loads(get_status.text)['tokens']

I think, lol

283 replies

RRunPod

•Created by kingclimax7569 on 2/16/2024 in #⚡｜serverless

Status endpoint only returns "COMPLETED" but no answer to the question

elif status == "COMPLETED": tokens = json_response['output'][0]['choices'][0]['tokens'] return tokens here's the relevant part of mine. if the status is COMPLETED, the output you want is in 'tokens'. hope this helps!

283 replies

RRunPod

•Created by kingclimax7569 on 2/16/2024 in #⚡｜serverless

Status endpoint only returns "COMPLETED" but no answer to the question

I can share my code, but as far as I can see looking from what you’ve posted, your output should be in the ’tokens’ part of the json that you get back. Try just printing everything you get back. If it’s completed, it should be there…

283 replies

RRunPod

•Created by Superintendent on 2/16/2024 in #⚡｜serverless

Deepseek coder on serverless

It’s like ‘open a folder, github bash clone the repo, open the command line, put in that one line

sudo docker build -t username/image:tag --build-arg MODEL_NAME="TheBloke/deepseek-coder-33B-instruct-AWQ" --build-arg BASE_PATH="/models" .

sudo docker build -t username/image:tag --build-arg MODEL_NAME="TheBloke/deepseek-coder-33B-instruct-AWQ" --build-arg BASE_PATH="/models" .

Windows doesn’t need sudo. Model name is copied using the huggingface button. Username/image:tag needs to be your username and chosen image name/tag (I’m sure you know this already) and to be in all lower-case, and runpod requires a tag (I’ve just been mostly using 0.1 so far) It’s been working. edit: I put the name in for deepseek coder awq quantized. I have not tried this one personally. Note that GGUF quants won’t work with vLLM afaik.

17 replies

RRunPod

•Created by Superintendent on 2/16/2024 in #⚡｜serverless

Deepseek coder on serverless

I’ve been following the instructions for ‘option 2’ on this page: https://github.com/runpod-workers/worker-vllm

17 replies

RRunPod

•Created by Toxibunny on 2/13/2024 in #⚡｜serverless

max workers set to 2 but endpoint page shows ‘5 idle’

Okay thankyou!

5 replies

Gaming

Programming