JJonahJ
RRunPod
•Created by Gregg Casey on 4/26/2024 in #⚡|serverless
Hosted Serverless
Uhhh… I have it on a 48GB gpu
11 replies
RRunPod
•Created by Gregg Casey on 4/26/2024 in #⚡|serverless
Hosted Serverless
Mixtral can run on serverless. Takes a minute and a half to spin up though…
11 replies
RRunPod
•Created by avif on 4/5/2024 in #⚡|serverless
Having problems working with the `Llama-2-7b-chat-hf`
7 replies
RRunPod
•Created by avif on 4/5/2024 in #⚡|serverless
Having problems working with the `Llama-2-7b-chat-hf`
7 replies
RRunPod
•Created by avif on 4/5/2024 in #⚡|serverless
Having problems working with the `Llama-2-7b-chat-hf`
It’s because your output tokens is set to 16. You should send a bunch of parameters too, not just the prompt.
7 replies
RRunPod
•Created by Casper. on 2/5/2024 in #⚡|serverless
SGLang worker (similar to worker-vllm)
Sorry to hijack, but if it’s not too much trouble, it’d be nice to have the option to use locally stored models when baking them into the docker image. For times when huggingface is down, for example…
11 replies
RRunPod
•Created by bigslam on 3/24/2024 in #⚡|serverless
How to run OLLAMA on Runpod Serverless?
Vllm supports AWQ quantized, but yeah it would be nice to have other options for text inference. Like I keep seeing this ‘grammars’ thing mentioned about the place, but afaik Vllm doesn’t support that either…
32 replies
RRunPod
•Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
And yeah, I guess I tried a few times before hitting it lucky with that.
48 replies
RRunPod
•Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
I just looked, all I have is model name and quantization
48 replies
RRunPod
•Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
I only got a mixtral working by putting the context a lot lower than I’d hoped to…
Edit: actually looking at my template I didn’t set that environment variable. 🤷♂️
48 replies
RRunPod
•Created by kingclimax7569 on 2/16/2024 in #⚡|serverless
Status endpoint only returns "COMPLETED" but no answer to the question
I just used the ready made vllm endpoint. 🤷♂️ I’m not really the one to ask. 👀
283 replies
RRunPod
•Created by kingclimax7569 on 2/16/2024 in #⚡|serverless
Status endpoint only returns "COMPLETED" but no answer to the question
…unless the problem really is that all you’re getting back is ’completed’ and no tokens at all anywhere. In which case forget all I said 😅
283 replies
RRunPod
•Created by kingclimax7569 on 2/16/2024 in #⚡|serverless
Status endpoint only returns "COMPLETED" but no answer to the question
...so if I'm reading yours right, you'll want something like
I think, lol
283 replies
RRunPod
•Created by kingclimax7569 on 2/16/2024 in #⚡|serverless
Status endpoint only returns "COMPLETED" but no answer to the question
elif status == "COMPLETED":
tokens = json_response['output'][0]['choices'][0]['tokens']
return tokens
here's the relevant part of mine. if the status is COMPLETED, the output you want is in 'tokens'. hope this helps!
283 replies
RRunPod
•Created by kingclimax7569 on 2/16/2024 in #⚡|serverless
Status endpoint only returns "COMPLETED" but no answer to the question
I can share my code, but as far as I can see looking from what you’ve posted, your output should be in the ’tokens’ part of the json that you get back. Try just printing everything you get back. If it’s completed, it should be there…
283 replies
RRunPod
•Created by Superintendent on 2/16/2024 in #⚡|serverless
Deepseek coder on serverless
It’s like ‘open a folder, github bash clone the repo, open the command line, put in that one line
Windows doesn’t need sudo. Model name is copied using the huggingface button. Username/image:tag needs to be your username and chosen image name/tag (I’m sure you know this already) and to be in all lower-case, and runpod requires a tag (I’ve just been mostly using 0.1 so far)
It’s been working.
edit: I put the name in for deepseek coder awq quantized. I have not tried this one personally. Note that GGUF quants won’t work with vLLM afaik.
17 replies
RRunPod
•Created by Superintendent on 2/16/2024 in #⚡|serverless
Deepseek coder on serverless
I’ve been following the instructions for ‘option 2’ on this page: https://github.com/runpod-workers/worker-vllm
17 replies
RRunPod
•Created by JJonahJ on 2/13/2024 in #⚡|serverless
max workers set to 2 but endpoint page shows ‘5 idle’
Okay thankyou!
5 replies