wizardjoe
RRunPod
•Created by Builderman on 2/19/2024 in #⚡|serverless
Mixtral Possible?
don't select anything under that
21 replies
RRunPod
•Created by Builderman on 2/19/2024 in #⚡|serverless
Mixtral Possible?
I have min workers set to at least 1 so that it doesnt spend time booting, which is where the majority of the latency will be
21 replies
RRunPod
•Created by Builderman on 2/19/2024 in #⚡|serverless
Mixtral Possible?
I'm currently running this with decent speeds, but you'll need to set your min and max workers accordingly depending on the load you expect
21 replies
RRunPod
•Created by wizardjoe on 2/16/2024 in #⚡|serverless
How do I correctly stream results using runpod-python?
This solved it. Thanks!
34 replies
RRunPod
•Created by wizardjoe on 2/16/2024 in #⚡|serverless
How do I correctly stream results using runpod-python?
Notice that stream is empty. So I've missed the last chunk
34 replies
RRunPod
•Created by wizardjoe on 2/16/2024 in #⚡|serverless
How do I correctly stream results using runpod-python?
It missed it when I tested this in Postman. For example, I called it once and got this chunk:
{
"status": "IN_PROGRESS",
"stream": [
{
"output": {
"finished": true,
"tokens": [
"\nThe word you are looking for is buy, for example, "Instead"
],
"usage": {
"input": 9,
"output": 16
}
}
}
]
}
Notice that it's in progress and the response in "tokens" is a partial one. Then I ran it again after waiting for 1 second and got this:
{
"status": "COMPLETED",
"stream": []
}
34 replies
RRunPod
•Created by wizardjoe on 2/16/2024 in #⚡|serverless
How do I correctly stream results using runpod-python?
Got it, thanks! This aligns with what I was thinking, although there is still the possibility that the final call might miss the last chunk, and you'll have to call the status API or something to get the full response
34 replies
RRunPod
•Created by wizardjoe on 2/16/2024 in #⚡|serverless
How do I correctly stream results using runpod-python?
Would you happen to have example code that I can look at to see how to handle streaming properly in python?
34 replies
RRunPod
•Created by wizardjoe on 2/16/2024 in #⚡|serverless
How do I correctly stream results using runpod-python?
So now, I have to call the status API to get the full message, and subtract the chunk that I got before, in order to get the final chunk... seems pretty cumbersome, no?
34 replies
RRunPod
•Created by wizardjoe on 2/16/2024 in #⚡|serverless
How do I correctly stream results using runpod-python?
For example, the first time I call the stream API endpoint, it returns:
{
"status": "IN_QUEUE",
"stream": []
}
Every 10 seconds or so, I call it again, and it returns the same thing. Finally, after some time, when I call it, it returns:
{
"status": "IN_PROGRESS",
"stream": [
{
"output": {
"finished": true,
"tokens": [
"\nThe word you are looking for is buy, for example, "Instead"
],
"usage": {
"input": 9,
"output": 16
}
}
}
]
}
This seems to be a chunk. But when I call it again after this, it seems like I miss the ending chunk, and just get this:
{
"status": "COMPLETED",
"stream": []
}
34 replies
RRunPod
•Created by wizardjoe on 2/16/2024 in #⚡|serverless
How do I correctly stream results using runpod-python?
What's a typical way to structure the requests? I tried using Postman to call the stream API endpoint and it still times out after 10 seconds, returning status "IN_QUEUE". Am I supposed to repeatedly call it every 10 seconds until I get a response?
34 replies
RRunPod
•Created by wizardjoe on 1/9/2024 in #⚡|serverless
Setting up MODEL_BASE_PATH when building worker-vllm image
Are the "dolphin" images mainly for coding or are they good at other things?
20 replies
RRunPod
•Created by wizardjoe on 1/9/2024 in #⚡|serverless
Setting up MODEL_BASE_PATH when building worker-vllm image
If you're making your own then I guess you can do anything really
20 replies
RRunPod
•Created by wizardjoe on 1/9/2024 in #⚡|serverless
Setting up MODEL_BASE_PATH when building worker-vllm image
Sorry, I meant the "worker-vllm" image that runpod has only supports awq or squeezellm, at least according to these docs: https://github.com/runpod-workers/worker-vllm
20 replies
RRunPod
•Created by wizardjoe on 1/9/2024 in #⚡|serverless
Setting up MODEL_BASE_PATH when building worker-vllm image
Aw man, I think runpod serverless only supports either awq or squeezellm quantization
20 replies
RRunPod
•Created by wizardjoe on 1/9/2024 in #⚡|serverless
Setting up MODEL_BASE_PATH when building worker-vllm image
Do you know how big the models were in terms of total space used in HDD?
20 replies
RRunPod
•Created by wizardjoe on 1/9/2024 in #⚡|serverless
Setting up MODEL_BASE_PATH when building worker-vllm image
@ashleyk Is there one in particular that you recommend?
20 replies
RRunPod
•Created by wizardjoe on 1/9/2024 in #⚡|serverless
Setting up MODEL_BASE_PATH when building worker-vllm image
@Alpay Ariyak , any thoughts? If I try to bake in the mixtral-8x7b model, it results in a huge image that I'm having trouble getting up into docker hub, so I want to figure out how to set it up with volumes.
20 replies
RRunPod
•Created by wizardjoe on 1/4/2024 in #⚡|serverless
Error building worker-vllm docker image for mixtral 8x7b
@Herai_Studios it spends some time downloading the model safetensors, and then after that, it exports the layers and then writes the image. I haven't tested the endpoint yet, will let you know more tomorrow
69 replies