wizardjoe Comments - Answer Overflow

Topics

wizardjoe

•Created by Builderman on 2/19/2024 in #⚡｜serverless

Mixtral Possible?

don't select anything under that

21 replies

•Created by Builderman on 2/19/2024 in #⚡｜serverless

Mixtral Possible?

I use 48GB

21 replies

•Created by Builderman on 2/19/2024 in #⚡｜serverless

Mixtral Possible?

I have min workers set to at least 1 so that it doesnt spend time booting, which is where the majority of the latency will be

21 replies

•Created by Builderman on 2/19/2024 in #⚡｜serverless

Mixtral Possible?

I'm currently running this with decent speeds, but you'll need to set your min and max workers accordingly depending on the load you expect

21 replies

•Created by wizardjoe on 2/16/2024 in #⚡｜serverless

How do I correctly stream results using runpod-python?

This solved it. Thanks!

34 replies

•Created by wizardjoe on 2/16/2024 in #⚡｜serverless

How do I correctly stream results using runpod-python?

Notice that stream is empty. So I've missed the last chunk

34 replies

•Created by wizardjoe on 2/16/2024 in #⚡｜serverless

How do I correctly stream results using runpod-python?

It missed it when I tested this in Postman. For example, I called it once and got this chunk: { "status": "IN_PROGRESS", "stream": [ { "output": { "finished": true, "tokens": [ "\nThe word you are looking for is buy, for example, "Instead" ], "usage": { "input": 9, "output": 16 } } } ] } Notice that it's in progress and the response in "tokens" is a partial one. Then I ran it again after waiting for 1 second and got this: { "status": "COMPLETED", "stream": [] }

34 replies

•Created by wizardjoe on 2/16/2024 in #⚡｜serverless

How do I correctly stream results using runpod-python?

Got it, thanks! This aligns with what I was thinking, although there is still the possibility that the final call might miss the last chunk, and you'll have to call the status API or something to get the full response

34 replies

•Created by wizardjoe on 2/16/2024 in #⚡｜serverless

How do I correctly stream results using runpod-python?

Would you happen to have example code that I can look at to see how to handle streaming properly in python?

34 replies

•Created by wizardjoe on 2/16/2024 in #⚡｜serverless

How do I correctly stream results using runpod-python?

So now, I have to call the status API to get the full message, and subtract the chunk that I got before, in order to get the final chunk... seems pretty cumbersome, no?

34 replies

•Created by wizardjoe on 2/16/2024 in #⚡｜serverless

How do I correctly stream results using runpod-python?

For example, the first time I call the stream API endpoint, it returns: { "status": "IN_QUEUE", "stream": [] } Every 10 seconds or so, I call it again, and it returns the same thing. Finally, after some time, when I call it, it returns: { "status": "IN_PROGRESS", "stream": [ { "output": { "finished": true, "tokens": [ "\nThe word you are looking for is buy, for example, "Instead" ], "usage": { "input": 9, "output": 16 } } } ] } This seems to be a chunk. But when I call it again after this, it seems like I miss the ending chunk, and just get this: { "status": "COMPLETED", "stream": [] }

34 replies

•Created by wizardjoe on 2/16/2024 in #⚡｜serverless

How do I correctly stream results using runpod-python?

What's a typical way to structure the requests? I tried using Postman to call the stream API endpoint and it still times out after 10 seconds, returning status "IN_QUEUE". Am I supposed to repeatedly call it every 10 seconds until I get a response?

34 replies

•Created by wizardjoe on 1/9/2024 in #⚡｜serverless

Setting up MODEL_BASE_PATH when building worker-vllm image

Are the "dolphin" images mainly for coding or are they good at other things?

20 replies

•Created by wizardjoe on 1/9/2024 in #⚡｜serverless

Setting up MODEL_BASE_PATH when building worker-vllm image

If you're making your own then I guess you can do anything really

20 replies

•Created by wizardjoe on 1/9/2024 in #⚡｜serverless

Setting up MODEL_BASE_PATH when building worker-vllm image

Sorry, I meant the "worker-vllm" image that runpod has only supports awq or squeezellm, at least according to these docs: https://github.com/runpod-workers/worker-vllm

20 replies

•Created by wizardjoe on 1/9/2024 in #⚡｜serverless

Setting up MODEL_BASE_PATH when building worker-vllm image

Aw man, I think runpod serverless only supports either awq or squeezellm quantization

20 replies

•Created by wizardjoe on 1/9/2024 in #⚡｜serverless

Setting up MODEL_BASE_PATH when building worker-vllm image

Do you know how big the models were in terms of total space used in HDD?

20 replies

•Created by wizardjoe on 1/9/2024 in #⚡｜serverless

Setting up MODEL_BASE_PATH when building worker-vllm image

@ashleyk Is there one in particular that you recommend?

20 replies

•Created by wizardjoe on 1/9/2024 in #⚡｜serverless

Setting up MODEL_BASE_PATH when building worker-vllm image

@Alpay Ariyak , any thoughts? If I try to bake in the mixtral-8x7b model, it results in a huge image that I'm having trouble getting up into docker hub, so I want to figure out how to set it up with volumes.

20 replies

•Created by wizardjoe on 1/4/2024 in #⚡｜serverless

Error building worker-vllm docker image for mixtral 8x7b

@Herai_Studios it spends some time downloading the model safetensors, and then after that, it exports the layers and then writes the image. I haven't tested the endpoint yet, will let you know more tomorrow

69 replies