R
RunPod9mo ago
houmie

When using vLLM on OpenAI endpoint, what is the point of runsync/run?

I just managed to create a flexible worker on serverless. It works great and I can do text completions via the openai/v1/completions endpoint. What I don't understand is the purpose of runsync and run. It's not like I'm queuing jobs somewhere to pick up the results later, right? openai endpoint returns the results straight away. And if I had too many users trying to use the openai/v1/completions, aditional workers will come to aid and get them access. So what's the point of the other endpoints? May someone is so kind and explain that to me? Maybe I'm missing something. Thank you
6 Replies
agentpietrucha
agentpietrucha9mo ago
Runsync is a synchronous way of hitting your endpoint. The http query will wait (as long as it doesn't timeout first) until your worker returns a response. Eg.: If you add worker call to your api then you can just wait for the response: const response = await fetch('runpod/worker/url'/**runsync**) const result = await response.json() When you use the /run endpoint then runpod is returning you job status and id. Having the id you'd have to setup some kind of a worker to periodically check for results of your job. Both endpoints serve different purposes. Hope that veeery high level overview will help you a little
houmie
houmieOP9mo ago
Not sure. I can just use the openai endpoint, which seems to be synchronous. How does the openai endpoint relate to the two other endpoints, please?
agentpietrucha
agentpietrucha9mo ago
I haven’t used the openai endpoint yet unfortunately:(. Let’s wait for somebody else to jump in here
nerdylive
nerdylive9mo ago
Openai endpoints are made for vllm templates And two other is the "default" for Runpod workers The are for retreiving results too but openai endpoints for openai package on client, for easier compability and no need to recreate your own openai handler from runsync Runsyc and run can be used for other templates
houmie
houmieOP9mo ago
ahhh perfect. That makes now perfectly sense. Thank you. How about scalability? Lets assume max workers are set to 3 and 5 clients would connect explicitly via OpenAI endpoint to the serverless instance. The first three are naturally served, the other two would be waiting in the queue though and waiting for a worker to become freed up correct? Won't they time out while waiting?
nerdylive
nerdylive9mo ago
Ye ye Yep no time out while in queue unless you remove them ( the jobs) or set an time out before new workers is used from the endpoint ( beside max running workers)

Did you find this page helpful?