kdcd
RRunPod
•Created by kdcd on 2/15/2024 in #⚡|serverless
Directing requests from the same user to the same worker
Guys, thank you for your work. We are enjoying your platform.
I have the following workflow. On the first request from the user, the worker does some hard stuff about 15-20s, caches hard stuff and all subsequent requests are very fast ~150ms. But if some of the subsequent requests goes to another worker, it should repeat this hard stuff again (15-20s). Is there any possibility to direct all the subsequent calls from the same user to the same worker?
46 replies
RRunPod
•Created by kdcd on 1/31/2024 in #⚡|serverless
Pause on the yield in async handler
I have wrote async handler. Messages are realy small, about several kilobites
async for msg in search.run_search_generator(request):
start_time = time.perf_counter()
yield msg
print("elapsed_time", (time.perf_counter() - start_time) * 1000)
And I have measured how much time every yield from the job takes and it's about 160 ms. It's quite a lot for my use case and increases time twice for the whole job execution. What are my options ?
18 replies
RRunPod
•Created by kdcd on 1/21/2024 in #⚡|serverless
Proper way to listen stream
If I understood correctly, the only way to get stream updates is making request to stream endpoint like it showen in docs here https://docs.runpod.io/reference/llama2-13b-chat.
for i in range(10):
time.sleep(1)
get_status = requests.get(status_url, headers=headers)
print(get_status.text)
Is there any other way to get hands on updates from async generator handler?
Would be nice to have something like this https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events, like it's done in openai api https://platform.openai.com/docs/api-reference/streaming
Or maybe websockets will do too.
5 replies