kdcd
kdcd
RRunPod
Created by kdcd on 2/15/2024 in #⚡|serverless
Directing requests from the same user to the same worker
Guys, thank you for your work. We are enjoying your platform. I have the following workflow. On the first request from the user, the worker does some hard stuff about 15-20s, caches hard stuff and all subsequent requests are very fast ~150ms. But if some of the subsequent requests goes to another worker, it should repeat this hard stuff again (15-20s). Is there any possibility to direct all the subsequent calls from the same user to the same worker?
46 replies
RRunPod
Created by kdcd on 1/31/2024 in #⚡|serverless
Pause on the yield in async handler
I have wrote async handler. Messages are realy small, about several kilobites async for msg in search.run_search_generator(request): start_time = time.perf_counter() yield msg print("elapsed_time", (time.perf_counter() - start_time) * 1000) And I have measured how much time every yield from the job takes and it's about 160 ms. It's quite a lot for my use case and increases time twice for the whole job execution. What are my options ?
18 replies
RRunPod
Created by kdcd on 1/21/2024 in #⚡|serverless
Proper way to listen stream
If I understood correctly, the only way to get stream updates is making request to stream endpoint like it showen in docs here https://docs.runpod.io/reference/llama2-13b-chat. for i in range(10): time.sleep(1) get_status = requests.get(status_url, headers=headers) print(get_status.text) Is there any other way to get hands on updates from async generator handler? Would be nice to have something like this https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events, like it's done in openai api https://platform.openai.com/docs/api-reference/streaming Or maybe websockets will do too.
5 replies