Directing requests from the same user to the same worker
Guys, thank you for your work. We are enjoying your platform.
I have the following workflow. On the first request from the user, the worker does some hard stuff about 15-20s, caches hard stuff and all subsequent requests are very fast ~150ms. But if some of the subsequent requests goes to another worker, it should repeat this hard stuff again (15-20s). Is there any possibility to direct all the subsequent calls from the same user to the same worker?
Solution:Jump to solution
Just a summary so I can mark this solution:
1) Can use network storage to persist data in between runs
2) Use a outside file storage / object storage provider
3) If using Google cloud / S3 Bucket, for large files can use parallel downloads / uploads; there should be existing tooling out there; or can obvs custom make ur own...
18 Replies
You only really benefit from FlashBoot if you have a constant flow of requests. Otherwise you can either set an Active worker or increase the idle timeout.
@kdcd you can use request count scaling, and do something like first 100 requests you only need 1 worker, etc
It seems I have introduced a bit of confusion with my explanation of workflow. I will expand on it. My model is working on rendered construction drawings pdf. When user makes some request, pdf downloads from s3 and then renders high quality image, depending on pdf can take ~ 5s-30s. Each user has there own pdf. On subsequent request if request arrives to the same worker, hard work (downloading, rendering) already done, only model evaluates which is fast (150 ms). But if request arrives to another worker, it should download and render everything again. If we are scaling our workers to 10-20, what we are planning to do, it quite ruin the experience for the user, because on every pdf it will have 10-20 very slow requests.
You sound like you want a caching mechanism, your best bet is a network storage
All pods on serverless can attach to a network storage, which would allow you to persist data in between workers / runs
And then that way all workers have the same backing storage
https://docs.runpod.io/serverless/references/endpoint-configurations#select-network-volume
Essentially your workflow then should like:
1) Worker gets a job
2) Check network storage for client id > if exists pull existing resources > if not create a new folder
3) Continue with the job from whatever xyz point.
4) Write results if needed to network storage for other workers
yep, that's nice, thanks a lot. The only thing it will limit workers to one datacenter
I think that's just the cost you need to eat, or you can write to a firebase file storage is what I do and I download it from there
B/c of what you said, I actually prefer to use my own storage mechanism, especially if your files' arent insanely big for the final files it sounds like, or the initial resources
https://github.com/justinwlin/FirebaseStorageWrapperPython
(my personal wrapper lol)
🙂 Much appreciated. But would firebase be faster then just uploading files to s3?
Never heard about it
Firebase is backed by Google Cloud Buckets / it is run by google as a Google service
it just an easier wrapper around Google Cloud Buckets, so I like Firebase
S3 is also fine
I just hate AWS
xD
Honestly, if I could, Id avoid aws / google buckets if i could 😆, but no better file storage / object storage providers out there
🙂 Who loves them ?
But yeah, I also just think it's easier for me to have an easy wrapper around Google firebase file storage + they got a nice UI + I get a ton of file storage for free before I need to pay for it
So it is great for me, for developing cause I dont need to keep paying AWS ingress/egress costs
nice, nice
We just already have a lot of infra around s3 😦
Haha, then go with S3
But yeah not too bad, and I think cause I work with really long audio / videos with runpod, if your files can be optimized before sending / downloading it (compressing, converting file formats, stripping unnecessary data etc) it can also help with getting things moving faster. but honestly ur files sound small enough where that might not be necessary
idk how big ur files are tho
it depends, a lot of pdfs quite small 30 mb, but render time can be quite big, Some of them about 500 mb.
I see, I think for the bigger ones what I do is for S3
they support range downloads
So you can in parallel upload / download
files
So for large files, that is prob what you want to look into, that is what i did for my larger files
Thanks for the help
Yup no worries, and if you REALLY want to xD:
https://discord.com/channels/912829806415085598/1200525738449846342
You can optimize even further with a concurrent worker hahaha. idk how much GPU u are eating up tho
But im my mind a PDF renderer might not be eating up GPU resources all the way - i could be wrong
but it something ive been playing with
lol, but my video / audio transcriber does eat up a lot of resources so i could only get maybe 2 concurrent things going at once
Anyways gl 🙂
Good luck for you too 🙂
Solution
Just a summary so I can mark this solution:
1) Can use network storage to persist data in between runs
2) Use a outside file storage / object storage provider
3) If using Google cloud / S3 Bucket, for large files can use parallel downloads / uploads; there should be existing tooling out there; or can obvs custom make ur own