Automatic A111 WebUI Serverless on Network Volume

As posted before here on the channel, I am using the Automatic A111 WebUI Serverless on Network Volume by Ashley K. The issue is i unusable, taking between 1 to 3 min to generate a single image. I spent months integrating this API into our app, and now I need to launch to production. Moving to another API, its not an option now, because of all dev time spent. I am considering moving the same API to a docker image. How can I deploy the Automatic A111 WebUI (Serverless Network Volume by Ashley K), into a Docker? What other solutions RunPod recommends?
7 Replies
justin
justin•10mo ago
isnt everything in runpod already in a docker image? Just confused But also what settings is your templates and which GPUs do you have it set to? If you have flash boot on + also have you stressed tested it with 5 requests back to back? What does that look like? Can you share those speeds? If you have let's say like 30 requests incoming to runpod, and your idle time is let's say 60 seconds, and you have active workers, then i imagine you could also optimize it b/c then your workers would stay active and doesn't need to spin up and down for more image generation also how are you loading your code? Can you share it? If u used ashelyk's template too, im also curious what dev time was spent? Did you make further modifications to it? If you get a really beefy GPU, and your issue is that you are now leaving GPU power on the table, can look into modifying the handler to take concurrent requests: https://discord.com/channels/912829806415085598/1200525738449846342 https://phoenixnap.com/kb/how-to-commit-changes-to-docker-image U could also look into moving away from network volumes from ashelyk's template and doing the modification so that the images is stored on the image itself rather than network volume But tldr: im not sure if u spent months integrating why this is a surprise now too. If you can make a request in the runpod web ui, and show the results, that be good as a validation + just share more information about your testing process
ashleyk
ashleyk•10mo ago
Works fine for me on 24GB PRO (4090). I had 122 jobs today and the max execution time was 12 seconds. And I use network volumes too, works perfectly fine. And yeah you should really test things before committing to months of development time and then complain about spending dev time on it only to find it doesn't meet your expectations. You can try set active workers but basically if you have a constant flow of requests, its a non issue. Cold start times are a huge problem with A1111 if you don't have a constant flow of requests. I have at least 3 or 4 clients who are using this in production without any issues.
justin
justin•10mo ago
you can set a minimum worker to 1, and keep it active + its around the range of 24gb pro it seems from ashelyk's experience > hopefully should be solid and u wont have cold starts Tldr, ur best shot: Minimum active worker + beefier GPU is going to probably going to be ur best bet for now, until u get something more lightweight like maybe looking into diffusers library / repo
ashleyk
ashleyk•10mo ago
I don't even use any active workers, I just usually have a constant flow of requests so its a non-issue.
justin
justin•10mo ago
Edit: You could use any other API out there like replicate, dalle, etc. and you can use a different backing for ur image generation > and then set up something more proper later lol. im 100% sure there is if you are under time constraints. again surprised if months of development did happen that this is such a surprise Edit: I saw that you actually have been around for a while, but yeah, im surprised back then not an issue but now it is
ashleyk
ashleyk•10mo ago
I can't react to messages for some reason 🤔 weird now its working
briefPeach
briefPeach•8mo ago
Hi I wonder have you used serverless + network volume in scale? I'm trying to serve a comfyui serverless endpoint. If say 5 serverless workers are running in parallel using the same network volume, will it have slow file reading i/o speed? Cuz different workers are trying to access the same file system reading models? thanks for your amazing serverless + network volume template! its really easy to get started
Want results from more Discord servers?
Add your server