IN-QUEUE Indefinitely
I am attempting to deploy a model from HF Spaces in runpod serverless - using the ByteDance/SDXL-Lightning Docker image. I started by selecting 'Run with Docker' for the ByteDance/SDXL-Lightning space on HF and copied the Docker image tag: registry.hf.space/bytedance-sdxl-lightning:latest.
Next, in RunPod, I set up a serverless template by entering the Docker image tag into the 'Container Image' field and inputting 'bash -c "python app.py"' as the container start command. I allocated 50 GB of disk space to the container and finalized the template. Subsequently, I used this template to create an API endpoint in the 'Serverless' section. However, whenever I try to run the model, my requests remain indefinitely in the 'in-queue' state. Could you help identify what I might be doing wrong?
17 Replies
Do you have access to any of the Worker logs to see what's going on?
can't open worker logs
What happens when you click this, then Logs?
When I click on it, it tries launching logs for worker but then fails to do so. Basically shows nothing
@singhtanmay345 are you able to see any workers running? Should look l ike that green box in the top left on Patrick's screenshot above, you may be getting unlucky and trying to send requests when all of our GPUs are already in use
No that isnt the issue
Is my guess
Cause his workers are idle it looks like
i think the bigger issue is the expectation that u can just run Hf model on runpod directly
It doesnt sound like u have a proper handler.py file setup
/ a hugging face docker container prob doesnt have a runpod package installed
Can read this for a basic understanding of how to do a python file for runpod serverless
RunPod Blog
Serverless | Create a Custom Basic API
RunPod's Serverless platform allows for the creation of API endpoints that automatically scale to meet demand. The tutorial guides you through creating a basic worker and turning it into an API endpoint on the RunPod serverless platform. For this tutorial, we will create an API endpoint that helps us accomplish
@Merrell is it possible to update this blog to say the platform must be built for amd-64? actually a huge issue ppl run into
Facing the same issue
What issue? Are all your workers throttled? What do the worker logs say etc? You need to provide more detail.
u say the same issue but the issue i pointed out was that u cant just point to any random docker image lol. u gotta prepare it to work the way runpod expects which is why i linked the article
not sure if ur saying u made the same mistake? then read the article i linked
@singhtanmay345 were you able to run the model successfully? I also want to do the same, apparently need to setup the worker on runpod somehow which I am not sure how to.
I reread this thread, and yes Justin is correct.
You can't just grab any random docker image, it needs to pass through the RunPod Handler to send in the correct requests.
@inventionsbyhamid We don't have SDXL Ligtning as a dedicated worker, but we do have a tutorial on running SDXL turbo here: https://docs.runpod.io/tutorials/serverless/gpu/generate-sdxl-turbo#deploy-a-serverless-endpoint
You don't need to build your Docker image for this, as we have prebuilt templates for SDXL turbo.
Managed to do it with help from a friend. I wanted to deploy SDXL Lightning (Bytedance), It's the most run image gen model on replicate. You guys should add it.
Do you have a repo I can checkout? I would be intrested in seeing what we can add to help support this.
Let me ask internally if we can open source it
maybe this is because an improper handler code, no worker took the job, so the job stays in the queue...
I think its easy enough to make your own serverless code if its only for sdxl lightning try it out