R
RunPod9mo ago
fanbing

serverless endpoint, Jobs always 1 in queued, even 3 workers running

after 600 s ,still 1 jos in queued,and log nothing. How i to see what is running? This morning when I was using a GPU pod, I was prompted that an ip_adapter was not found, but now I can't see any output. My local project does have an ip_adapter.
No description
No description
18 Replies
fanbing
fanbingOP9mo ago
"id": "e7ea07cf-7c78-4bab-bc59-0c22e3e26cc9-e1",
ashleyk
ashleyk9mo ago
Click on one of your workers to see what they are doing. Probably a problem with your code. You shouldn't move to serverless if its not even working in GPU cloud. It is much easier to debug things in GPU cloud than in serverless.
fanbing
fanbingOP9mo ago
2024-03-10T08:54:43.082245784Z Traceback (most recent call last): 2024-03-10T08:54:43.082314323Z File "/src/handler.py", line 16, in <module> 2024-03-10T08:54:43.082319933Z from pipeline_stable_diffusion_xl_instantid import StableDiffusionXLInstantIDPipeline, draw_kps 2024-03-10T08:54:43.082325027Z File "/src/pipeline_stable_diffusion_xl_instantid.py", line 42, in <module> 2024-03-10T08:54:43.082329870Z from ip_adapter.resampler import Resampler 2024-03-10T08:54:43.082333984Z ModuleNotFoundError: No module named 'ip_adapter'
ashleyk
ashleyk9mo ago
yeah you have to fix that. I would scale workers down to zero, your code is broken, serverless can't fix it for you.
fanbing
fanbingOP9mo ago
but in my local project, ip_adapter in checkpoints , I test, ok ,it generate a image , i will check my code again,
ashleyk
ashleyk9mo ago
It looks like the ip_adapter module is not installed.
ashleyk
ashleyk9mo ago
It looks like you're using InstantID, I have this code that works: https://github.com/ashleykleynhans/runpod-worker-instantid
GitHub
GitHub - ashleykleynhans/runpod-worker-instantid: InstantID : Zero-...
InstantID : Zero-shot Identity-Preserving Generation in Seconds | RunPod Serverless Worker - ashleykleynhans/runpod-worker-instantid
fanbing
fanbingOP9mo ago
I have download ip_adapter in my instantid project, use docker to build image thans,I will see the link about InstantID
fanbing
fanbingOP9mo ago
Thank you very much , I use your worker-instantid to generate image success. But I find a litter problem, model in Request in api/generate.md is wrong, "wangqixun/YamerMIX_v8" is correct , not "lwangqixun/YamerMIX_v8".
No description
No description
ashleyk
ashleyk9mo ago
Thanks, the typo has been fixed.
fanbing
fanbingOP9mo ago
I am learning your "runpod-worker-instantid",and now instantid can use multicontrolnet modes, , should i use download_checkpoints.py to download there modes just like "get_instantid_pipeline('wangqixun/YamerMIX_v8')", or it is ok in rp_handler.py? Which is better ?
No description
No description
ashleyk
ashleyk9mo ago
Better to download the models into the image otherwise they probably get downloaded in your worker.
fanbing
fanbingOP9mo ago
Why 'wangqixun/YamerMIX_v8' can use download_checkpoint.py to download to network volume, but multicontrolnet mode like pose model 、depth model should use image ? What different ? Or I misunderstand something? I'm new to runpod, please forgive my basic questions.When using serverless endpoints, comparing the approach of downloading Multicontrolnet models, which are about 25GB in size, into the Docker image VS network volume, which option is more efficient?
billchen
billchen8mo ago
@ashleykrunpod can‘t run docker command 。and can't install docker。
flash-singh
flash-singh8mo ago
container image has better throughput than network storage since its on local nvme disk
billchen
billchen8mo ago
"output": "Traceback (most recent call last):\n File "/workspace/runpod-worker-instantid/src/rp_handler.py", line 313, in handler\n images = generate_image(\n File "/workspace/runpod-worker-instantid/src/rp_handler.py", line 282, in generate_image\n PIPELINE = get_instantid_pipeline(model)\n File "/workspace/runpod-worker-instantid/src/rp_handler.py", line 129, in get_instantid_pipeline\n pipe.load_ip_adapter_instantid(face_adapter)\n File "/workspace/runpod-worker-instantid/src/pipeline_stable_diffusion_xl_instantid.py", line 159, in load_ip_adapter_instantid\n self.set_image_proj_model(model_ckpt, image_emb_dim, num_tokens)\n File "/workspace/runpod-worker-instantid/src/pipeline_stable_diffusion_xl_instantid.py", line 181, in set_image_proj_model\n self.image_proj_model.load_state_dict(state_dict)\n File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2041, in load_state_dict\n raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(\nRuntimeError: Error(s) in loading state_dict for Resampler:\n\tsize mismatch for proj_out.weight: copying a param with shape torch.Size([2048, 1280]) from checkpoint, the shape in current model is torch.Size([768, 1280]).\n\tsize mismatch for proj_out.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([768]).\n\tsize mismatch for norm_out.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([768]).\n\tsize mismatch for norm_out.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([768]).\n"
billchen
billchen8mo ago
Error(s) in loading state_dict for Resampler:\n\tsize mismatch for proj_out.weight: copying a param with shape torch.Size([2048, 1280]) from checkpoint, the shape in current model is torch.Size([768, 1280]).\n\tsize mismatch for proj_out.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([768]).\n\tsize mismatch for norm_out.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([768]).\n\tsize mismatch for norm_out.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([768]).
Want results from more Discord servers?
Add your server