Hello
RRunPod
•Created by tolga on 9/9/2024 in #⚡|serverless
Getting slow workers randomly
I too seem to get this issue on 4090s. Is anything being done to narrow down the issue?
7 replies
RRunPod
•Created by Hello on 9/16/2024 in #⚡|serverless
worker exited with exit code 137
I am logging memory usage at the start and before the return, it seems like memory isn't getting cleared after requests are finished. Is this expected?
Scenario: 2 requests processed by a single worker
Request 1:
- [start] Memory usage: 10203.82 MB
- [end] Memory usage: 27672.66 MB
Request 2:
- [start] Memory usage: 27672.66 MB
- [end] Memory usage: 41805.31 MB
4 replies
RRunPod
•Created by Hello on 9/15/2024 in #⚡|serverless
Speeding up loading of model weights
Thanks for such a quick response though!
7 replies
RRunPod
•Created by Hello on 9/15/2024 in #⚡|serverless
Speeding up loading of model weights
1. The weights are built into the image
2. It is defined in the global scope, outside of the main
handler
function that runpod.serverless.start
calls
Since I require multiple models, I'm not sure what other optimizations / good practices are there so I'm asking here haha.7 replies
RRunPod
•Created by Hello on 9/4/2024 in #⚡|serverless
Stuck on "loading container image from cache"
yup, I disabled EU workers and all my workers are pulling as expected. Thanks alot guys!
24 replies
RRunPod
•Created by Hello on 9/4/2024 in #⚡|serverless
Stuck on "loading container image from cache"
Thanks!
24 replies
RRunPod
•Created by Hello on 9/4/2024 in #⚡|serverless
Stuck on "loading container image from cache"
oh gosh, looks like you are right. The workers that are stuck on loading from cache are on EU.
24 replies
RRunPod
•Created by Hello on 9/4/2024 in #⚡|serverless
Stuck on "loading container image from cache"
Thanks for such a quick response! I am using a brand new tag that's why I had to increment the release version accordingly. Some of the workers are pulling the image as expected but some are just "loading container image from cache"... :/
24 replies
RRunPod
•Created by Hello on 2/16/2024 in #⚡|serverless
Safetensor safeopen OS Error device not found
I am trying to open
/runpod-volume/example.safetensor
. Note that I didn't rename runpod-volume
to something else since I assume runpod-volume
is just a folder with or without a network disk mounted.10 replies
RRunPod
•Created by Hello on 2/16/2024 in #⚡|serverless
Safetensor safeopen OS Error device not found
Just tried, same error haha.
10 replies
RRunPod
•Created by Hello on 2/16/2024 in #⚡|serverless
Safetensor safeopen OS Error device not found
I see. To be honest, I have restarted it multiple time and even managed to print out the contents of the network mounted disk. I guess I could try it without a network disk and update this post accordingly.
10 replies
RRunPod
•Created by Hello on 2/16/2024 in #⚡|serverless
Safetensor safeopen OS Error device not found
Yes I am! I have attached it to the runpod serverless instance under the advanced settings as well.
10 replies