Samuel
Samuel
RRunPod
Created by Satpal on 5/8/2024 in #⚡|serverless
Serverless Error Kept Pod Active
Pretty cool technology, however, not really usable if your whole balance gets drained overnight due to some worker restarting over and over again if for example an OOM error occurs, which will keep occurring as the gpu memory is cached across invocations..
51 replies
RRunPod
Created by Satpal on 5/8/2024 in #⚡|serverless
Serverless Error Kept Pod Active
The container image has been pulled and assigned to the worker once it stopped initializing. What flashboot does is to do some sort of hybernization to the container of the endpoint gets called frequently. I assume they persist the gpu memory to disk (or other storage), so that it can be loaded from there instead of having to fully load the model to gpu again on subsequent requests. That reduces cold start time from about 90 - 120 seconds for mixtral 8x7b on A6000 to less than two seconds for most of my requests.
51 replies
RRunPod
Created by Satpal on 5/8/2024 in #⚡|serverless
Serverless Error Kept Pod Active
Thank you very much! Is there a place to subscribe to updates on this? I already have notifications for the GitHub issue in the worker-vLLM repo toggle on. However, any option to stay in the feedback and notification loop on this would be greatly appreciated!
51 replies
RRunPod
Created by Satpal on 5/8/2024 in #⚡|serverless
Serverless Error Kept Pod Active
This is a great suggestion by @houmie - error handling for this is crucial. I am considering moving away from runpod due to this issue. Especially out of memory errors due to kv cache filling up with flashboot enabled are a problem. I think this „delete worker and create new one“ should be the default option (would handle OOM errors as well as some other runtime errors) with additional (preferably configurable) options for retry delay, exponential backoff on/off and max retries. For now however, the initial delete and recreate would be a quick and easy win, as this would solve the biggest issues I (and probably many others) have with runpod serverless right now.
51 replies
RRunPod
Created by Samuel on 3/12/2024 in #⚡|serverless
Failed Serverless Jobs drain Complete Balance
@Alpay Ariyak maybe you could clarify this? As far as I can see the issue is still open.
5 replies