How to keep worker memory after completing request?
Hi! I'm running serverless for model GAN. I want preload model in memory at the first request and reuse it on the next req without load model again (in case container/pod still remain). When I sent 2nd req, Idle had "clean up worker" and load model again.
How could I prevent "clean up worker" and keep model in memory? (in case container was not removed)
5 Replies
Load the model before you call
runpod.serverless.start()
and enable FlashBoot on your endpoint.I changed as your guide and I could use preload model now. But sometime, Worker was cleaned up when I sent second request. I check log and got an error about serverless/worker_loop:
File "/usr/local/anaconda3/envs/deepfacelab/lib/python3.7/site-packages/runpod/serverless/init.py", line 24, in start
asyncio.run(work_loop.start_worker(config))
File "/usr/local/anaconda3/envs/deepfacelab/lib/python3.7/asyncio/runners.py", line 43, in run
File "/usr/local/anaconda3/envs/deepfacelab/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
return future.result()
runpod.serverless.start({"handler": handler}
File "/usr/local/anaconda3/envs/deepfacelab/lib/python3.7/site-packages/runpod/serverless/work_loop.py", line 36, in start_worker
if job["input"] is None:
KeyError: 'input'
Is is a bug of package runpod-python, isn't it? Because my request has "input" field.
Which version of the SDK are you using? I haven't had any issues like this.
The worker will be cleaned on 2nd and subsequent requests if you aren't sending a constant flow of requests. FlashBoot is only benecicial if you send a constant flow of requests to your endpoint.
I'm using SDK 0.9.9. Cause my project requires python 3.7.
Why does your project require such an ancient version of Python? You are going to run into a world of pain using such an ancient version of the SDK.