nielsrolf
nielsrolf
RRunPod
Created by nielsrolf on 11/12/2024 in #⚡|serverless
Incredibly long startup time when running 70b models via vllm
The other thing where it gets stuck on frequently is:
warnings.warn('resource_tracker: There appear to be %d '\n
/usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown\n
INFO 11-12 14:35:03 weight_utils.py:243] Using model weights format ['*.safetensors']\n
(VllmWorkerProcess pid=229) INFO 11-12 14:35:02 weight_utils.py:243] Using model weights format ['*.safetensors']\n
(VllmWorkerProcess pid=229) INFO 11-12 14:35:02 model_runner.py:1060] Starting to load model cognitivecomputations/dolphin-2.9.1-llama-3-70b...\n
INFO 11-12 14:35:02 model_runner.py:1060] Starting to load model cognitivecomputations/dolphin-2.9.1-llama-3-70b...\n
warnings.warn('resource_tracker: There appear to be %d '\n
/usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown\n
INFO 11-12 14:35:03 weight_utils.py:243] Using model weights format ['*.safetensors']\n
(VllmWorkerProcess pid=229) INFO 11-12 14:35:02 weight_utils.py:243] Using model weights format ['*.safetensors']\n
(VllmWorkerProcess pid=229) INFO 11-12 14:35:02 model_runner.py:1060] Starting to load model cognitivecomputations/dolphin-2.9.1-llama-3-70b...\n
INFO 11-12 14:35:02 model_runner.py:1060] Starting to load model cognitivecomputations/dolphin-2.9.1-llama-3-70b...\n
Yesterday I was told that this might be due to issues with the model itself, but it now happened with different models and sometimes the models later worked.
11 replies
RRunPod
Created by nielsrolf on 11/12/2024 in #⚡|serverless
Incredibly long startup time when running 70b models via vllm
11 replies
RRunPod
Created by nielsrolf on 11/12/2024 in #⚡|serverless
Incredibly long startup time when running 70b models via vllm
Ok this is what I got told when I opened a support ticket yesterday, but then I will remove that again
11 replies
RRunPod
Created by nielsrolf on 11/12/2024 in #⚡|serverless
Incredibly long startup time when running 70b models via vllm
Yes it would indeed be better if that wasn't necessary, but this is how the vllm-worker appears to be implemented. I could live with a long start up time because I mostlty want to do batch requests, but if you know how to deploy the vllm template with preloaded model then I'd gladly use that
11 replies
RRunPod
Created by nielsrolf on 11/12/2024 in #⚡|serverless
Incredibly long startup time when running 70b models via vllm
Thanks, it now says Ticket created
11 replies
RRunPod
Created by nielsrolf on 11/12/2024 in #⚡|serverless
Incredibly long startup time when running 70b models via vllm
11 replies