nielsrolf Comments - Answer Overflow

nielsrolf

•Created by nielsrolf on 11/19/2024 in #⛅｜pods

Starting a pod with runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04 has cuda version 12.6

hm but when i ssh into a pod and check the cuda version I should see the template cuda version right? but that is the cuda version that is unpredictable for me

6 replies

RRunPod

•Created by nielsrolf on 11/19/2024 in #⛅｜pods

Starting a pod with runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04 has cuda version 12.6

Ping

6 replies

RRunPod

•Created by nielsrolf on 11/12/2024 in #⚡｜serverless

Incredibly long startup time when running 70b models via vllm

The other thing where it gets stuck on frequently is:

warnings.warn('resource_tracker: There appear to be %d '\n
/usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown\n
INFO 11-12 14:35:03 weight_utils.py:243] Using model weights format ['*.safetensors']\n
[1;36m(VllmWorkerProcess pid=229)[0;0m INFO 11-12 14:35:02 weight_utils.py:243] Using model weights format ['*.safetensors']\n
[1;36m(VllmWorkerProcess pid=229)[0;0m INFO 11-12 14:35:02 model_runner.py:1060] Starting to load model cognitivecomputations/dolphin-2.9.1-llama-3-70b...\n
INFO 11-12 14:35:02 model_runner.py:1060] Starting to load model cognitivecomputations/dolphin-2.9.1-llama-3-70b...\n

warnings.warn('resource_tracker: There appear to be %d '\n
/usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown\n
INFO 11-12 14:35:03 weight_utils.py:243] Using model weights format ['*.safetensors']\n
[1;36m(VllmWorkerProcess pid=229)[0;0m INFO 11-12 14:35:02 weight_utils.py:243] Using model weights format ['*.safetensors']\n
[1;36m(VllmWorkerProcess pid=229)[0;0m INFO 11-12 14:35:02 model_runner.py:1060] Starting to load model cognitivecomputations/dolphin-2.9.1-llama-3-70b...\n
INFO 11-12 14:35:02 model_runner.py:1060] Starting to load model cognitivecomputations/dolphin-2.9.1-llama-3-70b...\n

Yesterday I was told that this might be due to issues with the model itself, but it now happened with different models and sometimes the models later worked.

11 replies

RRunPod

•Created by nielsrolf on 11/12/2024 in #⚡｜serverless

Incredibly long startup time when running 70b models via vllm

11 replies

RRunPod

•Created by nielsrolf on 11/12/2024 in #⚡｜serverless

Incredibly long startup time when running 70b models via vllm

Ok this is what I got told when I opened a support ticket yesterday, but then I will remove that again

11 replies

RRunPod

•Created by nielsrolf on 11/12/2024 in #⚡｜serverless

Incredibly long startup time when running 70b models via vllm

Yes it would indeed be better if that wasn't necessary, but this is how the vllm-worker appears to be implemented. I could live with a long start up time because I mostlty want to do batch requests, but if you know how to deploy the vllm template with preloaded model then I'd gladly use that

11 replies

RRunPod

•Created by nielsrolf on 11/12/2024 in #⚡｜serverless

Incredibly long startup time when running 70b models via vllm

Thanks, it now says Ticket created

11 replies

RRunPod

•Created by nielsrolf on 11/12/2024 in #⚡｜serverless

Incredibly long startup time when running 70b models via vllm

I think my main issue is the same as https://github.com/runpod-workers/worker-vllm/issues/112

11 replies

Gaming

Programming