blabbercrab
RRunPod
•Created by blabbercrab on 7/5/2024 in #⚡|serverless
Serverless is timing out before full load
this way any user who requests a specific lora it only takes extra time once
39 replies
RRunPod
•Created by blabbercrab on 7/5/2024 in #⚡|serverless
Serverless is timing out before full load
I'm loading the lora once at request, and then not unloading it for a new request, It always checks if the lora is already loaded
39 replies
RRunPod
•Created by blabbercrab on 7/5/2024 in #⚡|serverless
Serverless is timing out before full load
Anysy i came up with a different solution to my problem so it's all good now
39 replies
RRunPod
•Created by blabbercrab on 7/5/2024 in #⚡|serverless
Serverless is timing out before full load
And that keeps continuing
39 replies
RRunPod
•Created by blabbercrab on 7/5/2024 in #⚡|serverless
Serverless is timing out before full load
What happens is before it loads all 30 loras there's some sort of time out which restarts the worker and it retries loading all of them back in again
39 replies
RRunPod
•Created by blabbercrab on 7/5/2024 in #⚡|serverless
Serverless is timing out before full load
I don't mind it loading for however long it wants but I'd like for it to fully load
39 replies
RRunPod
•Created by blabbercrab on 7/5/2024 in #⚡|serverless
Serverless is timing out before full load
it dies before being able to load everything into ram
39 replies
RRunPod
•Created by blabbercrab on 7/5/2024 in #⚡|serverless
Serverless is timing out before full load
@Charixfox
39 replies
RRunPod
•Created by blabbercrab on 7/5/2024 in #⚡|serverless
Serverless is timing out before full load
the files are already on the docker container
39 replies
RRunPod
•Created by blabbercrab on 7/7/2024 in #⚡|serverless
Trying to load a huge model into serverless
16 replies
RRunPod
•Created by blabbercrab on 7/7/2024 in #⚡|serverless
Trying to load a huge model into serverless
I wasnt able to load it using one 80gb gpu, isnt 2 x 80gb excessive for the model size?
16 replies
RRunPod
•Created by blabbercrab on 7/7/2024 in #⚡|serverless
Trying to load a huge model into serverless
2024-07-07T10:13:51.888246699Z (RayWorkerWrapper pid=14238) INFO 07-07 10:13:50 pynccl_utils.py:43] vLLM is using nccl==2.17.1
2024-07-07T10:13:51.888281517Z INFO 07-07 10:13:51 utils.py:118] generating GPU P2P access cache for in /root/.config/vllm/gpu_p2p_access_cache_for_0,1.json
2024-07-07T10:13:51.889113795Z INFO 07-07 10:13:51 utils.py:132] reading GPU P2P access cache from /root/.config/vllm/gpu_p2p_access_cache_for_0,1.json
2024-07-07T10:13:51.889199350Z WARNING 07-07 10:13:51 custom_all_reduce.py:74] Custom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed. To silence this warning, specify disable_custom_all_reduce=True explicitly.
2024-07-07T10:13:52.655130972Z (RayWorkerWrapper pid=14238) INFO 07-07 10:13:51 utils.py:132] reading GPU P2P access cache from /root/.config/vllm/gpu_p2p_access_cache_for_0,1.json
2024-07-07T10:13:52.655172182Z (RayWorkerWrapper pid=14238) WARNING 07-07 10:13:51 custom_all_reduce.py:74] Custom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed. To silence this warning, specify disable_custom_all_reduce=True explicitly.
2024-07-07T10:13:52.655176579Z INFO 07-07 10:13:52 weight_utils.py:200] Using model weights format ['*.safetensors']
16 replies
RRunPod
•Created by blabbercrab on 7/7/2024 in #⚡|serverless
Trying to load a huge model into serverless
2024-07-07T10:13:37.060080427Z INFO 07-07 10:13:37 ray_utils.py:96] Total CPUs: 252
2024-07-07T10:13:37.060112418Z INFO 07-07 10:13:37 ray_utils.py:97] Using 252 CPUs
2024-07-07T10:13:39.223150657Z 2024-07-07 10:13:39,222 INFO worker.py:1753 -- Started a local Ray instance.
2024-07-07T10:13:42.909013372Z INFO 07-07 10:13:42 llm_engine.py:100] Initializing an LLM engine (v0.4.2) with config: model='cognitivecomputations/dolphin-2.9.2-qwen2-72b', speculative_config=None, tokenizer='cognitivecomputations/dolphin-2.9.2-qwen2-72b', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=131072, download_dir='/runpod-volume/huggingface-cache/hub', load_format=LoadFormat.AUTO, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=cognitivecomputations/dolphin-2.9.2-qwen2-72b)
2024-07-07T10:13:43.234774592Z Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
2024-07-07T10:13:48.090819086Z INFO 07-07 10:13:48 utils.py:628] Found nccl from environment variable VLLM_NCCL_SO_PATH=/usr/lib/x86_64-linux-gnu/libnccl.so.2
2024-07-07T10:13:49.634162208Z (RayWorkerWrapper pid=14238) INFO 07-07 10:13:48 utils.py:628] Found nccl from environment variable VLLM_NCCL_SO_PATH=/usr/lib/x86_64-linux-gnu/libnccl.so.2
2024-07-07T10:13:49.634349607Z INFO 07-07 10:13:49 selector.py:27] Using FlashAttention-2 backend.
2024-07-07T10:13:50.971622090Z (RayWorkerWrapper pid=14238) INFO 07-07 10:13:49 selector.py:27] Using FlashAttention-2 backend.
2024-07-07T10:13:50.971661235Z INFO 07-07 10:13:50 pynccl_utils.py:43] vLLM is using nccl==2.17.1
16 replies
RRunPod
•Created by AMooMoo on 7/6/2024 in #⚡|serverless
Question about Network Volumes
any idea what might be the issue at https://discord.com/channels/912829806415085598/1258893524175159388
12 replies
RRunPod
•Created by AMooMoo on 7/6/2024 in #⚡|serverless
Question about Network Volumes
@nerdylive
12 replies
RRunPod
•Created by AMooMoo on 7/6/2024 in #⚡|serverless
Question about Network Volumes
what about if it's 24gb total
12 replies
RRunPod
•Created by blabbercrab on 7/5/2024 in #⚡|serverless
Serverless is timing out before full load
I dont use all those loras at once, but rather I load them all, then use set_adapter to activate only the ones I require, this way I dont have to load in and load out every lora during every request
39 replies
RRunPod
•Created by blabbercrab on 7/5/2024 in #⚡|serverless
Serverless is timing out before full load
It's for our image generation service
39 replies
RRunPod
•Created by blabbercrab on 7/5/2024 in #⚡|serverless
Serverless is timing out before full load
I can't extend this time period?
39 replies
RRunPod
•Created by blabbercrab on 7/5/2024 in #⚡|serverless
Serverless is timing out before full load
if anyone has a clue on how to fix what's happening please let me know
39 replies