blabbercrab Comments - Answer Overflow

Topics

blabbercrab

•Created by indiejoseph on 3/4/2024 in #⛅｜pods

OSError: [Errno 5] Input/output error

?

18 replies

•Created by indiejoseph on 3/4/2024 in #⛅｜pods

OSError: [Errno 5] Input/output error

which can be a bad idea

18 replies

•Created by indiejoseph on 3/4/2024 in #⛅｜pods

OSError: [Errno 5] Input/output error

is there a general reason as to why this happens? I output all logs to an output.log file

18 replies

•Created by indiejoseph on 3/4/2024 in #⛅｜pods

OSError: [Errno 5] Input/output error

Same thing repeatedly happens to me every few days.

18 replies

•Created by blabbercrab on 7/5/2024 in #⚡｜serverless

Serverless is timing out before full load

this way any user who requests a specific lora it only takes extra time once

39 replies

•Created by blabbercrab on 7/5/2024 in #⚡｜serverless

Serverless is timing out before full load

I'm loading the lora once at request, and then not unloading it for a new request, It always checks if the lora is already loaded

39 replies

•Created by blabbercrab on 7/5/2024 in #⚡｜serverless

Serverless is timing out before full load

Anysy i came up with a different solution to my problem so it's all good now

39 replies

•Created by blabbercrab on 7/5/2024 in #⚡｜serverless

Serverless is timing out before full load

And that keeps continuing

39 replies

•Created by blabbercrab on 7/5/2024 in #⚡｜serverless

Serverless is timing out before full load

What happens is before it loads all 30 loras there's some sort of time out which restarts the worker and it retries loading all of them back in again

39 replies

•Created by blabbercrab on 7/5/2024 in #⚡｜serverless

Serverless is timing out before full load

I don't mind it loading for however long it wants but I'd like for it to fully load

39 replies

•Created by blabbercrab on 7/5/2024 in #⚡｜serverless

Serverless is timing out before full load

it dies before being able to load everything into ram

39 replies

•Created by blabbercrab on 7/5/2024 in #⚡｜serverless

Serverless is timing out before full load

@Charixfox

39 replies

•Created by blabbercrab on 7/5/2024 in #⚡｜serverless

Serverless is timing out before full load

the files are already on the docker container

39 replies

•Created by blabbercrab on 7/7/2024 in #⚡｜serverless

Trying to load a huge model into serverless

https://tenor.com/view/gg-gojo-usb-usb-gojo-rip-bozo-gif-5949329359688209853

16 replies

•Created by blabbercrab on 7/7/2024 in #⚡｜serverless

Trying to load a huge model into serverless

I wasnt able to load it using one 80gb gpu, isnt 2 x 80gb excessive for the model size?

16 replies

•Created by blabbercrab on 7/7/2024 in #⚡｜serverless

Trying to load a huge model into serverless

2024-07-07T10:13:51.888246699Z (RayWorkerWrapper pid=14238) INFO 07-07 10:13:50 pynccl_utils.py:43] vLLM is using nccl==2.17.1 2024-07-07T10:13:51.888281517Z INFO 07-07 10:13:51 utils.py:118] generating GPU P2P access cache for in /root/.config/vllm/gpu_p2p_access_cache_for_0,1.json 2024-07-07T10:13:51.889113795Z INFO 07-07 10:13:51 utils.py:132] reading GPU P2P access cache from /root/.config/vllm/gpu_p2p_access_cache_for_0,1.json 2024-07-07T10:13:51.889199350Z WARNING 07-07 10:13:51 custom_all_reduce.py:74] Custom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed. To silence this warning, specify disable_custom_all_reduce=True explicitly. 2024-07-07T10:13:52.655130972Z (RayWorkerWrapper pid=14238) INFO 07-07 10:13:51 utils.py:132] reading GPU P2P access cache from /root/.config/vllm/gpu_p2p_access_cache_for_0,1.json 2024-07-07T10:13:52.655172182Z (RayWorkerWrapper pid=14238) WARNING 07-07 10:13:51 custom_all_reduce.py:74] Custom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed. To silence this warning, specify disable_custom_all_reduce=True explicitly. 2024-07-07T10:13:52.655176579Z INFO 07-07 10:13:52 weight_utils.py:200] Using model weights format ['*.safetensors']

16 replies

•Created by blabbercrab on 7/7/2024 in #⚡｜serverless

Trying to load a huge model into serverless

2024-07-07T10:13:37.060080427Z INFO 07-07 10:13:37 ray_utils.py:96] Total CPUs: 252 2024-07-07T10:13:37.060112418Z INFO 07-07 10:13:37 ray_utils.py:97] Using 252 CPUs 2024-07-07T10:13:39.223150657Z 2024-07-07 10:13:39,222 INFO worker.py:1753 -- Started a local Ray instance. 2024-07-07T10:13:42.909013372Z INFO 07-07 10:13:42 llm_engine.py:100] Initializing an LLM engine (v0.4.2) with config: model='cognitivecomputations/dolphin-2.9.2-qwen2-72b', speculative_config=None, tokenizer='cognitivecomputations/dolphin-2.9.2-qwen2-72b', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=131072, download_dir='/runpod-volume/huggingface-cache/hub', load_format=LoadFormat.AUTO, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=cognitivecomputations/dolphin-2.9.2-qwen2-72b) 2024-07-07T10:13:43.234774592Z Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 2024-07-07T10:13:48.090819086Z INFO 07-07 10:13:48 utils.py:628] Found nccl from environment variable VLLM_NCCL_SO_PATH=/usr/lib/x86_64-linux-gnu/libnccl.so.2 2024-07-07T10:13:49.634162208Z (RayWorkerWrapper pid=14238) INFO 07-07 10:13:48 utils.py:628] Found nccl from environment variable VLLM_NCCL_SO_PATH=/usr/lib/x86_64-linux-gnu/libnccl.so.2 2024-07-07T10:13:49.634349607Z INFO 07-07 10:13:49 selector.py:27] Using FlashAttention-2 backend. 2024-07-07T10:13:50.971622090Z (RayWorkerWrapper pid=14238) INFO 07-07 10:13:49 selector.py:27] Using FlashAttention-2 backend. 2024-07-07T10:13:50.971661235Z INFO 07-07 10:13:50 pynccl_utils.py:43] vLLM is using nccl==2.17.1

16 replies

•Created by AMooMoo on 7/6/2024 in #⚡｜serverless

Question about Network Volumes

any idea what might be the issue at https://discord.com/channels/912829806415085598/1258893524175159388

12 replies

•Created by AMooMoo on 7/6/2024 in #⚡｜serverless

Question about Network Volumes

@nerdylive

12 replies

•Created by AMooMoo on 7/6/2024 in #⚡｜serverless

Question about Network Volumes

what about if it's 24gb total

12 replies