Thibaud
Thibaud
RRunPod
Created by Thibaud on 8/20/2024 in #⚡|serverless
SGLang
SGLang works very well in pod but impossible to run in serverless. the api route stay => error 404 i use the exact same config (docker, command line, port) in pod and serverless.
120 replies
RRunPod
Created by Thibaud on 8/13/2024 in #⛅|pods
vllm seems not use GPU
No description
17 replies
RRunPod
Created by Thibaud on 8/8/2024 in #⚡|serverless
can't run 70b
any tips to run a 70b model, for example: mlabonne/Llama-3.1-70B-Instruct-lorablated i tried that: config 80GB GPU 2GPUs / Worker container disk: 500 gb env var: MAX_MODEL_LEN 15000* MODEL_NAME mlabonne/Llama-3.1-70B-Instruct-lorablated but it doesn't work without MAX_MODEL_LEN 15000, i got The model's max seq len (131072) is larger than the maximum number of tokens that can be stored in KV cache (18368). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine. " 2024-08-08T12:44:26Z 4f4fb700ef54 Extracting [==================================================>] 32B/32B 2024-08-08T12:44:26Z 4f4fb700ef54 Extracting [==================================================>] 32B/32B 2024-08-08T12:44:26Z 4f4fb700ef54 Pull complete 2024-08-08T12:44:26Z Digest: sha256:44f3a3d209d0df623295065203da969e69f57fe0b8b73520e9bef47fb9d33c70 2024-08-08T12:44:26Z Status: Downloaded newer image for runpod/worker-v1-vllm:stable-cuda12.1.0 2024-08-08T12:44:26Z worker is ready 2024-08-08T12:44:38Z create pod network 2024-08-08T12:44:38Z create container runpod/worker-v1-vllm:stable-cuda12.1.0 2024-08-08T12:44:38Z stable-cuda12.1.0 Pulling from runpod/worker-v1-vllm 2024-08-08T12:44:38Z Digest: sha256:44f3a3d209d0df623295065203da969e69f57fe0b8b73520e9bef47fb9d33c70 2024-08-08T12:44:38Z Status: Image is up to date for runpod/worker-v1-vllm:stable-cuda12.1.0 2024-08-08T12:44:38Z worker is ready 2024-08-08T12:44:39Z start container 2024-08-08T12:48:14Z start container and nothing after
75 replies