wizardjoe
wizardjoe
RRunPod
Created by wizardjoe on 2/16/2024 in #⚡|serverless
How do I correctly stream results using runpod-python?
Currently, I'm doing the following: ------- import runpod runpod.api_key = 'yyyy' endpoint = runpod.Endpoint('xxxx') message = 'What is a synonym for purchase?' run_request = endpoint.run({ "input": { "prompt": message, "sampling_params": { "max_tokens": 5000, "max_new_tokens": 2000, "temperature": 0.7, "repetition_penalty": 1.15, "length_penalty": 10.0 } } }) for output in run_request.stream(): print(output) ------- However, stream() times out after 10 seconds and I don't see a way to increase the timeout. Also, once it does work, it seems like it sends back everything at once, instead of a chunk at a time, unless I'm doing something wrong?
34 replies
RRunPod
Created by wizardjoe on 1/9/2024 in #⚡|serverless
Setting up MODEL_BASE_PATH when building worker-vllm image
I'm a little confused about this parameter in setting up worker-vllm. It seems to default to /runpod-volume, which to me implies a network volume, instead of getting baked into the image, but I'm not sure. A few questions: 1) If set to "/runpod-volume", does this mean that the model will be downloaded to that path automatically, and therefore won't be a part of the image (resulting in a much smaller image)? 2) Will I therefore need to set up a network volume when creating the endpoint? 3) Does the model get downloaded every time workers are created from a cold start? If not, then will I need to "run" a worker for a given amount of time at first to download the model?
20 replies
RRunPod
Created by wizardjoe on 1/4/2024 in #⚡|serverless
Error building worker-vllm docker image for mixtral 8x7b
I'm running the following command to build and tag a docker worker image based off of worker-vllm: docker build -t lesterhnh/mixtral-8x7b-instruct-v0.1-runpod-serverless:1.0 --build-arg MODEL_NAME="mistralai/Mixtral-8x7B-Instruct-v0.1" --build-arg MODEL_BASE_PATH="/models" . I'm getting the following error: ------ Dockerfile:23 -------------------- 22 | # Install torch and vllm based on CUDA version 23 | >>> RUN if [[ "${WORKER_CUDA_VERSION}" == 11.8* ]]; then \ 24 | >>> python3.11 -m pip install -U --force-reinstall torch==2.1.2 xformers==0.0.23.post1 --index-url https://download.pytorch.org/whl/cu118; \ 25 | >>> python3.11 -m pip install -e git+https://github.com/runpod/vllm-fork-for-sls-worker.git@cuda-11.8#egg=vllm; \ 26 | >>> else \ 27 | >>> python3.11 -m pip install -e git+https://github.com/runpod/vllm-fork-for-sls-worker.git#egg=vllm; \ 28 | >>> fi && \ 29 | >>> rm -rf /root/.cache/pip 30 | -------------------- ERROR: failed to solve: process "/bin/bash -o pipefail -c if [[ "${WORKER_CUDA_VERSION}" == 11.8* ]]; then python3.11 -m pip install -U --force-reinstall torch==2.1.2 xformers==0.0.23.post1 --index-url https://download.pytorch.org/whl/cu118; python3.11 -m pip install -e git+https://github.com/runpod/vllm-fork-for-sls-worker.git@cuda-11.8#egg=vllm; else python3.11 -m pip install -e git+https://github.com/runpod/vllm-fork-for-sls-worker.git#egg=vllm; fi && rm -rf /root/.cache/pip" did not complete successfully: exit code: 1
69 replies