wizardjoe
RRunPod
•Created by wizardjoe on 2/16/2024 in #⚡|serverless
How do I correctly stream results using runpod-python?
Currently, I'm doing the following:
-------
import runpod
runpod.api_key = 'yyyy'
endpoint = runpod.Endpoint('xxxx')
message = 'What is a synonym for purchase?'
run_request = endpoint.run({
"input": {
"prompt": message,
"sampling_params": {
"max_tokens": 5000,
"max_new_tokens": 2000,
"temperature": 0.7,
"repetition_penalty": 1.15,
"length_penalty": 10.0
}
}
})
for output in run_request.stream():
print(output)
-------
However, stream() times out after 10 seconds and I don't see a way to increase the timeout. Also, once it does work, it seems like it sends back everything at once, instead of a chunk at a time, unless I'm doing something wrong?
34 replies
RRunPod
•Created by wizardjoe on 1/9/2024 in #⚡|serverless
Setting up MODEL_BASE_PATH when building worker-vllm image
I'm a little confused about this parameter in setting up worker-vllm. It seems to default to /runpod-volume, which to me implies a network volume, instead of getting baked into the image, but I'm not sure. A few questions:
1) If set to "/runpod-volume", does this mean that the model will be downloaded to that path automatically, and therefore won't be a part of the image (resulting in a much smaller image)?
2) Will I therefore need to set up a network volume when creating the endpoint?
3) Does the model get downloaded every time workers are created from a cold start? If not, then will I need to "run" a worker for a given amount of time at first to download the model?
20 replies
RRunPod
•Created by wizardjoe on 1/4/2024 in #⚡|serverless
Error building worker-vllm docker image for mixtral 8x7b
I'm running the following command to build and tag a docker worker image based off of worker-vllm:
docker build -t lesterhnh/mixtral-8x7b-instruct-v0.1-runpod-serverless:1.0 --build-arg MODEL_NAME="mistralai/Mixtral-8x7B-Instruct-v0.1" --build-arg MODEL_BASE_PATH="/models" .
I'm getting the following error:
------
Dockerfile:23
--------------------
22 | # Install torch and vllm based on CUDA version
23 | >>> RUN if [[ "${WORKER_CUDA_VERSION}" == 11.8* ]]; then \
24 | >>> python3.11 -m pip install -U --force-reinstall torch==2.1.2 xformers==0.0.23.post1 --index-url https://download.pytorch.org/whl/cu118; \
25 | >>> python3.11 -m pip install -e git+https://github.com/runpod/[email protected]#egg=vllm; \
26 | >>> else \
27 | >>> python3.11 -m pip install -e git+https://github.com/runpod/vllm-fork-for-sls-worker.git#egg=vllm; \
28 | >>> fi && \
29 | >>> rm -rf /root/.cache/pip
30 |
--------------------
ERROR: failed to solve: process "/bin/bash -o pipefail -c if [[ "${WORKER_CUDA_VERSION}" == 11.8* ]]; then python3.11 -m pip install -U --force-reinstall torch==2.1.2 xformers==0.0.23.post1 --index-url https://download.pytorch.org/whl/cu118; python3.11 -m pip install -e git+https://github.com/runpod/[email protected]#egg=vllm; else python3.11 -m pip install -e git+https://github.com/runpod/vllm-fork-for-sls-worker.git#egg=vllm; fi && rm -rf /root/.cache/pip" did not complete successfully: exit code: 1
69 replies