RunPod•15mo ago

Trying to deploy Llava-Mistral using a simple Docker image, receive both success & error msgs

I am using a simple Docker script to deploy Llava-Mistral. In the system logs, it creates the container successfully. In the container logs, I get the following:

2024-02-05T01:52:10.452447184Z [FATAL tini (7)] exec docker failed: No such file or directory

2024-02-05T01:52:10.452447184Z [FATAL tini (7)] exec docker failed: No such file or directory

Script:

# Use an official Ubuntu as a base image
FROM nvidia/cuda:11.8.0-base-ubuntu20.04

# Set noninteractive environment variable to avoid prompts during package installations
ENV DEBIAN_FRONTEND=noninteractive

# Update and install git-lfs, cmake, and other required packages
RUN apt-get update && \
    apt-get install -y git-lfs python3 python3-pip cmake g++ gcc

# Install additional dependencies for server mode
RUN CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python[server]

# Create a directory for the llava files
WORKDIR /llava

# Download specific files from the repository
ADD https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/resolve/main/llava-v1.6-mistral-7b.Q4_K_M.gguf /llava/
ADD https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/resolve/main/mmproj-model-f16.gguf /llava/

# Run the server with specified parameters
CMD python3 -m llama_cpp.server --model /llava/llava-v1.6-mistral-7b.Q4_K_M.gguf --clip_model_path /llava/mmproj-model-f16.gguf --port 8081 --host 0.0.0.0 --n_gpu_layers -1 --use_mlock false

# Use an official Ubuntu as a base image
FROM nvidia/cuda:11.8.0-base-ubuntu20.04

# Set noninteractive environment variable to avoid prompts during package installations
ENV DEBIAN_FRONTEND=noninteractive

# Update and install git-lfs, cmake, and other required packages
RUN apt-get update && \
    apt-get install -y git-lfs python3 python3-pip cmake g++ gcc

# Install additional dependencies for server mode
RUN CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python[server]

# Create a directory for the llava files
WORKDIR /llava

# Download specific files from the repository
ADD https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/resolve/main/llava-v1.6-mistral-7b.Q4_K_M.gguf /llava/
ADD https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/resolve/main/mmproj-model-f16.gguf /llava/

# Run the server with specified parameters
CMD python3 -m llama_cpp.server --model /llava/llava-v1.6-mistral-7b.Q4_K_M.gguf --clip_model_path /llava/mmproj-model-f16.gguf --port 8081 --host 0.0.0.0 --n_gpu_layers -1 --use_mlock false

The system logs spam me with "start container" as well. I made sure to use absolute paths to make certain that this is pointed at the right spot. I also tested this in Docker Desktop and it worked flawlessly. My question is what am I doing wrong here? Why am I unable to get a connection to the endpoint? I'd also like to know what a typical request would look like to an exposed port on the https /run endpoint. Typically reverse proxies don't use ports so I'd like to know what the norm is for that.

22 Replies

ashleyk•15mo ago

You need to use sleep infinity to keep your container alive. I also have a LLaVA template that you can use that is working.

B1llstarOP•15mo ago

yes that'd be awesome. any tips on getting mistral in particular working? was i on the right track with my container or?

ashleyk•15mo ago

Mistral 7B is the default model in my green template

B1llstarOP•15mo ago

so where is the template regular or instruct

ashleyk•15mo ago

"LLaVA 1.6" under the "Communtiy" section of "Explore".

B1llstarOP•15mo ago

im trying to implement this on serverless though this is the serverless support section

ashleyk•15mo ago

I don't see a runpod hander in your Dockerfile

B1llstarOP•15mo ago

i'm new to runpod can you show me what i need to change?

ashleyk•15mo ago

Here are some resources for getting started with RunPod serverless: https://blog.runpod.io/serverless-create-a-basic-api/ https://www.youtube.com/@generativelabs/videos https://trapdoor.cloud/getting-started-with-runpod-serverless/

B1llstarOP•15mo ago

well you just told me what was missing, so how about just telling me directly so i don't have to sift through all that

ashleyk•15mo ago

It is not my job to hold your hand and do everything for you. I told you your handler was missing, use your brain and follow the resources I sent you otherwise I will gladly help you for $100 per hour of my time.

B1llstarOP•15mo ago

i asked for a courtesy, you respond with sass? you said in the article yourself you aren't an expert with implementing llava. your time is not worth $100 per hour

ashleyk•15mo ago

Then stuggle with it yourself

B1llstarOP•15mo ago

you are a childish man

ashleyk•15mo ago

Nope, I told you want to do but you are too lazy and expect everyone to do everything for you. That is not how life works. I offered to help for my hourly rate then you insult me, when I am one of the most experienced people on RunPod. YOU are chilidhs and a comple fucking idiot.

B1llstarOP•15mo ago

i don't know a single noteworthy person who yells their credentials when somebody upsets them imagine going into a help section and calling someone the r slur

Madiator2011•15mo ago

@ashleyk Lets chill and let me handle this. No need to make another argue 🙂 @B1llstar have you tried to put

python3 -m llama_cpp.server --model /llava/llava-v1.6-mistral-7b.Q4_K_M.gguf --clip_model_path /llava/mmproj-model-f16.gguf --port 8081 --host 0.0.0.0 --n_gpu_layers -1 --use_mlock false

python3 -m llama_cpp.server --model /llava/llava-v1.6-mistral-7b.Q4_K_M.gguf --clip_model_path /llava/mmproj-model-f16.gguf --port 8081 --host 0.0.0.0 --n_gpu_layers -1 --use_mlock false

as docker command. Also do you use network/volume storage? I would also change way you store models in image:

# Use an official NVIDIA CUDA image as a base
FROM nvidia/cuda:11.8.0-base-ubuntu20.04

# Set noninteractive environment variable
ENV DEBIAN_FRONTEND=noninteractive

# Update, install necessary packages, and clean up in a single RUN to reduce image size
RUN apt-get update && \
    apt-get install -y git-lfs python3 python3-pip cmake g++ gcc && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# Install additional Python dependencies
RUN CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python[server]

# Set work directory
WORKDIR /llava

# If direct ADD does not work due to authentication or redirection issues, replace with:
RUN apt-get update && apt-get install -y curl && \
    curl -o llava-v1.6-mistral-7b.Q4_K_M.gguf https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/resolve/main/llava-v1.6-mistral-7b.Q4_K_M.gguf && \
    curl -o mmproj-model-f16.gguf https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/resolve/main/mmproj-model-f16.gguf

# Command to run the server
CMD ["python3", "-m", "llama_cpp.server", "--model", "/llava/llava-v1.6-mistral-7b.Q4_K_M.gguf", "--clip_model_path", "/llava/mmproj-model-f16.gguf", "--port", "8081", "--host", "0.0.0.0", "--n_gpu_layers", "-1", "--use_mlock", "false"]

# Use an official NVIDIA CUDA image as a base
FROM nvidia/cuda:11.8.0-base-ubuntu20.04

# Set noninteractive environment variable
ENV DEBIAN_FRONTEND=noninteractive

# Update, install necessary packages, and clean up in a single RUN to reduce image size
RUN apt-get update && \
    apt-get install -y git-lfs python3 python3-pip cmake g++ gcc && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# Install additional Python dependencies
RUN CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python[server]

# Set work directory
WORKDIR /llava

# If direct ADD does not work due to authentication or redirection issues, replace with:
RUN apt-get update && apt-get install -y curl && \
    curl -o llava-v1.6-mistral-7b.Q4_K_M.gguf https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/resolve/main/llava-v1.6-mistral-7b.Q4_K_M.gguf && \
    curl -o mmproj-model-f16.gguf https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/resolve/main/mmproj-model-f16.gguf

# Command to run the server
CMD ["python3", "-m", "llama_cpp.server", "--model", "/llava/llava-v1.6-mistral-7b.Q4_K_M.gguf", "--clip_model_path", "/llava/mmproj-model-f16.gguf", "--port", "8081", "--host", "0.0.0.0", "--n_gpu_layers", "-1", "--use_mlock", "false"]

though note it will work on pods as for serverlles you need have handler file that will process job requests

B1llstarOP•15mo ago

nice, i will look at this today. thank you for the level-headed response

Madiator2011•15mo ago

Though like askleyk said have look at the links he send they are good examples how to start with serverless

B1llstarOP•15mo ago

i don't know if that guy represents you but it's probably not a good idea to have someone yelling obscenities like that

Madiator2011•15mo ago

ashleyk is person that creates many templates and he is always willing to help. Though not except that we are ChatGPT and we will give you working solution cause you want to.

B1llstarOP•15mo ago

i didn't quite understand that second sentence, but i think i understand what you're getting at? i honestly mainly asked for direct help because i figured the fix was a single line of code or something that i was missing in the docker file lol i'll be looking into the hander today though. i did glance at the articles and they were well-written. i can "separate the artist from their work"

Gaming

Programming

Trying to deploy Llava-Mistral using a simple Docker image, receive both success & error msgs

Did you find this page helpful?