B1llstar Posts - Answer Overflow

B1llstar

•Created by B1llstar on 2/5/2024 in #⚡｜serverless

Trying to deploy Llava-Mistral using a simple Docker image, receive both success & error msgs

I am using a simple Docker script to deploy Llava-Mistral. In the system logs, it creates the container successfully. In the container logs, I get the following:

2024-02-05T01:52:10.452447184Z [FATAL tini (7)] exec docker failed: No such file or directory

2024-02-05T01:52:10.452447184Z [FATAL tini (7)] exec docker failed: No such file or directory

Script:

# Use an official Ubuntu as a base image
FROM nvidia/cuda:11.8.0-base-ubuntu20.04

# Set noninteractive environment variable to avoid prompts during package installations
ENV DEBIAN_FRONTEND=noninteractive

# Update and install git-lfs, cmake, and other required packages
RUN apt-get update && \
    apt-get install -y git-lfs python3 python3-pip cmake g++ gcc

# Install additional dependencies for server mode
RUN CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python[server]

# Create a directory for the llava files
WORKDIR /llava

# Download specific files from the repository
ADD https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/resolve/main/llava-v1.6-mistral-7b.Q4_K_M.gguf /llava/
ADD https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/resolve/main/mmproj-model-f16.gguf /llava/

# Run the server with specified parameters
CMD python3 -m llama_cpp.server --model /llava/llava-v1.6-mistral-7b.Q4_K_M.gguf --clip_model_path /llava/mmproj-model-f16.gguf --port 8081 --host 0.0.0.0 --n_gpu_layers -1 --use_mlock false

# Use an official Ubuntu as a base image
FROM nvidia/cuda:11.8.0-base-ubuntu20.04

# Set noninteractive environment variable to avoid prompts during package installations
ENV DEBIAN_FRONTEND=noninteractive

# Update and install git-lfs, cmake, and other required packages
RUN apt-get update && \
    apt-get install -y git-lfs python3 python3-pip cmake g++ gcc

# Install additional dependencies for server mode
RUN CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python[server]

# Create a directory for the llava files
WORKDIR /llava

# Download specific files from the repository
ADD https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/resolve/main/llava-v1.6-mistral-7b.Q4_K_M.gguf /llava/
ADD https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/resolve/main/mmproj-model-f16.gguf /llava/

# Run the server with specified parameters
CMD python3 -m llama_cpp.server --model /llava/llava-v1.6-mistral-7b.Q4_K_M.gguf --clip_model_path /llava/mmproj-model-f16.gguf --port 8081 --host 0.0.0.0 --n_gpu_layers -1 --use_mlock false

The system logs spam me with "start container" as well. I made sure to use absolute paths to make certain that this is pointed at the right spot. I also tested this in Docker Desktop and it worked flawlessly. My question is what am I doing wrong here? Why am I unable to get a connection to the endpoint? I'd also like to know what a typical request would look like to an exposed port on the https /run endpoint. Typically reverse proxies don't use ports so I'd like to know what the norm is for that.

35 replies

Gaming

Programming