ngc tritonserver container image not usable?
I tried to create a pod on a server with cuda >= 12.2 using this image: nvcr.io/nvidia/tritonserver:24.01-trtllm-python-py3
it loads up correctly, but the resulting server is not usable, cannot connect ssh (the window immediately closes after typing passphrase).
the same image works fine on servers from vast.ai, what's the issue?
9 Replies
Modify the docker image so that you install your own nginx / openssh stuff on it:
Example of me starting up the openssh server / stuff from my own template. I tend to build off the runpod base templates.
https://github.com/justinwlin/Runpod-OpenLLM-Pod-and-Serverless/blob/main/start.sh
Dependencies here:
OpenSSH / Nginx prob what you need at minimum so that you can ssh / also port forward:
https://github.com/runpod/containers/blob/af63c609f5ac84495fb0b8bc4779b17c1d4b21e0/README.md?plain=1#L23
I see, I will need to try that later
why is this necessary when the image works fine as is on vast though?
Dunno, Vast.ai and Runpod could just be setup differently. Never used vast.ai. Theoretically if your SSH to it, and you got your own public key or password based authentication going on there, it should be fine.
https://discord.com/channels/912829806415085598/1202555381126008852
Madiator has a pip package for ex. where he sets up password based ssh / I've set it up on my own containers / docker files. But I dont know your container well enough to comment on it, cause i usually build mine
Also make sure your docker image is actually keeping the container alive. It sounds like it is constantly restarting because its not being kept alive. You can add
bash -c 'sleep infinity'
to your Docker comand and see if that resolves it.
You don't need nginx etc.thanks for the advice, I cobbled together these with help from chatgpt and it seems to have produced a container that works on runpod, with the original environment intact so the triton executables can be launched from ssh:
start.sh
Dockerfile
now to continue with what I actually wanted to test..
Nice, you don't need to expose port 22 in your Dockerfile for RunPod though
I see, does it do it automatically? I like being able to use the direct connect for ssh so I can use tunnels
when using the runpod proxy it didn't seem to work properly
You can do it through the Runpod UI https://docs.runpod.io/pods/configuration/expose-ports
Expose ports | RunPod Documentation
Learn to expose your ports.
hi - is anybody here to run llama2 with tensorrt-llm and trition inference server backend?