RunPod•13mo ago

TensorRT-LLM setup

Has anyone been able to successfully install tensorrt_llm? I'm trying with pip, but I'm running into mpi related errors:

Cannot open configuration file /build-result/hpcx-v2.16-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.18-x86_64/ompi/share/openmpi/mpicc-wrapper-data.txt
Error parsing data file mpicc: Not found

I've tried a few templates (runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04; nvcr.io/nvidia/tritonserver:24.03-trtllm-python-py3) on A100 and on a 4090. Cuda 12.2

30 Replies

Madiator2011•13mo ago

tried:

apt update
apt install libopenmpi-dev

apt update
apt install libopenmpi-dev

Dhruv MullickOP•13mo ago

Doesn't work unfortunately Tried uninstalling and reinstalling as well. But doesn't help

Madiator2011•13mo ago

apt-get install libopenmpi-dev openmpi-bin

Dhruv MullickOP•13mo ago

Yeah, tried them too. I've narrowed down the problem to building mpi4py which gets built from tensorrt_llm

Dhruv MullickOP•13mo ago

message.txt

Madiator2011•13mo ago

are you running it in venv or normal?

Dhruv MullickOP•13mo ago

Normally Let me try in venv

Madiator2011•13mo ago

mpicc --version do you get output?

Dhruv MullickOP•13mo ago

Same error: root@afabf97a0d57:/workspace# mpicc --version Cannot open configuration file /build-result/hpcx-v2.16-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.18-x86_64/ompi/share/openmpi/mpicc-wrapper-data.txt Error parsing data file mpicc: Not found

Madiator2011•13mo ago

try with venv

apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev
pip3 install tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com

apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev
pip3 install tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com

Dhruv MullickOP•13mo ago

Same error 😅

Dhruv MullickOP•13mo ago

(Bottom part -> )

message.txt

Madiator2011•13mo ago

you will probably need to ask on their repo

Dhruv MullickOP•13mo ago

Okay, thank you

Dhruv MullickOP•13mo ago

@Papa Madiator , are we doing anything MPI related while spawning the container on RunPod? https://github.com/mpi4py/mpi4py/issues/483 Per this, on a clean container from the image I shared, the mpi issue isn't there

GitHub

pip installation fails with "Cannot open configuration file" · Iss...

Hello, I'm trying to install mpi4py (dependency of tensorrt_llm) using pip, but I get the error: Cannot open configuration file /build-result/hpcx-v2.16-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nc...

Madiator2011•13mo ago

@Dhruv Mullick I mean runpod does not change files in docker container

Dhruv MullickOP•13mo ago

https://discord.com/channels/912829806415085598/948767517332107274/1225899896532504596 With reference to the new error here (reached this point thanks to @aikitoria) Can we increase the limit? I don't have permissions to do so...

aikitoria•13mo ago

you should be able to stop openmpi from trying to increase it idk why the variable I posted doesn't work for you

Madiator2011•13mo ago

It's not possible as containers are not provilaged

Dhruv MullickOP•13mo ago

@aikitoria , did you do a apt install libopenmpi-dev as well if you remember? I'm not sure if we should be doing that based on the github link I shared above But if I don't, then I get a different set of errors like: /usr/bin/ld: cannot find -lvt.mpi: No such file or directory /usr/bin/ld: cannot find -lvt-hyb: No such file or directory /usr/bin/ld: cannot find -lvt.ompi: No such file or directory _configtest.c:2:10: fatal error: mpi.h: No such file or directory

aikitoria•13mo ago

https://www.reddit.com/r/LocalLLaMA/comments/1b4iy16/comment/kt2nuee/ I ended up not having any time to mess more with tensorrt-llm my original goal was to run tritonserver

Dhruv MullickOP•13mo ago

Worked!

aikitoria•13mo ago

so I made a container off the nvidia one that runpod can launch, here https://discord.com/channels/912829806415085598/1211077936338178129/1211673633727057920

Dhruv MullickOP•13mo ago

Thanks a lot!! I think the apt-get command along with the exports you shared together worked out for me I'm on the runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04 template. Will have to see if it works with others too

aikitoria•13mo ago

if you don't want to run triton that should work just fine

Dhruv MullickOP•13mo ago

Well. Triton is the goal Will go through your post 😄

aikitoria•13mo ago

then you should run it in the nvidia container image like I did there yeah but you have to install trtllm the same way to get the tools to build the engine locally I didn't get to the step of actually running triton realized it would be more work than I have time for rn I definitely want min-p sampling for example but my feature request died it seems https://github.com/NVIDIA/TensorRT-LLM/issues/1154 it's probably not that hard to add it except if I build trtllm myself the built executable doesn't work worlds least stable software

Dhruv MullickOP•13mo ago

Does seem that way! Thanks for helping out here 😄

Geri•11mo ago

hi guys - is someone using torch tensorrt?

Madiator2011•11mo ago

What are requirements might take time and if I get some of it can try build one.

Gaming

Programming

TensorRT-LLM setup

Did you find this page helpful?