TensorRT-LLM setup
Has anyone been able to successfully install tensorrt_llm?
I'm trying with pip, but I'm running into mpi related errors:
Cannot open configuration file /build-result/hpcx-v2.16-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.18-x86_64/ompi/share/openmpi/mpicc-wrapper-data.txt
Error parsing data file mpicc: Not found
I've tried a few templates (runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04; nvcr.io/nvidia/tritonserver:24.03-trtllm-python-py3) on A100 and on a 4090.
Cuda 12.230 Replies
tried:
Doesn't work unfortunately
Tried uninstalling and reinstalling as well. But doesn't help
apt-get install libopenmpi-dev openmpi-bin
Yeah, tried them too. I've narrowed down the problem to building mpi4py which gets built from tensorrt_llm
are you running it in venv or normal?
Normally
Let me try in venv
mpicc --version
do you get output?
Same error:
root@afabf97a0d57:/workspace# mpicc --version
Cannot open configuration file /build-result/hpcx-v2.16-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.18-x86_64/ompi/share/openmpi/mpicc-wrapper-data.txt
Error parsing data file mpicc: Not found
try with venv
Same error š
(Bottom part -> )
you will probably need to ask on their repo
Okay, thank you
@Papa Madiator , are we doing anything MPI related while spawning the container on RunPod?
https://github.com/mpi4py/mpi4py/issues/483
Per this, on a clean container from the image I shared, the mpi issue isn't there
GitHub
pip installation fails with "Cannot open configuration file" Ā· Iss...
Hello, I'm trying to install mpi4py (dependency of tensorrt_llm) using pip, but I get the error: Cannot open configuration file /build-result/hpcx-v2.16-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nc...
@Dhruv Mullick I mean runpod does not change files in docker container
https://discord.com/channels/912829806415085598/948767517332107274/1225899896532504596
With reference to the new error here (reached this point thanks to @aikitoria)
Can we increase the limit? I don't have permissions to do so...
you should be able to stop openmpi from trying to increase it
idk why the variable I posted doesn't work for you
It's not possible as containers are not provilaged
@aikitoria , did you do a
apt install libopenmpi-dev
as well if you remember? I'm not sure if we should be doing that based on the github link I shared above
But if I don't, then I get a different set of errors like:
/usr/bin/ld: cannot find -lvt.mpi: No such file or directory
/usr/bin/ld: cannot find -lvt-hyb: No such file or directory
/usr/bin/ld: cannot find -lvt.ompi: No such file or directory
_configtest.c:2:10: fatal error: mpi.h: No such file or directory
https://www.reddit.com/r/LocalLLaMA/comments/1b4iy16/comment/kt2nuee/
I ended up not having any time to mess more with tensorrt-llm
my original goal was to run tritonserver
Worked!
so I made a container off the nvidia one that runpod can launch, here https://discord.com/channels/912829806415085598/1211077936338178129/1211673633727057920
Thanks a lot!! I think the apt-get command along with the exports you shared together worked out for me
I'm on the runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04 template. Will have to see if it works with others too
if you don't want to run triton that should work just fine
Well. Triton is the goal
Will go through your post
š
then you should run it in the nvidia container image like I did there yeah
but you have to install trtllm the same way to get the tools to build the engine locally
I didn't get to the step of actually running triton
realized it would be more work than I have time for rn
I definitely want min-p sampling for example
but my feature request died it seems https://github.com/NVIDIA/TensorRT-LLM/issues/1154
it's probably not that hard to add it
except if I build trtllm myself the built executable doesn't work
worlds least stable software
Does seem that way!
Thanks for helping out here š
hi guys - is someone using torch tensorrt?
What are requirements might take time and if I get some of it can try build one.