aikitoria
aikitoria
RRunPod
Created by Volko on 4/17/2024 in #⛅|pods
is AWQ faster than GGUF ?
you use aphrodite-engine or TensorRT-LLM (good luck!) for maximum speed on multiple GPUs
9 replies
RRunPod
Created by Volko on 4/17/2024 in #⛅|pods
is AWQ faster than GGUF ?
you use EXL2 for maximum speed on a single GPU
9 replies
RRunPod
Created by Volko on 4/17/2024 in #⛅|pods
is AWQ faster than GGUF ?
you use GGUF if you want to run on a very small GPU and have to keep some of the model on CPU only. it's for hybrid CPU/GPU inference
9 replies
RRunPod
Created by Volko on 4/17/2024 in #⛅|pods
is AWQ faster than GGUF ?
I've not used AWQ or GPTQ directly, those are older formats
9 replies
RRunPod
Created by Dhruv Mullick on 4/5/2024 in #⛅|pods
TensorRT-LLM setup
worlds least stable software
53 replies
RRunPod
Created by Dhruv Mullick on 4/5/2024 in #⛅|pods
TensorRT-LLM setup
except if I build trtllm myself the built executable doesn't work
53 replies
RRunPod
Created by Dhruv Mullick on 4/5/2024 in #⛅|pods
TensorRT-LLM setup
it's probably not that hard to add it
53 replies
RRunPod
Created by Dhruv Mullick on 4/5/2024 in #⛅|pods
TensorRT-LLM setup
but my feature request died it seems https://github.com/NVIDIA/TensorRT-LLM/issues/1154
53 replies
RRunPod
Created by Dhruv Mullick on 4/5/2024 in #⛅|pods
TensorRT-LLM setup
I definitely want min-p sampling for example
53 replies
RRunPod
Created by Dhruv Mullick on 4/5/2024 in #⛅|pods
TensorRT-LLM setup
realized it would be more work than I have time for rn
53 replies
RRunPod
Created by Dhruv Mullick on 4/5/2024 in #⛅|pods
TensorRT-LLM setup
I didn't get to the step of actually running triton
53 replies
RRunPod
Created by Dhruv Mullick on 4/5/2024 in #⛅|pods
TensorRT-LLM setup
but you have to install trtllm the same way to get the tools to build the engine locally
53 replies
RRunPod
Created by Dhruv Mullick on 4/5/2024 in #⛅|pods
TensorRT-LLM setup
then you should run it in the nvidia container image like I did there yeah
53 replies
RRunPod
Created by Dhruv Mullick on 4/5/2024 in #⛅|pods
TensorRT-LLM setup
if you don't want to run triton that should work just fine
53 replies
RRunPod
Created by Dhruv Mullick on 4/5/2024 in #⛅|pods
TensorRT-LLM setup
so I made a container off the nvidia one that runpod can launch, here https://discord.com/channels/912829806415085598/1211077936338178129/1211673633727057920
53 replies
RRunPod
Created by Dhruv Mullick on 4/5/2024 in #⛅|pods
TensorRT-LLM setup
my original goal was to run tritonserver
53 replies
RRunPod
Created by Dhruv Mullick on 4/5/2024 in #⛅|pods
TensorRT-LLM setup
I ended up not having any time to mess more with tensorrt-llm
53 replies
RRunPod
Created by Dhruv Mullick on 4/5/2024 in #⛅|pods
TensorRT-LLM setup
53 replies
RRunPod
Created by Dhruv Mullick on 4/5/2024 in #⛅|pods
TensorRT-LLM setup
idk why the variable I posted doesn't work for you
53 replies
RRunPod
Created by Dhruv Mullick on 4/5/2024 in #⛅|pods
TensorRT-LLM setup
you should be able to stop openmpi from trying to increase it
53 replies