aikitoria Comments - Answer Overflow

Topics

aikitoria

•Created by 0x6d6178 on 10/27/2024 in #⛅｜pods

When will H200's be available?

👀

9 replies

•Created by 0x6d6178 on 10/27/2024 in #⛅｜pods

When will H200's be available?

so what you're saying is we need a script that tries to grab one every second

9 replies

•Created by 0x6d6178 on 10/27/2024 in #⛅｜pods

When will H200's be available?

are there actually some? I've never seen it as "available"

9 replies

•Created by 0x6d6178 on 10/27/2024 in #⛅｜pods

When will H200's be available?

why is it incorrectly listed under "previous gen"?

9 replies

•Created by Volko on 4/17/2024 in #⛅｜pods

is AWQ faster than GGUF ?

you use aphrodite-engine or TensorRT-LLM (good luck!) for maximum speed on multiple GPUs

10 replies

•Created by Volko on 4/17/2024 in #⛅｜pods

is AWQ faster than GGUF ?

you use EXL2 for maximum speed on a single GPU

10 replies

•Created by Volko on 4/17/2024 in #⛅｜pods

is AWQ faster than GGUF ?

you use GGUF if you want to run on a very small GPU and have to keep some of the model on CPU only. it's for hybrid CPU/GPU inference

10 replies

•Created by Volko on 4/17/2024 in #⛅｜pods

is AWQ faster than GGUF ?

I've not used AWQ or GPTQ directly, those are older formats

10 replies

•Created by Dhruv Mullick on 4/5/2024 in #⛅｜pods

TensorRT-LLM setup

worlds least stable software

53 replies

•Created by Dhruv Mullick on 4/5/2024 in #⛅｜pods

TensorRT-LLM setup

except if I build trtllm myself the built executable doesn't work

53 replies

•Created by Dhruv Mullick on 4/5/2024 in #⛅｜pods

TensorRT-LLM setup

it's probably not that hard to add it

53 replies

•Created by Dhruv Mullick on 4/5/2024 in #⛅｜pods

TensorRT-LLM setup

but my feature request died it seems https://github.com/NVIDIA/TensorRT-LLM/issues/1154

53 replies

•Created by Dhruv Mullick on 4/5/2024 in #⛅｜pods

TensorRT-LLM setup

I definitely want min-p sampling for example

53 replies

•Created by Dhruv Mullick on 4/5/2024 in #⛅｜pods

TensorRT-LLM setup

realized it would be more work than I have time for rn

53 replies

•Created by Dhruv Mullick on 4/5/2024 in #⛅｜pods

TensorRT-LLM setup

I didn't get to the step of actually running triton

53 replies

•Created by Dhruv Mullick on 4/5/2024 in #⛅｜pods

TensorRT-LLM setup

but you have to install trtllm the same way to get the tools to build the engine locally

53 replies

•Created by Dhruv Mullick on 4/5/2024 in #⛅｜pods

TensorRT-LLM setup

then you should run it in the nvidia container image like I did there yeah

53 replies

•Created by Dhruv Mullick on 4/5/2024 in #⛅｜pods

TensorRT-LLM setup

if you don't want to run triton that should work just fine

53 replies

•Created by Dhruv Mullick on 4/5/2024 in #⛅｜pods

TensorRT-LLM setup

so I made a container off the nvidia one that runpod can launch, here https://discord.com/channels/912829806415085598/1211077936338178129/1211673633727057920

53 replies

•Created by Dhruv Mullick on 4/5/2024 in #⛅｜pods

TensorRT-LLM setup

my original goal was to run tritonserver

53 replies