RunPod•9mo ago

GPU for 13B language model

Just wanted to get your recommendations on GPU choice for running a 13B language model with a quantization in AWQ or GPTQ? Workload would be around 200-300 requests / hour. I tried a 48 GB A6000 with pretty good results but I was wondering if you think 24 GB GPU could be up to the task?

8 Replies

nerdylive•9mo ago

Havent tried that yet, feel free to deploy it too

digigoblin•9mo ago

24GB should be fine Best to try it and see

J*OP•9mo ago

Well I tried and failed, out of memory CUDA exception. I guess I'll stick to 48 GB GPU for now.

digigoblin•9mo ago

Which model was it? Usually 13B with AWQ or GPTQ quantization aren't very large.

justin•9mo ago

Could be need to set some env variable sometimes for these LLMs to prevent it from eating up too much memory i think too. I remember i had a similar experience, but sometimes there are configs to help that If u wanna eat less memory

digigoblin•9mo ago

Yeah for GPTQ, I had to set GPU_MEMORY_UTILIZATION to 0.80 instead of the default of 0.95

J*OP•9mo ago

Model was https://huggingface.co/TheBloke/Mythalion-13B-GPTQ

TheBloke/Mythalion-13B-GPTQ · Hugging Face

J*OP•9mo ago

Thanks, will give it a try!

Gaming

Programming

GPU for 13B language model

Did you find this page helpful?