jules.dix Posts - Answer Overflow

jules.dix

•Created by jules.dix on 10/24/2024 in #⚡｜serverless

Vllm error flash-attn

I get this error how to fix it and use vllm-flash-attn which is faster. Current Qwen2-VL implementation has a bug with vllm-flash-attn inside vision module, so we use xformers backend instead. You can run `pip install flash-attn to use flash-attention backend.

1 replies

Gaming

Programming