Vllm error flash-attn

I get this error how to fix it and use vllm-flash-attn which is faster. Current Qwen2-VL implementation has a bug with vllm-flash-attn inside vision module, so we use xformers backend instead. You can run `pip install flash-attn to use flash-attention backend.

0 Replies

No replies yetBe the first to reply to this messageJoin

RunPod Join

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

17KMembers

View on Discord

Gaming

Programming

Vllm error flash-attn

Did you find this page helpful?