Yeah definitely.
But like it should still multithread in time sharing. E.g. Even with 1 vcpu I should be able to get 10 threads. But here it seems that I can get max 2 threads per vcpu
I would be happy to run 64 say in parallel. At some point I am hitting ram and vram limits. But that is OK.
I don't understand why I am hitting multithread limits when there is still ram and vram available.
Thanks. Yeah the problem is that I get it with just 20 vllm in parallel.
What do you mean by thread swarming? Should I just spin off a number of threads to see what the limit is?