Abdelrhman Nile
RRunPod
•Created by Abdelrhman Nile on 4/16/2025 in #⚡|serverless
Serverless VLLM concurrency issue
will test that
81 replies
RRunPod
•Created by Abdelrhman Nile on 4/16/2025 in #⚡|serverless
Serverless VLLM concurrency issue
same model, same benchmark but on gcp a100 40 vram machine
81 replies
RRunPod
•Created by Abdelrhman Nile on 4/16/2025 in #⚡|serverless
Serverless VLLM concurrency issue
also the script sent 1000 requests
only 857 was succesful
81 replies
RRunPod
•Created by Abdelrhman Nile on 4/16/2025 in #⚡|serverless
Serverless VLLM concurrency issue
configuration was max workers = 3
and i was NOT setting default batch size , it was left on deafult which i believe is 50
81 replies
RRunPod
•Created by Abdelrhman Nile on 4/16/2025 in #⚡|serverless
Serverless VLLM concurrency issue
i kinda did that with vllm benchmark serving script, let me share the results with you
81 replies
RRunPod
•Created by Abdelrhman Nile on 4/16/2025 in #⚡|serverless
Serverless VLLM concurrency issue
but let me try it one more time to confirm
81 replies
RRunPod
•Created by Abdelrhman Nile on 4/16/2025 in #⚡|serverless
Serverless VLLM concurrency issue
are you sure? i tried it with 5, 10, 50, 256 and i got the same behaviour
81 replies
RRunPod
•Created by Abdelrhman Nile on 4/16/2025 in #⚡|serverless
Serverless VLLM concurrency issue
but both requests status appear as IN PROGRESS
81 replies
RRunPod
•Created by Abdelrhman Nile on 4/16/2025 in #⚡|serverless
Serverless VLLM concurrency issue
same behavior of not handling multiple requests with default batch size set to 50 and 256
81 replies
RRunPod
•Created by Abdelrhman Nile on 4/16/2025 in #⚡|serverless
Serverless VLLM concurrency issue
i tried it with 50 and 256
81 replies
RRunPod
•Created by Abdelrhman Nile on 4/16/2025 in #⚡|serverless
Serverless VLLM concurrency issue
i ma setting default batch size to 1 because i noticed streaming used to send very big chunks of tokens
81 replies
RRunPod
•Created by Abdelrhman Nile on 4/16/2025 in #⚡|serverless
Serverless VLLM concurrency issue
right now it is configured to only have one worker
81 replies
RRunPod
•Created by Abdelrhman Nile on 4/16/2025 in #⚡|serverless
Serverless VLLM concurrency issue

81 replies
RRunPod
•Created by Abdelrhman Nile on 4/16/2025 in #⚡|serverless
Serverless VLLM concurrency issue

81 replies
RRunPod
•Created by Abdelrhman Nile on 4/16/2025 in #⚡|serverless
Serverless VLLM concurrency issue
logs when sending 2 requests
81 replies
RRunPod
•Created by Abdelrhman Nile on 4/16/2025 in #⚡|serverless
Serverless VLLM concurrency issue
no i mean i was configuring the endpoint to scale up to multiple workers if needed
81 replies
RRunPod
•Created by Abdelrhman Nile on 4/16/2025 in #⚡|serverless
Serverless VLLM concurrency issue
tried both
81 replies
RRunPod
•Created by Abdelrhman Nile on 4/16/2025 in #⚡|serverless
Serverless VLLM concurrency issue
2 and 3
81 replies
RRunPod
•Created by Abdelrhman Nile on 4/16/2025 in #⚡|serverless
Serverless VLLM concurrency issue
ill do some benchmarks and provide you with the number s
81 replies
RRunPod
•Created by Abdelrhman Nile on 4/16/2025 in #⚡|serverless
Serverless VLLM concurrency issue
yes
first request starts streaming, second request from another client always starts after the first one finishes
81 replies