Abdelrhman Nile Comments - Answer Overflow

Abdelrhman Nile

Posts Comments

RRunPod

•Created by Abdelrhman Nile on 4/16/2025 in #⚡｜serverless

Serverless VLLM concurrency issue

will test that

81 replies

RRunPod

•Created by Abdelrhman Nile on 4/16/2025 in #⚡｜serverless

Serverless VLLM concurrency issue

same model, same benchmark but on gcp a100 40 vram machine

============ Serving Benchmark Result ============
Successful requests:                     1000      
Benchmark duration (s):                  346.74    
Total input tokens:                      1024000   
Total generated tokens:                  70328     
Request throughput (req/s):              2.88      
Output token throughput (tok/s):         202.83    
Total Token throughput (tok/s):          3156.09   
---------------Time to First Token----------------
Mean TTFT (ms):                          172033.53 
Median TTFT (ms):                        178518.65 
P99 TTFT (ms):                           326714.81 
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          357.45    
Median TPOT (ms):                        271.08    
P99 TPOT (ms):                           1728.97   
---------------Inter-token Latency----------------
Mean ITL (ms):                           263.52    
Median ITL (ms):                         151.98    
P99 ITL (ms):                            1228.35   
==================================================

============ Serving Benchmark Result ============
Successful requests:                     1000      
Benchmark duration (s):                  346.74    
Total input tokens:                      1024000   
Total generated tokens:                  70328     
Request throughput (req/s):              2.88      
Output token throughput (tok/s):         202.83    
Total Token throughput (tok/s):          3156.09   
---------------Time to First Token----------------
Mean TTFT (ms):                          172033.53 
Median TTFT (ms):                        178518.65 
P99 TTFT (ms):                           326714.81 
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          357.45    
Median TPOT (ms):                        271.08    
P99 TPOT (ms):                           1728.97   
---------------Inter-token Latency----------------
Mean ITL (ms):                           263.52    
Median ITL (ms):                         151.98    
P99 ITL (ms):                            1228.35   
==================================================

81 replies

RRunPod

•Created by Abdelrhman Nile on 4/16/2025 in #⚡｜serverless

Serverless VLLM concurrency issue

also the script sent 1000 requests only 857 was succesful

81 replies

RRunPod

•Created by Abdelrhman Nile on 4/16/2025 in #⚡｜serverless

Serverless VLLM concurrency issue

============ Serving Benchmark Result ============
Successful requests:                     857       
Benchmark duration (s):                  95.82     
Total input tokens:                      877568    
Total generated tokens:                  68965     
Request throughput (req/s):              8.94      
Output token throughput (tok/s):         719.70    
Total Token throughput (tok/s):          9877.74   
---------------Time to First Token----------------
Mean TTFT (ms):                          42451.61  
Median TTFT (ms):                        42317.61  
P99 TTFT (ms):                           77811.55  
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          472.19    
Median TPOT (ms):                        190.87    
P99 TPOT (ms):                           3881.05   
---------------Inter-token Latency----------------
Mean ITL (ms):                           182.12    
Median ITL (ms):                         0.01      
P99 ITL (ms):                            4703.27   
==================================================

============ Serving Benchmark Result ============
Successful requests:                     857       
Benchmark duration (s):                  95.82     
Total input tokens:                      877568    
Total generated tokens:                  68965     
Request throughput (req/s):              8.94      
Output token throughput (tok/s):         719.70    
Total Token throughput (tok/s):          9877.74   
---------------Time to First Token----------------
Mean TTFT (ms):                          42451.61  
Median TTFT (ms):                        42317.61  
P99 TTFT (ms):                           77811.55  
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          472.19    
Median TPOT (ms):                        190.87    
P99 TPOT (ms):                           3881.05   
---------------Inter-token Latency----------------
Mean ITL (ms):                           182.12    
Median ITL (ms):                         0.01      
P99 ITL (ms):                            4703.27   
==================================================