There's inconsistency in performance ( POD )

Hello. I rent and operate 20 RTX4090 GPUs all day long. However, there are significant differences in inference speeds. Each line in the table in the attached image represents 2 RTX 4090 GPUs. One processes 150 images in 3 minutes. However, the rest only process 50-80 images. On my own RTX4090 2-way server that I purchased directly, the throughput is 180 images processed in 3 minutes. I haven't been able to figure out why these speed differences are occurring. The inference task is generating one image.
No description
17 Replies
streamize
streamize3w ago
I used community cloud There are significant performance variations each time an instance is created.
nerdylive
nerdylive3w ago
Have pod id's?
streamize
streamize3w ago
I'm currently looking for a suitable GPU provider, but in the case of RunPod, the performance variance is too severe. I've also tested Vast.ai, and such performance instability issues hardly occur. I need to be prepared for a situation where I'll have to rent 100-200 RTX 4090 GPUs in the future, so this problem needs to be resolved. qqqvn244j95e71 this is worse performance pod id eeqpk5j05wyz6j kts3emj3q087ee too. and ifjn32k8a0hru1 pod is good performance
Poddy
Poddy3w ago
@streamize
Escalated To Zendesk
The thread has been escalated to Zendesk!
nerdylive
nerdylive3w ago
Hey does the ticket looks right
streamize
streamize3w ago
yes but I put the not runpod account email. no problem? I created ticket with other email
nerdylive
nerdylive3w ago
Ooh alright I'm wondering if those pods are the same specifications?
streamize
streamize3w ago
yes righjt. I used same docker image with same spec
nerdylive
nerdylive3w ago
The cpu model, ram amount? I'm guessing those are the factors or the ssd Or maybe network connection ( depending how does your app works ) But most likely the specs of the server
streamize
streamize3w ago
The CPU and RAM, as you know, are not directly specified by me and always come out differently. So, when looking at pods with good performance in relation to this issue, there were cases with even lower VRAM and lower CPU performance. Therefore, I've put on hold judging it as a VRAM or CPU problem. We use a system where tasks are stacked in a queue for processing. Additionally, in our server source code, we've separated the pending situations where the next task can't be processed due to network delays. The criterion for a task being completed in the queue is simply the inference (it doesn't include delivery to the user). So, I'm not currently considering it as a network connection problem. If we assume it's a network connection issue, I think all pods should have experienced uniform problems in processing capacity. However, some pods are working well. So, I haven't been able to identify the cause yet, haha... This problem didn't occur on vast.ai. So now I have a headache.... If we assume the SSD is the problem, what exactly is the issue? Is it a problem I can control? Because when creating pods, I only input the capacity, and I entered the same for all of them. I don't exactly understand what it could be. There is about a 2-fold difference in creation speed between instances with low performance and those without. This is a very significant figure.
nerdylive
nerdylive3w ago
Not really, like ssds wear out right, older ssd can yield worse performance leading to lower throughput like gpus Yeah I agree
streamize
streamize3w ago
Is this true even though inference is already made while the model is on the GPU?
nerdylive
nerdylive3w ago
I'm not sure about what's the real issue here, oh you're using the same model in vram? Most likely the gpu or cpu then Have you also check nvidia-smi maybe they're power limited Just guessing, but I'm sure staffs can check more about the machine How does your app receives input for its inference? What about secure cloud ever tried those too?
streamize
streamize3w ago
The same model is used in VRAM, and the actions performed by each instance are 100% identical. They use the same Docker image and receive the same requests from clients. What I'm curious about is the performance of the computers in the community cloud. As far as I know, these receive GPUs from an unspecified number of users. Does RunPod have internal criteria for determining suitability for hosting? As you mentioned, if it's a power issue, the GPU performance might decrease. Are there no management standards for this? The Docker image hosts a Socket.IO server. It receives messages from users, generates images, and when generation is complete, it sends the generated image as a base64 string. As you mentioned, I've now set up hosting to test the secure cloud. Due to price differences, I used the community cloud.
nerdylive
nerdylive3w ago
There is but I don't know if they do any checks periodically for that Oh okay nice If you don't mind update me here, I'm curious what can cause what you're experiencing
Want results from more Discord servers?
Add your server