RunPod•8mo ago

There's inconsistency in performance ( POD )

Hello. I rent and operate 20 RTX4090 GPUs all day long. However, there are significant differences in inference speeds. Each line in the table in the attached image represents 2 RTX 4090 GPUs. One processes 150 images in 3 minutes. However, the rest only process 50-80 images. On my own RTX4090 2-way server that I purchased directly, the throughput is 180 images processed in 3 minutes. I haven't been able to figure out why these speed differences are occurring. The inference task is generating one image.

17 Replies

streamizeOP•8mo ago

I used community cloud There are significant performance variations each time an instance is created.

Jason•8mo ago

Have pod id's?

streamizeOP•8mo ago

I'm currently looking for a suitable GPU provider, but in the case of RunPod, the performance variance is too severe. I've also tested Vast.ai, and such performance instability issues hardly occur. I need to be prepared for a situation where I'll have to rent 100-200 RTX 4090 GPUs in the future, so this problem needs to be resolved. qqqvn244j95e71 this is worse performance pod id eeqpk5j05wyz6j kts3emj3q087ee too. and ifjn32k8a0hru1 pod is good performance

Poddy•8mo ago

@streamize

Escalated To Zendesk

The thread has been escalated to Zendesk!

Jason•8mo ago

Hey does the ticket looks right

streamizeOP•8mo ago

yes but I put the not runpod account email. no problem? I created ticket with other email

Jason•8mo ago

Ooh alright I'm wondering if those pods are the same specifications?

streamizeOP•8mo ago

yes righjt. I used same docker image with same spec

Jason•8mo ago

The cpu model, ram amount? I'm guessing those are the factors or the ssd Or maybe network connection ( depending how does your app works ) But most likely the specs of the server

streamizeOP•8mo ago

The CPU and RAM, as you know, are not directly specified by me and always come out differently. So, when looking at pods with good performance in relation to this issue, there were cases with even lower VRAM and lower CPU performance. Therefore, I've put on hold judging it as a VRAM or CPU problem. We use a system where tasks are stacked in a queue for processing. Additionally, in our server source code, we've separated the pending situations where the next task can't be processed due to network delays. The criterion for a task being completed in the queue is simply the inference (it doesn't include delivery to the user). So, I'm not currently considering it as a network connection problem. If we assume it's a network connection issue, I think all pods should have experienced uniform problems in processing capacity. However, some pods are working well. So, I haven't been able to identify the cause yet, haha... This problem didn't occur on vast.ai. So now I have a headache.... If we assume the SSD is the problem, what exactly is the issue? Is it a problem I can control? Because when creating pods, I only input the capacity, and I entered the same for all of them. I don't exactly understand what it could be. There is about a 2-fold difference in creation speed between instances with low performance and those without. This is a very significant figure.

Jason•8mo ago

Not really, like ssds wear out right, older ssd can yield worse performance leading to lower throughput like gpus Yeah I agree

streamizeOP•8mo ago

Is this true even though inference is already made while the model is on the GPU?

Jason•8mo ago

I'm not sure about what's the real issue here, oh you're using the same model in vram? Most likely the gpu or cpu then Have you also check nvidia-smi maybe they're power limited Just guessing, but I'm sure staffs can check more about the machine How does your app receives input for its inference? What about secure cloud ever tried those too?

streamizeOP•8mo ago

The same model is used in VRAM, and the actions performed by each instance are 100% identical. They use the same Docker image and receive the same requests from clients. What I'm curious about is the performance of the computers in the community cloud. As far as I know, these receive GPUs from an unspecified number of users. Does RunPod have internal criteria for determining suitability for hosting? As you mentioned, if it's a power issue, the GPU performance might decrease. Are there no management standards for this? The Docker image hosts a Socket.IO server. It receives messages from users, generates images, and when generation is complete, it sends the generated image as a base64 string. As you mentioned, I've now set up hosting to test the secure cloud. Due to price differences, I used the community cloud.

Jason•8mo ago

There is but I don't know if they do any checks periodically for that Oh okay nice If you don't mind update me here, I'm curious what can cause what you're experiencing

streamizeOP•8mo ago

I think we can discuss it here.

Jason•8mo ago

Ya sure discuss what

Gaming

Programming

There's inconsistency in performance ( POD )

Did you find this page helpful?