streamize
RRunPod
•Created by streamize on 1/17/2025 in #⚡|serverless
EU-RO-1 region severless H100 gpu not available ....
you mean that there is no H100 GPU in EU-RO-1 right?
4 replies
There's inconsistency in performance ( POD )
The Docker image hosts a Socket.IO server. It receives messages from users, generates images, and when generation is complete, it sends the generated image as a base64 string.
As you mentioned, I've now set up hosting to test the secure cloud. Due to price differences, I used the community cloud.
38 replies
There's inconsistency in performance ( POD )
The same model is used in VRAM, and the actions performed by each instance are 100% identical. They use the same Docker image and receive the same requests from clients. What I'm curious about is the performance of the computers in the community cloud. As far as I know, these receive GPUs from an unspecified number of users. Does RunPod have internal criteria for determining suitability for hosting? As you mentioned, if it's a power issue, the GPU performance might decrease. Are there no management standards for this?
38 replies
There's inconsistency in performance ( POD )
If we assume the SSD is the problem, what exactly is the issue? Is it a problem I can control? Because when creating pods, I only input the capacity, and I entered the same for all of them. I don't exactly understand what it could be.
38 replies
There's inconsistency in performance ( POD )
The CPU and RAM, as you know, are not directly specified by me and always come out differently. So, when looking at pods with good performance in relation to this issue, there were cases with even lower VRAM and lower CPU performance. Therefore, I've put on hold judging it as a VRAM or CPU problem.
We use a system where tasks are stacked in a queue for processing. Additionally, in our server source code, we've separated the pending situations where the next task can't be processed due to network delays. The criterion for a task being completed in the queue is simply the inference (it doesn't include delivery to the user). So, I'm not currently considering it as a network connection problem.
If we assume it's a network connection issue, I think all pods should have experienced uniform problems in processing capacity. However, some pods are working well.
So, I haven't been able to identify the cause yet, haha...
This problem didn't occur on vast.ai. So now I have a headache....
38 replies
There's inconsistency in performance ( POD )
I'm currently looking for a suitable GPU provider, but in the case of RunPod, the performance variance is too severe. I've also tested Vast.ai, and such performance instability issues hardly occur. I need to be prepared for a situation where I'll have to rent 100-200 RTX 4090 GPUs in the future, so this problem needs to be resolved.
38 replies
RRunPod
•Created by streamize on 7/14/2024 in #⚡|serverless
retrieving queue position for a specific task in RunPod serverless API
We have over 50 inference operations coming in per second. So, even if there are 30 workers, it is not enough.
7 replies
RRunPod
•Created by streamize on 5/19/2024 in #⚡|serverless
runpod serverless start.sh issue
This is an image capturing the logs of a failed case. (It is executed without start.sh being run.)
13 replies
RRunPod
•Created by streamize on 5/19/2024 in #⚡|serverless
runpod serverless start.sh issue
The attached image is a screenshot of the logs. This image represents a case where everything is processed correctly (when the Docker image is initialized for the first time and start.sh is executed successfully).
13 replies