streamize
There's inconsistency in performance ( POD )
The Docker image hosts a Socket.IO server. It receives messages from users, generates images, and when generation is complete, it sends the generated image as a base64 string.
As you mentioned, I've now set up hosting to test the secure cloud. Due to price differences, I used the community cloud.
38 replies
There's inconsistency in performance ( POD )
The same model is used in VRAM, and the actions performed by each instance are 100% identical. They use the same Docker image and receive the same requests from clients. What I'm curious about is the performance of the computers in the community cloud. As far as I know, these receive GPUs from an unspecified number of users. Does RunPod have internal criteria for determining suitability for hosting? As you mentioned, if it's a power issue, the GPU performance might decrease. Are there no management standards for this?
38 replies
There's inconsistency in performance ( POD )
If we assume the SSD is the problem, what exactly is the issue? Is it a problem I can control? Because when creating pods, I only input the capacity, and I entered the same for all of them. I don't exactly understand what it could be.
38 replies
There's inconsistency in performance ( POD )
The CPU and RAM, as you know, are not directly specified by me and always come out differently. So, when looking at pods with good performance in relation to this issue, there were cases with even lower VRAM and lower CPU performance. Therefore, I've put on hold judging it as a VRAM or CPU problem.
We use a system where tasks are stacked in a queue for processing. Additionally, in our server source code, we've separated the pending situations where the next task can't be processed due to network delays. The criterion for a task being completed in the queue is simply the inference (it doesn't include delivery to the user). So, I'm not currently considering it as a network connection problem.
If we assume it's a network connection issue, I think all pods should have experienced uniform problems in processing capacity. However, some pods are working well.
So, I haven't been able to identify the cause yet, haha...
This problem didn't occur on vast.ai. So now I have a headache....
38 replies
There's inconsistency in performance ( POD )
I'm currently looking for a suitable GPU provider, but in the case of RunPod, the performance variance is too severe. I've also tested Vast.ai, and such performance instability issues hardly occur. I need to be prepared for a situation where I'll have to rent 100-200 RTX 4090 GPUs in the future, so this problem needs to be resolved.
38 replies
RRunPod
•Created by streamize on 7/14/2024 in #⚡|serverless
retrieving queue position for a specific task in RunPod serverless API
We have over 50 inference operations coming in per second. So, even if there are 30 workers, it is not enough.
7 replies
RRunPod
•Created by streamize on 5/19/2024 in #⚡|serverless
runpod serverless start.sh issue
This is an image capturing the logs of a failed case. (It is executed without start.sh being run.)
13 replies
RRunPod
•Created by streamize on 5/19/2024 in #⚡|serverless
runpod serverless start.sh issue
The attached image is a screenshot of the logs. This image represents a case where everything is processed correctly (when the Docker image is initialized for the first time and start.sh is executed successfully).
13 replies
RRunPod
•Created by streamize on 5/19/2024 in #⚡|serverless
runpod serverless start.sh issue
Upon further investigation, I found that an error occurs on line 244 of the rp_handler.py file that I attached. Line 244 is the code responsible for saving the generated image from the comfyui API. The first runpod serverless call, where the Docker image is initialized and start.sh is executed, works as expected. However, from the second call onwards, after the Docker image has been initialized, it seems that comfyui is not generating the images. I'm not sure why this issue is occurring, but starting from the second call, comfyui does not create images in the temp directory. (It appears that the image generation task is not being performed at all, even though the comfyui server is running.) Lastly, there's a part that I don't understand. When all instances of runpod serverless are terminated and an instance is started, start.sh is not executed, but I don't understand why the comfyui server port is still active. It doesn't seem to be starting up properly.
13 replies