Extremely poor performance PODs with the RTX 4090
Hi. I'm building DeepFake with DeepFaceLab and today I've run already 3 PODs with rtx 4090 and they all give different performance, and very bad.
A couple of weeks ago I did the same work I'm doing now. My POD was with rtx 4090 and was giving a performance of 0.250ms per iteration. CPU utilization was 20-30% and GPU utilization was over 90% always.
Today I ran the same process on three PODs with RTX 4090 and they are running extremely weird. On one the performance was 0.850ms per iteration. On the other two about 1.100ms per iteration! All three PODs have CPUs loaded at 100% and I tested the GPUs with the command (nvidia-smi) for a long time and got strange results. GPUs are not loaded most of the time and only have one-off spikes up to 5-30% from time to time.
For a clean experiment I tested with the same build and DeepFaceLab settings on my PC with rtx 2080 and everything works fine! CPU utilization is 20-30%. GPU 100%.
What could be the issue?
1 Reply
Found the problem. The vulnerability is the CPU. Each POD was allocated rather weak CPUs, that was the neck of the bottle. The first time I tried it a couple weeks ago, apparently I was allocated a good CPU. I tried to select a CPU in the new POD settings, but all I can select is vCPU, but there is no CPU name and no option to book the whole CPU rather than part of it. How can this be done?