Pods not even starting due to low memory
I'm in US-OR-1 and trying to start pods with 0 GPU to do some config work. They don't start normally, the web UI embedded console goes back and forth between "Waiting for logs" and acting like the pod is healthy. When I try to connect, Jupyter server is offline, ComfyUI is offline (which I don't care about since I started with 0 GPU) and the "Start Web Terminal" button doesn't do anything; I never get the "Connect to Web Terminal" button to enable.
Container log:
2024-12-03T21:46:51Z create container valyriantech/comfyui-with-flux:latest
2024-12-03T21:46:52Z latest Pulling from valyriantech/comfyui-with-flux
2024-12-03T21:46:52Z Digest: sha256:ba3957ab9ef3eda8912e89d2dad13477c10f6bbe214c342622a1cb65d2b0a128
2024-12-03T21:46:52Z Status: Image is up to date for valyriantech/comfyui-with-flux:latest
2024-12-03T21:46:52Z start container for valyriantech/comfyui-with-flux:latest: begin
2024-12-03T21:47:10Z WARN: very high memory utilization: 486.7MiB / 488.3MiB (99 %)
2024-12-03T21:47:40Z WARN: very high memory utilization: 487.1MiB / 488.3MiB (99 %)
2024-12-03T21:47:56Z WARN: container is unhealthy: triggered memory limits (OOM)
2024-12-03T21:48:08Z WARN: container is unhealthy: triggered memory limits (OOM)
2024-12-03T21:48:10Z WARN: very high memory utilization: 487.4MiB / 488.3MiB (99 %)
2024-12-03T21:48:12Z WARN: container is unhealthy: triggered memory limits (OOM)
2024-12-03T21:48:29Z WARN: container is unhealthy: triggered memory limits (OOM)
7 Replies
When you start the pod with 0 GPU, the memory is limited to 512MB, you can’t do things that too heavy.
I'm just trying to start the pod, open the web terminal, and move some files. All stuff that I've done in the past with no problem.
Try using cloud sync feature to s3 or other cloud storage
I think it is better
bash -c 'sleep infinity'
then you can ssh and transfer data
Acknowledge and appreciate the options but are we saying that 0 GPU mode just doesn't work anymore?
A 0 GPU pod can run if you’re performing lightweight tasks like copying files, but if your pod starts running web servers or other heavy workflows, it might exceed the limited memory allocated, leading to the error above.
I'm saying that the pod would not even get to a usable state; those log messages were from pod startup. I'm just wondering what changed, because it used to work.