Services don't start
This morning I tried starting a pod using the https://github.com/ashleykleynhans/stable-diffusion-docker template, and no matter how long I leave it after getting the "Container is READY!" confirmation, the services never start.
The RunPod Application Manager (port 8000) and Jupyter Labs (port 8888) connect without issue, but a1111 (port 3000), kohya_ss (port 3010), comfyUI (port 3020) and tensorboard (port 6006) all show the "connection is not up yet" page.
I exposed the external port 44082, which made no change, and stopping/starting the apps in the runpod application manager also makes no change.
I usually run this from a network volume with everything already preconfigured, and I'm currently running it in the CZ1 region. Neither running it from the network volume or running it from scratch make a difference, services can't connect either way.
Current ID: rupmemk46gtepk
GitHub
GitHub - ashleykleynhans/stable-diffusion-docker: Docker image for ...
Docker image for Stable Diffusion WebUI with ControlNet, After Detailer, Dreambooth, Deforum and roop extensions, as well as Kohya_ss and ComfyUI - GitHub - ashleykleynhans/stable-diffusion-docker:...
8 Replies
What are you seeing for the container logs?
I can restart the pod to grab the exact logs, but it looked like everything was standard in the logs all the way through "Container is READY!", system and container both started up without issue in 1-2 minutes
Hmm. I just started it on another GPU config and it's going through a longer cycle building the container, so maybe didn't build properly on the other config?
I was using one of the RTX 4090 configs which I had used several times previously, and used last yesterday
Let me know if the second one works and then feel free to post the pod id here for the one that wasn't working
same issue on the new pod, ID: yzbsztfwcaf1ga
Here are the container logs:
(as a file since I don't have Nitro)
I did notice this template didn't populate in a search or by default like it usually does, I had to launch it using the link from the GitHub page, not sure if that might be relevant
2023-12-29T16:04:35.791740132Z Starting Nginx service...
2023-12-29T16:04:35.811540626Z * Starting nginx nginx
2023-12-29T16:04:35.824502734Z ...done.
2023-12-29T16:04:35.824513804Z Running pre-start script...
2023-12-29T16:04:35.824517553Z Container is running
2023-12-29T16:04:35.824521583Z Syncing venv to workspace, please wait...
2023-12-29T16:05:17.502348603Z Syncing Stable Diffusion Web UI to workspace, please wait...
2023-12-29T16:05:19.924204304Z Syncing Kohya_ss to workspace, please wait...
2023-12-29T16:05:53.743971926Z Syncing ComfyUI to workspace, please wait...
2023-12-29T16:05:58.184933424Z Syncing Application Manager to workspace, please wait...
2023-12-29T16:05:58.654091435Z Fixing Stable Diffusion Web UI venv...
2023-12-29T16:05:58.657608222Z Fixing Kohya_ss venv...
2023-12-29T16:05:58.661171919Z Fixing ComfyUI venv...
2023-12-29T16:05:58.665633451Z Configuring accelerate...
2023-12-29T16:05:58.688464516Z Starting Stable Diffusion Web UI
2023-12-29T16:05:58.689192512Z Stable Diffusion Web UI started
2023-12-29T16:05:58.689210491Z Log file: /workspace/logs/webui.log
2023-12-29T16:05:58.691251259Z Starting Kohya_ss Web UI
2023-12-29T16:05:58.692034254Z Kohya_ss started
2023-12-29T16:05:58.692047304Z Log file: /workspace/logs/kohya_ss.log
2023-12-29T16:05:58.696341637Z Starting ComfyUI
2023-12-29T16:05:58.706286843Z ComfyUI started
2023-12-29T16:05:58.706305543Z Log file: /workspace/logs/comfyui.log
2023-12-29T16:05:58.709811431Z Starting Tensorboard
2023-12-29T16:05:58.717965869Z ln: failed to create symbolic link '/workspace/logs/dreambooth/dreambooth': File exists
2023-12-29T16:05:58.720013886Z ln: failed to create symbolic link '/workspace/logs/ti/textual_inversion': File exists
2023-12-29T16:05:58.722665349Z Tensorboard Started
2023-12-29T16:05:58.723045417Z All services have been started
2023-12-29T16:05:58.723575894Z Pod Started
2023-12-29T16:05:58.723615143Z Starting Jupyter Lab...
2023-12-29T16:05:58.723913951Z Jupyter Lab started
2023-12-29T16:05:58.723957801Z Exporting environment variables...
2023-12-29T16:05:58.731740011Z Container is READY!
I thnk everything looks normal except maybe the symlinks for dreambooth and text inversion
Update: Assuming it's something with the template. The Kohya_ss standalone template worked fine
Not an issue with the template, many people are using this template, must be some other issue.
Hey Ashley! Thanks for the quick reply on github. So far I've tried running it on an RTX 6000 Ada and an RTX 4090, and the server has been set to any, but I can try a specific server to isolate.
I did notice that runpod is no longer giving me the choice between secure cloud (which I normally use) and gpu cloud. Currently my only option is GPU cloud. Wasn't sure if something changed on my end or if it was just a change in runpod's UI, but seemed worth mentioning.
You can select Community Cloud or Secure Cloud in the Cloud Type filter at the top of the page.
It defaults to Secure Cloud but you can change it to Community Cloud.
Everything is working for me in Secure Cloud as well as Community Cloud. I used an A5000 in SK region in Community Cloud as well as an A4000. I used a 4090 in IS region in Secure Cloud and its working perfectly every time.
So weird. Thanks for checking! I'll keep testing