AI/LLM/Image gen. not working on RX 6900 XT since bazzite:stable-41.20250409
Hello, I know this is likely not the usual use case around these parts, but I had been running podman containers with Ollama, Fooocus, ComfyUI, and Open WebUI for many months without issues now, however something changed in the system between bazzite images:
>> stable-41.20250331 (<--- working)
>> stable-41.20250409 (<--- not working, nor any later images, including 42)
and it has completely blocked any of my "AI" working.
My card is AMD Radeon RX 6900 XT.
My Fooocus and ComfyUI podman images are custom-made. My Ollama and Open WebUI OCI images are taken from official sources (+rocm channel for Ollama).
For example in Ollama, anytime a model is supposed to be loaded, previously it would load the model into memory (ending up with something like 14 GiB used by system + Gemma 3), but now it just gets stuck at 10.7 GiB used, CPU constantly going at 100%. Then it just stays stuck like this.
Ollama debug logs show:
[character limit hit, continuing below]
12 Replies
_[GIN] 2025/04/16 - 18:28:17 | 200 | 33.89µs | 192.168.192.100 | GET "/api/version" time=2025-04-16T18:28:17.553Z level=INFO source=ggml.go:289 msg="model weights" buffer=ROCm0 size="7.6 GiB" time=2025-04-16T18:28:17.553Z level=INFO source=ggml.go:289 msg="model weights" buffer=CPU size="787.5 MiB" time=2025-04-16T18:28:17.748Z level=DEBUG source=server.go:625 msg="model load progress 0.00" time=2025-04-16T18:28:24.516Z level=DEBUG source=server.go:625 msg="model load progress 0.00" time=2025-04-16T18:28:24.767Z level=DEBUG source=server.go:625 msg="model load progress 0.00" time=2025-04-16T18:28:25.519Z level=DEBUG source=server.go:625 msg="model load progress 0.00" time=2025-04-16T18:28:40.062Z level=DEBUG source=server.go:625 msg="model load progress 0.00" time=2025-04-16T18:28:42.819Z level=DEBUG source=server.go:625 msg="model load progress 0.01" time=2025-04-16T18:29:51.299Z level=DEBUG source=server.go:625 msg="model load progress 0.02" time=2025-04-16T18:29:51.551Z level=DEBUG source=server.go:625 msg="model load progress 0.03" time=2025-04-16T18:29:51.803Z level=DEBUG source=server.go:625 msg="model load progress 0.03"So it seems it's doing something, but it's going at a glacial pace compared to before the system image update, where the model would be loaded within a few seconds and fully operational. I have been trying to solve this somehow, but I get no other useful errors. It also doesn't matter if my podman images use PyTorch + ROCm 6.2.4 or 6.3 (nightly). I would appreciate any help or tips, thank you. I don't know how to compare which packages changed in the system images between the two snapshots. I know rpm-ostree has shown it during the output, but I would really prefer not to downgrade from 42 now. If my memory serves me right, I might have seen mesa drivers being updated at that time. Could that be a possible cause?
Might try this workload in distrobox
See if the problem still occurs
If it does you know it's an upstream issue to report
All the dependencies including rocm can be installed in there
I will try Distrobox, although I'm not sure how successful that will be since the backend is Podman as well. But it is certainly strange that a small change between snapshots caused this, as it worked in Podman before. I've checked by downgrading and it was indeed that.
Distrobox has the exact same behavior. CPU backend runs fine, but GPU/ROCm backend gets stuck on the same state.
Can I somehow see the RPM diff between
>> stable-41.20250331
>> stable-41.20250409
?
That means the issue is in your libraries
Can report that upstream
Ok, but which? As I'm saying, it only does so on and after stable-41.20250409, it works fine on *stable-41.20250331 *, so I'd want to know what exactly changed between those Bazzite snapshots so I can narrow it down.
Neither, the distro you used in distrobox
You just proved this isn't an us problem
This broke upstream
Hold on, hold on.
The same podman image, two bazzite snapshots.
Distrobox uses no host libraries
If you have the issue in distrobox, you have the issue in distrobox
Nothing to do with us
Snapshot 1 works, snapshot 2 doesn't.
Podman image never changed. Not talking distrobox here.
Did you test distrobox like I asked? Was implied above
If so, it's not an us problem. Report to the ROCM project and let them help you figure out what broke
You're not using any part of our image other than the Kernel in that scenario.
Yes, I tried Distrobox on Ubuntu 24.04, Bazzite image stable-42. It doesn't work.
But my ollama/comfyui/fooocus podman images are not distrobox and they are each based on Ubuntu 20.04-22.04. Those are unchanged and always reinitialize. Those do work on stable-41.20250331, but the exact same images do not work on stable-41.20250409. If the images are unchanged, but the Bazzite base changes, and that is the only variable that changes, and that causes it to work or not work, why would this not be a Bazzite image issue? Or meaning at least whichever packages changed.
Only change in either scenario is the Kernel
Contact rocm