System spontaneously rebooting at random, can't find a cause

I posted this over on the Universal Blue forums, but I haven't gotten any replies to it, so I'm posting it here: https://universal-blue.discourse.group/t/having-my-bazzite-system-slow-down-then-spontaneously-reboot-at-random/7081 I’m having a regular issue of my Bazzite install suddenly slow down to a crawl and then reboot itself. I've had it happen in so many situations (idling on desktop, running desktop apps and even in the middle of a game), and I'm at my wit's end on how this is happening. Running “journalctl | grep Error” gets me these lines around the time I get the restarts: Feb 25 18:12:43 bazzite kernel: iwlwifi 0000:04:00.0: Start IWL Error Log Dump: Feb 25 10:13:29 bazzite kernel: mce: [Hardware Error]: Machine check events logged Feb 25 10:13:29 bazzite kernel: mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 5: bea0000001000108 Feb 25 10:13:29 bazzite kernel: mce: [Hardware Error]: TSC 0 ADDR ffffffc34b72a8 MISC d012000100000000 SYND 4d000000 IPID 500b000000000 Feb 25 10:13:29 bazzite kernel: mce: [Hardware Error]: PROCESSOR 2:a20f12 TIME 1740507205 SOCKET 0 APIC 6 microcode a201210 System specs CPU: AMD Ryzen 7 5800X3D Motherboard: MSI MPG B550 Gaming Plus RAM: 32 GB DDR4 3200 GPU: Nvidia GeForce RTX 4070 Ti Super Storage: T-Force TM8FPZ004T (main drive), WD WDS400T2B0A-00SM50 (secondary) A quick search uncovered some Linux users having stability issues with Ryzen 5000 CPUs related to undervolting, and I was running one (via the AMD Curve Optimizer) when I first installed Bazzite on this system, but turning it off has not stopped the restarts from happening. I've also tinkered with the power plan in the KDE Settings menu but that seems to do nothing as well. This issue has persisted between all the Bazzite builds that have come out and I've upgraded to since I installed almost a month ago, so it's happened among a bunch of different kernels and Nvidia driver versions. Can anybody help me figure out what's going on here?
Universal Blue
Having my Bazzite system slow down, then spontaneously reboot at ra...
So I’m having a regular issue of my Bazzite install suddenly slow down to a crawl and then reboot itself. Up until today it happened on the desktop, and now it happened in the middle of a game so I’m getting some advice on what might be going on here. Running “journalctl | grep Error” gets me these lines around the time I get the restarts: Feb...
22 Replies
NeoChaos
NeoChaosOP4w ago
Just had it happen again just now, here is a ujust get-logs output for that session. https://paste.centos.org/view/fa02af88
LJMPRO
LJMPRO4w ago
Is there any signs that there's an issue? Does it just go black and all fans stop?
NeoChaos
NeoChaosOP4w ago
System starts stuttering and slowing down before it restarts. Mouse movement gets choppy, keys and clicks are slow to register. Eventually everything stops responding and then the screen goes blank and the system restarts
LJMPRO
LJMPRO4w ago
Interesting, I have a friend whose system does something similar. But he doesn't see the stuttering or slowdown. It just goes black.
NeoChaos
NeoChaosOP2w ago
Just happened again. get-logs output: https://paste.centos.org/view/e5b71128 after a week of stability, happened again https://paste.centos.org/view/6eba9390 Happened again, this time while logging in after a rebase to an older build: https://paste.centos.org/view/545d2987
Pandabrain
Pandabrain2w ago
Have you tried ostree-fsck?
NeoChaos
NeoChaosOP2w ago
I get command not found when trying to run that command anyway, happened again. Log for the previous session: https://paste.centos.org/view/b22ed48e Also the log for this current session, which included the error info I listed in the OP (starting at line 803): https://paste.centos.org/view/243d61b7
Pandabrain
Pandabrain2w ago
my bad, it's ostree fsck without the hyphen and you should sudo it sudo ostree fsck
Default_Defect
Keeping an eye on this, mine is doing this too.
NeoChaos
NeoChaosOP7d ago
Ok just had it happen again and ran ostree fsck upon logging back in. Tells me it comes up with no errors And the ujust get-logs output Previous crashed session: https://paste.centos.org/view/39e2717b This logged in session: https://paste.centos.org/view/ea811076
NeoChaos
NeoChaosOP7d ago
Pic of error messages I saw on reboot
No description
NeoChaos
NeoChaosOP7d ago
What's your PC specs? Maybe it's a common hardware issue?
Default_Defect
5800X3d 4080 Super 32gb ram
NeoChaos
NeoChaosOP7d ago
Huh, same CPU at least
Dziban
Dziban7d ago
Well I had this issue the other day after I cleaned my PC, turns out I left the CPU PCIE cable not fully plugged to the motherboard. Have you changed PC parts recently?
NeoChaos
NeoChaosOP7d ago
I changed my GPU a couple months ago, but that was before I installed Bazzite
Dziban
Dziban7d ago
🤔 definitely not it then.
NeoChaos
NeoChaosOP7d ago
Pandabrain
Pandabrain6d ago
1. You could try and reproduce the error by stress testing or running memtest86. 2. Have you ever tried to reset bios after this started happening? 3. I found this, which looks like the error you mentioned, he says a core is defective and deactivates it. https://forums.servethehome.com/index.php?resources/machine-check-exception-mce-workaround.55/ You could try that (with numbers adjusted to your error message accordingly). But I guess it would make sense to try to stress test before that to potentially verify if that helped. Nevermind that last part, I think your error is different
NeoChaos
NeoChaosOP3d ago
Not sure I should do 2 as I have Secure Boot enabled. I'll give 1 a try when I get the chance Anyway, with 20250325, it's been happening more frequently, had it happen twice just since this morning (PDT). Logs for the most recent incident: https://paste.centos.org/view/c0a260cb (previous boot) https://paste.centos.org/view/5273db5c (current boot) Another instance. Will memtest my system soon Previous boot: https://paste.centos.org/view/879c1d4e Current boot: https://paste.centos.org/view/1a324607 second time in an hour and a half! Previous boot: https://paste.centos.org/view/985347f3 Current boot: https://paste.centos.org/view/b43776f3 Just ran memtest, system passed with flying colors
amel
amel2d ago
just a head's up, i had a failing RAM module that memtest never caught. replaced it and my issues disappeared immediately (was on endeavouros at that time). so you might want to still take RAM/hardware issues into consideration perhaps it would also be a good idea to run a live USB with a different distro for several hours to see if you see a difference?
NeoChaos
NeoChaosOP2d ago
So far it's been stable after the memtest run, knock on wood. I'll give a live environment a try if it happens again soon though

Did you find this page helpful?