I
Immich2y ago
Eazy E

Typesense Keeps Restarting (CrashLoopBackoff)

I'm running the Immich stack in Kubernetes. I'm having an issue where the immich-typesense image appears to be failing and restarting (see image for logs). The notable lines I can see are raft_server.h:62] Peer refresh failed, error: Doing another configuration change and node.cpp:811] [default_group:10.42.1.104:8107:8108 ] Refusing concurrent configuration changing. This is a brand new install where I recently used the CLI tool to import my images from backup and just connected the mobile app. What could cause this image to see these failures and continue to restart? Kubernetes Manifest: https://github.com/ModestTG/heliod-cluster/blob/main/kubernetes/apps/media/immich/app/typesense/helmrelease.yaml On a separate but related note, I have noticed that the data used for the typesense image keeps growing and growing. Currently at 25 GB for ~80 GB of pics/videos. Not sure if that is normal or not but just wanted to see if that's also an issue that is happening. Any help anyone could provide would be very helpful. My cluster is running ubuntu 22.04 on Intel 7th gen
GitHub
heliod-cluster/kubernetes/apps/media/immich/app/typesense/helmrelea...
Test Cluster. Contribute to ModestTG/heliod-cluster development by creating an account on GitHub.
No description
10 Replies
bo0tzz
bo0tzz2y ago
Try deleting the typesense container and its volume, then bringing typesense back up and finally restarting the server and microservices containers
Eazy E
Eazy EOP2y ago
Thanks. I think I found the issue for me. The HelmRelease I'm using I got from someone else. The liveliness and readiness probes I think we're a little too agressive, so the pod would restart when the probes did not complete in time. Bumping up the probes seems to have fixed the issue. I'm not exactly sure if I should reduce them now that the big backlog has been cleared but I'll experiment with it in the future and see if that fixes my issue. Thanks for your response! Do you know if it's common for the Typesense container to use a lot of storage space? I just want to understand if that's normal or not.
bo0tzz
bo0tzz2y ago
I don't think that's normal. I saw in the kah discord that it went back down after doing a trim, right? I can imagine it doing a lot of io thus causing that, but it shouldn't be using a lot of disk space at any one point
Eazy E
Eazy EOP2y ago
It wen't down but only a few GBs. Right now it's sitting at 28 GB being reported from Longhorn. But a df -ah from inside the immich-typesense container shows
/dev/longhorn/pvc-572a358c-2989-4161-83cf-a0af47d72ff5 50G 605M 49G 2% /config #Columns go Size, Used, Free, Use%
/dev/longhorn/pvc-572a358c-2989-4161-83cf-a0af47d72ff5 50G 605M 49G 2% /config #Columns go Size, Used, Free, Use%
So it seems like a potential longhorn reporting problem? Not sure why that is.
Eazy E
Eazy EOP2y ago
Third row is my PV for Immich-typesense.
No description
bo0tzz
bo0tzz2y ago
No idea what's going on there tbqh I would just shrink the pvc down to like 2 Gi and call it a day :P
Alex Tran
Alex Tran2y ago
Can you use typesense 0.24.0 version instead of 0.24.1 to see if it fix your issue?
Eazy E
Eazy EOP2y ago
I'll give that a try. To be clear I don't have any issues anymore. It seems that Longhorn is potentially not correctly showing the used space for a given PVC. Immich appears to be functioning properly and everything seems to be working as intended.
FancyGUI
FancyGUI2y ago
FYI, longhorn is definitely showing the right usage for me. Getting into 75Gi right now and climbing. I'll try to delete and restart with the older version as well, but that's odd My setup is pretty much the same as yours @Eazy E
minch
minch2y ago
Hi Alex, I have the same issue with both versions of typesense ok, I think I have find something. I have deleted the typesense container and the tsdata volume before creating it again (using 0.24.0 version). And then in Immich, I run the "Tag objects" and "Encode Clip" jobs again now things seem to be working properly : no high IO, tsdata volume size is contained

Did you find this page helpful?