machine learning keeps crashing

Hello! I'm running immich in kubernetes. Today the machine learning pod keeps crashlooping. The log doesn't say much either
Defaulted container "immich-machine-learning" out of: immich-machine-learning, provide-timezone (init)
INFO: Started server process [1]
INFO: Waiting for application startup.
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configurat
ion.
Downloading pytorch_model.bin: 31%|███ | 189M/605M [02:41<06:00, 1.16MB/s]⏎
Defaulted container "immich-machine-learning" out of: immich-machine-learning, provide-timezone (init)
INFO: Started server process [1]
INFO: Waiting for application startup.
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configurat
ion.
Downloading pytorch_model.bin: 31%|███ | 189M/605M [02:41<06:00, 1.16MB/s]⏎
The pod restarted around this point so I don't know what I should look for. The memory limit to the pod is 4Gi and the usage tops at 850Mi before crashing so I think this is not OOMKilled.
7 Replies
budimanjojo
budimanjojoOP2y ago
I managed to fix this by increasing the startupprobe to a very long time (10 minutes).
bo0tzz
bo0tzz2y ago
This is the way indeed, at startup it downloads the ml models which can take a while Do you have a cache volume configured for it?
budimanjojo
budimanjojoOP2y ago
no, maybe I should? where is the cache mounted?
bo0tzz
bo0tzz2y ago
Whatever path TRANSFORMERS_CACHE is set to, which by default is /cache
budimanjojo
budimanjojoOP2y ago
thanks!
badmannen
badmannen10mo ago
how did you do that?
RubenHensen
RubenHensen5mo ago
I am wondering this too

Did you find this page helpful?