immich-machine-learning keeps restarting
No logs from that container.
I used the Install Script [Experimental] https://immich.app/docs/install/script. (I didn't change any files)
I'm trying to fix
Internal server error (500 - Internal Server Error)
when using the search bar.
I have run the Encode CLIP
job twice.
logs from typesense/typesense:0.24.0
:
Error response from daemon: configured logging driver does not support reading
151 Replies
From what you posted earlier, your machine-learning container is failing to start
What could be the cause?
Is the typesense container related to this issue?
probably not
Hard to say without any logs
docker logs --follow 979b7e5f5a48
doesn't show anything, just exits
So I'm not exactly sure how to see the logs
Oh, got this
Try
docker run -it --rm ghcr.io/immich-app/immich-machine-learning:release
Still no logs

Very strange, that docker run command really should output something
I'm looking up things on google but I don't really know what I should be looking for
Is there any way to reinstall everything without losing the images?
I used the
Install Script [Experimental]
and I want to reinstall with Docker Compose [Recommended]
does docker logs immich_machine_learning --follow have anything?
Also, what hardware are you using by the way?
No, same thing, just exits.
RPi 4, 4GB, with raspberry pi os 64bit
You may want to stop the container, remove it, delete the image, and then try to run the image manually again.
I feel like I've seen some other people with this problem and specifically for rasp pi
So:
docker stop immich_machine_learning
docker rm immich_machine_learning
Fails:
root@raspberrypi:/mnt/nas/Immich/immich-app# docker rmi immich_machine_learning
Error: No such image: immich_machine_learning
I don't know how to do it: run the image again
docker image rm ghcr.io/immich-app/immich-machine-learning
you can do docker image ls and delete by id too I think
I had to add
:release
. How do I run it again?
Try this one now:

I guess it shouldn't have exited with
-it
but it didWeird
Just as a sanity check can you try
docker run -it --rm hello-world
?
Can you copy/paste the sha256 for the machine learning one you just ran? a6ce?
Now i got logs :)

sha256:a6ce45adf8b8ea3637dd456bc70e9b7421a904c56801f54a6e3fb7247bc6d8e5
This is from another container, not the machine learning one.
If you run docker ps, it presumably would should that it's crashing still.
ah. true
shouldn't I be seeing it here?

docker ps -a

same
Oh right, we killed and removed it. You'd have to run docker-compose up -d again, but I don't think that's worthwhile to do until you can get it to run on it's own.

This is the one you should be using. Maybe for some reason you're picking up a different image.
do I repeat these steps to stop, and delete?
Exit code 132
You see
Restarting (132)
means it exited with a code of 132. That is a code fo SIGILL (illegal instruction).
I would probably try removing the the image again and re-pulling the one I just sent with the sha256 hash in it.okay
This may not be related to pulling the right/wrong image. It may just be a problem with machine learning not working on rpi for some reason.
I think you're actually already pulling the right one.

does docker ps -a still show exit 132?
yes

Sorry and thanks for helping btw!
Should I re-download the
docker-compose.yml and .env
?No
All that stuff is configured and working as expected, the machine learning container just doesn't work because of an illegal instruction, which is quite strange since it's supposed to have been built for arm. You're not the first person to have this issue though.
https://discord.com/channels/979116623879368755/1094203863734693888
is this container also in charge of the search api? I can't search anything, I get a 500 error
Machine learning being down would cause that, yes.
You can set an env MACHINE_LEARNING_URL=false then remove the machine learning section from the docker-compose file and bring it back up again.
Might want to just run it without machine learning for now.
Added to the .env and deleted this part:

yup
how do I restart everything? π
docker-compose down
docker compose up -d
actually
you can probably just do docker-compose up -d again
I wonder if you could try to get a core dump and send it over
If you tell me how, sure
I still get the 500 error π
I'll wait a bit just in case

When you run docker-compose up, donβt put in the -d so you can monitor the logs of all the containers on the foreground
Now I don't have the ml on the docker-compose.yml do i re-add it an run
docker compose up
?Can you run this:
Then, once inside the container run

Can you run
ls
now?Where can I find the dump?
I hope it is in that directory

core
is a core dump I believe.
Can you cat this file /proc/sys/kernel/core_pattern
?
Cool.
If you run
ls -l
it should show the size of the core filels in proc/... or in user/src/app?
no no in the current directory (usr/src/app) just to see the size of the core dump file

175MB? lol
Coredumps are chunky :)
If I can get it to my pc, I can send you it
I think this
curl -T core https://put.icu/upload/
should upload it and return a link that you can share here
(if they don't have a size limit)
Hm, that is if that image has curl of course π€curl not installed
Apk add curl

You could also try
apt install gdb
Sweet, nice
I wish you the best of luck reading that file hahaha
Anything else I can do for you or test out?
Can you run the command
python src/main.py
? Or same problem?literaly just exited docker. how do I get back in?
Oh sorry lol
just re-run the docker run command again

Hi @jrasm91 thatβs my exact case. Ml pod keeps restarting on raspi4 8 gb. Is there a fix for my case? I keep on searching but Iβve just read your message and it seems you already know the raspi case.
from transformers import pipeline
seems too be the issue


Updating it to 4.28.1 to see if it fixes it
Now it'll start up?
pip install --upgrade transformers
I'm now installing
pip install torchvision
same

maybe downgrading
Trying verison 4.12.2, installing rust
This is what should be pre-installed btw:
Where do I write this?
You don't need to do anything with it
Just letting you know those are the dependencies that are pre-installed in the image.
I've been trying with different transformers versions without any luck
So if you type
python
then from transformers import pipeline
it crashes?i'll try that, one sec
yes, crash

Yeah, idk. The debug thing didn't seem to help much. I couldn't make sense of it π’
chatGPT. I'm now checking CPU arquitecture requirements for transfrmers

@mecs can you try
python3 -c "import torch;print(torch.__version__)"
?
@jrasm91 I think it could be this https://github.com/pytorch/pytorch/issues/97226
(I'm yet to read the full issue)
same with python instead of python3
@mecs next up:
docker run -it --rm ghcr.io/immich-app/immich-machine-learning:main

π€¨
it's located in usr/src/main.py
without /app/
nope, not true
/usr/src/app/src/main.py
and I still get illegal instruction
Tricky tricky. Possibly some weird issue with arm v8 + torch?
Turns out
:main
was built longer ago than I expected
One sec
Alright, final one: docker run -it --rm ghcr.io/immich-app/immich-machine-learning:pr-2373
I'm pretty sure this should work now
Dangit
Is there anything else I could do to help?
I think the bug was fixed upstream but hasn't made it to us yet. Try the :main tag occasionally and see if it works, I guess? π
Oh, it got fixed? That's nice!
What's the usual release cycle?
^ this one
Ml builds against pytorch-nightly, but I dont know exactly what goes in there
So technically, if I run this, I should have ml?
If it doesn't crash, yeah
π

still crashes
Rip :/
Does this happen on all Raspberry Pis?
I imagine so, but we haven't seen many reports of it I believe
Is it still happening on 1.55.0?
Can confirm it happening on my RPI 4 8gb (https://canary.discord.com/channels/979116623879368755/1105179099477061826/1105179099477061826)
Even on the newest v1.55.1
Are you using 32bit or 64bit OS on your Pi4?
64bit ubuntu 22.04
was the v1.55 ml compiled with pytorch 2.0.1? I see it just got published yesterday
I am not sure we were using the nightly build because of the arm64 support
let's see if it makes to a stable version

let me see if i can reproduce the error using the steps in https://github.com/pytorch/pytorch/issues/97226
If it works, the that means the problem was fixed upstream, and the problem lies in the ml image
GitHub
Illegal instruction (core dumped) : PyTorch 2.0 on Raspberry Pi 4.0...
π Describe the bug Virgin install on Raspberry Pi 4.0 8gb pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu python3 -c "import torch;print(torch.__vers...
seems to work on my machine
but
The ml image is probably just using a channel that doesn't have the fix yet
I am experiencing with pinning a version now
just checked, and installing pytorch using the command in https://github.com/immich-app/immich/blob/main/machine-learning/Dockerfile seems to work, so probably just no build with a working one yet?
GitHub
immich/Dockerfile at main Β· immich-app/immich
Self-hosted photo and video backup solution directly from your mobile phone. - immich/Dockerfile at main Β· immich-app/immich
i can try to build ml image right now and check that
When all the jobs in this PR complete https://github.com/immich-app/immich/pull/2416 can you try to use
pr-2416
as the image tag to pull and test?sure!
seems like it doesn't build correctly though
https://github.com/immich-app/immich/actions/runs/4930289928/jobs/8811012368?pr=2416
GitHub
chore(ml): pin pytorch version Β· immich-app/immich@b368567
Self-hosted photo and video backup solution directly from your mobile phone. - chore(ml): pin pytorch version Β· immich-app/immich@b368567
hmm
Not available for arm64 yet
we just have to wait and see then π€·
You mean the library or the immich image?
Library
Hi ! I've been following this thread, I had the same problem on my Rpi4 8gb. Today I tried to build the machine learning container from source (without using PR#2416) and it works flawlessly π₯³ So when the next release is going out it should work again for everybody
If building from source worked then using
ghcr.io/immich-app/immich-machine-learning:main
should work too - can you try that to be sure?And it doesn't work ... Still error 132
Can you try
pr-2416
as the tag?Is it possible that layer is cached?
It worked with the PR
I'll dump the cache and have it build again, good suggestion
Oh huh, it's not actually possible to delete a tag π€
It does cache it indeed

So all we should need (I think) is to bust that cache, but I'm not sure how
Alex's PR changes that layer itself which would prevent the cache from being used.
Can confirm:
pr-2416
doesn't crash anymore!By now
main
should work as well π¬trying rn
It does!
Awesome!
\o/ sweet
Can confirm, doesnβt crash anymore :)
