Immich•2y ago

immich-machine-learning keeps restarting

No logs from that container. I used the Install Script [Experimental] https://immich.app/docs/install/script. (I didn't change any files) I'm trying to fix Internal server error (500 - Internal Server Error) when using the search bar. I have run the Encode CLIP job twice. logs from typesense/typesense:0.24.0: Error response from daemon: configured logging driver does not support reading

151 Replies

bo0tzz•2y ago

From what you posted earlier, your machine-learning container is failing to start

mecsOP•2y ago

What could be the cause? Is the typesense container related to this issue?

bo0tzz•2y ago

probably not Hard to say without any logs

mecsOP•2y ago

docker logs --follow 979b7e5f5a48 doesn't show anything, just exits So I'm not exactly sure how to see the logs Oh, got this

root@raspberrypi:/mnt/nas/Immich/immich-app# docker run -it 979b7e5f5a48
Unable to find image '979b7e5f5a48:latest' locally
docker: Error response from daemon: pull access denied for 979b7e5f5a48, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.
See 'docker run --help'.

root@raspberrypi:/mnt/nas/Immich/immich-app# docker run -it 979b7e5f5a48
Unable to find image '979b7e5f5a48:latest' locally
docker: Error response from daemon: pull access denied for 979b7e5f5a48, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.
See 'docker run --help'.

bo0tzz•2y ago

Try docker run -it --rm ghcr.io/immich-app/immich-machine-learning:release

mecsOP•2y ago

Still no logs

bo0tzz•2y ago

Very strange, that docker run command really should output something

mecsOP•2y ago

I'm looking up things on google but I don't really know what I should be looking for Is there any way to reinstall everything without losing the images? I used the Install Script [Experimental] and I want to reinstall with Docker Compose [Recommended]

jrasm91•2y ago

does docker logs immich_machine_learning --follow have anything? Also, what hardware are you using by the way?

mecsOP•2y ago

No, same thing, just exits. RPi 4, 4GB, with raspberry pi os 64bit

jrasm91•2y ago

You may want to stop the container, remove it, delete the image, and then try to run the image manually again. I feel like I've seen some other people with this problem and specifically for rasp pi

mecsOP•2y ago

So: docker stop immich_machine_learning docker rm immich_machine_learning Fails: root@raspberrypi:/mnt/nas/Immich/immich-app# docker rmi immich_machine_learning Error: No such image: immich_machine_learning I don't know how to do it: run the image again

jrasm91•2y ago

docker image rm ghcr.io/immich-app/immich-machine-learning you can do docker image ls and delete by id too I think

mecsOP•2y ago

I had to add :release. How do I run it again?

xroot@raspberrypi:/mnt/nas/Immich/immich-app# docker image rm ghcr.io/immich-app/immich-machine-learning:release
Untagged: ghcr.io/immich-app/immich-machine-learning:release
Untagged: ghcr.io/immich-app/immich-machine-learning@sha256:a6ce45adf8b8ea3637dd456bc70e9b7421a904c56801f54a6e3fb7247bc6d8e5
Deleted: sha256:98622afa746fbfeeba3aafc677eb7bccf5662619f08a464572d1177d2907e0bd
Deleted: sha256:62ab2e1c8d6c0053724d3e7042c188c8b21fa385d2b6994764695e8f1ddd5c13
Deleted: sha256:69726ca08fdd0add8416ae2ffb13f5b20ceeb856f808ee0a636a84179797a2d3
Deleted: sha256:63ec1fa6720416e4cd42bf9ba12d706ee4a47e917cd5206331074dffcacef906
Deleted: sha256:3bf524d34e1e312e81b00a6858cb94496f75f31cd99eb6744aef5b0374cf7c8b
Deleted: sha256:12be857a6c58793b581cea5cb3ac3c2c35bf53dede08af59dbf2eee0da03ac22
Deleted: sha256:ca15a2621193b40988bff6b145c4ebc0db21590cdfd79fdb7e6f006cc174f277
Deleted: sha256:4fcb0f5075379b04fec1a4605a73fa461b6beb571326070b0c8e819f79cdbe7d

xroot@raspberrypi:/mnt/nas/Immich/immich-app# docker image rm ghcr.io/immich-app/immich-machine-learning:release
Untagged: ghcr.io/immich-app/immich-machine-learning:release
Untagged: ghcr.io/immich-app/immich-machine-learning@sha256:a6ce45adf8b8ea3637dd456bc70e9b7421a904c56801f54a6e3fb7247bc6d8e5
Deleted: sha256:98622afa746fbfeeba3aafc677eb7bccf5662619f08a464572d1177d2907e0bd
Deleted: sha256:62ab2e1c8d6c0053724d3e7042c188c8b21fa385d2b6994764695e8f1ddd5c13
Deleted: sha256:69726ca08fdd0add8416ae2ffb13f5b20ceeb856f808ee0a636a84179797a2d3
Deleted: sha256:63ec1fa6720416e4cd42bf9ba12d706ee4a47e917cd5206331074dffcacef906
Deleted: sha256:3bf524d34e1e312e81b00a6858cb94496f75f31cd99eb6744aef5b0374cf7c8b
Deleted: sha256:12be857a6c58793b581cea5cb3ac3c2c35bf53dede08af59dbf2eee0da03ac22
Deleted: sha256:ca15a2621193b40988bff6b145c4ebc0db21590cdfd79fdb7e6f006cc174f277
Deleted: sha256:4fcb0f5075379b04fec1a4605a73fa461b6beb571326070b0c8e819f79cdbe7d

jrasm91•2y ago

Try this one now:

docker run -it --rm ghcr.io/immich-app/immich-machine-learning:release

docker run -it --rm ghcr.io/immich-app/immich-machine-learning:release

mecsOP•2y ago

I guess it shouldn't have exited with -it but it did

bo0tzz•2y ago

Weird Just as a sanity check can you try docker run -it --rm hello-world?

mecsOP•2y ago

jrasm91•2y ago

Can you copy/paste the sha256 for the machine learning one you just ran? a6ce?

mecsOP•2y ago

Now i got logs :)

mecsOP•2y ago

sha256:a6ce45adf8b8ea3637dd456bc70e9b7421a904c56801f54a6e3fb7247bc6d8e5

[Nest] 1  - 05/03/2023, 3:30:28 PM    WARN [ImmichServer] Machine learning is enabled
[Nest] 1  - 05/03/2023, 3:30:28 PM    WARN [ImmichServer] Search is enabled
[Nest] 1  - 05/03/2023, 3:46:37 PM   ERROR [ExceptionsHandler] getaddrinfo ENOTFOUND immich-machine-learning
Error: getaddrinfo ENOTFOUND immich-machine-learning
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:71:26)
[Nest] 1  - 05/03/2023, 4:11:17 PM   ERROR [ExceptionsHandler] getaddrinfo ENOTFOUND immich-machine-learning
Error: getaddrinfo ENOTFOUND immich-machine-learning
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:71:26)

[Nest] 1  - 05/03/2023, 3:30:28 PM    WARN [ImmichServer] Machine learning is enabled
[Nest] 1  - 05/03/2023, 3:30:28 PM    WARN [ImmichServer] Search is enabled
[Nest] 1  - 05/03/2023, 3:46:37 PM   ERROR [ExceptionsHandler] getaddrinfo ENOTFOUND immich-machine-learning
Error: getaddrinfo ENOTFOUND immich-machine-learning
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:71:26)
[Nest] 1  - 05/03/2023, 4:11:17 PM   ERROR [ExceptionsHandler] getaddrinfo ENOTFOUND immich-machine-learning
Error: getaddrinfo ENOTFOUND immich-machine-learning
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:71:26)

jrasm91•2y ago

This is from another container, not the machine learning one. If you run docker ps, it presumably would should that it's crashing still.

mecsOP•2y ago

ah. true

mecsOP•2y ago

shouldn't I be seeing it here?

jrasm91•2y ago

docker ps -a

mecsOP•2y ago

same

jrasm91•2y ago

Oh right, we killed and removed it. You'd have to run docker-compose up -d again, but I don't think that's worthwhile to do until you can get it to run on it's own.

mecsOP•2y ago

jrasm91•2y ago

This is the one you should be using. Maybe for some reason you're picking up a different image.

docker pull ghcr.io/immich-app/immich-machine-learning:v1.54.1@sha256:7783a83c09fadf483ed3367c9c412817d0c58c3c42ce101070a0cac1aaa020a4

docker pull ghcr.io/immich-app/immich-machine-learning:v1.54.1@sha256:7783a83c09fadf483ed3367c9c412817d0c58c3c42ce101070a0cac1aaa020a4

mecsOP•2y ago

do I repeat these steps to stop, and delete?

jrasm91•2y ago

Exit code 132 You see Restarting (132) means it exited with a code of 132. That is a code fo SIGILL (illegal instruction). I would probably try removing the the image again and re-pulling the one I just sent with the sha256 hash in it.

mecsOP•2y ago

okay

jrasm91•2y ago

This may not be related to pulling the right/wrong image. It may just be a problem with machine learning not working on rpi for some reason. I think you're actually already pulling the right one.

mecsOP•2y ago

docker stop immich_machine_learning
docker rm immich_machine_learning
docker image rm 98622afa746f
docker pull ghcr.io/immich-app/immich-machine-learning:v1.54.1@sha256:7783a83c09fadf483ed3367c9c412817d0c58c3c42ce101070a0cac1aaa020a4
docker compose up -d

docker stop immich_machine_learning
docker rm immich_machine_learning
docker image rm 98622afa746f
docker pull ghcr.io/immich-app/immich-machine-learning:v1.54.1@sha256:7783a83c09fadf483ed3367c9c412817d0c58c3c42ce101070a0cac1aaa020a4
docker compose up -d

jrasm91•2y ago

does docker ps -a still show exit 132?

mecsOP•2y ago

yes

mecsOP•2y ago

Sorry and thanks for helping btw! Should I re-download the docker-compose.yml and .env?

jrasm91•2y ago

No All that stuff is configured and working as expected, the machine learning container just doesn't work because of an illegal instruction, which is quite strange since it's supposed to have been built for arm. You're not the first person to have this issue though. https://discord.com/channels/979116623879368755/1094203863734693888

mecsOP•2y ago

is this container also in charge of the search api? I can't search anything, I get a 500 error

jrasm91•2y ago

Machine learning being down would cause that, yes. You can set an env MACHINE_LEARNING_URL=false then remove the machine learning section from the docker-compose file and bring it back up again. Might want to just run it without machine learning for now.

mecsOP•2y ago

Added to the .env and deleted this part:

jrasm91•2y ago

yup

mecsOP•2y ago

how do I restart everything? 😅

jrasm91•2y ago

docker-compose down docker compose up -d actually you can probably just do docker-compose up -d again I wonder if you could try to get a core dump and send it over

mecsOP•2y ago

If you tell me how, sure

mecsOP•2y ago

I still get the 500 error 😅 I'll wait a bit just in case

Alex Tran•2y ago

When you run docker-compose up, don’t put in the -d so you can monitor the logs of all the containers on the foreground

mecsOP•2y ago

Now I don't have the ml on the docker-compose.yml do i re-add it an run docker compose up?

jrasm91•2y ago

Can you run this:

docker run -it --rm ghcr.io/immich-app/immich-machine-learning:release /bin/bash

docker run -it --rm ghcr.io/immich-app/immich-machine-learning:release /bin/bash

Then, once inside the container run

ulimit -c unlimited
gunicorn src.main:server

ulimit -c unlimited
gunicorn src.main:server

mecsOP•2y ago

jrasm91•2y ago

Can you run ls now?

mecsOP•2y ago

Where can I find the dump?

jrasm91•2y ago

I hope it is in that directory

mecsOP•2y ago

jrasm91•2y ago

core is a core dump I believe. Can you cat this file /proc/sys/kernel/core_pattern?

mecsOP•2y ago

jrasm91•2y ago

Cool. If you run ls -l it should show the size of the core file

mecsOP•2y ago

ls in proc/... or in user/src/app?

jrasm91•2y ago

no no in the current directory (usr/src/app) just to see the size of the core dump file

mecsOP•2y ago

jrasm91•2y ago

175MB? lol

bo0tzz•2y ago

Coredumps are chunky :)

mecsOP•2y ago

If I can get it to my pc, I can send you it

bo0tzz•2y ago

I think this curl -T core https://put.icu/upload/ should upload it and return a link that you can share here (if they don't have a size limit) Hm, that is if that image has curl of course 🤔

mecsOP•2y ago

curl not installed

Alex Tran•2y ago

Apk add curl

mecsOP•2y ago

jrasm91•2y ago

apt update
apt install curl

apt update
apt install curl

You could also try apt install gdb

mecsOP•2y ago

https://put.icu/oj54bkpb.file

jrasm91•2y ago

Sweet, nice

mecsOP•2y ago

I wish you the best of luck reading that file hahaha Anything else I can do for you or test out?

jrasm91•2y ago

Can you run the command python src/main.py? Or same problem?

mecsOP•2y ago

literaly just exited docker. how do I get back in?

jrasm91•2y ago

Oh sorry lol just re-run the docker run command again

mecsOP•2y ago

main.py

pie•2y ago

Hi @jrasm91 that’s my exact case. Ml pod keeps restarting on raspi4 8 gb. Is there a fix for my case? I keep on searching but I’ve just read your message and it seems you already know the raspi case.

mecsOP•2y ago

from transformers import pipeline seems too be the issue

mecsOP•2y ago

Updating it to 4.28.1 to see if it fixes it

jrasm91•2y ago

Now it'll start up?

mecsOP•2y ago

pip install --upgrade transformers

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
sentence-transformers 2.2.2 requires torchvision, which is not installed.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
sentence-transformers 2.2.2 requires torchvision, which is not installed.

I'm now installing pip install torchvision

mecsOP•2y ago

same

mecsOP•2y ago

maybe downgrading Trying verison 4.12.2, installing rust

jrasm91•2y ago

This is what should be pre-installed btw:

RUN /opt/venv/bin/pip install --pre torch  -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
RUN /opt/venv/bin/pip install transformers tqdm numpy scikit-learn scipy nltk sentencepiece flask Pillow gunicorn
RUN /opt/venv/bin/pip install --no-deps sentence-transformers

RUN /opt/venv/bin/pip install --pre torch  -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
RUN /opt/venv/bin/pip install transformers tqdm numpy scikit-learn scipy nltk sentencepiece flask Pillow gunicorn
RUN /opt/venv/bin/pip install --no-deps sentence-transformers

mecsOP•2y ago

Where do I write this?

jrasm91•2y ago

You don't need to do anything with it Just letting you know those are the dependencies that are pre-installed in the image.

mecsOP•2y ago

I've been trying with different transformers versions without any luck

jrasm91•2y ago

So if you type python then from transformers import pipeline it crashes?

mecsOP•2y ago

i'll try that, one sec

mecsOP•2y ago

yes, crash

jrasm91•2y ago

Yeah, idk. The debug thing didn't seem to help much. I couldn't make sense of it 😢

mecsOP•2y ago

chatGPT. I'm now checking CPU arquitecture requirements for transfrmers

bo0tzz•2y ago

@mecs can you try python3 -c "import torch;print(torch.__version__)"?

mecsOP•2y ago

bo0tzz•2y ago

@jrasm91 I think it could be this https://github.com/pytorch/pytorch/issues/97226 (I'm yet to read the full issue)

mecsOP•2y ago

same with python instead of python3

bo0tzz•2y ago

@mecs next up: docker run -it --rm ghcr.io/immich-app/immich-machine-learning:main

mecsOP•2y ago

bo0tzz•2y ago

🤨

mecsOP•2y ago

it's located in usr/src/main.py without /app/ nope, not true /usr/src/app/src/main.py and I still get illegal instruction

jrasm91•2y ago

Tricky tricky. Possibly some weird issue with arm v8 + torch?

bo0tzz•2y ago

Turns out :main was built longer ago than I expected One sec Alright, final one: docker run -it --rm ghcr.io/immich-app/immich-machine-learning:pr-2373 I'm pretty sure this should work now

mecsOP•2y ago

bo0tzz•2y ago

Dangit

mecsOP•2y ago

Is there anything else I could do to help?

bo0tzz•2y ago

I think the bug was fixed upstream but hasn't made it to us yet. Try the :main tag occasionally and see if it works, I guess? 😅

mecsOP•2y ago

Oh, it got fixed? That's nice! What's the usual release cycle?

bo0tzz•2y ago

^ this one Ml builds against pytorch-nightly, but I dont know exactly what goes in there

mecsOP•2y ago

So technically, if I run this, I should have ml?

bo0tzz•2y ago

If it doesn't crash, yeah

mecsOP•2y ago

😅

mecsOP•2y ago

still crashes

bo0tzz•2y ago

Rip :/

mecsOP•2y ago

Does this happen on all Raspberry Pis?

bo0tzz•2y ago

I imagine so, but we haven't seen many reports of it I believe Is it still happening on 1.55.0?

Bruhtek•2y ago

Can confirm it happening on my RPI 4 8gb (https://canary.discord.com/channels/979116623879368755/1105179099477061826/1105179099477061826) Even on the newest v1.55.1

Alex Tran•2y ago

Are you using 32bit or 64bit OS on your Pi4?

Bruhtek•2y ago

64bit ubuntu 22.04 was the v1.55 ml compiled with pytorch 2.0.1? I see it just got published yesterday

Alex Tran•2y ago

I am not sure we were using the nightly build because of the arm64 support let's see if it makes to a stable version

Alex Tran•2y ago

Bruhtek•2y ago

let me see if i can reproduce the error using the steps in https://github.com/pytorch/pytorch/issues/97226

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
python3 -c "import torch;print(torch.__version__)"

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
python3 -c "import torch;print(torch.__version__)"

If it works, the that means the problem was fixed upstream, and the problem lies in the ml image

GitHub

Illegal instruction (core dumped) : PyTorch 2.0 on Raspberry Pi 4.0...

🐛 Describe the bug Virgin install on Raspberry Pi 4.0 8gb pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu python3 -c "import torch;print(torch.__vers...

Bruhtek•2y ago

❯ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
(ommited)
❯ python3 -c "import torch;print(torch.__version__)"
2.0.1

❯ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
(ommited)
❯ python3 -c "import torch;print(torch.__version__)"
2.0.1

seems to work on my machine but

❯ docker run -it --rm ghcr.io/immich-app/immich-machine-learning:main

╭─   ~                                                                                                                          ✘ ILL  6s bruhtek@bruhtek  22:34:29
╰─❯

❯ docker run -it --rm ghcr.io/immich-app/immich-machine-learning:main

╭─   ~                                                                                                                          ✘ ILL  6s bruhtek@bruhtek  22:34:29
╰─❯

bo0tzz•2y ago

The ml image is probably just using a channel that doesn't have the fix yet

Alex Tran•2y ago

I am experiencing with pinning a version now

Bruhtek•2y ago

just checked, and installing pytorch using the command in https://github.com/immich-app/immich/blob/main/machine-learning/Dockerfile seems to work, so probably just no build with a working one yet?

❯ pip install --pre torch  -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
(omitted)
Successfully installed torch-2.0.1
❯ python3 -c "import torch;print(torch.__version__)"
2.0.1

❯ pip install --pre torch  -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
(omitted)
Successfully installed torch-2.0.1
❯ python3 -c "import torch;print(torch.__version__)"
2.0.1

GitHub

immich/Dockerfile at main · immich-app/immich

Self-hosted photo and video backup solution directly from your mobile phone. - immich/Dockerfile at main · immich-app/immich

Bruhtek•2y ago

i can try to build ml image right now and check that

Alex Tran•2y ago

When all the jobs in this PR complete https://github.com/immich-app/immich/pull/2416 can you try to use pr-2416 as the image tag to pull and test?

GitHub

chore(ml): pin pytorch version by alextran1502 · Pull Request #2416...

Bruhtek•2y ago

sure!

Bruhtek•2y ago

seems like it doesn't build correctly though https://github.com/immich-app/immich/actions/runs/4930289928/jobs/8811012368?pr=2416

GitHub

chore(ml): pin pytorch version · immich-app/immich@b368567

Self-hosted photo and video backup solution directly from your mobile phone. - chore(ml): pin pytorch version · immich-app/immich@b368567

Alex Tran•2y ago

hmm Not available for arm64 yet

Bruhtek•2y ago

we just have to wait and see then 🤷

mecsOP•2y ago

You mean the library or the immich image?

Alex Tran•2y ago

Library

UberFluff•2y ago

Hi ! I've been following this thread, I had the same problem on my Rpi4 8gb. Today I tried to build the machine learning container from source (without using PR#2416) and it works flawlessly 🥳 So when the next release is going out it should work again for everybody

bo0tzz•2y ago

If building from source worked then using ghcr.io/immich-app/immich-machine-learning:main should work too - can you try that to be sure?

UberFluff•2y ago

And it doesn't work ... Still error 132

Alex Tran•2y ago

Can you try pr-2416 as the tag?

jrasm91•2y ago

Is it possible that layer is cached?

UberFluff•2y ago

It worked with the PR

bo0tzz•2y ago

I'll dump the cache and have it build again, good suggestion Oh huh, it's not actually possible to delete a tag 🤔

bo0tzz•2y ago

It does cache it indeed

bo0tzz•2y ago

So all we should need (I think) is to bust that cache, but I'm not sure how

jrasm91•2y ago

Alex's PR changes that layer itself which would prevent the cache from being used.

Bruhtek•2y ago

Can confirm: pr-2416 doesn't crash anymore!

bo0tzz•2y ago

By now main should work as well 😬

Bruhtek•2y ago

trying rn It does! Awesome!

bo0tzz•2y ago

\o/ sweet

mecsOP•2y ago

Can confirm, doesn’t crash anymore :)

Gaming

Programming

immich-machine-learning keeps restarting

Did you find this page helpful?