New uploads not getting tagged
Faces are getting found but they are not getting tagged. This applies to all new uploads. I've tried deleting the various images and rebuilding the stack (on unRAID), but the problem persists. The only errors I see are in the TypeSense container that look like this. Thanks for any advice. For what it's worth, objects are being found - and I can search for them. It's just that faces are not being tagged.
I've re-run face tagging - and it goes through 100,000+ faces - but they are not being found and saved.
E20230917 16:01:41.338583 375 raft_server.h:62] Peer refresh failed, error: Doing another configuration change
W20230917 16:01:51.339373 357 node.cpp:811] [default_group:172.20.0.6:8107:8108 ] Refusing concurrent configuration changing
20 Replies
Try deleting the typesense data volume and then restarting the stack
Just deleting the container, doesn't seem to delete the volume. Any thoughts as to where the volume data are stored? I can try googling around, but when I wanted to delete the database it took a long time to figure out exactly where that was stored. It persists even between stack restarting. Even deleting all the images and pulling the entire thing again, didn't help until I found out exactly where the database was stored.
It's stored in a docker volume
I deleted the /data/db folder in the TypeSense container and am now re-running face recognition. I'll report back if it works. Just deleting the docker image doesn't work - the data persist. Thanks.
OK - I nuked everything in the TypeSense container, deleted it, and restarted the stack. It recreated the TypeSense container - but the problem persists. New pictures are not face tagged, and when I re-run "tag missing faces" - it runs over 100,000 faces - and doesn't tag them. I can manually tag them - and it will merge them with the existing photo tags, but they are not being recognized. I've manually tagged about 100 people, and new photos - with very clear pictures - are not being recognized. I appreciate any advice or troubleshooting that I can do. Thanks.
I'm getting a lot of these errors
immich_microservices | {
immich_microservices | "id": "73f04018-dc3a-40d5-863b-475236dd180b"
immich_microservices | }
immich_microservices |
immich_microservices | [Nest] 7 - 09/27/2023, 4:31:51 AM ERROR [JobService] Unable to run job handler (recognizeFaces/recognize-faces): TypeError: fetch failed
immich_microservices | [Nest] 7 - 09/27/2023, 4:31:51 AM ERROR [JobService] TypeError: fetch failed
immich_microservices | at Object.fetch (node:internal/deps/undici/undici:11576:11)
immich_microservices | at async MachineLearningRepository.post (/usr/src/app/dist/infra/repositories/machine-learning.repository.js:27:21)
immich_microservices | at async FacialRecognitionService.handleRecognizeFaces (/usr/src/app/dist/domain/facial-recognition/facial-recognition.services.js:105:23)
immich_microservices | at async /usr/src/app/dist/domain/job/job.service.js:107:37
immich_microservices | at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:346:28)
immich_microservices | at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:531:24)
immich_microservices | [Nest] 7 - 09/27/2023, 4:31:51 AM ERROR [JobService] Object:
What do your ml logs say?
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
/opt/venv/lib/python3.11/site-packages/transformers/models/convnext/feature_extraction_convnext.py:28: FutureWarning: The class ConvNextFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use ConvNextImageProcessor instead.
warnings.warn(
[09/27/23 04:42:51] INFO Starting gunicorn 21.2.0
[09/27/23 04:42:51] INFO Listening at: http://0.0.0.0:3003 (9)
[09/27/23 04:42:51] INFO Using worker: uvicorn.workers.UvicornWorker
[09/27/23 04:42:51] INFO Booting worker with pid: 10
[09/27/23 04:43:04] INFO Created in-memory cache with unloading disabled.
[09/27/23 04:43:04] INFO Initialized request thread pool with 12 threads.
[09/27/23 04:53:41] INFO Loading clip model 'ViT-B-32::openai'
[09/27/23 04:53:44] INFO Loading image classification model
'microsoft/resnet-50'
[09/27/23 04:53:44] INFO Loading facial recognition model 'buffalo_l'
[09/27/23 04:42:51] INFO Listening at: http://0.0.0.0:3003 (9)
[09/27/23 04:42:51] INFO Using worker: uvicorn.workers.UvicornWorker
[09/27/23 04:42:51] INFO Booting worker with pid: 10
[09/27/23 04:43:04] INFO Created in-memory cache with unloading disabled.
[09/27/23 04:43:04] INFO Initialized request thread pool with 12 threads.
[09/27/23 04:53:41] INFO Loading clip model 'ViT-B-32::openai'
[09/27/23 04:53:44] INFO Loading image classification model
'microsoft/resnet-50'
[09/27/23 04:53:44] INFO Loading facial recognition model 'buffalo_l'
Is it restarting over and over?
How much RAM do you have?
32gb
it runs "recognize faces" and then stops. Right now the docker image utilization is 100% - filled up with model cache, ts data and pg data.
So, it's running very slowly.
If your docker image is full that's pretty much certain to cause problems
Right, but that's not the source of the errors. It just slows things down.
How so? If it's full then I would expect eg. downloading of ML models to fail, the database would have trouble, and more
Right, but I don't want this to be a red herring. I've had the face recognition problem for awhile. I imported 200,000 pictures with no problem, and they were labeled. Since then (last few weeks) I have to manually tag everyone. The docker image only fills up when I merge faces (a known problem). But the non labeling of faces has been a problem for weeks, and I can't seem to chase it down. I suspect there are communication problems between the containers. And that reflects my own lack of understanding of how docker containers communicate with each other.
What I'm seeing here is:
1. Microservices is having trouble communicating with ML
2. ML is seemingly crashing/restarting
3. You mention your docker img is full
Unless there are other errors in the logs, or you're running out of RAM or such, to me the logical explanation is that your docker image is full and that's why ML is having problems.
Okay so I need to set up some shares that are not part of the docker image app data folder. And then redirect those files to somewhere else on the server.
The faces are getting recognized correctly. The index face that comes up is correct. It just doesn't have a name associated with it.
Oh, I may have misunderstood the problem then?
Faces won't have a name until you assign one
Yes, I've done that - for about 100 people. And the new pictures find the faces - and the index face it shows is correct. But it is not assigning the name. And when I give it the name, it asks if I want to merge it with the existing person - which I can do. But it shouldn't have to ask - these are all good pictures.

I have 1000 pictures of my wife tagged already.
If I go ahead and name that picture, it will merge the two - but I don't know why it can't match it
That's just due to the recognition distance threshold in the machine learning settings. You can tweak that if you want to
Any thoughts about these typesense errors?
W20230927 19:16:28.209331 358 raft_server.cpp:570] Single-node with no leader. Resetting peers.
W20230927 19:16:28.209357 358 node.cpp:866] node default_group:172.24.0.2:8107:8108 is in state ERROR, can't reset_peer
W20230927 19:16:38.210311 358 raft_server.cpp:570] Single-node with no leader. Resetting peers.
W20230927 19:16:38.210326 358 node.cpp:866] node default_group:172.24.0.2:8107:8108 is in state ERROR, can't reset_peer
there are thousands of them