I
Immich2y ago
volcs0

New uploads not getting tagged

Faces are getting found but they are not getting tagged. This applies to all new uploads. I've tried deleting the various images and rebuilding the stack (on unRAID), but the problem persists. The only errors I see are in the TypeSense container that look like this. Thanks for any advice. For what it's worth, objects are being found - and I can search for them. It's just that faces are not being tagged. I've re-run face tagging - and it goes through 100,000+ faces - but they are not being found and saved. E20230917 16:01:41.338583 375 raft_server.h:62] Peer refresh failed, error: Doing another configuration change W20230917 16:01:51.339373 357 node.cpp:811] [default_group:172.20.0.6:8107:8108 ] Refusing concurrent configuration changing
20 Replies
bo0tzz
bo0tzz2y ago
Try deleting the typesense data volume and then restarting the stack
volcs0
volcs0OP2y ago
Just deleting the container, doesn't seem to delete the volume. Any thoughts as to where the volume data are stored? I can try googling around, but when I wanted to delete the database it took a long time to figure out exactly where that was stored. It persists even between stack restarting. Even deleting all the images and pulling the entire thing again, didn't help until I found out exactly where the database was stored.
bo0tzz
bo0tzz2y ago
It's stored in a docker volume
volcs0
volcs0OP2y ago
I deleted the /data/db folder in the TypeSense container and am now re-running face recognition. I'll report back if it works. Just deleting the docker image doesn't work - the data persist. Thanks. OK - I nuked everything in the TypeSense container, deleted it, and restarted the stack. It recreated the TypeSense container - but the problem persists. New pictures are not face tagged, and when I re-run "tag missing faces" - it runs over 100,000 faces - and doesn't tag them. I can manually tag them - and it will merge them with the existing photo tags, but they are not being recognized. I've manually tagged about 100 people, and new photos - with very clear pictures - are not being recognized. I appreciate any advice or troubleshooting that I can do. Thanks. I'm getting a lot of these errors immich_microservices | { immich_microservices | "id": "73f04018-dc3a-40d5-863b-475236dd180b" immich_microservices | } immich_microservices | immich_microservices | [Nest] 7 - 09/27/2023, 4:31:51 AM ERROR [JobService] Unable to run job handler (recognizeFaces/recognize-faces): TypeError: fetch failed immich_microservices | [Nest] 7 - 09/27/2023, 4:31:51 AM ERROR [JobService] TypeError: fetch failed immich_microservices | at Object.fetch (node:internal/deps/undici/undici:11576:11) immich_microservices | at async MachineLearningRepository.post (/usr/src/app/dist/infra/repositories/machine-learning.repository.js:27:21) immich_microservices | at async FacialRecognitionService.handleRecognizeFaces (/usr/src/app/dist/domain/facial-recognition/facial-recognition.services.js:105:23) immich_microservices | at async /usr/src/app/dist/domain/job/job.service.js:107:37 immich_microservices | at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:346:28) immich_microservices | at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:531:24) immich_microservices | [Nest] 7 - 09/27/2023, 4:31:51 AM ERROR [JobService] Object:
bo0tzz
bo0tzz2y ago
What do your ml logs say?
volcs0
volcs0OP2y ago
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration. Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration. /opt/venv/lib/python3.11/site-packages/transformers/models/convnext/feature_extraction_convnext.py:28: FutureWarning: The class ConvNextFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use ConvNextImageProcessor instead. warnings.warn( [09/27/23 04:42:51] INFO Starting gunicorn 21.2.0
[09/27/23 04:42:51] INFO Listening at: http://0.0.0.0:3003 (9)
[09/27/23 04:42:51] INFO Using worker: uvicorn.workers.UvicornWorker
[09/27/23 04:42:51] INFO Booting worker with pid: 10
[09/27/23 04:43:04] INFO Created in-memory cache with unloading disabled.
[09/27/23 04:43:04] INFO Initialized request thread pool with 12 threads.
[09/27/23 04:53:41] INFO Loading clip model 'ViT-B-32::openai'
[09/27/23 04:53:44] INFO Loading image classification model
'microsoft/resnet-50'
[09/27/23 04:53:44] INFO Loading facial recognition model 'buffalo_l'
bo0tzz
bo0tzz2y ago
Is it restarting over and over? How much RAM do you have?
volcs0
volcs0OP2y ago
32gb it runs "recognize faces" and then stops. Right now the docker image utilization is 100% - filled up with model cache, ts data and pg data. So, it's running very slowly.
bo0tzz
bo0tzz2y ago
If your docker image is full that's pretty much certain to cause problems
volcs0
volcs0OP2y ago
Right, but that's not the source of the errors. It just slows things down.
bo0tzz
bo0tzz2y ago
How so? If it's full then I would expect eg. downloading of ML models to fail, the database would have trouble, and more
volcs0
volcs0OP2y ago
Right, but I don't want this to be a red herring. I've had the face recognition problem for awhile. I imported 200,000 pictures with no problem, and they were labeled. Since then (last few weeks) I have to manually tag everyone. The docker image only fills up when I merge faces (a known problem). But the non labeling of faces has been a problem for weeks, and I can't seem to chase it down. I suspect there are communication problems between the containers. And that reflects my own lack of understanding of how docker containers communicate with each other.
bo0tzz
bo0tzz2y ago
What I'm seeing here is: 1. Microservices is having trouble communicating with ML 2. ML is seemingly crashing/restarting 3. You mention your docker img is full Unless there are other errors in the logs, or you're running out of RAM or such, to me the logical explanation is that your docker image is full and that's why ML is having problems.
volcs0
volcs0OP2y ago
Okay so I need to set up some shares that are not part of the docker image app data folder. And then redirect those files to somewhere else on the server. The faces are getting recognized correctly. The index face that comes up is correct. It just doesn't have a name associated with it.
bo0tzz
bo0tzz2y ago
Oh, I may have misunderstood the problem then? Faces won't have a name until you assign one
volcs0
volcs0OP2y ago
Yes, I've done that - for about 100 people. And the new pictures find the faces - and the index face it shows is correct. But it is not assigning the name. And when I give it the name, it asks if I want to merge it with the existing person - which I can do. But it shouldn't have to ask - these are all good pictures.
volcs0
volcs0OP2y ago
No description
volcs0
volcs0OP2y ago
I have 1000 pictures of my wife tagged already. If I go ahead and name that picture, it will merge the two - but I don't know why it can't match it
bo0tzz
bo0tzz2y ago
That's just due to the recognition distance threshold in the machine learning settings. You can tweak that if you want to
volcs0
volcs0OP2y ago
Any thoughts about these typesense errors? W20230927 19:16:28.209331 358 raft_server.cpp:570] Single-node with no leader. Resetting peers. W20230927 19:16:28.209357 358 node.cpp:866] node default_group:172.24.0.2:8107:8108 is in state ERROR, can't reset_peer W20230927 19:16:38.210311 358 raft_server.cpp:570] Single-node with no leader. Resetting peers. W20230927 19:16:38.210326 358 node.cpp:866] node default_group:172.24.0.2:8107:8108 is in state ERROR, can't reset_peer there are thousands of them

Did you find this page helpful?