bootzin
bootzin
CC#
Created by bootzin on 4/4/2023 in #help
✅ Help optimizing finding keywords in text
I currently have a huge document database, and want to go through each of them and find matching keywords (which are user defined). Documents are usually 5k~10k words long. I am currently doing a string.contains, but it has a very bad performance My initial attempt was to build a trie for each document, which worked almost perfectly, but with a major problem: some keywords are compound (i.e: "bag of words") Word context/ordering is important enough that simply checking of all the words independently didn't work ("bag" AND "of" AND "words") Any ideas?
18 replies