bootzin
✅ Help optimizing finding keywords in text
I currently have a huge document database, and want to go through each of them and find matching keywords (which are user defined).
Documents are usually 5k~10k words long. I am currently doing a string.contains, but it has a very bad performance
My initial attempt was to build a trie for each document, which worked almost perfectly, but with a major problem: some keywords are compound (i.e: "bag of words")
Word context/ordering is important enough that simply checking of all the words independently didn't work ("bag" AND "of" AND "words")
Any ideas?
18 replies