C#•10mo ago

Dictionary translation

So i need to make some app to transfer hole pdf dictionary to some DB, and i encounter some problems. The problem is reading and parsing data, i came up to use regex for that but its just so complex to optimize and i sometime just get good result for a few words and lot of times its just broke, here is what needs to be transfered and how. the example is in photo. this is english to serbian dictionary does anyone know any idia how can i do this with minimal errors

5 Replies

Joschi•10mo ago

You could start by assuming bold text is english and non bold text the translation. But this will break on declinations like kneel, knelt and so on and also an plurals like knavery. You could then try to do more passes detecting and handling as many edge cases as possible. But somehow I don't think you will get around having to clean up your data manually in the end.

Ꜳåąɐȁặⱥᴀᴬ•10mo ago

an example of the data you are reading from the pdf? what is the structure of the table

neSHaOP•10mo ago

Original word, pronaunce, translation Thats what i need to extract from this

Ꜳåąɐȁặⱥᴀᴬ•10mo ago

are you parsing raw text from the pdf or something more refined

SleepWellPupper•10mo ago

Yeah if reading raw pdf data, one might be able to extract font weight. If using OCR, that might be more difficult Seems to me like regular grammars are just barely able to express your format; maybe a context-free grammar would be better suited. I'd imagine using a tool like ANTLR for this task would be feasible.

Gaming

Programming

Dictionary translation

Did you find this page helpful?