M
Modular•5mo ago
Andrey

Interacting with Desktop Files such as .pdf documents

Hello all, first time poster here. I've been learning mojo for about a month and I'm having an issue trying to open a .pdf file using just Mojo, no Python (likely not supported at this time?) following this guide: https://docs.modular.com/mojo/stdlib/builtin/file/ Am I missing something here or is reading content from a .pdf file not possible a this time?
file | Modular Docs
Implements the file based methods.
4 Replies
carlcaulkett
carlcaulkett•5mo ago
Hello Andrey. I would think it is possible. I've had a Bitwig preset parser up and running for a few months.. Have a look at https://github.com/carlca/ca_mojo.git and in particular the ./bitwig/preset_parser for some ideas. Essentially you just open a file using var f = open(file_name, "r") and then use methods like var data: List[UInt8] = f.read_bytes(size) to read the data. Or did you mean more specialised code geared towards PDF file in particular? I fear that for the moment, it's a case of using first-principles and a handy reference to the file-format in question.
GitHub
GitHub - carlca/ca_mojo
Contribute to carlca/ca_mojo development by creating an account on GitHub.
Andrey
AndreyOP•5mo ago
Thank you so much for the response I really appreciate it! I've been scratching my head with this one and couldn't manage to get it going. I'll give this a shot once I'm home! Just got home and gave your code a shot. Just to clarify, I'm looking to extract text from .pdf files. The parsing and formatting isn't my issue, it's the actual text extraction. The output that I'm getting is a list of SIMD[DType.uint8, 1] values. I'm not entirely sure how to proceed from here as I'm now uncertain as to how I can convert the bytes back to a string.
carlcaulkett
carlcaulkett•5mo ago
Hi Andrey! In my parser code, where I needed to convert from the bytes to a string, I used this method...
@staticmethod
fn vec_to_string(data: List[UInt8]) raises -> String:
var result = String()
for i in range(0, len(data)):
if data[i] == 0x00:
break
result += chr(data[i].__int__())
return result
@staticmethod
fn vec_to_string(data: List[UInt8]) raises -> String:
var result = String()
for i in range(0, len(data)):
if data[i] == 0x00:
break
result += chr(data[i].__int__())
return result
Something similar should do the trick 😉 I see that @maxim has a new string handling package out https://discord.com/channels/1087530497313357884/1151418092052815884/1273544370578133046 I am certain that his code will be much more efficient than mine. I haven't looked at it yet, though. In my case it's a case of "if it ain't broke, don't fix it" 😉
Andrey
AndreyOP•5mo ago
You guys are amazing. I'll give this a shot!

Did you find this page helpful?