Scrape many pages with plasmo
Hello, i’m new to the framework and I feel a little bit overwhelmed. So I decided to ask the Plasmo people directly.
So my need is to scrape a few pages as the browser owner, extract a JSON and provide it as a download. Since the website is blocked by authentication shenanigans and that my use case rely on authenticated users to extract the data i’ve opted to run a web extension.
I’ve been able to use the
PlasmoCSConfig
and PlasmoGetInlineAnchor
to add a button on the website. That’s great. I’ve also been able to change the current url, upon click event. I’ve also read and explored a little bit around WORLD script injection.
But how would you design an extension that would visit a page, determine a list of 6 to 20 pages to explore, visit each of them as the current user, extract some info from each DOM and finally create a JSON payload ?
Do I need to use a background ? If so, how the background can act upon the current tab from it ? If the scraping is too long, is there a risk that the background would die.
Alternatively, should I embed the website in a frame to keep a context between each pages visited ? Or this can be achieved by a sidebar or the project’s popup ?0 Replies