How do I get element innerHTML using HTMLRewriter?
Here's my code:
How do I get the element's innerHTML?
1 Reply
HTMLRewriter
is streamed, so there's no guarantee that on the first iteration, you'll have the full element's contents. You would have to do something like run the rewriter on *
, check tagName
, set a point at which you start watching for new elements, and then as they come in, keep track of your own tree of nodes. By constructing this with onEndTag
you could probably create a pretty accurate representation of the contents.
If you just want text inside an element, you can do something like (pseudo):
but if you're looking more to parse/scrape HTML from specific elements, HTMLRewriter
probably isn't the best tool for the job, and a more traditional parser like cheerio
(etc.) will work best after loading the document into memory.