✅ Parsing a Link from an HTML file with HTMLAgilityPack
Hi, I'm having a bit of trouble parsing an HTML file to extract a link.
I'm using HTMLAgilityPack to do this as it seemed simple enough for what I wanted.
In the latest variable I use SelectNodes and provide the XPATH to the link that I found using inspect element.
However, the selection returns null and the Console returns an error when writing.
Any tips?
Console Error
27 Replies
I opened that link, View Source, ctrl-f for "block-views-block-topic-releases-listing-topic-latest-release-block", and there are no hits
I even do the same from developer tools (which includes HTML generated by JS), and there are no hits there either
Oh I am very regarded....
And remember that HAP (and AngleSharp too) don't actually run any javascript
Let me check if that was the issue, thanks
so if that data is loaded via JS, it won't work
That's fine, I think. All i need is the link for the next page, and do the same after. I need to parse through a couple of HTML pages and get a download link afterwards
https://www.abs.gov.au/statistics/labour/employment-and-unemployment/labour-force-australia this is the correct link
Australian Bureau of Statistics
Labour Force, Australia
Ok, so this finds the node, but it prints the text inside the link instead of the link, how can I extract the link?
the link itself is inside the
href
attribute of the tag, no?
InnerText
is the stuff within the tag open/close
ie <a href="meep">InnerText</a>
Ah ok, so how do I extract the href?
iirc there is a way to access attributes on the tag
check what props/methods are available on
i
The documentation's a bit shit, isn't it? I'd just F12 on
i
, see what's availableye
exactly that. or just let intellisense autocomplete
i.
I'm running on vim 🙃 I got a couple Properties i'm gonna try printing
no LSP?
Im sure I've seen intellisense in vim before
also, unrelated, but HAP has not aged super well
most people prefer AngleSharp these days
Yh, I'm using LSP but tbh omnisharp is not fantastic in Linux
If vim can't show you all of the properties/methods on a type, you really need to be using something else (or configure it better)
You can 100% do this with either lib thou, so its not an issue really
just thought I should throw that out there
iirc AS is quite a bit faster too
Ok ok, this is for a Job Interview exercise and I was getting a bit stuck. I just need something that works by tonight and tomorrow if I can make it better, then I'll spend some time improving my solution.
Thanks for the tip 🙂
not saying you should change, just wanted to add my 2 cents
Yeah, getting an attribute of an HTML element is one of the very very basic things any HTML library will let you do
I can, I do need to configure my lsp a lil better, true. Just haven't gotten around to it yet 😆
var href = link.Attributes["href"].Value;
says googleWhat's that link?
link
is your i
its the html tag/elementAh amazing! I'll try that 🙂
@Pobiega @canton7 ❤️ got it!!!! thanks so much!
Cool, glad to hear!