❔ ***Webscraping- could a newer framework be disrupting htmlagility pack ?
I am currently building a wpf app and i am having issues crawling/grabbing information from news articles. I am using html agility pack and I have even tried selenium(was a bit too much for the small tasks I need to do).
I looked at my application framework and I was using .NET 7.0- and I have now switched down to 4.5. I will now try out this framework and see if I just need a more html heavy newsite to scrape from. Could the htmlagility package having conflict with .NET 7.0?
Does anyone have experience or time to help me with my scraping project? Most times I just need a "get gud" you know haha.
12 Replies
HtmlAgilityPack
is compiled against .NET Standard 2.0
, so that should not be the problem herewould trying to scrape from the google news api with just html agility pack too naive? shld I chose smaller website that are less dependant on javascript?
Scraping from an API? I'd assume you just make requests to an api
Ah. I searched around a little. That API has been deprecated for ages it seems...
In that case, AgilityPack should be sufficient. I didn't have problems with it so far.
is this not still relevant?
https://www.nuget.org/packages/HtmlAgilityPack/#supportedframeworks-body-tab
HtmlAgilityPack 1.11.48
This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar...
u know what thats what i was looking for thanks!
... why did you go to 4.5 FX?
In general, .NET 7 is about 30% faster at more or less everything than 4.5
because I keep getting httpRequestException + IOException"response ended prematurely"
i could totally be parsing incorrectly though for all I know I
And your initial gut reaction is to downgrade to .NET 4.5? O_O
I highly doubt there is any problem with HAP, but you could try
AngleSharp
hmm ill try it
well coming from visual basic, ive had to downgrade in the past to get access to certain features lol
O_O
Have not done VB since vb6, but still
yeah the print form tools or access to certain packages are only availible or work in order frameworks for some reason
Was this issue resolved? If so, run
/close
- otherwise I will mark this as stale and this post will be archived until there is new activity.