C
C#15mo ago
Polarnik

✅ Unsuccessfully trying to parse web pages.

I want to parse last.fm for artists but my code works only first iteration, in all iterations where pages > 1 var nodeArtist = document.DocumentNode.QuerySelectorAll("td.chartlist-name"); returns empty list which is strange because pages are identical in structure
using System.Text;
using HtmlAgilityPack;

namespace Parser
{
public class Program
{
public static async Task Main()
{
const int pageNum = 6;
string url = "https://www.last.fm/user/Polarnichek/library/artists?page=0";
var web = new HtmlWeb();
var sb = new StringBuilder(url.Length + 1);
sb.Append(url);
var artists = new List<string>(500);

for (int page = 4; page <= pageNum; page++)
{
sb.Clear();
sb.Append(url);
sb.Replace("page=0", $"page={page}");

var document = await web.LoadFromWebAsync(sb.ToString());

var nodeArtist = document.DocumentNode.QuerySelectorAll("td.chartlist-name");
foreach (var artist in nodeArtist)
{
var artistName = artist.QuerySelector("a").Attributes["title"].Value;
artists.Add(artistName);
}
}
Console.WriteLine(artists.Count);
}
}
}
using System.Text;
using HtmlAgilityPack;

namespace Parser
{
public class Program
{
public static async Task Main()
{
const int pageNum = 6;
string url = "https://www.last.fm/user/Polarnichek/library/artists?page=0";
var web = new HtmlWeb();
var sb = new StringBuilder(url.Length + 1);
sb.Append(url);
var artists = new List<string>(500);

for (int page = 4; page <= pageNum; page++)
{
sb.Clear();
sb.Append(url);
sb.Replace("page=0", $"page={page}");

var document = await web.LoadFromWebAsync(sb.ToString());

var nodeArtist = document.DocumentNode.QuerySelectorAll("td.chartlist-name");
foreach (var artist in nodeArtist)
{
var artistName = artist.QuerySelector("a").Attributes["title"].Value;
artists.Add(artistName);
}
}
Console.WriteLine(artists.Count);
}
}
}
12 Replies
Stan
Stan15mo ago
ok, so scraping is hard and even harder in C#. check if they have a public api you can consume. if not, check if they have an internal api you can consume through chrone devtools in the networking tab if they use some unga bunga ancient technology, you probably wouldnt have the error but also, good luck and god speed, i cannot help you further.
Polarnik
PolarnikOP15mo ago
they have a public api, I've just never worked with APIs, so I decided to parse the page, it seemed easier to me (ノ_<。)
Stan
Stan15mo ago
Good time to learn it. trust me the API will be wayyyy easier. do you know how json works? actually, checm if theres a nuget package first, if its a public api for a popular service, goos chance someone made one
Polarnik
PolarnikOP15mo ago
There is, i will try use it
Stan
Stan15mo ago
pog, gl bro, leme know if u need help
Polarnik
PolarnikOP15mo ago
ok, big thanks
Unknown User
Unknown User15mo ago
Message Not Public
Sign In & Join Server To View
Stan
Stan15mo ago
they have a public api, surely with restrictions for whatever isnt tos my man just wants to programatically know what music he listens to
Unknown User
Unknown User15mo ago
Message Not Public
Sign In & Join Server To View
Stan
Stan15mo ago
nothin wrong with a little bit of educational web scraping tho 😉 but youre like a mod or something so ill shush xD
Unknown User
Unknown User15mo ago
Message Not Public
Sign In & Join Server To View
Pobiega
Pobiega15mo ago
using an API is easy/very easy. Significantly easier than scraping at least.

Did you find this page helpful?