Simbaclaws
TTCTheo's Typesafe Cult
•Created by Simbaclaws on 1/3/2025 in #questions
How do I parse this using python and langchain's WebBaseLoader?
I went ahead and turned it into:
42 replies
TTCTheo's Typesafe Cult
•Created by Simbaclaws on 1/3/2025 in #questions
How do I parse this using python and langchain's WebBaseLoader?
I can then just build the Document object with langchain myself instead
42 replies
TTCTheo's Typesafe Cult
•Created by Simbaclaws on 1/3/2025 in #questions
How do I parse this using python and langchain's WebBaseLoader?
I gave up on using SoupStrainer inside WebBaseLoader, I went with parsing in Beautiful soup with a html request instead
42 replies
TTCTheo's Typesafe Cult
•Created by Simbaclaws on 1/3/2025 in #questions
How do I parse this using python and langchain's WebBaseLoader?
I think I'll try to use a different method of getting this turned into a langchain document. By using a custom html parsing library and requests
42 replies
TTCTheo's Typesafe Cult
•Created by Simbaclaws on 1/3/2025 in #questions
How do I parse this using python and langchain's WebBaseLoader?
I have the first part figured out:
This causes it to search a div element with the class entry-content, because SoupStrainer doesn't work when it has multiple classes on a div element. So the lambda would look for entry-content in all of the classes.
This works for my first criteria, and I'm looking through stack overflow, which says the following:
https://stackoverflow.com/questions/27713802/can-soupstrainer-have-two-arguments
You can apparently give a list of different criteria, however... Doing the following:
doesn't work
42 replies
TTCTheo's Typesafe Cult
•Created by Simbaclaws on 1/3/2025 in #questions
How do I parse this using python and langchain's WebBaseLoader?
can I "unmark" it?
42 replies
TTCTheo's Typesafe Cult
•Created by Simbaclaws on 1/3/2025 in #questions
How do I parse this using python and langchain's WebBaseLoader?
oops, that seems to just scrape all of the content, I might've implemented that incorrectly
42 replies
TTCTheo's Typesafe Cult
•Created by Simbaclaws on 1/3/2025 in #questions
How do I parse this using python and langchain's WebBaseLoader?
that worked I think 😄
42 replies
TTCTheo's Typesafe Cult
•Created by Simbaclaws on 1/3/2025 in #questions
How do I parse this using python and langchain's WebBaseLoader?
perhaps I could try this:
42 replies
TTCTheo's Typesafe Cult
•Created by Simbaclaws on 1/3/2025 in #questions
How do I parse this using python and langchain's WebBaseLoader?
since it accepts Any?
42 replies
TTCTheo's Typesafe Cult
•Created by Simbaclaws on 1/3/2025 in #questions
How do I parse this using python and langchain's WebBaseLoader?
maybe I can supply a lambda that goes over the parsers?
42 replies
TTCTheo's Typesafe Cult
•Created by Simbaclaws on 1/3/2025 in #questions
How do I parse this using python and langchain's WebBaseLoader?
because there is a rate limit applied to the site that I'm scraping, so if I run it twice, it means double the timeouts
42 replies
TTCTheo's Typesafe Cult
•Created by Simbaclaws on 1/3/2025 in #questions
How do I parse this using python and langchain's WebBaseLoader?
I think that's exactly what the parser does, it uses the parser to parse the text that's provided from the WebBaseLoader
42 replies
TTCTheo's Typesafe Cult
•Created by Simbaclaws on 1/3/2025 in #questions
How do I parse this using python and langchain's WebBaseLoader?
I could create a class that extends from SoupStrainer perhaps
42 replies
TTCTheo's Typesafe Cult
•Created by Simbaclaws on 1/3/2025 in #questions
How do I parse this using python and langchain's WebBaseLoader?
sorry I've only been using python for a couple of months so far
42 replies
TTCTheo's Typesafe Cult
•Created by Simbaclaws on 1/3/2025 in #questions
How do I parse this using python and langchain's WebBaseLoader?
which is weird because putting in a single SoupStrainer with either of the values does seem to have a .text value. So I assume the SoupStrainer class has a .text property it can use. but passing in 2 of them means it uses a list of 2 objects that have .text in them. I need to figure out a way to use both somehow
42 replies
TTCTheo's Typesafe Cult
•Created by Simbaclaws on 1/3/2025 in #questions
How do I parse this using python and langchain's WebBaseLoader?
kind of confused what it means
42 replies
TTCTheo's Typesafe Cult
•Created by Simbaclaws on 1/3/2025 in #questions
How do I parse this using python and langchain's WebBaseLoader?
this fails with the following message: 'list' object has no attribute 'text'
42 replies
TTCTheo's Typesafe Cult
•Created by Simbaclaws on 1/3/2025 in #questions
How do I parse this using python and langchain's WebBaseLoader?
thank you
42 replies
TTCTheo's Typesafe Cult
•Created by Simbaclaws on 1/3/2025 in #questions
How do I parse this using python and langchain's WebBaseLoader?
I'll give that a try, perhaps you can input a list with multiple strainers
42 replies