R
Railwayβ€’2mo ago
ThallesComH

is scraping allowed?

I'm planning on scraping some data from a few sites and Railway has always been vague on this matter, is it allowed? I really hope Railway allows it, I don't want to spin up a EC2 😭
Solution:
data on twitter is public but that would be a gigantic no, football sites are fine, go for it, as long as you have respectable request rates and abide by robots.txt if applicable.
Jump to solution
10 Replies
Percy
Percyβ€’2mo ago
Project ID: N/A
ThallesComH
ThallesComHβ€’2mo ago
N/A
Brody
Brodyβ€’2mo ago
what sites are you scraping and did they give you permission?
ThallesComH
ThallesComHβ€’2mo ago
football sites and no but their data is public
Solution
Brody
Brodyβ€’2mo ago
data on twitter is public but that would be a gigantic no, football sites are fine, go for it, as long as you have respectable request rates and abide by robots.txt if applicable.
ThallesComH
ThallesComHβ€’2mo ago
ok! thanks
ThallesComH
ThallesComHβ€’2mo ago
just to be sure, if their robots.txt doesn't allow it then Railway is against it?
User-agent: Mediapartners-Google
Disallow:

User-agent: Googlebot
Disallow:

User-agent: AdsBot-Google
Disallow:

User-agent: Googlebot-Image
Disallow:

User-agent: *
Disallow: /*?*
User-agent: Mediapartners-Google
Disallow:

User-agent: Googlebot
Disallow:

User-agent: AdsBot-Google
Disallow:

User-agent: Googlebot-Image
Disallow:

User-agent: *
Disallow: /*?*
cc @Brody (sorry for ping, idk if the thread pops again when closed)
Brody
Brodyβ€’2mo ago
you gotta respect the robots.txt like any good robot would, we don't want to have to deal with takedown requests, though we will comply. and unfortunately "it's unlikely for you to be sent a takedown request" is not an excuse oh and you should also have an email in your UA so that web admins can email you to get put on a no crawl list
ThallesComH
ThallesComHβ€’2mo ago
yeah I'm a bad robot so I'm guessing I should spin up this elsewhere. Could I at least host the database or the API in Railway?
Brody
Brodyβ€’2mo ago
yeah I don't see any issue with that
Want results from more Discord servers?
Add your server