Topics

Railway•14mo ago

carlostrevino3583

I have a problem deploying my app in Railway using selenium webdriver and chromedriver

There is the code implementing it, what it does is not important, because that works in my localhost, and the files are there too, the chromedriver.exe is working and it is the right version, someone knows how can i solve this? Or maybe change the libraries, because when chrome change version this stops working until it is released the new chromedriver.exe, someone knows a better solution?

No description

No description

164 Replies

Percy•14mo ago

Project ID: N/A

carlostrevino3583OP•14mo ago

N/A

Floris•14mo ago

i am assuming that you are trying to deploy some sort of web crawler / scraper?

carlostrevino3583OP•14mo ago

Yes

Floris•14mo ago

I'm not 100% sure but i believe that any type of non accidental crawling or scraping is against railway TOS so i would imagine that there's numerous guards in place trying to stop / hinder this

carlostrevino3583OP•14mo ago

So your recomendation is finding another host that allows it? And try it there?

Brody•14mo ago

railway generally doesnt allow crawling / scraping (it is in their tos after all) there are exeptions, can you share your usecase?

Floris•14mo ago

article 6 of the tos i just checked

carlostrevino3583OP•14mo ago

I work in a solar panel company in Mexico, in Mexico there is only 1 company that is in charge of all the electricity that is called CFE, that company does not share their data, but to do my reports and all that stuff i need ther info, so my bot goes to their page and save their tariffs for example in my database, so i can use it in my platform

Brody•14mo ago

does this company allow you to scrape their site?

carlostrevino3583OP•14mo ago

Well it is public information, you dont have to pass any type of security or ReCaptcha, they only dosent have API to get that data, they used to have, but they said that a lot of companies like mine shut use it a lot so their server crashed a lot And it is kind of impossible to a person to collect that data manually It is a lot jaja

Brody•14mo ago

they shut the api down, with that info, it doesnt sounds like theyd be too keen on you scraping the data instead

carlostrevino3583OP•14mo ago

The reason they shut down the api it is because their servers cant handle the amount of request that the companies were sending, not because they dont want to share their info

Brody•14mo ago

how often do you run a scrape task

carlostrevino3583OP•14mo ago

This task once a month, and i have another so it will be twice a month

Brody•14mo ago

oh then thats no big deal its only in the tos so that railway can take action against bad actors, its not a hard no you have chromedriver.exe, a windows binary, you need a way to install the linux version of the chromedriver when on railway

carlostrevino3583OP•14mo ago

Let me try installing the linux version to see if it works

Brody•14mo ago

putting a binary in the folder is not how its done, it may work for windows, but thats not how you should be doing it remove chromedriver.exe from the project, install chromedriver onto your computer and get it working that way before we move to deploying to railway

carlostrevino3583OP•14mo ago

Okey, but i dont have linux in my computer

Brody•14mo ago

whats that have to do with anything lol

carlostrevino3583OP•14mo ago

Ah srry, i got confused, so i use the same version that i have just in my computer

Brody•14mo ago

yeah just install it on your computer so any project could use it, you dont want to be putting binarys into your projects

carlostrevino3583OP•14mo ago

Okey so after trying a lot, it seems i just had to update the selenium-webdriver and i dont need the chromedriver.exe in my project Now it works with out it

Brody•14mo ago

does it work on railway though

carlostrevino3583OP•14mo ago

I havent tried it Let me upload the changes

carlostrevino3583OP•14mo ago

No description

carlostrevino3583OP•14mo ago

Apparently not Maybe it is what you said, something about linux?

Brody•14mo ago

try adding a nixpacks.toml file to your project with this in it

[phases.setup]
  aptPkgs = ['...', 'libglib2.0-0', 'libnss3', 'libgconf-2-4', 'libfontconfig1']

[phases.setup]
  aptPkgs = ['...', 'libglib2.0-0', 'libnss3', 'libgconf-2-4', 'libfontconfig1']

carlostrevino3583OP•14mo ago

Literally like this?

No description

Brody•14mo ago

yep

carlostrevino3583OP•14mo ago

It is says the same

carlostrevino3583OP•14mo ago

No description

Brody•14mo ago

nixpacks L ill write a dockerfile for you later or you can give it a shot and see where you end up

carlostrevino3583OP•14mo ago

I havent use docker, i saw in google that some people talk about that I dont know literally any of that I can give it a try but, try what? jaja

Brody•14mo ago

there's no better way to learn, write a dockerfile that will run this app of yours, read some guides, watch some YouTube videos, etc all you need to do for railway to use the dockerfile you write is put the dockerfile in your project (the filename should have a capital D)

carlostrevino3583OP•14mo ago

Okey, let me try it and i tell you what happend

Brody•14mo ago

sounds good

carlostrevino3583OP•14mo ago

It is something like that?

No description

Brody•14mo ago

I wouldn't use node 14, that's long past end of life

carlostrevino3583OP•14mo ago

Well i changed node:14 to FROM node:18.13.0

Brody•14mo ago

use the LTS version of 18 you should change npm install to npm ci and assuming you have those selenium and chrome driver npm packages in your package json, installing them again is pointless

carlostrevino3583OP•14mo ago

No description

carlostrevino3583OP•14mo ago

So it will be like this

Brody•14mo ago

^ but other than that, yeah that looks great see dockerfiles are easy, it's literally just the steps to run your app but you start from scratch

carlostrevino3583OP•14mo ago

How can i change the version to LTS? Whats the command

Brody•14mo ago

just change the tag on the image I don't know what the version number for the lts release of node 18 is off the top of my head, but that's a simple Google search for you

carlostrevino3583OP•14mo ago

It dosent work

carlostrevino3583OP•14mo ago

No description

carlostrevino3583OP•14mo ago

No description

carlostrevino3583OP•14mo ago

18.12.0 is the LTS version It has something to do with the warning?

carlostrevino3583OP•14mo ago

server.js only have this:

No description

Brody•14mo ago

okay now the slightly harder task, have your dockerfile install these apt packages

carlostrevino3583OP•14mo ago

I also need the nixpacks.toml file? Or just in the Dockerfile

Brody•14mo ago

just the dockerfile, nixpacks is irrelevant

carlostrevino3583OP•14mo ago

No description

carlostrevino3583OP•14mo ago

I have it like this

carlostrevino3583OP•14mo ago

But gives this

No description

Brody•14mo ago

that's different not this this would actually cause any problems, but the apt stuff should go before the workdir thing can you give me the full deployment logs? https://bookmarklets.up.railway.app/log-downloader/

carlostrevino3583OP•14mo ago

It is building with the change you told me, let me get the logs and i sent them to you

carlostrevino3583OP•14mo ago

No description

Brody•14mo ago

question, how are you so good at this? are you asking chat gpt or something?

carlostrevino3583OP•14mo ago

The first Dockerfile example was from ChatGPT, and i send it to an ex coworker that i know he has work with docker to ask him if it was correct and he said yes, so yeah, i send the code asking for the changes

Brody•14mo ago

I see, nice work

carlostrevino3583OP•14mo ago

deployment_logs_cfe-...

carlostrevino3583OP•14mo ago

Those are the logs with this code

Brody•14mo ago

start your app locally and give me the version of chrome and chrome driver used your app prints that stuff out on railway you are using 118.0.5993.70

carlostrevino3583OP•14mo ago

No description

carlostrevino3583OP•14mo ago

In my local is the same version

Brody•14mo ago

might be an issue with running as root, slap a USER 1000 in before CMD

carlostrevino3583OP•14mo ago

No description

carlostrevino3583OP•14mo ago

Like this?

Brody•14mo ago

nope, literally do exactly what I said

carlostrevino3583OP•14mo ago

No description

Brody•14mo ago

yep it's a shot in the dark, but hey why not try it

carlostrevino3583OP•14mo ago

No description

carlostrevino3583OP•14mo ago

sadcat

Brody•14mo ago

try using this list as the apt packages to install

libnss3
libatk1.0-0
libatk-bridge2.0-0
libcups2
libgbm1
libasound2
libpangocairo-1.0-0
libxss1
libgtk-3-0
libxshmfence1
libglu1

libnss3
libatk1.0-0
libatk-bridge2.0-0
libcups2
libgbm1
libasound2
libpangocairo-1.0-0
libxss1
libgtk-3-0
libxshmfence1
libglu1

question, do you even need selenium? could you just request the raw html of the page and extract the data with cheerio?

carlostrevino3583OP•14mo ago

index.js

carlostrevino3583OP•14mo ago

I think that i do, but if you have another way i am open to listen

Brody•14mo ago

you said this page is public without any captcha or Auth?

carlostrevino3583OP•14mo ago

Yes

Brody•14mo ago

send me the link

carlostrevino3583OP•14mo ago

"https://app.cfe.mx/Aplicaciones/CCFE/Tarifas/TarifasCRENegocio/Tarifas/PequenaDemandaBT.aspx" What the bot does is select the diferent combinations of options so it can gather all the tariffs

Brody•14mo ago

i guess that would be easier with selenium try these apt packages

carlostrevino3583OP•14mo ago

No description

carlostrevino3583OP•14mo ago

Like this?

Brody•14mo ago

looks good

carlostrevino3583OP•14mo ago

It dosent work I was eating jaja

Brody•14mo ago

same error?

carlostrevino3583OP•14mo ago

deployment_logs_cfe-...

carlostrevino3583OP•14mo ago

I think it is the same It looks a like

Brody•14mo ago

can you share your repo?

carlostrevino3583OP•14mo ago

https://github.com/LuxunEnergy/CFE-tariff-bot

Brody•14mo ago

yeah but make it public lol

Floris•14mo ago

man brody is the dockerking

carlostrevino3583OP•14mo ago

Sorry for the delay, it had been the weekend jaja I already changed it to public

Brody•14mo ago

@Carlos Treviño here you are https://github.com/LuxunEnergy/CFE-tariff-bot/pull/1

Brody•14mo ago

looks to be working

No description

carlostrevino3583OP•14mo ago

Brody the goat Jajaja Thanks a lot You were really really helpful

Brody•14mo ago

no problem, I recommend looking into streaming the json objects in json line format

carlostrevino3583OP•14mo ago

I havent heard of that, you recommend it doing it where it is a lot of data? Because in other areas of my app i am getting like stuck with the problem of efficiency jaja

Brody•14mo ago

yeah you are processing one thing at a time

carlostrevino3583OP•14mo ago

Okey thanks, i will search what is that and implement it

Brody•14mo ago

sounds good!

carlostrevino3583OP•14mo ago

Hey good morning Brody, are you still here? Because im facing an issue triyng to set up the bot in the main server The bot that i showed you was like isolated in other service, but I want to implement it on the large server and it is not letting me

Brody•14mo ago

dont really understand the question "it is not letting me" does not tell me anything about the problem you are facing

carlostrevino3583OP•14mo ago

I have this in the server

No description

carlostrevino3583OP•14mo ago

I copied the file and change it to "dist/app.js" in CMD

carlostrevino3583OP•14mo ago

build_logs_amaterasu...

carlostrevino3583OP•14mo ago

And those are the logs Maybe it is having problems with the things that have python there?

Brody•14mo ago

send your package.json please

carlostrevino3583OP•14mo ago

{ "name": "11-ts-restserver", "version": "1.0.0", "description": "", "main": "index.js", "scripts": { "test": "echo "Error: no test specified" && exit 1", "start": "node dist/app.js" }, "keywords": [], "author": "", "license": "ISC", "devDependencies": { "@types/bcryptjs": "^2.4.2", "@types/cors": "^2.8.13", "@types/express": "^4.17.15", "@types/fs-extra": "^11.0.1", "@types/jsonwebtoken": "^9.0.1", "@types/node-cron": "^3.0.7", "@types/nodemailer": "^6.4.8", "@types/pdf-parse": "^1.1.1", "@types/puppeteer": "^7.0.4", "@types/selenium-webdriver": "^4.1.15", "@typescript-eslint/eslint-plugin": "^5.56.0", "@typescript-eslint/parser": "^5.56.0", "eslint": "^8.36.0", "tslint": "^6.1.3", "typescript": "^4.9.4" }, "dependencies": { "@aws-sdk/client-s3": "^3.378.0", "aws-sdk": "^2.1423.0", "axios": "^0.21.1", "bcryptjs": "^2.4.3", "canvas": "^2.11.2", "chart.js": "^3.9.1", "chartjs-node-canvas": "^4.1.6", "chartjs-plugin-datalabels": "^2.2.0", "chromedriver": "^114.0.2", "cors": "^2.8.5", "dotenv": "^16.0.3", "dropbox": "^10.34.0", "excel4node": "^1.8.0", "express": "^4.18.2", "express-validator": "^6.14.2", "fs-extra": "^11.1.0", "googleapis": "^118.0.0", "handlebars": "^4.7.7", "isomorphic-fetch": "^3.0.0", "jsonwebtoken": "^9.0.0", "jszip": "^3.10.1", "moment": "^2.29.4", "mysql2": "^2.3.3", "node-cron": "^3.0.2", "nodemailer": "^6.9.3", "officeparser": "^3.3.0", "opn": "^6.0.0", "pdf-parse": "^1.1.1", "pdf2json": "^3.0.4", "pdfkit": "^0.13.0", "pg": "^8.8.0", "pm2": "^5.3.0", "puppeteer": "^19.7.2", "redis": "^4.6.6", "selenium-webdriver": "^4.10.0", "sequelize": "^6.28.0", "simple-statistics": "^7.8.3", "stream": "^0.0.2", "tempmail.js": "^0.3.1" } }

Brody•14mo ago

try adding python3 to the end of line 3

carlostrevino3583OP•14mo ago

Okey

carlostrevino3583OP•14mo ago

build_logs_amaterasu...

carlostrevino3583OP•14mo ago

Reading the logs it says that everything was downloaded correctly no? Maybe it is wrong the path in the CMD

Brody•14mo ago

the build failed but those logs arent complete

carlostrevino3583OP•14mo ago

Those are all the logs Or what do you mean I didn't understand

Brody•14mo ago

try another build and send the build logs again

carlostrevino3583OP•14mo ago

build_logs_amaterasu...

carlostrevino3583OP•14mo ago

There is less info in this logs lol In the package.json says "main": "index.js", i dont have to put that in the CMD? Like CMD ["node", "dist/index.js"], because that file doesnt exist, or that is not the problem

Brody•14mo ago

its failing at the ci stage, its not even getting to the run phase run the build until you get better build logs

carlostrevino3583OP•14mo ago

Okey It doesnt make better logs I already try several times and it is just this

Brody•14mo ago

you will have to run the dockerfile build locally then

carlostrevino3583OP•14mo ago

How can i do dad?

Brody•14mo ago

you would probably wanna watch some YouTube videos on how to build dockerfiles locally

carlostrevino3583OP•14mo ago

This logs helps?

build_logs_amaterasu...

carlostrevino3583OP•14mo ago

Okey Here are better logs

carlostrevino3583OP•14mo ago

build_logs_amaterasu...

Brody•14mo ago

you reached the log line limit of what the bookmaklet can download, you will have to copy the rest of the logs manually into a txt file

carlostrevino3583OP•14mo ago

Build_logs_TXT.txt

Brody•14mo ago

um you need to copy everything, not just 5 lines lol

carlostrevino3583OP•14mo ago

Jajaja I am thinking, what if i make it like the other one, the bot in another service, and in my main server just call his endpoint So that way i dont need to have it in my main server

Brody•14mo ago

dont know what you mean by main server

carlostrevino3583OP•14mo ago

Nothing ignore that I cant copy all the logs I think there is a limit for that too

Brody•14mo ago

there isnt

carlostrevino3583OP•14mo ago

Here are the complete logs

carlostrevino3583OP•14mo ago

Complete_logs.txt

carlostrevino3583OP•14mo ago

i add --verbose in the Dockerfile i think that it is what show the logs

carlostrevino3583OP•14mo ago

No description

Brody•14mo ago

does this dockerfile work locally?

carlostrevino3583OP•14mo ago

No, let me do it locally and i will tell you I already did it

carlostrevino3583OP•14mo ago

No description

carlostrevino3583OP•14mo ago

This works There was a dependency that was missing some things so yeah I am having another problem but idk if it is like your area to help me here

Brody•14mo ago

awesome glad you solved that I mean, can't hurt to ask

carlostrevino3583OP•14mo ago

The task i want it is to download files, in my local server it works without the headless tag But when i put it It doesnt work It downloads the first file but not the rest Because there are 6 And i try iy like upload it to railway with out the headless mode to see if it works there but no session not created: Chrome failed to start: exited normally. (session not created: DevToolsActivePort file doesn't exist) (The process started from chrome location /usr/bin/chromium-browser is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

Brody•14mo ago

you do need to run it in headless mode since this is a docker environment so what errors do you get when running in headless mode

carlostrevino3583OP•14mo ago

I am trying something like that

No description

carlostrevino3583OP•14mo ago

The problem is that I need to download many files, a maximum of 12, without headless it does it without a problem, but in headless the only thing it does is download the first one without waiting for the rest

Brody•14mo ago

where are you downloading files from

carlostrevino3583OP•14mo ago

It is the page that i told you they shut down their api because they doesnt have good servers They before had an api to download those pdfs

Brody•14mo ago

how many requests do you make per second?

carlostrevino3583OP•14mo ago

This is going to run once a month

Brody•14mo ago

but while it's running, how many requests does it do per second

carlostrevino3583OP•14mo ago

Request to who? To them? None

Brody•14mo ago

of course you make requests, don't know why you would tell me you make no requests

carlostrevino3583OP•14mo ago

I mean, like to their API none because they doesnt have one

Brody•14mo ago

to their webpage I know they don't have an api

carlostrevino3583OP•14mo ago

Maybe 1 per second or 2, for like 3 seconds, and then begins the process of downloading the pdfs, that there i didnt have like restriction of time between clicks, working with out the headless mode the code is like this

carlostrevino3583OP•14mo ago

No description

No description

carlostrevino3583OP•14mo ago

To my understanding, it gave all the clicks to the pdfs on the download button without any time in between, and then I told it to wait for the expected number of pdfs in the downloads folder to continue, but that does not work in the headless mode

Brody•14mo ago

locally run the browser in headless mode and configure your app to properly work when chrome runs in headless mode

carlostrevino3583OP•14mo ago

I have al ready try it, but I'm not getting it, by pure luck you won't know any tricks on how to wait for the first download to be ready to click the next one?

Brody•14mo ago

sorry I don't, personally I wouldn't bother web scraping anything

Hang out with other likeminded developers & talk about all things https://railway.app on the Railway Community Server.

25KMembers

View on Discord

Want results from more Discord servers?

Add your server