Crazy Egress Charges
i,
I just received a strange usage alert for excessive network egress of 133gb
These numbers just don't make any sense, the website is a simple application, one table - with <10 users (I think no more than 2) It handles 2500 rows in a database with some text in them - nothing to justify such high network usage.
I have two other similar apps that don't show these numbers, all built using the same tools and same packages.
It seems like something is wrong on your end - or that I need your help tracking down what's causing this issue - as it is not intended by me at all.
I've tracked the network usage of the Node js service, and it shows crazy numbers at a perticular time - that is 10000X higher then ever in it's history - nothing in the observability logs to indicate any cause for this.
Also - you have a bug in your UI where you show $ in the minutes accumulation 🙂
The account is a personal account in my name - Noam Honig, and I use GitHub sign in with the email.
Please advise
42 Replies
Project ID:
b932a645-8fdb-4c05-b12f-dd4d8c72e164
Project id:
b932a645-8fdb-4c05-b12f-dd4d8c72e164
More charts 🙂
The normal usage should be more like 200kb
okay thats super bizarre, but im curious, what does the view cost by service table say?
has egress usage subsided?
It spiked twice:
Once in 12:05 and once earlier at 8:36 - both to 3606 gb inbound
Looking 7 days back it's all calm:
during these network spikes, are there any correlation with increased cpu / mem?
In the last 24 hours I see a few weird spikes - all i specific minutes- not ongoing and all crazy high number
None:
Also the logs don't say anything:
And really it's a website that serves one user
this has to be a glitch with how railway is counting usage, can you also provide the service id?
Service ID:2b0728eb-85ac-4d81-9fb5-28938ac54339
and just to be thorough, are you communicating with the database via the private network? not that it could cause a 3.5tb+ network spike, given what you have told me about your app
I'm using the DATABASE_URL environment variable
can you referance
DATABASE_PRIVATE_URL
instead?Sure - but that's not it - here's the stats for the db:
The entire database 2500 rows in one table
oh i know, it has to be a glitch, but while we're here, might as well use the private url
Thread has been flagged to Railway team by @Brody.
what would you say is the most data you send to postgres at a time?
Next to nothing - the entire postgres service, including postgres is 217mb
Really - only two tables, one with short articles ~2500 rows, and another with 2 rows representing users.
gotcha, we shall wait for the team to respond, in the mean time, if you see any more spikes let us know
Will do - thanks
@Brody I think it's best if I don't change anything until the team looks at it - what do you say?
yeah thats probably a good idea
This is odd
Do you think it's possible you were DDOSed?
The lack of CPU makes that seem unlikely
yeah a ddos resulting in 3.6tb of network with zero bump in mem or cpu doesn’t really seem possible
I doubt it as it’s a site I built@for my cousin a week ago, it’s not published anywhere
Thanks for the context, I will raise this to the rest of the team.
I bet nginx could do it haha
So you have crazy high network ingress as well
This doesn't seem to be us, I am not sure how this happened on your side, but publicly facing URLs are publically facing.
Hi @thomas I see these numbers, but they don't make any sense in the context of this crazy simple app - also as you can see from the cpu/memory/anything stats there is nothing to justify that network usage - also the communication between the db and the app - even if it is public, doesn'y justify these numbers.
Also - checkout the graphs, it indicates two seperate one minute events with 3606GB in that minute - without any memory / cpu rise on my app - I'm not sure even your bandwith supports that much through put in one minute - something is strange
Thomas, I agree with noam, this is far too strange to be a ddos, two network spikes that spike to nearly the exact same usage, if Noam's app can do that without spikes in the cpu or memory then they deserve some kind of programming award
fr how an application can respond 217gb of data without a spike of cpu/memory 💀
Caching? If you serve a static website, then you can reach efficiency levels like this.
So some more members of the team chimed in overnight.
- if you add a spending limit to the project we will credit you for this event.
We still can't find any indication it was not a DDOS of some kind.
I know it's weird and we will track it, but so far our egress/ingress tracking has stood up to testing.
I really do see what you guys mean here, but someone checked it against the machine logs and the traffic over it's interfaces lines up perfectly with the spikes. That machine did serve that traffic, and something was sent, and more importantly, many things were received.
We should be clear that while we are refunding you for this time, we won't be in the future. If you add a spending limit, if this happens again, you are covered. Many users use Cloudflair in front of railways to protect themselves from stuff like this.
Thanks for looking into this, and for refunding me.
I still find it hard to believe that in two separate occasions there was oncoming traffic at precisely 3606 3607 go - it’s to consistent to be a chance.
Is there any logs that you want me to add to this app so that we can isolate it if it happens again?
I haven't refunded you yet, please add a limit to your account first
This will prevent this from happening again. As for logs, if you could show us logs that show that traffic does not increase during spikes like this then I would have something to go back to the team with.
@thomas checkout these logs - they show absolutely nothing on these times
Hi @thomas I've set limits here - is that the correct place?
yes perfect
I credited your account, they will go towards your next bill