Railway•6mo ago

Reading ~30gb sequentially from the volume make the ram go to ~30gb

ofc that doesn't happens locally. I'm test scaling a new compute heavy feature of my app before shipping and this is a blocker for me. This is a problem because I don't want to be billed 30gb for something I discard a few seconds later. I'm just looking at if this might be a bug and/or misuse before I start looking at alternative hoster (for this particular workload, I'm still happy for the rest) .

Solution:

Jump to solution

67 Replies

Percy•6mo ago

Project ID: 8bccf693-4059-4ef3-9dd0-55493979fdb7

AlaanorOP•6mo ago

8bccf693-4059-4ef3-9dd0-55493979fdb7

AlaanorOP•6mo ago

AlaanorOP•6mo ago

I should note that it does get stuck there and will not go down again

Brody•6mo ago

are you loading the file into memory? and, reading a 30gb file from the disk to where?

AlaanorOP•6mo ago

my rust backend need to read all 180k file I have the attached volume with the service. They're about 170kb each locally it goes fine and doesn't fill up my ram

Brody•6mo ago

right but you are reading these files off of disk? where are they going beside into ram?

AlaanorOP•6mo ago

I'm not sure to understand the question I'm reading like literally at /cache/something/somefile.mspack and deserializing it, doing compute and drop /cache is where my volume is mounted at

Brody•6mo ago

so these files are loaded into memory then, I'm not sure why you are surprised that memory has increased?

AlaanorOP•6mo ago

once I have read and made use of the information how can I discard and let the unused ram go back ? locally it is discarded right after each read, so my ram never go up. I can run this program with 1gb or less ram

Brody•6mo ago

that wouldn't be a platform specific question

AlaanorOP•6mo ago

Well if that doesn't sound like a bug/misuse to you, then I got my answer I guess 😄

Brody•6mo ago

yes unfortunately this would be a code issue

angelo•6mo ago

Yep, you will need to deallocate that in Rust sounds like you aren't resolving your lifetimes afaik

AlaanorOP•6mo ago

I tried all I could think of really 😕 even with an explicit drop(variable_with_deserialized_data) at the end of each loop. Even after my job is done and clean it doesn't drop. I still want to precise that locally the same job never goes above a few mb of ram, on the same dataset.

Brody•6mo ago

nixpacks or dockerfile?

AlaanorOP•6mo ago

[build]
builder = "NIXPACKS"

[deploy]
startCommand = "./api"
healthcheckPath = "/health"
healthcheckTimeout = 100

[phases.setup]
nixpkgsArchive = 'a459b363de387c078080b719b30c54d8a79b4a3e'
nixPkgs = ["...", "ffmpeg"]
nixLibs = ["...", "dav1d"]

[build]
builder = "NIXPACKS"

[deploy]
startCommand = "./api"
healthcheckPath = "/health"
healthcheckTimeout = 100

[phases.setup]
nixpkgsArchive = 'a459b363de387c078080b719b30c54d8a79b4a3e'
nixPkgs = ["...", "ffmpeg"]
nixLibs = ["...", "dav1d"]

not sure that matter tho, also I do my build on github action because they're not trivial and I do a final railway up -e {env} -s {service_id} to upload

Brody•6mo ago

thats going to cause railway to run another build

AlaanorOP•6mo ago

are you sure you're did not mix 2 problems 😄 I was on something unrelated to deployment, I don't have deployment problem

Brody•6mo ago

i know, im getting side tracked, just wanted to point it out but when in doubt write a dockerfile that uses alpine instead of nixpacks have seen alpine based dockerfiles help with strange memory issues before plenty of times

AlaanorOP•6mo ago

I could give a try

Brody•6mo ago

cant hurt

AlaanorOP•6mo ago

unfortunately did almost everything but for alpine I need some special target compilation and I can't get it to work easily with my rust code. ~~got too much deps~~ I could try a docker image on ubuntu or something but at this point we're back to what nixpack does

Brody•6mo ago

ah yes the joys of rust, compiling

AlaanorOP•6mo ago

so I did a small experimentation, last peak is me running that heavy task (limited to 20k item to not go to 30gb again). I have changed the start command to while true; do free -h; sleep 5; done & ./api and here are the result. We can clearly see that the used column stay to 251gb, before and after the tasks (why does it show me the host machine spec's instead of my docker, no clue). But the column buff/cache is billed on me and indeed grew by a few gb.

Brody•6mo ago

buffer / cache is indeed included in the results of docker stats

AlaanorOP•6mo ago

I had some hope for a moment with https://linux.die.net/man/2/open O_DIRECT, it works locally, doesn't bump the buff/cache but on railway I get an error, I guess the filesystem doesn't accept this custom flag :sossa: this is pain

Brody•6mo ago

I wonder if the v2 runtime would also gather metrics that include the buffer / cache, but you would need to find a way to run your tests without a volume since any service with a volume defaults back to the legacy runtime despite the selector saying v2

alex•6mo ago

Have you tried copying the volume’s contents into the container’s disk on startup? In that case you may be able to do local file system file streaming in rust and it shouldn’t increase the memory Local file system is a little different than docker mounted volumes i think

Brody•6mo ago

volumes are ext4 mounted zvols

alex•6mo ago

I forgot railway doesn’t actually use docker

Brody•6mo ago

legacy runtime does

alex•6mo ago

Man I don’t even use railway 🗣️

Brody•6mo ago

v2 uses podman

alex•6mo ago

Man portable air defense? Oh wait that’s manpad

AlaanorOP•6mo ago

Since I could not find a solution with railway for this particular thing, I bought a server somewhere else, although the disk io isn't as good as railway and this add me some complexity for deployment and monitoring :( But yeah I can't afford adding 200$+ to my monthly billing just to read a few files sometimes. I still use railway for a lot of other stuff and I'm happy with those. I just figured out that I should railway where it is helping me instead of trying to fight it. No hate, I can understand why buff/cached is counted. Just wanted to give an update for future people searching this thread.

Solution

Brody•6mo ago

i wrote a benchmark test (in Go) to write 30000 1MiB files (for a total of 30GiB) to disk at 250 files concurrently. running this locally of course there was no wild increase in memory. running this program on railway with the legacy runtime had my memory reach ~23GB. i then switched to the v2 runtime and ran the program again and the memory never increased above ~45MB, it also wrote the data a bit faster. i then re-ran the test on the v2 runtime but this time configured it to write 50GIB worth of files and still saw no wild increase in memory. and just to be sure one last time, i switched back to the legacy runtime and ran the test to write the 50GIB worth of files (same concurrency), and the memory this time peaked at 32GB. tl;dr this issue is fixed in the v2 runtime but only the legacy runtime has support for volumes (even if you select the v2 runtime and you have a volume it will run with the legacy runtime)

Brody•6mo ago

heres a memory graph that backs up these statements

Brody•6mo ago

yes I know I'm writing files instead of reading them, but the same issue is being surfaced

alex•6mo ago

interesting

Brody•6mo ago

I need a sticker "V2 fixes it" v2 builder / runtime / proxy has just simply fixed a lot of issue thus far

alex•6mo ago

when is v2 going ga ?

Brody•6mo ago

asap once the bugs are fixed

alex•6mo ago

nice

Brody•6mo ago

from the sounds of it the legacy runtime will not be running on bare metal it sounds like bare metal will be v2 runtime exclusive

alex•6mo ago

oh nice

angelo•6mo ago

We are taking our time btw We aren’t going to V2 cutover until everything is polished

Brody•6mo ago

we aren't going to remove the legacy option*

angelo•6mo ago

Ofc we are moving fast- but migrating running workloads over will take time Its likely that we have a one way migration and will remove the option, like we did for BuildPacks to NixPacks

Brody•6mo ago

thankfully I wasn't around for that 😆

angelo•6mo ago

It was very painful But we did it And we’ll accomplish this as well

Brody•6mo ago

I just wanna know if we will see V2 runtime support volumes before bare metal, it's whats stopping alaanor from running this project on railway

angelo•6mo ago

Ofc, this will be added before metal Completely stateless workloads aren’t very useful

Brody•6mo ago

char said otherwise, unless I was misunderstanding him

angelo•6mo ago

Metal workloads will mostly be Trial and experimental workloads until we flight Railway Metal that requires volunes If the servers are plugged in, why not serve from then But we won’t stop the metal rollout until we have vol. support

Brody•6mo ago

right but people could benefit from volume support on the v2 runtime with the current gcp hosts, like OP, or everyone running uptime kuma that are getting EHOSTUNREACH

angelo•6mo ago

Yea, heard, it’s a, we are speedrunning all short coming fixes when we can Never enough hands on boards But its not a OR its a AND Metal is happening on a different timeframe than V2 cutover It just so happens that Metal will be V2 only, no sense in extending the lifetime of Legacy

Brody•6mo ago

ay at least i was right in that regard either way, would you say it would be safe to mark this thread as solved, and would i be correct in saying this wont be getting fixed on the legacy runtime?

angelo•6mo ago

Yep

AlaanorOP•6mo ago

This is really cool, appreciate the finding a lot. Thanks 👍 I'll be checking the railway changelog for v2 with volume frequently and hopefully one day I can be fully back on railway :)

Brody•6mo ago

hopefully!

AlaanorOP•5mo ago

@Brody @Angelo I saw v2 volume are now a thing and so I spent the day to setup the stuff and try on railway again but unfortunately it's still stuck this way. Not to complain or anything, just wanted to share that it did not magically fix that, as we though it perhaps would.

Brody•5mo ago

you might not be on the v2 runtime

AlaanorOP•5mo ago

at least the UI said I was on it, I remember you said that it might be a lie from the UI because it would not works with volume, but now we got volume on v2 so I though I could maybe trust that UI

Brody•5mo ago

not all of railway's hosts support volumes on the v2 runtime, a surefire way to be sure you are on the v2 runtime would be to check for container event logs like "starting container"

AlaanorOP•5d ago

Just wanted to tell that I have finally moved back this particular service to railway and it's working great now 👍 thanks again for all

Brody•4d ago

that's awesome!

Gaming

Programming

Reading ~30gb sequentially from the volume make the ram go to ~30gb