45 Replies
This platfrom is not usable any more with these low download speeds
Is this secure cloud or community cloud? And what is the pod id?
yeah i have seen this all day today and yesterday
same here. I always need to restart my down- or uploads as the speed is always declining very fast after a few minutes to 600-800 KB/S
in community cloud, e.g. ut51facl5q9ned
I've noticed this a lot recently as well. I have fiber internet, speedtest-cli will show a decent speed of a few hundred Mb or so on the pod, I'll get runpodctl up and running and... it's a crawl of maybe 8 Mb/s if I'm lucky. More insulting, often the first chunk I upload of my data is pretty quick, and then it dips. But for 300 GB of data, I can't take the cost of leaving the instance running and doing nothing like I tried last week, so no pod id to share unfortunately. I considered using a network volume but there are never, ever GPUs available in Secure Cloud so 🤷♂️
@xxeviouxx / @bennyx0x0x / @H4RV3YD3NT / @arhanovich :
https://discord.com/channels/912829806415085598/1207175605255147571
If you do happen to see it again, i'm not a staff, but I was also curious about this issue, and after doing some research wrote this little script to help do speed-test sanity checking + also dump relevant pod information into a text file you can share 🙂 if it happens again, feel free to make a post + share the text file, as I think that would be helpful to the runpod staff to actually help know if it is runpod's end or maybe your downstream source etc.
issue still continues
I will cancel my subscription
If no staff is interested
@arhanovich I did not see a response to this earlier.
Secure Cloud
And the pod id?
I have deleted 3 of them and opened a new one with ID: e4r5seqn1xadhn
(Just wondering) can you share more info about where you are downloading from too? 300kb is also crazy slow, but be helpful to know what is it downloading from. Like a command etc.
70B model from huggingface
I tried 4 pods today
Cool~ hopefully the staff can help u check that out.
But also just wondering did u run the speed test i posted? Got any results to share?
the max is 25 MB/s
which takes many hours to download that model
Try using this package as well https://github.com/huggingface/hf_transfer
Just FYI - if you end up running this speed test, ull prob also get a better indication from the text file that gets generated if the pod is slow, or if that is hugging face's limitation. Could be on hugging face end, if you went through 4 pods, i imagine that their side is just slow.
https://huggingface.co/docs/huggingface_hub/en/guides/download
There are no longer any instances available with enough disk space. (I only want 300GB)
performance of runpod woefully decreased a lot
You are about to lose a lot of customers until you fix it
I remeber the old good days 😦
u could prob launch a network storage with 300 gb, which is prob what u would like anyways is my guess if ur downloading a model so big / prob want to persist it
and ive seen this before, but i think it usually becomes avaliable soon
But if u launch an instance with network storage attached, u wouldn't need to launch a single pod with just that much storage + prob be more flexible is my guess
Any runpod good alternatives
It sucks
@arhanovich what is the issue you are having? RunPod is awesome in my experience.
Man its download speed is very very low
and you do not get storage
Easily
What do you think I will run If I do not get enough storage?
What kind of storage are you referring to? Regular persistent storage or network volume storage?
Runpod used to be very very good
But it sucks now
Its still as good as ever for me
What do you run on runpod?
I have wasted 10 dollars today to find a single A100 today with download speed of 30 MB/s
Did u run the speed test? Honestly this seems more like a hugging face problem
Unless u run the speed test showing that it is a pod problem itself
What that means is that no matter what computer u use, u could be getting bottlenecked by HF, unless the speed test is showing u that ur download speeds just suck across the board
But yea vast.ai / tons of other alternatives > but so far I think runpod is still the easiest
U could always go to AWS / GCP if u want a big cloud provider / figure that ecosystem out
vast.ai sucks, I loaded $10 and never used it because I hate it
Also sorry just FYI side note, if you are just downloading a model, they released CPU Pods, so that you can just download the models directly to a Network storage
and then mount the more expensive H100 GPUs
CPU Pods are a new things, so that u arent just burning money downloading - i guess this would limit u in terms of region tho
CPU pods are only available in RO and there are no H100's available in RO
oh no D:
okok
dang
I guess any lower GPUs in the region of h100s then would also suffice, cause a model so big would take a while and burning that on an h100 for network request seems rough
Yeah I always use the cheapest available GPU to copy stuff onto my network storage
couple things to always try when you get low speeds
- run speedtest
- run ping test
ping google.com
packet loss can be a big issue
otherwise we are here to help but reporting low speeds downloading from huggingface is not enough info to go on
@nathaniel another good use case to add to runpodctl
for easy speedtest runsI am thinking of Using that option form huggingface service
Does it mean that 13 USD per hour like only the 1 hour of interference or like until you end the deployment like Run Pod?
Does anyone have any idea?
it probably till u end it
I havent heard of a service that just kills u after an hour
They want ur money probably haha
Yeah AWS also charges while its running until you stop it
Very bad
I agree that runpod has cost advantage
But problematic when dealşng with large models
Yeah, but I havent found a better alternative yet haha. If you do let me know xD
It seems just inherently a hard problem
why a lot of competitors are around in the space trying to solve this right now
Just the network bandwidth alone to move ML models are on avg stupidly large + then also the storage + training and deployment of them is really just new. Talked about this with my friends actually, but the reality is where do I move to if not runpod - cause everyone else is harder to use / screws me over in pricing so far
I guess I am on a luck pod lol
252 MB/s
Dropped to 8 MB/s
WOW
PRETTY FAIR
I have started thinking that this is intentionally done by Runpod to make you pay more usage
coz no way
Run the speed test? are u using the HF CLI tool?
Again, if you run the speed test, and it is consistently an issue, then its a problem for runpod
if not, it means that hugging face themselves is bottle necking u
Which means to use their CLI tool or something
WHy dont u just share the command ur running then or tell us more infor? or try to run the speed test and see if its actually the pod
Like u complain its runpod, but never try any the suggestions so far
Speed tests do not consider large files
I test against a 5GB out of a 200GB file on S3, if u want u can change that to be 20gb, 50GB etc and i have a max timeout on it, so that we can get the avg speed and not just wait there sitting for the whole file to download. It also auto cleans up and just sends it to null so isnt taking ur space on ur thing.
Just change the s3 download range,
But if for ex. the S3 is fast, it means Hugging face is limiting u
which then means follow using their CLI, which are u? etc.
my speed test runs against a civitai ai file download (2GB), a HF LLM model (5GB), and a S3 download file (5Gb / 200GB file, which u can change i fu want)
so it isnt just a normal speed test + i am testing larger files
That why im telling u to run the speed test I gave u, b/c if it is an issue, across the board its runpod's fault, but if not, it means ur limited by hugging face, and ull hit this no matter what
and also are u using the HF CLI tool? can u share ur ocmmand
The speed test I gave u is to 1) run pings 2) run checks against normal speed tests 3) to sanity check across multiple sources of files for downloading large files.
Honestly, u keep saying its runpod without even trying to prove it when i literally give u the script to prove it
And it seems to be a Huggingface issue more than a RunPod issue
I've seen a lot of people complaining about speed since Huggingface was down the other day, not sure whether its coincidence or not.