DATA LOSS IN EU-RO-1 - URGENT
Need someone to communicate ASAP this is literally sev0. We have all of our data and work backed in that network drive and out of the blue the files just disappeared.
64 Replies
Could you submit ticket on website with all info
4667
Ticket number, we already opened one @Papa Madiator
@Papa Madiator I’m waiting??
Whose joining the ticket yall
We have models, datasets, code that are on that drive we need someone ASAP and I cannot stress this enough
Hmm same for me haha
My ticket I'd 4609 still no response yet trying to be patient
@nerdylive when did this happen to you?
I’m not sure if I’m calmer or more anxious by the fact that is not just us lol
Just panicking at the moment tbh
Hmm like a few days ago
And no response yet??
This is sick
There are
3 days ago it happened
Still being investigated they said
I deleted it and made a new one, it was all gone like a new fresh one
Ok now I’m officially freaking out
calm down, maybe they can recover it, because they replicate network drive ( if I'm not wrong )
I sure hope so.. there’s no way for us to recover from this if the driver is gone
Wait what driver
Network driver
Drive
Sorry
Oh yeah
Let's just wait for now
Is this happening in a specific region or all regions?
I'm not sure, what ur region @Omer
Eu-Ro-1
Oh same
So seems some issue in RO region then 🙈
Let me check mine and see if my files are disappearing too.
Yeah sure let us know too
My A1111 volume is still fine, it didn't need to sync anything from Hugging Face
I'll test ComfyUI as well.
Did you do anything heavy i/o when it happened like download or uploading large files or creating lots of files?
Hmm I don't know
When it exactly happens, but when I check its gone
I can confirm that I've lost files from my ComfyUI network volume in RO region as well 😱
Wait really?
🤔
Yep, lost about 320MB of data in RO region for my ComfyUI storage
I resynced it from my NO storage, but I've removed RO from the list of my RabbitMQ consumers for now anyway until the issue is resolved.
Now I need to back up my data to external cloud storage because RunPod is unreliable 😱
Make a ticket too
They replicate network volumes right? How can we experience this
Supposedly, there is probably something wrong with the replication 🤷♂️
My ticket number is 4669
Forwarded
@Omer @digigoblin @nerdylive could you also provide info on what datacenter you had issues, what templates did you used, are there any auto syncing functions in templates. That would help a lot.
All RO
yep
No autosyncing, Pytorch template to check and data was gone
yes, but my template works well for the autosync
i've used it just today, yesterday with new network volume and it works just fine.. didn't delete other datas
We kinda trying to get some intel what might be going on
Yeah, I haven't had issues either, only checked when @Omer and @nerdylive said they lost data and I saw I lost data too.
Somebody uses my template and said it was fine too btw
btw was NS attached to some pod or just laying arround untuched when data lost happened?
nothing happened on the host side?
Mine is attached to serverless not pods.
not sure when it happened actually but its attached to serverless both cpu, gpu
I attached it to a pod just to check if data was lost, and discovered it was.
2
also using pods to download before
Other regions seem to be ok, I use NO, SE, CA and RO, and only RO seems to be affected.
Ours was attached to a pod
@Papa Madiator the most important thing - is there any backup to the storage?
for secure cloud it's possible (I do not have high level access though)
we kinda need figure out why data is gone at first place
even when i deleted the empty ns?
There are no network drives on community pods
And the loss happened in a network drive
yeap all secure cloud
What do you mean by deleting empty ns?
Why would you want to delete it if its already empty?
We had a terabyte of storage in the drive , time is of the essence- is there anyone available for a bridge?
How much did you lose? everything?
Every single bit
well it still charges my account right?
And its still gone when you mount to a new pod @Omer ?
Yes
i don't see a reason to use it anymore when it doesnt have the data and it seems to be broken lol
Oh, I get what you mean now, sorry was confused
yeah np
any updatesd?
Not yet
@xcxooxl did you log a ticket for it if you're also experiencing data loss?
@xcxooxl @digigoblin @Omer what templates were used when this happened?
digigoblin used pytorch template to check, i think
Yep