R
RunPod4w ago
Omer

DATA LOSS IN EU-RO-1 - URGENT

Need someone to communicate ASAP this is literally sev0. We have all of our data and work backed in that network drive and out of the blue the files just disappeared.
64 Replies
Madiator2011
Madiator20114w ago
Could you submit ticket on website with all info
Omer
Omer4w ago
4667 Ticket number, we already opened one @Papa Madiator @Papa Madiator I’m waiting?? Whose joining the ticket yall We have models, datasets, code that are on that drive we need someone ASAP and I cannot stress this enough
nerdylive
nerdylive4w ago
Hmm same for me haha My ticket I'd 4609 still no response yet trying to be patient
Omer
Omer4w ago
@nerdylive when did this happen to you? I’m not sure if I’m calmer or more anxious by the fact that is not just us lol Just panicking at the moment tbh
nerdylive
nerdylive4w ago
Hmm like a few days ago
Omer
Omer4w ago
And no response yet?? This is sick
nerdylive
nerdylive4w ago
There are 3 days ago it happened Still being investigated they said I deleted it and made a new one, it was all gone like a new fresh one
Omer
Omer4w ago
Ok now I’m officially freaking out
nerdylive
nerdylive4w ago
calm down, maybe they can recover it, because they replicate network drive ( if I'm not wrong )
Omer
Omer4w ago
I sure hope so.. there’s no way for us to recover from this if the driver is gone
nerdylive
nerdylive4w ago
Wait what driver
Omer
Omer4w ago
Network driver Drive Sorry
nerdylive
nerdylive4w ago
Oh yeah Let's just wait for now
digigoblin
digigoblin4w ago
Is this happening in a specific region or all regions?
nerdylive
nerdylive4w ago
I'm not sure, what ur region @Omer
Omer
Omer4w ago
Eu-Ro-1
nerdylive
nerdylive4w ago
Oh same
digigoblin
digigoblin4w ago
So seems some issue in RO region then 🙈 Let me check mine and see if my files are disappearing too.
nerdylive
nerdylive4w ago
Yeah sure let us know too
digigoblin
digigoblin4w ago
My A1111 volume is still fine, it didn't need to sync anything from Hugging Face
No description
digigoblin
digigoblin4w ago
I'll test ComfyUI as well.
xcxooxl
xcxooxl4w ago
Did you do anything heavy i/o when it happened like download or uploading large files or creating lots of files?
nerdylive
nerdylive4w ago
Hmm I don't know When it exactly happens, but when I check its gone
digigoblin
digigoblin4w ago
I can confirm that I've lost files from my ComfyUI network volume in RO region as well 😱
nerdylive
nerdylive4w ago
Wait really? 🤔
digigoblin
digigoblin4w ago
Yep, lost about 320MB of data in RO region for my ComfyUI storage I resynced it from my NO storage, but I've removed RO from the list of my RabbitMQ consumers for now anyway until the issue is resolved. Now I need to back up my data to external cloud storage because RunPod is unreliable 😱
nerdylive
nerdylive4w ago
Make a ticket too They replicate network volumes right? How can we experience this
digigoblin
digigoblin4w ago
Supposedly, there is probably something wrong with the replication 🤷‍♂️ My ticket number is 4669
Madiator2011
Madiator20114w ago
Forwarded @Omer @digigoblin @nerdylive could you also provide info on what datacenter you had issues, what templates did you used, are there any auto syncing functions in templates. That would help a lot.
digigoblin
digigoblin4w ago
All RO
nerdylive
nerdylive4w ago
yep
digigoblin
digigoblin4w ago
No autosyncing, Pytorch template to check and data was gone
nerdylive
nerdylive4w ago
yes, but my template works well for the autosync i've used it just today, yesterday with new network volume and it works just fine.. didn't delete other datas
Madiator2011
Madiator20114w ago
We kinda trying to get some intel what might be going on
digigoblin
digigoblin4w ago
Yeah, I haven't had issues either, only checked when @Omer and @nerdylive said they lost data and I saw I lost data too.
nerdylive
nerdylive4w ago
Somebody uses my template and said it was fine too btw
Madiator2011
Madiator20114w ago
btw was NS attached to some pod or just laying arround untuched when data lost happened?
nerdylive
nerdylive4w ago
nothing happened on the host side?
digigoblin
digigoblin4w ago
Mine is attached to serverless not pods.
nerdylive
nerdylive4w ago
not sure when it happened actually but its attached to serverless both cpu, gpu
digigoblin
digigoblin4w ago
I attached it to a pod just to check if data was lost, and discovered it was.
nerdylive
nerdylive4w ago
2 also using pods to download before
digigoblin
digigoblin4w ago
Other regions seem to be ok, I use NO, SE, CA and RO, and only RO seems to be affected.
Omer
Omer4w ago
Ours was attached to a pod @Papa Madiator the most important thing - is there any backup to the storage?
Madiator2011
Madiator20114w ago
for secure cloud it's possible (I do not have high level access though) we kinda need figure out why data is gone at first place
nerdylive
nerdylive4w ago
even when i deleted the empty ns?
Omer
Omer4w ago
There are no network drives on community pods And the loss happened in a network drive
nerdylive
nerdylive4w ago
yeap all secure cloud
digigoblin
digigoblin4w ago
What do you mean by deleting empty ns? Why would you want to delete it if its already empty?
Omer
Omer4w ago
We had a terabyte of storage in the drive , time is of the essence- is there anyone available for a bridge?
digigoblin
digigoblin4w ago
How much did you lose? everything?
Omer
Omer4w ago
Every single bit
nerdylive
nerdylive4w ago
well it still charges my account right?
digigoblin
digigoblin4w ago
And its still gone when you mount to a new pod @Omer ?
Omer
Omer4w ago
Yes
nerdylive
nerdylive4w ago
i don't see a reason to use it anymore when it doesnt have the data and it seems to be broken lol
digigoblin
digigoblin4w ago
Oh, I get what you mean now, sorry was confused
nerdylive
nerdylive4w ago
yeah np
xcxooxl
xcxooxl4w ago
any updatesd?
nerdylive
nerdylive4w ago
Not yet
digigoblin
digigoblin4w ago
@xcxooxl did you log a ticket for it if you're also experiencing data loss?
haris
haris4w ago
@xcxooxl @digigoblin @Omer what templates were used when this happened?
nerdylive
nerdylive4w ago
digigoblin used pytorch template to check, i think
digigoblin
digigoblin4w ago
Yep