R
RunPod•4w ago
fireice

Why no gpu in canada data center today?

My network volume is in ca-mtl-1, there is no any gpu now.
No description
Solution:
Hey y'all, we disable the creation of new pods four days before a maintenance to stop further issues (this was not something I was personally aware of until now otherwise it would have been posted in #🚨|incidents). However, I talked with the team and you should be able to create new pods again, let me know if you're running into any issues.
Jump to solution
20 Replies
digigoblin
digigoblin•4w ago
Read #🚨|incidents , its scheduled for maintenance thats why
moez4921
moez4921•4w ago
@haris @Finley it's been more than 4 hours since the outage started. aren't you going to declare an incident and give some updates? looking at the green status on https://uptime.runpod.io, I suspect that your monitoring has not caught this issue.
nerdylive
nerdylive•4w ago
@Madiator2011 (Work) any idea about this?
digigoblin
digigoblin•4w ago
@nerdylive Its because RunPod disables the DC before maintenance is about to begin, probably because people don't read and then they log unneccessary support tickets.
nerdylive
nerdylive•4w ago
Oh long before read where?
digigoblin
digigoblin•4w ago
I already mentioned this elsewhere, but @fireice being an idiot and giving me a thumbs down already proves my point.
nerdylive
nerdylive•4w ago
Oof wheres that info from btw
digigoblin
digigoblin•4w ago
I know this from previous experience, Zeen or someone like that mentioned it.
nerdylive
nerdylive•4w ago
Ooh like days before?
digigoblin
digigoblin•4w ago
Yes
nerdylive
nerdylive•4w ago
ic ic yeah thats probably it
digigoblin
digigoblin•4w ago
No point in allowing someone to create a pod and have training that runs for days and gets interrupted
nerdylive
nerdylive•4w ago
yeah correct hahah, but it should be on #🚨|incidents too next time when its gonna be disabled
digigoblin
digigoblin•4w ago
Yeah agreed, RunPod communication is ALWAYS appalling, its about 1% better but still has a LONG way to go
moez4921
moez4921•4w ago
just got an email response from them confirming what @digigoblin says. they are disabling new machine creations. the morale, as there is no way to clone network volumes (correct me if I'm wrong), you better continuously make backups using https://syncthing.net/ or something like that.
digigoblin
digigoblin•4w ago
Yep, I guess this is the point of the lack of communication, people need to know when a DC is going to be taken offline for maintenace a few days in advance so that they can start migrating their data to a different DC. When #🚨|incidents says its only going to be offline for maintenance on Monday, but no new pods can be created 4 days ahead of time, then its a problem because people can't access their data to make alternate arragments. @haris
Madiator2011 (Work)
Madiator2011 (Work)•4w ago
I mean you should always do backup when you upload data as cloud is basically someone else computer
Solution
haris
haris•4w ago
Hey y'all, we disable the creation of new pods four days before a maintenance to stop further issues (this was not something I was personally aware of until now otherwise it would have been posted in #🚨|incidents). However, I talked with the team and you should be able to create new pods again, let me know if you're running into any issues.
nerdylive
nerdylive•4w ago
But the maintanance will be executed in the same schedule?
haris
haris•4w ago
Yep, as far as I know but I will double check