error creating container

error creating container: nvidia-smi: parsing output of line 0: failed to parse (pcie.link.gen.max) into int: strconv.Atoi: parsing "": invalid syntax
No description
59 Replies
riverfog7
riverfog72w ago
hi what gpu r u using
Anmol Sharma
Anmol SharmaOP2w ago
3090
riverfog7
riverfog72w ago
what image..?
Anmol Sharma
Anmol SharmaOP2w ago
No description
Anmol Sharma
Anmol SharmaOP2w ago
i also got same error for 2.4.0
riverfog7
riverfog72w ago
community cloud or secure cloud?
Anmol Sharma
Anmol SharmaOP2w ago
ON-demand
riverfog7
riverfog72w ago
mine works fine
Anmol Sharma
Anmol SharmaOP2w ago
i changed Container Disk volume to 200
riverfog7
riverfog72w ago
any changes in CMD?
Anmol Sharma
Anmol SharmaOP2w ago
so it chnages from spot to on-demand no
riverfog7
riverfog72w ago
can you try recreating the thing
Anmol Sharma
Anmol SharmaOP2w ago
can you come on vc and help me out
riverfog7
riverfog72w ago
mine works with a 3090 and that template i cant speak here
Anmol Sharma
Anmol SharmaOP2w ago
on lounge vc?
riverfog7
riverfog72w ago
im in a library soo
Anmol Sharma
Anmol SharmaOP2w ago
u just be mute and see and type for anything wrong please
riverfog7
riverfog72w ago
okay why tho mine starts fine
Anmol Sharma
Anmol SharmaOP2w ago
i tried diffrent template also i want to train a cv model so thats why i am using it
riverfog7
riverfog72w ago
other gpus too? try A40 maybe that specific 3090 has a problem
Anmol Sharma
Anmol SharmaOP2w ago
yeah so which is the best gpu
riverfog7
riverfog72w ago
depends on the model
Anmol Sharma
Anmol SharmaOP2w ago
i can use for training my cv model
riverfog7
riverfog72w ago
what's your model like big or small if its small and you wanna be cost effective A40 / 3090 / 4090 sth like this if its large and doesnt fit in those gpus or u wanna do large batch sizes then go with A100 / H100 / H200 / B100(this is overkill tho) @Anmol Sharma did u solve it @Dj sorry for the ping but I think that nvidia-smi: failed to parse pcie.link.gen.max is a HW error. can you check that?
Anmol Sharma
Anmol SharmaOP2w ago
ya sorry something came up , had to go out of my room ya i will use a40 then
Poddy
Poddy2w ago
@Anmol Sharma
Escalated To Zendesk
The thread has been escalated to Zendesk!
Ticket ID: #16220
Jason
Jason2w ago
can you open a ticket for that
Anmol Sharma
Anmol SharmaOP2w ago
okay i have one more problem so when i upload my dataset it is crashing
riverfog7
riverfog72w ago
lets just talk here
Anmol Sharma
Anmol SharmaOP2w ago
ok
riverfog7
riverfog72w ago
how large is the dataset? in gigabytes?
Anmol Sharma
Anmol SharmaOP2w ago
yes i want to use 3 datasets
riverfog7
riverfog72w ago
and what are you uploading with??
Anmol Sharma
Anmol SharmaOP2w ago
one is 32gb , 14gb and 5gb
riverfog7
riverfog72w ago
hmm.. did you provision enough storage? where are you uploading to? /workspace?
Anmol Sharma
Anmol SharmaOP2w ago
No description
Anmol Sharma
Anmol SharmaOP2w ago
have a look to this
riverfog7
riverfog72w ago
how much images are in your dataset?
Dj
Dj2w ago
@Anmol Sharma Can you share the pod id of the pod that gave you the error at the top of this thread?
Anmol Sharma
Anmol SharmaOP2w ago
umm i dont have a note of that
riverfog7
riverfog72w ago
you can check at audit logs
Anmol Sharma
Anmol SharmaOP2w ago
give me a sec sorry for typo
Dj
Dj2w ago
I can grab it too, it's just easier if you already know the id :p If you can't track it down let me know I'll go find it in your account history.
Anmol Sharma
Anmol SharmaOP2w ago
No description
Anmol Sharma
Anmol SharmaOP2w ago
yeah please help me out
riverfog7
riverfog72w ago
where r u uploading?
Anmol Sharma
Anmol SharmaOP2w ago
in a folder i am creating
riverfog7
riverfog72w ago
and maybe use a network volume as it is persistent and not subject to dataloss it needs to be in workspace
Anmol Sharma
Anmol SharmaOP2w ago
ya ya
riverfog7
riverfog72w ago
and make volume disk 200gigs and container disk 20gigs
Anmol Sharma
Anmol SharmaOP2w ago
in that i am creating a a sub folder
riverfog7
riverfog72w ago
volume disk -> /workspace container disk - >everything else
Anmol Sharma
Anmol SharmaOP7d ago
let me try and i will get back to you i am getting s much less speed that turtle is faster, i have a running pod for last 1 hour and still the dataset is being uploaded which is 5 gb, only 517.00MB have been uploaded so far
Anmol Sharma
Anmol SharmaOP7d ago
No description
Anmol Sharma
Anmol SharmaOP7d ago
and it exited
riverfog7
riverfog77d ago
its a spot pod use on demand if you want it to never exit
Anmol Sharma
Anmol SharmaOP6d ago
let me try it
KaSuTeRaMAX
KaSuTeRaMAX2d ago
I am also suffering from this phenomenon. The image is a self-made image based on nvidia/cuda:12.6.2-cudnn-runtime-ubuntu24.04. The information I have is - It occurs in serverless - RTX3090 - With the same image, this problem sometimes occurs and sometimes does not (if it does occur, it goes into an infinite loop, so I end it with "Terminate") - It seems to have started occurring recently in EU-CZ-1 - I think it has been occurring frequently in the US for some time now
yhlong00000
yhlong000002d ago
I have a feeling this might relate to cuda version, you want to filter machine has 12.6+

Did you find this page helpful?