RunPod•2w ago

error creating container

error creating container: nvidia-smi: parsing output of line 0: failed to parse (pcie.link.gen.max) into int: strconv.Atoi: parsing "": invalid syntax

59 Replies

riverfog7•2w ago

hi what gpu r u using

Anmol SharmaOP•2w ago

3090

riverfog7•2w ago

what image..?

Anmol SharmaOP•2w ago

i also got same error for 2.4.0

riverfog7•2w ago

community cloud or secure cloud?

Anmol SharmaOP•2w ago

ON-demand

riverfog7•2w ago

mine works fine

Anmol SharmaOP•2w ago

i changed Container Disk volume to 200

riverfog7•2w ago

any changes in CMD?

Anmol SharmaOP•2w ago

so it chnages from spot to on-demand no

riverfog7•2w ago

can you try recreating the thing

Anmol SharmaOP•2w ago

can you come on vc and help me out

riverfog7•2w ago

mine works with a 3090 and that template i cant speak here

Anmol SharmaOP•2w ago

on lounge vc?

riverfog7•2w ago

im in a library soo

Anmol SharmaOP•2w ago

u just be mute and see and type for anything wrong please

riverfog7•2w ago

okay why tho mine starts fine

Anmol SharmaOP•2w ago

i tried diffrent template also i want to train a cv model so thats why i am using it

riverfog7•2w ago

other gpus too? try A40 maybe that specific 3090 has a problem

Anmol SharmaOP•2w ago

yeah so which is the best gpu

riverfog7•2w ago

depends on the model

Anmol SharmaOP•2w ago

i can use for training my cv model

riverfog7•2w ago

what's your model like big or small if its small and you wanna be cost effective A40 / 3090 / 4090 sth like this if its large and doesnt fit in those gpus or u wanna do large batch sizes then go with A100 / H100 / H200 / B100(this is overkill tho) @Anmol Sharma did u solve it @Dj sorry for the ping but I think that nvidia-smi: failed to parse pcie.link.gen.max is a HW error. can you check that?

Anmol SharmaOP•2w ago

ya sorry something came up , had to go out of my room ya i will use a40 then

Poddy•2w ago

@Anmol Sharma

Escalated To Zendesk

The thread has been escalated to Zendesk!

Ticket ID: #16220

Jason•2w ago

can you open a ticket for that

Anmol SharmaOP•2w ago

okay i have one more problem so when i upload my dataset it is crashing

riverfog7•2w ago

lets just talk here

Anmol SharmaOP•2w ago

riverfog7•2w ago

how large is the dataset? in gigabytes?

Anmol SharmaOP•2w ago

yes i want to use 3 datasets

riverfog7•2w ago

and what are you uploading with??

Anmol SharmaOP•2w ago

one is 32gb , 14gb and 5gb

riverfog7•2w ago

hmm.. did you provision enough storage? where are you uploading to? /workspace?

Anmol SharmaOP•2w ago

have a look to this

riverfog7•2w ago

how much images are in your dataset?

Dj•2w ago

@Anmol Sharma Can you share the pod id of the pod that gave you the error at the top of this thread?

Anmol SharmaOP•2w ago

umm i dont have a note of that

riverfog7•2w ago

you can check at audit logs

Anmol SharmaOP•2w ago

give me a sec sorry for typo

Dj•2w ago

I can grab it too, it's just easier if you already know the id :p If you can't track it down let me know I'll go find it in your account history.

Anmol SharmaOP•2w ago

yeah please help me out

riverfog7•2w ago

where r u uploading?

Anmol SharmaOP•2w ago

in a folder i am creating

riverfog7•2w ago

and maybe use a network volume as it is persistent and not subject to dataloss it needs to be in workspace

Anmol SharmaOP•2w ago

ya ya

riverfog7•2w ago

and make volume disk 200gigs and container disk 20gigs

Anmol SharmaOP•2w ago

in that i am creating a a sub folder

riverfog7•2w ago

volume disk -> /workspace container disk - >everything else

Anmol SharmaOP•7d ago

let me try and i will get back to you i am getting s much less speed that turtle is faster, i have a running pod for last 1 hour and still the dataset is being uploaded which is 5 gb, only 517.00MB have been uploaded so far

Anmol SharmaOP•7d ago

and it exited

riverfog7•7d ago

its a spot pod use on demand if you want it to never exit

Anmol SharmaOP•6d ago

let me try it

KaSuTeRaMAX•2d ago

I am also suffering from this phenomenon. The image is a self-made image based on nvidia/cuda:12.6.2-cudnn-runtime-ubuntu24.04. The information I have is - It occurs in serverless - RTX3090 - With the same image, this problem sometimes occurs and sometimes does not (if it does occur, it goes into an infinite loop, so I end it with "Terminate") - It seems to have started occurring recently in EU-CZ-1 - I think it has been occurring frequently in the US for some time now

yhlong00000•2d ago

I have a feeling this might relate to cuda version, you want to filter machine has 12.6+

Gaming

Programming

error creating container

Did you find this page helpful?