R
RunPod•7mo ago
Encyrption

CPU Instances on 64 / 128 vCPUs FAIL

I can deploy my app on all instances except for 64 & 128 vCPU. Both of these run on AMD EPYC 9754 128-Core Processor. When it tries to run it gets stuck in QUEUE with the error (pasted below). When this happens it then just loops between "start container" and "failed to create shim task: the file python was not found: unknown". Any ideas what is causing this and how to resolve? There is similar issue reported in pods section here but I am using serverless and getting same problem. ERROR from instance: error creating container: container: create: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.43/containers/03f5da1a67e9f72498f779b9923cb7927a703cc84d173fa038041e72a7caac9b/start": context deadline exceeded
14 Replies
nerdylive
nerdylive•7mo ago
i think theres some ongoing bug for this
Encyrption
EncyrptionOP•7mo ago
I know RunPod focus is on GPU instances but these must be their most profitable CPU instances. I've not experienced their support yet 🤞
nerdylive
nerdylive•7mo ago
yeah probably, i think runpod's still on it max is 32 vcpus right now yeah?
Encyrption
EncyrptionOP•7mo ago
Yeah, seems 32 vCPU/128GB is the biggest CPU instance until this issue is resolved. Too bad for my thread/RAM heavy app. 128 vCPU/256GB would be much better fit. Limits the payloads I can process 😦 oh well
nerdylive
nerdylive•7mo ago
Yeah sorry, thats the only available for now Btw if i may know, just curious what kind of app is it you're running
Encyrption
EncyrptionOP•7mo ago
It is a video conversion tool, ArtisanASCII. It takes video as input and converts the frames into ascii characters which forms an ASCII art video. It all depends on the scale factor from the original. The closer to a 1 to 1 scale factor (pixel to character) the RAM resources go to the moon. With all payloads I use 100% of all threads available to make it quicker. I've only been able to test on machines with 128GB RAM and cannot finish 1 to 1 scaled 1 minute video without running out a RAM. Was hoping to see what 256GB could do. I know with 128 cores it would have been VERY fast!
nerdylive
nerdylive•7mo ago
Ooh wow thats cool hahah i c
Encyrption
EncyrptionOP•7mo ago
Could use disk instead of RAM but it would take at least 2 - 4 forevers to complete. LOL
nerdylive
nerdylive•7mo ago
lol yeah i guess, what we can do rn is wait for the bigger instances to be available ...
digigoblin
digigoblin•7mo ago
128 vcpu is not the same as 128 cores
Encyrption
EncyrptionOP•7mo ago
For me it means the same thing, 128 threads. Well, not really I am more so after the 256GB of RAM than the 128 threads but I would use them all... you know if the 128 vCPU systems actually worked.
digigoblin
digigoblin•7mo ago
Its definitely NOT the same thing, you can have more than 128 threads with 128 cores.
Encyrption
EncyrptionOP•7mo ago
Please excuse my ignorance, it seems like you have a lot more knowledge on the subject than I do. On my home server I have 20 physical cores. In my code I spin up 20 threads and top shows near 100% usage of all cores. How can I get more than that out of the hardware? I would appreciate your thoughts on this.
nerdylive
nerdylive•7mo ago
Well spin out more threads, simply by doing that you might gain performance but not as much because the cpu will be shared if they are maxed out and there will be like slower cpu cycles if they are all working nearly 100% on all cores ( If I'm not wrong )

Did you find this page helpful?