CPU Instances on 64 / 128 vCPUs FAIL
I can deploy my app on all instances except for 64 & 128 vCPU. Both of these run on AMD EPYC 9754 128-Core Processor. When it tries to run it gets stuck in QUEUE with the error (pasted below). When this happens it then just loops between "start container" and "failed to create shim task: the file python was not found: unknown". Any ideas what is causing this and how to resolve? There is similar issue reported in pods section here but I am using serverless and getting same problem. ERROR from instance: error creating container: container: create: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.43/containers/03f5da1a67e9f72498f779b9923cb7927a703cc84d173fa038041e72a7caac9b/start": context deadline exceeded
14 Replies
i think theres some ongoing bug for this
I know RunPod focus is on GPU instances but these must be their most profitable CPU instances. I've not experienced their support yet 🤞
yeah probably, i think runpod's still on it
max is 32 vcpus right now yeah?
Yeah, seems 32 vCPU/128GB is the biggest CPU instance until this issue is resolved. Too bad for my thread/RAM heavy app. 128 vCPU/256GB would be much better fit. Limits the payloads I can process 😦 oh well
Yeah sorry, thats the only available for now
Btw if i may know, just curious what kind of app is it you're running
It is a video conversion tool, ArtisanASCII. It takes video as input and converts the frames into ascii characters which forms an ASCII art video. It all depends on the scale factor from the original. The closer to a 1 to 1 scale factor (pixel to character) the RAM resources go to the moon. With all payloads I use 100% of all threads available to make it quicker. I've only been able to test on machines with 128GB RAM and cannot finish 1 to 1 scaled 1 minute video without running out a RAM. Was hoping to see what 256GB could do. I know with 128 cores it would have been VERY fast!
Ooh wow thats cool hahah
i c
Could use disk instead of RAM but it would take at least 2 - 4 forevers to complete. LOL
lol
yeah i guess, what we can do rn is wait for the bigger instances to be available ...
128 vcpu is not the same as 128 cores
For me it means the same thing, 128 threads. Well, not really I am more so after the 256GB of RAM than the 128 threads but I would use them all... you know if the 128 vCPU systems actually worked.
Its definitely NOT the same thing, you can have more than 128 threads with 128 cores.
Please excuse my ignorance, it seems like you have a lot more knowledge on the subject than I do. On my home server I have 20 physical cores. In my code I spin up 20 threads and top shows near 100% usage of all cores. How can I get more than that out of the hardware? I would appreciate your thoughts on this.
Well spin out more threads, simply by doing that you might gain performance but not as much because the cpu will be shared if they are maxed out and there will be like slower cpu cycles if they are all working nearly 100% on all cores ( If I'm not wrong )