R
RunPod•2mo ago
Aurelia

cpu instances don't work

2024-06-05T20:19:37Z create container runpod/base:0.5.1-cpu 2024-06-05T20:19:38Z 0.5.1-cpu Pulling from runpod/base 2024-06-05T20:19:38Z Digest: sha256:7530e77d6014bd6f3e1939b8d9003d8f7d2bd35a98395c4d297ac3b7a6d05b85 2024-06-05T20:19:38Z Status: Image is up to date for runpod/base:0.5.1-cpu 2024-06-05T20:20:38Z error creating container: container: create: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.43/containers/2fc1150401eeace7c2f58423e071f9686d6faaa89c28c7f50cf249b8b3f5ada4/start": context deadline exceeded 2024-06-05T20:20:58Z create container runpod/base:0.5.1-cpu 2024-06-05T20:20:59Z 0.5.1-cpu Pulling from runpod/base 2024-06-05T20:20:59Z Digest: sha256:7530e77d6014bd6f3e1939b8d9003d8f7d2bd35a98395c4d297ac3b7a6d05b85 2024-06-05T20:20:59Z Status: Image is up to date for runpod/base:0.5.1-cpu 2024-06-05T20:21:48Z start container 2024-06-05T20:21:49Z error starting container: Error response from daemon: failed to create task for container: failed to create shim task: ENOENT: No such file or directory: unknown 2024-06-05T20:22:05Z start container 2024-06-05T20:22:06Z error starting container: Error response from daemon: failed to create task for container: failed to create shim task: the file /start.sh was not found: unknown
6 Replies
Aurelia
Aurelia•2mo ago
note I've tested this in the RO region and it failed, doing it on the 128 CPU model as well as the 64 CPU one it does work on the 2 CPU instance testing on 32 cpus
2024-06-05T20:34:07Z create container runpod/base:0.5.1-cpu 2024-06-05T20:34:08Z 0.5.1-cpu Pulling from runpod/base 2024-06-05T20:34:08Z Digest: sha256:7530e77d6014bd6f3e1939b8d9003d8f7d2bd35a98395c4d297ac3b7a6d05b85 2024-06-05T20:34:08Z Status: Image is up to date for runpod/base:0.5.1-cpu 2024-06-05T20:34:09Z error creating container: container: create: Error response from daemon: driver failed programming external connectivity on endpoint opfoebvlassaxv-0 (c00b1b5fad13eb28f13c4d2ef431a59774c3b36d1bec10b373bee6a2a310437f): Error starting userland proxy: listen tcp4 100.65.19.241:60648: bind: address already in use 2024-06-05T20:34:25Z start container 2024-06-05T20:34:25Z error starting container: Error response from daemon: driver failed programming external connectivity on endpoint opfoebvlassaxv-0 (aef02d6c2ed3691d8ba60ca4642f05f05388f7cb7ed726496d025b6b32658a45): Error starting userland proxy: listen tcp4 100.65.19.241:60650: bind: address already in use
32 CPUs gives a different error! trying for 16 CPUs 16 cpus sets up just fine seems to fail for everything bigger than 16 CPUS from my informal testing
haris
haris•2mo ago
cc: @flash-singh
flash-singh
flash-singh•2mo ago
32 vcpu works, try that again, 64 cores or higher does fail for me, ill check those
nerdylive
nerdylive•2mo ago
I've ever experienced that, because of the tcp ports are occupied if im not wrong
Encyrption
Encyrption•5w ago
I am having similar issue. I can deploy my app on all instances except for 64 & 128 vCPU. Both of these run on AMD EPYC 9754 128-Core Processor. When it tries to run it gets stuck in QUEUE. With the error (pasted below). When this happens it then just loops between "start container" and "failed to create shim task: the file python was not found: unknown". This seems to be the same issue described above. I hope they can resolve this issue quickly!🤞 ERROR from instance: error creating container: container: create: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.43/containers/03f5da1a67e9f72498f779b9923cb7927a703cc84d173fa038041e72a7caac9b/start": context deadline exceeded
flash-singh
flash-singh•5w ago
this is a known issue, will keep updated as we get closer to fixing it