Unable to start pod with MI300x
Observing "hang" when starting pod with 8xMI300x, screenshot attached. Any ideas on how to fix this?
3 Replies
I am able to run with 8xMI300X using official templates, i am wondering if something related to your image?
Whats your dockerfile like?
how do you call the cmd / entrypoint
Gotcha -- this was the image that was used: https://hub.docker.com/r/eliovp/rocm6.1.2_py3.10_torch2.5_vllm0.5_bkc
Using the official rocm pytorch images from runpod seems to work. Thanks!