sbhavani
RRunPod
•Created by sbhavani on 4/1/2024 in #⚡|serverless
NGC containers
@Geri Take a look at the versions used in https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html. That should give you an idea of compatibility across torch and nvidia packages
66 replies
RRunPod
•Created by sbhavani on 4/1/2024 in #⚡|serverless
NGC containers
thanks! I'll test it out on friday!
66 replies
RRunPod
•Created by sbhavani on 4/1/2024 in #⚡|serverless
NGC containers
then sounds like it works! if you publish to the community I'll test it out as well
66 replies
RRunPod
•Created by sbhavani on 4/1/2024 in #⚡|serverless
NGC containers
I can clean up this repo and add a HF LLama-2/3 example comparing BF16 and FP8 throughput: https://github.com/sbhavani/h100-performance-tests
66 replies
RRunPod
•Created by sbhavani on 4/1/2024 in #⚡|serverless
NGC containers
hmm actually that code is more functional testing, I don't have anything readily available to test perf/speed up
66 replies
RRunPod
•Created by sbhavani on 4/1/2024 in #⚡|serverless
NGC containers
https://github.com/NVIDIA/TransformerEngine?tab=readme-ov-file#pytorch - sample code here!
66 replies
RRunPod
•Created by sbhavani on 4/1/2024 in #⚡|serverless
NGC containers
yes template for pods, I guess it depends on the driver version for the host too
66 replies
RRunPod
•Created by sbhavani on 4/1/2024 in #⚡|serverless
NGC containers
latest container from a few days ago:
nvcr.io/nvidia/pytorch:24.04-py3
66 replies
RRunPod
•Created by sbhavani on 4/1/2024 in #⚡|serverless
NGC containers
a pod doesn't give you access to the host machine
66 replies
RRunPod
•Created by sbhavani on 4/1/2024 in #⚡|serverless
NGC containers
this is for pods, a pod still runs a container
66 replies
RRunPod
•Created by sbhavani on 4/1/2024 in #⚡|serverless
NGC containers
anyways I think I can create a template to fix it with my remaining few dollars of credits 😅
66 replies
RRunPod
•Created by sbhavani on 4/1/2024 in #⚡|serverless
NGC containers
but how would I SSH into the container or is the SSH command for host machine with Docker access?
66 replies
RRunPod
•Created by sbhavani on 4/1/2024 in #⚡|serverless
NGC containers
So I'm not sure what you can do after deploying a "RunPod PyTorch NGC" template
66 replies
RRunPod
•Created by sbhavani on 4/1/2024 in #⚡|serverless
NGC containers
how do you use a container and "Connect" if there's no SSH access? There's also no option to SSH into the host and use the container interactively
66 replies
RRunPod
•Created by sbhavani on 4/1/2024 in #⚡|serverless
NGC containers
yeah I got that, I guess I was just too lazy to add the required ssh libs and create that template
I also didn't understand why RunPod PyTorch NGC containers are available in the dropdown selection if the limitations are known. Maybe I'm just not using it correctly?
66 replies
RRunPod
•Created by sbhavani on 4/1/2024 in #⚡|serverless
NGC containers
Any update on this?
CC @mmoy
66 replies
RRunPod
•Created by sbhavani on 4/1/2024 in #⚡|serverless
NGC containers
Is there any docs or quick example on how to use it?
66 replies