R
RunPod•4mo ago
MokshMalik

Training jobs using script

Hey, Can anyone tell me if runpod gives the feature to create a training script that can be run from anywhere and I can use that to create a GPU instance, and load and save my data to external cloud storages just like in AWS Sagemaker training script mode? I need to train multiple models in such manner with different architectures to see which one performs the best.
20 Replies
nerdylive
nerdylive•4mo ago
Yes you can upload files to s3 storage Like in python you can do that too
yhlong00000
yhlong00000•4mo ago
Overview | RunPod Documentation
Unlock serverless functionality with RunPod SDKs, enabling developers to create custom logic, simplify deployments, and programatically manage infrastructure, including Pods, Templates, and Endpoints.
Overview | RunPod Documentation
RunPod CLI (runpodctl) is a command-line interface tool designed to automate and manage GPU pods on RunPod.
Export data | RunPod Documentation
Export RunPod data to various cloud providers, including Amazon S3, Google Cloud Storage, Microsoft Azure Blob Storage, Backblaze B2 Cloud Storage, and Dropbox, with secure key and access token management.
MokshMalik
MokshMalikOP•4mo ago
I'm fairly new to RunPod. Can you please point me to a tutorial where a remote training job is run on a pod, the model weights are stored on S3, and the pod automatically kills itself once the training is complete?
yhlong00000
yhlong00000•4mo ago
You probably have to write some code to pull data from s3 and after training you can terminate the pod using our cli. Btw, ChatGPT is really good at writing code😀 https://docs.runpod.io/cli/overview
Overview | RunPod Documentation
RunPod CLI (runpodctl) is a command-line interface tool designed to automate and manage GPU pods on RunPod.
nerdylive
nerdylive•4mo ago
yeah upload data to s3 after training
MokshMalik
MokshMalikOP•4mo ago
Sorry, it is still unclear. Does runpod has a tutorial on training a custom model on a GPU instance? I have tried searching for it, but I have not found any.
nerdylive
nerdylive•4mo ago
I think there is, but what kind of model are you trying to train?
nerdylive
nerdylive•4mo ago
RunPod Blog
Using RunPod's DreamBooth Endpoint to Make Custom Generated Images
DreamBooth provides a great way to take a Stable Diffusion model and train it to include a specific new concept (maybe your dog or a friend) making it capable of generating AI images featuring that concept. In a previous post we walked through using RunPod's template to set things and
Marcus
Marcus•4mo ago
Probably not working anymore since the Dreambooth endpoint used TheLastBen's code I recommend using Kohya_ss, EveryDream2Trainer or OneTrainer
Marcus
Marcus•4mo ago
This guy has some videos for training image models: https://www.youtube.com/@SECourses/videos
YouTube
SECourses
Welcome to Software Engineering Courses (SECourses) – the ultimate destination for skillfully curated insights into state-of-the-art technologies and programming paradigms. We demystify the realms of Artificial Intelligence, Stable Diffusion, DreamBooth, LoRA, ControlNet, Textual Inversion, Software Engineering, Programming, C#, .NET, ASP .NET, ...
Marcus
Marcus•4mo ago
What kind of model are you training?
MokshMalik
MokshMalikOP•4mo ago
Well, I'm training different kinds of segmentation models for my tasks, varying from simple U-Net to Attention U-Net, and might also go for transformer-based segmentation models. I'd like to run an instance for each model, so I can compare their performance in as little time as possible.
nerdylive
nerdylive•4mo ago
refer to your model library maybe? or the model repo then use custom script to execute python to use boto3 to upload to s3
MokshMalik
MokshMalikOP•4mo ago
A big problem is to auto-kill the pod once the training is complete and saving the model weights before that.
nerdylive
nerdylive•4mo ago
thats the high level
MokshMalik
MokshMalikOP•4mo ago
Can you please shed some light on how to auto-kill the instance?
Marcus
Marcus•4mo ago
runpodctl remove pod $RUNPOD_POD_ID
runpodctl remove pod $RUNPOD_POD_ID
nerdylive
nerdylive•4mo ago
Yeah you can exec the script to upload the models then run runpodctl remove pod like that
MokshMalik
MokshMalikOP•4mo ago
Okay, thanks! If I just stop my pod and do not remove it, will I still be billed? And once I'll be inside the pod, can I stop it from there? Will the command runpodctl remove pod $RUNPOD_POD_ID work from inside the pod?
nerdylive
nerdylive•4mo ago
yes you will still be billed for the storage probably you can run the remove pod command but you have to reexport runpod pod id from your start script ( cmd / entrypoint )
Want results from more Discord servers?
Add your server