R
RunPod•9mo ago
Monster

why don't I have a stop option, only terminate option available

No description
Solution:
I would use rclone rather than cloud sync. Cloud sync is built on stop of rclone anyway.
Jump to solution
52 Replies
digigoblin
digigoblin•9mo ago
You don't get a stop button when you use a network volume because its redundant. You can simply create a new pod with the network volume attached.
Monster
MonsterOP•9mo ago
do you mean create a new pod with local disk volume attached?
Madiator2011 (Work)
Madiator2011 (Work)•9mo ago
I mean if you use network storage there is no point of stoping pod as you can reuse storage on other machine
Monster
MonsterOP•9mo ago
but I am trying to stop the pod itself not the volume.
nerdylive
nerdylive•9mo ago
Just stop the pod Terminate ir And remake it later with a network storage attached
Monster
MonsterOP•9mo ago
ok
nerdylive
nerdylive•9mo ago
There's no way for it for now in your case, if you use the storage from the pod only then it's possible
Monster
MonsterOP•9mo ago
then I think I need to make a pod with local disk volume, which I can stop it any time. and restart it to access the data on the local disk volume. does it help me to transfer 200GB data faster from aws instance to runpod
nerdylive
nerdylive•9mo ago
Yes but it's better to use network volume
Monster
MonsterOP•9mo ago
because I saw you answer another post saying network volume are slow
nerdylive
nerdylive•9mo ago
U can use other machines if the whole machine is rented ( your local disk volume ) Yes maybe abit slower, we'll you have to back it up to cloud storage if you want to be safe That's another alternative
Madiator2011 (Work)
Madiator2011 (Work)•9mo ago
I mean if you deploy volume storage you are limited to single host machine and you can endup often with 0 GPU error 🙂
Monster
MonsterOP•9mo ago
good one. I will create a local disk volume and create a gpu pod upon the disk volume.
nerdylive
nerdylive•9mo ago
Hmm I'm confused
Monster
MonsterOP•9mo ago
why
nerdylive
nerdylive•9mo ago
Nvm go ahead I don't need to understand your words when you're not asking haha Creating gpu upon disk volume just not specific enough
Monster
MonsterOP•9mo ago
sorry, I might be type too fast. my requirement is typical. I need a pod running ubuntu and has volume around 300GB (local disk / network volume).
nerdylive
nerdylive•9mo ago
All Gud was just curious You've to experience the latency yourself and decide which is fine for you btw
Monster
MonsterOP•9mo ago
previously I used network volume, but when I scp 200GB data from aws to runpod, it was too slow. so I am thinking the bottlenect might be the network volume. that is why I am considering create a new pod with local disk volume
nerdylive
nerdylive•9mo ago
Ohh Howmuch speeds were you getting
Monster
MonsterOP•9mo ago
only 5BM/s means it takes 10 hour to copy data from aws to runpod
digigoblin
digigoblin•9mo ago
200GB data from AWS to RunPod must be a nightmare in terms of egress costs
nerdylive
nerdylive•9mo ago
Well they want data in not out, makes sense
Monster
MonsterOP•9mo ago
ohhh, I though egress is free
nerdylive
nerdylive•9mo ago
Free 1gb 😂 idk Howmuch are the free tiers
digigoblin
digigoblin•9mo ago
AWS egress is very expensive. RunPod does not charge data transfer.
Monster
MonsterOP•9mo ago
Fk aws
nerdylive
nerdylive•9mo ago
"Love aws"
Monster
MonsterOP•9mo ago
ok, then I will just use the same network volume, and git all the source code and re-download data from runpod.
nerdylive
nerdylive•9mo ago
Re-download data? What for
Monster
MonsterOP•9mo ago
so, the best practice in runpod is to create pod each time you use it with a network volume. and terminate it when you stop working right?
digigoblin
digigoblin•9mo ago
If you want to backup your data somewhere, I recommend using something like Hugging Face Hub which is completely free
nerdylive
nerdylive•9mo ago
Yep or just leave it on if it's running something and you want it to keep running
digigoblin
digigoblin•9mo ago
I backup all my models etc to Hugging Face Hub and then sync them to my pods
nerdylive
nerdylive•9mo ago
Yeah Howmuch the data limit on hf btw
digigoblin
digigoblin•9mo ago
Probably only a limit for private repos. I don't think public repos have a limit, TheBloke has a massive amount of data.
Monster
MonsterOP•9mo ago
I use comfyUI which needs a lot of checkpoint models to run. I have downloaded to aws instance. that is why I'd like to copy the whole repo to runpod. now, seems I have to git clone the bare comfyUI repo to the new pod, and re-download all the needed models to the pod and the network volume attached.
digigoblin
digigoblin•9mo ago
I would use network storage for this
Madiator2011 (Work)
Madiator2011 (Work)•9mo ago
Options: - use network storage - bake models into docker image - use volume storage
Monster
MonsterOP•9mo ago
sounds better. how do you do so please? like how do you bring the aws instance data to hugging face hub and how do you sync from hugging face hub to runpod? thank you so much still I found it strange to use network volume. if I terminate the pod every time. then all the ubuntu configure gone. I need to setup again each time I need to start the work, it sounds counter-intuitive.
digigoblin
digigoblin•9mo ago
Create a custom docker image and template so that you don't need to reinstall Ubuntu packages every time.
Monster
MonsterOP•9mo ago
it still not so easy to use, because the config alway changing, I need to update the docker image from time to time. I will create a pod with local disk volume then. at least next time when I start the pod, all my data and the ubuntu config still there.
nerdylive
nerdylive•9mo ago
Env variables If conditions Those can be easier to change
Monster
MonsterOP•9mo ago
thank you for the answer, but still it does work in my case, when you install a lot of python dependencies, or pytorch, cuda stuff. some of them are under /home/etc/... which is not /workspace. if I stop the pod, all of them get lost, I need to reinstall and re-do the env setup each time I stop the pod and start the pod again. no ideal why the font changed to red I am sure this is a basic and typical requirement, It must be me, using runpod in the wrong way.
digigoblin
digigoblin•9mo ago
You are using runpod in the wrong way. Don't put things that you want to persist on container disk. Container disk is temporary storage. If you want your data to persist, then put it on the persistent storage in /workspace.
Monster
MonsterOP•9mo ago
yes, I understand. but if you install pytorch, it goes to system folders like ~/etc/ or others, not the /workspace
digigoblin
digigoblin•9mo ago
Don't install pytorch etc into OS, create a venv on /workspace, activate venv and install the stuff there.
Monster
MonsterOP•9mo ago
ok, do install anything in virtual env, not using system-wise installation. right? got your idea. thank you for the help
digigoblin
digigoblin•9mo ago
Grand
Monster
MonsterOP•9mo ago
last thing. back to the 200GB data transfer stuff. I will put the aws data to S3 and use runpod cloud sync, does it sound reasonable or not to you? I have to admit the egrass cost incured, as all the downloaded stuff are there in aws instance, need to bring it out if I want to migrate to runpod anyway
Solution
digigoblin
digigoblin•9mo ago
I would use rclone rather than cloud sync. Cloud sync is built on stop of rclone anyway.
Monster
MonsterOP•9mo ago
ok. thanks. great help. I will mark this as resolved. and the warm support really keep me in with runpod, and keep other options out. thank you

Did you find this page helpful?