RunPod•12mo ago

why don't I have a stop option, only terminate option available

Solution:

I would use rclone rather than cloud sync. Cloud sync is built on stop of rclone anyway.

52 Replies

You don't get a stop button when you use a network volume because its redundant. You can simply create a new pod with the network volume attached.

MonsterOP•12mo ago

do you mean create a new pod with local disk volume attached?

Madiator2011 (Work)•12mo ago

I mean if you use network storage there is no point of stoping pod as you can reuse storage on other machine

MonsterOP•12mo ago

but I am trying to stop the pod itself not the volume.

Jason•12mo ago

Just stop the pod Terminate ir And remake it later with a network storage attached

MonsterOP•12mo ago

Jason•12mo ago

There's no way for it for now in your case, if you use the storage from the pod only then it's possible

MonsterOP•12mo ago

then I think I need to make a pod with local disk volume, which I can stop it any time. and restart it to access the data on the local disk volume. does it help me to transfer 200GB data faster from aws instance to runpod

Jason•12mo ago

Yes but it's better to use network volume

MonsterOP•12mo ago

because I saw you answer another post saying network volume are slow

Jason•12mo ago

U can use other machines if the whole machine is rented ( your local disk volume ) Yes maybe abit slower, we'll you have to back it up to cloud storage if you want to be safe That's another alternative

Madiator2011 (Work)•12mo ago

I mean if you deploy volume storage you are limited to single host machine and you can endup often with 0 GPU error 🙂

MonsterOP•12mo ago

good one. I will create a local disk volume and create a gpu pod upon the disk volume.

Jason•12mo ago

Hmm I'm confused

MonsterOP•12mo ago

why

Jason•12mo ago

Nvm go ahead I don't need to understand your words when you're not asking haha Creating gpu upon disk volume just not specific enough

MonsterOP•12mo ago

sorry, I might be type too fast. my requirement is typical. I need a pod running ubuntu and has volume around 300GB (local disk / network volume).

Jason•12mo ago

All Gud was just curious You've to experience the latency yourself and decide which is fine for you btw

MonsterOP•12mo ago

previously I used network volume, but when I scp 200GB data from aws to runpod, it was too slow. so I am thinking the bottlenect might be the network volume. that is why I am considering create a new pod with local disk volume

Jason•12mo ago

Ohh Howmuch speeds were you getting

MonsterOP•12mo ago

only 5BM/s means it takes 10 hour to copy data from aws to runpod

digigoblin•12mo ago

200GB data from AWS to RunPod must be a nightmare in terms of egress costs

Jason•12mo ago

Well they want data in not out, makes sense

MonsterOP•12mo ago

ohhh, I though egress is free

Jason•12mo ago

Free 1gb 😂 idk Howmuch are the free tiers

digigoblin•12mo ago

AWS egress is very expensive. RunPod does not charge data transfer.

MonsterOP•12mo ago

Fk aws

Jason•12mo ago

"Love aws"

MonsterOP•12mo ago

ok, then I will just use the same network volume, and git all the source code and re-download data from runpod.

Jason•12mo ago

Re-download data? What for

MonsterOP•12mo ago

so, the best practice in runpod is to create pod each time you use it with a network volume. and terminate it when you stop working right?

digigoblin•12mo ago

If you want to backup your data somewhere, I recommend using something like Hugging Face Hub which is completely free

Jason•12mo ago

Yep or just leave it on if it's running something and you want it to keep running

digigoblin•12mo ago

I backup all my models etc to Hugging Face Hub and then sync them to my pods

Jason•12mo ago

Yeah Howmuch the data limit on hf btw

digigoblin•12mo ago

Probably only a limit for private repos. I don't think public repos have a limit, TheBloke has a massive amount of data.

MonsterOP•12mo ago

I use comfyUI which needs a lot of checkpoint models to run. I have downloaded to aws instance. that is why I'd like to copy the whole repo to runpod. now, seems I have to git clone the bare comfyUI repo to the new pod, and re-download all the needed models to the pod and the network volume attached.

digigoblin•12mo ago

I would use network storage for this

Madiator2011 (Work)•12mo ago

Options: - use network storage - bake models into docker image - use volume storage

MonsterOP•12mo ago

sounds better. how do you do so please? like how do you bring the aws instance data to hugging face hub and how do you sync from hugging face hub to runpod? thank you so much still I found it strange to use network volume. if I terminate the pod every time. then all the ubuntu configure gone. I need to setup again each time I need to start the work, it sounds counter-intuitive.

digigoblin•12mo ago

Create a custom docker image and template so that you don't need to reinstall Ubuntu packages every time.

MonsterOP•12mo ago

it still not so easy to use, because the config alway changing, I need to update the docker image from time to time. I will create a pod with local disk volume then. at least next time when I start the pod, all my data and the ubuntu config still there.

Jason•12mo ago

Env variables If conditions Those can be easier to change

MonsterOP•12mo ago

thank you for the answer, but still it does work in my case, when you install a lot of python dependencies, or pytorch, cuda stuff. some of them are under /home/etc/... which is not /workspace. if I stop the pod, all of them get lost, I need to reinstall and re-do the env setup each time I stop the pod and start the pod again. no ideal why the font changed to red I am sure this is a basic and typical requirement, It must be me, using runpod in the wrong way.

digigoblin•12mo ago

You are using runpod in the wrong way. Don't put things that you want to persist on container disk. Container disk is temporary storage. If you want your data to persist, then put it on the persistent storage in /workspace.

MonsterOP•12mo ago

yes, I understand. but if you install pytorch, it goes to system folders like ~/etc/ or others, not the /workspace

digigoblin•12mo ago

Don't install pytorch etc into OS, create a venv on /workspace, activate venv and install the stuff there.

MonsterOP•12mo ago

ok, do install anything in virtual env, not using system-wise installation. right? got your idea. thank you for the help

digigoblin•12mo ago

Grand

MonsterOP•12mo ago

last thing. back to the 200GB data transfer stuff. I will put the aws data to S3 and use runpod cloud sync, does it sound reasonable or not to you? I have to admit the egrass cost incured, as all the downloaded stuff are there in aws instance, need to bring it out if I want to migrate to runpod anyway

Solution

digigoblin•12mo ago

I would use rclone rather than cloud sync. Cloud sync is built on stop of rclone anyway.

MonsterOP•12mo ago

ok. thanks. great help. I will mark this as resolved. and the warm support really keep me in with runpod, and keep other options out. thank you

Gaming

Programming

why don't I have a stop option, only terminate option available

Did you find this page helpful?