arken
arken
RRunPod
Created by arken on 3/7/2024 in #⛅|pods
Pod Outage
Currently taking 100x longer to pull the docker image and when it eventually builds I have an API server running inside the container and inferencing is taking an absurdly long time which is breaking production (api timeout) - Is there a current problem with the servers I should know about?
3 replies
RRunPod
Created by arken on 2/18/2024 in #⛅|pods
Transfer/Duplicate Network Volume
As per title, can I duplicate my network storage or temporarily move it to a different region so that I can have access to different GPUs more frequently? EU-RO-1 hasn't had an A100 available in a couple of days.
4 replies
RRunPod
Created by arken on 2/6/2024 in #⛅|pods
RunPod Library + API
So I am attempting to create an API to either start/stop an existing pod or create a pod and then start/stop, I currently have something somewhat working:
@app.route("/start_model", methods=["POST"])
def start_model():
resume = runpod.resume_pod(pod_id=pod_id, gpu_count=1)
return jsonify({"message": resume}), 200

@app.route("/stop_model", methods=["POST"])
def stop_model():
stop = runpod.stop_pod(pod_id)
return jsonify({"message": stop}), 200
@app.route("/start_model", methods=["POST"])
def start_model():
resume = runpod.resume_pod(pod_id=pod_id, gpu_count=1)
return jsonify({"message": resume}), 200

@app.route("/stop_model", methods=["POST"])
def stop_model():
stop = runpod.stop_pod(pod_id)
return jsonify({"message": stop}), 200
However, this worked right up until the availability for the pod required (RTX 4090 that is also compatible with our network volume hosted in EU-RO) hit zero. The pod then disappeared from our listed pods and it could no longer be stopped/resumed using the RunPod library. My question is: Is there a way around this that you know of? I know that I can also spin up a pod using this library but it breaks once I begin attempting to specify the particular network volume/docker image (I assume because it doesn't know that only some GPUs in the Secure Cloud are compatible with the network volume?). If the RunPod python library isn't officially supported, is there any other ways through the official RunPod API to achieve what I am trying to do? As a side question, am I able to also run a command, after the docker image has finished building and the pod is ready, to spin up the Python API within the pod that I use to talk to my model for inferencing? That way I can remove the manual step of once I start the container I then have to enter the container and start the API myself. Any help is super appreciated! If you need any extra information feel free to ask. (Just incase I don't get any notifications pls @ me - I'd love to reply to you swiftly)
3 replies
RRunPod
Created by arken on 1/11/2024 in #⚡|serverless
SCP
scp -P 22 -i ~/.ssh/id_ed25519 root@213.173.102.159:/mmdetection/data/results/vis/results.tar.gz ./ root@213.173.102.159's password: I am attempting to retrieve some files from my pod via SCP but I'm being prompted for a password, why is this happening?
13 replies