RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

Terminal does not work in jupyter notebook.

Hei guys, for some reason the terminal in jupyter notebooks is not working anymore, when i open the terminal, i just get an empty window in which i can't type anything. I need to use the web terminal for any script executions

Increase spending limit

I keep hitting my $40/hour limit and need this increased. How can I do this?

Hi,

I am trying to send a file from my local system to my pod volume using this command rsync -e "ssh -p 10234 -i /home/dell/ssh_keys/ssh_key_dell_Latitude_A4213.txt" -avP /home/dell/exp10/conda_env.zip [email protected]:/workspace/testing/ but when I run this I get this error ash: line 1: rsync: command not found rsync: connection unexpectedly closed (0 bytes received so far) [sender]...

Jupyter notebook - does it keep on running?

I am using Jupyter notebook on my pod, can I close the tab, will it keep running?

Open-WebUI 404 Error

When using the Better Ollama CUDA 12 template, and following the instructions found here: blog.runpod.io/run-llama-3-1-405b-with-ollama-a-step-by-step-guide, getting an error when posting a query using open-webui: Ollama: 404, message='Not Found', url='https://<snip>-11434.proxy.runpod.net/api/chat' Interestingly enough, replacing the open-webui localhost URL with the above URL works well with cURL using network diagnostics. Wanted to replicate the issue on a less expensive server, but can no longer find the template....
No description

Why is upload speed so slow?

A week back when I downloaded a 6BG checkpoint, it took 1-2 hours. Now it's telling me it'll take 12 hours. Is there a reason for this?

GPU errored, machine dead

Search 0 matches 2024-09-04T11:12:09Z stop container 2024-09-04T11:12:44Z remove container...

Slow Container Image download

Two EU datacenters are experiencing extreme slowdown during docker container image download, EU-SE-1 and EU-RO-1, to the point where our scaler can't keep up with load spikes because it takes > 30 minutes to start up a pod. This needs to be resolved as it's directly costing us money, we can't properly scale, causing our queue to keep spiking and building. Alongside being forced to use on-demand vs spot because of the slow download speed....

Can I specify CUDA version for a pod?

nvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.4, please update your driver to a newer version, or use an earlier cuda container: unknown vLLM based container image fail to start...
Solution:
In deploy click Filters and you can specify Cuda version there.

Pods wont start

Looks like auth to hugging face failed, cannot launch any pods - tried with multiple configs, same result. Clicking on start web terminal does nothing, sometimes connect to jupyter button appears but does not do anything. Pod ID: 5d15c6q1grfm6p ``` .254316737Z ...done....

create POD with full Intel Sapphire Rapids CPU chip for Parallel Algorithm scalability test.

Hi, I usually create PODs for GPU tasks, accessing through ssh, so I am very familiar in that sense. But now we need to rent a POD with just a modern Intel CPU fully available for us. In particular, we need one with Intel Sapphire Rapids architecture, so that it supports AMX matrix instructions. This is for a parallel CPU algorithm for which we need to obtain performance and energy consumption results (plots). I went to the menus of runpod but i could not find options on the CPU side, neither exact info of the CPU model of the pod. Am i missing something too obvious? Thanks in advance...

My pod had been stuck during initialization

ogw47gdxzk3a26 - stuck during image pulling. Could you checkout what happened and handle that issue, because our infra is not ready to handle this kind of your errors.

Creating instances with a bunch of open ports

I'm using several gpu pods. I faced the the lack of open ports. afaik, while creating instances, the number of ports is restricted. Only support at most 10 ports. How can I get 20 ro 30 ports while creating an instance?...

creating instance from an image file

i want to make an image from an image file (faster than using registry), any idea how to do it? i prefer to use the runpod storage, because it is faster that way.

Creating pods with different GPU types.

Hello, Can I create pods with different GPU types? Say I want to create a pod with 2 A40s and 1 RTX A5000. I asked because I there is a gpuTypeIdList property on the runpod graphql specs. Also, it would be amazing to have that feature. Thanks!

Slowish Downloads

I'm trying to setup a pod running ComfyUI for Flux at the moment, and it's going to take 30-40 mins just to download the models with the speed it's running at. ```Downloading 1 model(s) to /workspace//storage/stable_diffusion/models/unet... Downloading: https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/flux1-dev.safetensors 0K .......... .......... .......... .......... .......... 0% 10.9M 34m23s...

can't cloud sync with Backblaze B2

I need help, I can't do cloud sync with Backblaze B2 I put the key ID and the application key and the bucket root path but it says Something went wrong!...

How do i deploy a Worker with a Pod?

I have deployed a worker with a Serverless deployment, now i expected to be able to deploy the exact same image to a Pod and be able to have an endpoint URL to make a similar Worker request, but i'm not having success? I am currently using the following as the initial entrypoint for handler.py...
runpod.serverless.start({"handler": handler})
runpod.serverless.start({"handler": handler})
Is there any doc that discusses how to get a Serverless Worker deployed to a Pod? thx....

Funds not appearing in account balance

Hi - I deposited 300 dollars in my account. I got emailed the receipt. But the funds haven't been deposited as credit - could you look into this please?

Very inconsistent performance

I recently started using Runpod - and am a fan of the setup simplicity and pricing. I have recently noticed a huge amount of inconsistency in performance with identical training runs taking up to 3x longer to finish. I am on the secure cloud. Do you know why this may be?