Issues with changing file permission to 400
I have a ssh key that I'm trying to set the permission as 400 by running the following command
chmod 400 id_rsa_git
upon running ls -l
I'm seeing the permission as 444...Why FileBrowser cant be opened?
Why does a "HTTP Error 404" appear when I click on HTTP Service [Port 4040]? Even though the log output says "No module named 'ip_adapter'",.I want to check if the 'ip_adapter' file has been copied from the Docker Hub image to the 'runpod-volume/my project' directory. I do have this file in my local project.
Are there very few GPUs that support CUDA 11.8?
When I create a GPU Pod on Secure Cloud, if I select the CUDA 11.8 version, there are very few GPUs available. However, when I choose 'any', there are many more GPUs available for deployment. My project currently requires the use of CUDA 11.8.
GPU speed getting slower and slower
Yesterday I was using a 3090 and it was writing at 3.5 it/s which was great, but today im using a 3090 TI and it started at 2.5 it/s and is now slowing to 1.4 it/s and still going down... Wasting my money.
How do I run Docker in a RunPod environment?
I want to run the docker in the gpu pod,but the pod may be a docker container. How can I run the doker in it.
[ONNXRuntimeError] when running ComfyUI
I'm a total noob when it comes these. I was able to run a vid2vid workflow for about a week now, no issues, but from yesterday, I'm running into this issue and I have no clue what to do. Anyone be able to help?
Running sshuttle in my pod
I am trying to connect my pod to my k8s cluster and I need to work with sshuttle -- I need iptables DNAT and REDIRECT modules installed.
Is there a way to enable this on my instance? Alternatively I could also use nftables or TPROXY...
How to stop a Pod ?
The model has not been fully uploaded yet, and I would like to continue the upload tomorrow. If I don't stop the pod, it will continue to incur costs.
Network issues with 3090 pods
Pods
tfc6texf3xrkip
and 33laj8z8yzm0du
both have borked networking. The download speeds are very slow, and I get issues like these:
```
Collecting fairseq@ git+https://github.com/pzelasko/fairseq@ba2f4bae68107c9d8a838f19611f951e718577b4 (from -r requirements.txt (line 60))
Cloning https://github.com/pzelasko/fairseq (to revision ba2f4bae68107c9d8a838f19611f951e718577b4) to /tmp/pip-install-wt38ex46/fairseq_32d4b5f22eec428196d0a086873b7d52...Runpod error starting container
2024-03-07T14:40:19Z error starting container: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
Inconsistency detected by ld.so: ../sysdeps/x86_64/dl-machine.h: 534: elf_machine_rela_relative: Assertion `ELFW(R_TYPE) (reloc->r_info) == R_X86_64_RELATIVE' failed!
nvidia-container-cli: detection error: driver rpc error: failed to process request: unknown
I restart pod but still error...
Runpod SD ComfyUI Template missing??
Where did the "Runpod SD ComfyUI" template go? Can anyone help? I've been using it extensively for a month now, and suddenly it's gone?
Pod Outage
Currently taking 100x longer to pull the docker image and when it eventually builds I have an API server running inside the container and inferencing is taking an absurdly long time which is breaking production (api timeout) - Is there a current problem with the servers I should know about?
Cuda - Out of Memory error when the 2nd GPU not utilized
I have a pod with 2 x 80 GB PCIe and I am trying to load and run Smaug-72B-v0.1 LLM.
The problem is, I can download it and when I try to load it it gives me CUDA Out of memory exception while the 2nd GPU memory is empty. I was expecting that when I choose 2 x GPU to run I can use the sum capacity. If you check screenshot, the 2nd GPU memory not used at all when exception is fired. Also, there is no GPU instances with that big RAM so I have to choose 2x or 3x. How i can fix it? Thanks
The exception is:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 384.00 MiB. GPU 0 has a total capacty of 79.11 GiB of which 168.50 MiB is free. Process 3311833 has 78.93 GiB memory in use. Of the allocated memory 78.31 GiB is allocated by PyTorch, and 189.60 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
...Backdrop Build V3 Credits missing
Hi team,
I hope this message finds you well. I am writing to follow up on the recent offer I received to sign up for RunPod and connect it with my Build account. As instructed, I have successfully signed up for RunPod, ensured that my RunPod account is connected to the same email as my Build account, and marked myself as “interested in” or “building with” on the partner page.
However, it has been over 48 hours since I completed these steps, and I have yet to see the promised credits applied to my RunPod account. I am reaching out to inquire about the status of this offer and to kindly request assistance in ensuring the credits are credited to my account as promised....
When on 4000 ADA, it's RANDOMLY NOT DETECTING GPU!
When on 4000 ADA, it's RANDOMLY NOT DETECTING GPU! Yesterday I set it up and it's okay. Today I set it up and not detecting GPU. NVIDIA-SMI says had it.
Why is that???
It doesn't comfortable if I need to always install torch cuda everytime, waste of time and money....
cant get my pod to work right
hi im new to runpod im trying to add models and loras to my runpod as well as trying to install runpodctl but i cant figure it out
when i try to follow the tutorial for the runpodctl i keep getting errors.
Help would be greatly appreciated thank you in advance...
Can i still access the data of my GPU pod once my account run out of funds
I have a telegram bot running in a GPU pod.
It has a postgres database container, it stores all the data in the postgres database.
I earlier had it setup in the CPU pod but when i run out of funds, It deletes the pod and deletes all the data of my database. Now i switched to GPU pod so it has data persistency, but i was wondering what would happen if i run out of funds. Will i still be able to ssh in to my machine and get the data from my database ? or i can not do that....