Need password when connect to pod using SSH
when I create a pod I try to connect to it using SSH, I follow the tutorial in the site Doc, when I try to connect it asked me for password, I even create an Ubuntu server pod and test it, it give me the same results asked me for SSH password.
Can any one help me with this problem.
my OS is Ubuntu 22.04...
Solution:
RunPod official templates have start.sh scripts that inject ssh keys into pod on community templates it’s different some of them add key some not. I made Python package that setups true ssh with password. pip install OhMyRunPod
jupyter notebook
Will connecting to the port of GPU instance show the same progress and jupyter notebook I am running or will it just start another instance with new jupyter notebook environment?
Super slow network speeds on some pods.
Some pods have really really slow network speeds and take an absolute age to install requirements initially, and then uploading / downloading files. §5 - 20 mbs takes around 10 minutes? How does one determine which of these pods has slow networks? As they are really a waste of time and money. I have to try one pod at a time to find one that is fast. This is not great.
How to change from root user?
Sorry for dumb question! I'm trying to start a project by installing packages within my venv but get a warning that i'm still the root user. I tried 'su - [username]' but couldn't find what to use as my username (tried a couple obvious things).
Solution:
you can ignore error about root
http service [port 7860] Not Ready
I broke runpod. NEW to LLM. Use runpod webui/terminal and huggingface. I receive this error regardless of GPU. I am using straight forward bloke 1 click ui. Its worked for weeks. Then recently - no. So this is EsinError/operator error. That is my error when I try to start the terminal.
I see this in Logs: AttributeError: module 'gradio.layouts' has no attribute 'all'...
I see this in Logs: AttributeError: module 'gradio.layouts' has no attribute 'all'...
runpodctl: start spot instance?
Is there a flag that can be used to create a spot instance with runpodctl? Or does it only create on-demand instances?
Solution:
NO gpu in community pods
I very often get this problem when creating a gpu pod through community cloud
502 Bad Gateway Error
Greetings. I am attempting to use The Block LLMs One-Click UI. Whenever I try to connect to HTTP Service [Port 7860] I get a 502 Bad Gateway Error. I can't figure out how to fix it. Please help.
Community Cloud - Med Speed network - Slow outbound connections < 1Mbps
Uploading trained checkpoints to huggingface or just downloading from pod to my machine is very slow.
Any ideas on how to transfer a file out without keeping the pod running ? I just need a 2Gb file saved. But the speed is around 50Kbps...
Solution:
well try another pod, different regions. and yes you cant transfer files without a pod running.
if there is some CPU pods available then you can use that to transfer your files into network storage in a region. then use that in a gpu pods...
NFS mount is not allowed in pod?
Hello, I'm trying to mount my NAS server with NFS mount.
but when I tried to mount it, I got mount.nfs Operation not permitted error.
Is there no way to mount my server by nfs or sshfs?...
Solution:
wont work as it would require fuse and fuse requires provilaged containers
Skypilot & expose-ports
Hi,
I'm using Skypilot to create and deploy Vllm on POD.
If I'm correct, currently, the template
runpod/base:0.0.2
is used when a POD is created through Skypilot. Ports 8266,6380
are exposed by this template for Ray (I guess)....Issue with deploying gpu pod in CA-MTL-3 Region
In region : CA-MTL-3, when I try to depoy big server with more resource and container disk storage 4tb,it's throwing warning that there is no available instance with this storage.is there any way to increase the quota of storage for our account.
Note:I am not talking about network drive,I am talking about container disk volume and persistent storage
2) and there is no network storage available for the above region,is thwere any way to make it available also?
For reference I have attached screenshot also...
Solution:
no its not your quota, it was the availibility of hosts
Network issue with runpod
Hey folks my pod id (sb3ogh2mqvkuy6) has become unavailable.
The error message I'm getting:
"This server has recently suffered a network outage and may have spotty network connectivity. We aim to restore connectivity soon, but you may have connection issues until it is resolved. You will not be charged during any network downtime."
Is there any guidance on how long it would take to restore this pod?...
How to deploy Llama3 on Aphrodite Engine (RunPod)
I have setup the following settings for a pod with 48 GB RAM.
1) I'm not sure how to enable Q4 cache otherwise the 5.0bpw won't fit. Any advice please? (See attached)
2) I get an error config.json can't be found, It seems like the REVISION variable has not been taken into account. Based on the docs it says:
REVISION: The HuggingFace branch name, it defaults to the main branch....
Solution:
Sure, I just made A PR. Please have a look:
https://github.com/PygmalionAI/aphrodite-engine/pull/455
Do you think you could cherry pick this fix for RunPod?...
Max number of Pods
How many Pods can I run concurrently in Secure Cloud, for a High Availability machine, 10, 100, 1000 ?
Solution:
You can run at least 1,000 pods if you're an "enterprise" client, what's your usecase for this?