nvidia-glx-desktop - how to make it work
need SU password for the RunPod Desktop template 'runpod/kasm-docker:cuda11'
Custom template creation with AWS ECS
When trying to git pull Comfy nodes into my RunPod, I'm met with a divergent branch error?
Running 2x H100 80gb. Does this mean my cap is now 160gb of vram?
GPU cloud template to manage network volume
Cache a Docker image to reuse
RTX3090 is available on the selection page but my stooped pod is still 0 gpu
after scheduled maintenance today on my pod i now can not connect to the TCP port I set up with venv
Issue installing Foocus Runpod
sh: 1: accelerate: not found
Thanks sh: 1: accelerate: not found ...
A way to connect to an AWS VPC
8x H100 SXM5, Error 802
# /usr/bin/nv-fabricmanager -c ~/nvswitch/fabricmanager.cfg
request to query NVSwitch device information from NVSwitch driver failed with error:Failed to load the requested module [NV_ERR_MODULE_LOAD_FAILED]
...Attaching a Network Volume fails when using GraphQL
runpod-python
's method create_pod
, GraphQL endpoint returns the following error: There are no longer any instances available with the requested specifications. Please refresh and try again.
(I have tried multiple times)
Here is the minimal code:
```
pod = runpod.create_pod(...Container logs disappear after stopping the container
src/test.sh
):
```
echo "Working ..."
sleep 10
runpodctl stop pod "$RUNPOD_POD_ID"...CUDA 12.3 support
Is there a way to get pod logs programmatically?
GPUs look available via `runpod.api.ctl_commands.get_gpu()` which aren't available.
runpod.api.ctl_commands.get_gpu()
function which calls the graphql api, but the information it returns seems inconsistent with what's available.
For example, right now. I can run...Serverless endpoint long waits in "Initializing" state
/run
have an "Initializing" status in the dashboard for up to 15 minutes. Is this a normal queue time for an endpoint with no other requests?