flash-singh
flash-singh
RRunPod
Created by Sergio Santos on 11/19/2024 in #⚡|serverless
GPU Availability Issue on RunPod – Need Assistance
@Sergio Santos if you really need network storage, I would urge you to find a solution using 2 endpoints for long term, that way you always have fail over or split your users among them to give you higher availability than single region
4 replies
RRunPod
Created by rougsig on 11/19/2024 in #⚡|serverless
Why it can be stucked IN_PROGRESS?
i can see the jobs being taken from queue but not being reported back as soon as job is taken
15 replies
RRunPod
Created by rougsig on 11/19/2024 in #⚡|serverless
Why it can be stucked IN_PROGRESS?
the endpoint is bad, or using a bad sdk, can you make sure its updated
15 replies
RRunPod
Created by rougsig on 11/19/2024 in #⚡|serverless
Why it can be stucked IN_PROGRESS?
ping me endpoint id
15 replies
RRunPod
Created by rougsig on 11/19/2024 in #⚡|serverless
Why it can be stucked IN_PROGRESS?
all the jobs get stuck or just that one?
15 replies
RRunPod
Created by ART01 on 1/10/2024 in #⛅|pods
Multi-node training with multiple pods sharing same region.
its almost close to using wireguard, the tunnels are private
26 replies
RRunPod
Created by ART01 on 1/10/2024 in #⛅|pods
Multi-node training with multiple pods sharing same region.
speed is same as internet if across a region, but if the two pods are within the same region then it will try to use local networks and max speed you will get is around 500Mbps
26 replies
RRunPod
Created by ART01 on 1/10/2024 in #⛅|pods
Multi-node training with multiple pods sharing same region.
pods can talk to each other over an encrypted private connection without location limitations, location will impact throughput but wont hinder communication
26 replies
RRunPod
Created by ART01 on 1/10/2024 in #⛅|pods
Multi-node training with multiple pods sharing same region.
global networking is planned for launch sometime early dec, multi networking is likely Q1
26 replies
RRunPod
Created by ToonyGen on 10/23/2024 in #⚡|serverless
How to Minimize I/O Waiting Time?
yes base64 will be faster, latency with back end forth events with network storage with 2 endpoints will cause more latencies
5 replies
RRunPod
Created by zfmoodydub on 10/23/2024 in #⚡|serverless
Multiple endpoints within one handler
also you can override the start command, you can create 2 templates using same container image, override the start command to call a different .py script
7 replies
RRunPod
Created by ToonyGen on 10/23/2024 in #⚡|serverless
How to Minimize I/O Waiting Time?
another way is to use network volumes + cpu endpoint + gpu endpoint, have cpu endpoint download and upload while gpu workers handle the actual work, use network volumes to share data between cpu and gpu endpoint
5 replies
RRunPod
Created by ToonyGen on 10/23/2024 in #⚡|serverless
How to Minimize I/O Waiting Time?
pass base64 as input of the job, return base64 as output of the job to avoid downloading and uploading within the worker
5 replies
RRunPod
Created by zfmoodydub on 10/23/2024 in #⚡|serverless
Multiple endpoints within one handler
e.g.
if x == "A":
runpod.serverless.start({"handler": handlerA})
elif x == "B"
runpod.serverless.start({"handler": handlerB})
if x == "A":
runpod.serverless.start({"handler": handlerA})
elif x == "B"
runpod.serverless.start({"handler": handlerB})
7 replies
RRunPod
Created by zfmoodydub on 10/23/2024 in #⚡|serverless
Multiple endpoints within one handler
you can use an environment variable to select which handler is passed to the runpod.serverless.start({"handler": handler})
7 replies
RRunPod
Created by SyedAliii on 9/20/2024 in #⚡|serverless
Issue with Multiple instances of ComfyUI running simultaneously on Serverless
not yet
80 replies
RRunPod
Created by Keffisor21 on 10/3/2024 in #⚡|serverless
Job timeout constantly (bug?)
we found the bug, fix in progress
23 replies
RRunPod
Created by Keffisor21 on 10/3/2024 in #⚡|serverless
Job timeout constantly (bug?)
that means the job is getting lost, worker picks up the job but then stops reporting status on the job its working on, can you make sure your using the latest sdk
23 replies
RRunPod
Created by yasyf on 9/26/2024 in #⚡|serverless
524 Timeouts when waiting for new serverless messages
you reduce the number of concurrency, whats the value for that?
11 replies
RRunPod
Created by yasyf on 9/26/2024 in #⚡|serverless
524 Timeouts when waiting for new serverless messages
are you using llms? we have new sdk releases planned to reduce amount of traffic for workers and reduce 524s from cloudflare
11 replies