flash-singh Comments - Answer Overflow

flash-singh

Posts Comments

RRunPod

•Created by jvm-cb on 6/27/2024 in #⚡｜serverless

Maximum queue size

yes

62 replies

RRunPod

•Created by zaid on 3/18/2025 in #⚡｜serverless

Is it possible to response with Transfer-Encoding: Chunked

pm me endpoint id, here is vllm example you can see https://github.com/pandyamarut/vllm-load-balancer

5 replies

RRunPod

•Created by zaid on 3/18/2025 in #⚡｜serverless

Is it possible to response with Transfer-Encoding: Chunked

we have a new type of serverless coming out in a month which will allow you bring any http server, so yes then you can stream http responses as you like from a http server running in your workers, let me know if you want to give this a try, its being tested currently in production

5 replies

RRunPod

•Created by testymctestface on 12/27/2024 in #⚡｜serverless

Running worker automatically once docker image has been pulled

we are going live with priority flashboot later this month but its meant to be a hidden feature we enable for endpoints with high volume, learn from that and eventually figure out how we can price it

63 replies

RRunPod

•Created by Xqua on 3/11/2025 in #⚡｜serverless

Do you cache docker layers to avoid repulling ?

- all data centers support local layer caching - some data centers with network storage also offer caching in network storage for serverless - local cache is always used if the changes you make to endpoint don't change gpu, cuda version or some other attributes which require changing the server, otherwise we keep the worker and have it update in-place

9 replies

RRunPod

•Created by neural-soupe on 3/9/2025 in #⚡｜serverless

Use SDK to create Network Storage Volumes for Serverless Endpoints

https://rest.runpod.io/v1/docs#tag/network-volumes/POST/networkvolumes

8 replies

RRunPod

•Created by neural-soupe on 3/9/2025 in #⚡｜serverless

Use SDK to create Network Storage Volumes for Serverless Endpoints

you can use the new api https://rest.runpod.io

8 replies

RRunPod

•Created by hakankaan on 3/5/2025 in #⚡｜serverless

Can't get Warm/Cold status

nope its similar but scale for request count is easily determined so we can scale up faster if it falls behind since its simple math with count

11 replies

RRunPod

•Created by jim on 1/15/2025 in #⚡｜serverless

Serverless H200?

we recently lowered them, ill get that fixed

11 replies

RRunPod

•Created by jim on 1/15/2025 in #⚡｜serverless

Serverless H200?

$0.00155/s or $5.58/hr

11 replies

RRunPod

•Created by jim on 1/15/2025 in #⚡｜serverless

Serverless H200?

11 replies

RRunPod

•Created by hakankaan on 3/5/2025 in #⚡｜serverless

Can't get Warm/Cold status

yes prioritized but not guaranteed, are you using queue delay scale?

11 replies

RRunPod

•Created by Lattus on 1/22/2025 in #⚡｜serverless

Serverless deepseek-ai/DeepSeek-R1 setup?

you can stream serverless without worrying about request times, look into streaming section, also serverless max timeout is 5mins, proxy is about 90s

108 replies

RRunPod

•Created by hakankaan on 3/5/2025 in #⚡｜serverless

Can't get Warm/Cold status

as far as for this request, we have it in backlog to allow you to get warm state along with running, idle, etc

11 replies

RRunPod

•Created by hakankaan on 3/5/2025 in #⚡｜serverless

Can't get Warm/Cold status

we do prioritize warm / flashbooted workers first over cold ones

11 replies

RRunPod

•Created by AndiLeni on 2/28/2025 in #⚡｜serverless

CPU pod network volume

its live

4 replies

RRunPod

•Created by drycoco on 3/4/2025 in #⚡｜serverless

Serverless git integration rollback

@drycoco The #1 reason we implemented github integration was to aim for production use cases. For a production use case I would organize the github repo with dev and prod branch, and then make an endpoint for each. I would test dev branch and its releases, at some point merge dev into prod branch once dev endpoint has been validated. Currently this should get you closer to a dev > production pipeline. In future we are planning to introduce the following: - release rollback - cancel builds if you can't already - test cases with builds (builds will fail if test cases fail)

10 replies

RRunPod

•Created by sahir on 2/25/2025 in #⚡｜serverless

queue delay times

thats about right, depends on workload and capacity, for h100s thats really good if your p50 is hitting flashboot

27 replies

RRunPod

•Created by sahir on 2/25/2025 in #⚡｜serverless

queue delay times

thats just flashboot, anything over 10s should be your ideal cold start, is your model baked into the container image?

27 replies

RRunPod

•Created by sahir on 2/25/2025 in #⚡｜serverless

queue delay times

your cold starts are high, are you loading model from network volume or is the model just too big?

27 replies

Gaming

Programming