flash-singh
RRunPod
•Created by zaid on 3/18/2025 in #⚡|serverless
Is it possible to response with Transfer-Encoding: Chunked
pm me endpoint id, here is vllm example you can see https://github.com/pandyamarut/vllm-load-balancer
5 replies
RRunPod
•Created by zaid on 3/18/2025 in #⚡|serverless
Is it possible to response with Transfer-Encoding: Chunked
we have a new type of serverless coming out in a month which will allow you bring any http server, so yes then you can stream http responses as you like from a http server running in your workers, let me know if you want to give this a try, its being tested currently in production
5 replies
RRunPod
•Created by testymctestface on 12/27/2024 in #⚡|serverless
Running worker automatically once docker image has been pulled
we are going live with priority flashboot later this month but its meant to be a hidden feature we enable for endpoints with high volume, learn from that and eventually figure out how we can price it
63 replies
RRunPod
•Created by Xqua on 3/11/2025 in #⚡|serverless
Do you cache docker layers to avoid repulling ?
- all data centers support local layer caching
- some data centers with network storage also offer caching in network storage for serverless
- local cache is always used if the changes you make to endpoint don't change gpu, cuda version or some other attributes which require changing the server, otherwise we keep the worker and have it update in-place
9 replies
RRunPod
•Created by koop7450 on 3/9/2025 in #⚡|serverless
Use SDK to create Network Storage Volumes for Serverless Endpoints
8 replies
RRunPod
•Created by koop7450 on 3/9/2025 in #⚡|serverless
Use SDK to create Network Storage Volumes for Serverless Endpoints
you can use the new api https://rest.runpod.io
8 replies
RRunPod
•Created by hakankaan on 3/5/2025 in #⚡|serverless
Can't get Warm/Cold status
nope its similar but scale for request count is easily determined so we can scale up faster if it falls behind since its simple math with count
11 replies
RRunPod
•Created by jim on 1/15/2025 in #⚡|serverless
Serverless H200?
we recently lowered them, ill get that fixed
11 replies
RRunPod
•Created by jim on 1/15/2025 in #⚡|serverless
Serverless H200?
$0.00155/s or $5.58/hr
11 replies
RRunPod
•Created by hakankaan on 3/5/2025 in #⚡|serverless
Can't get Warm/Cold status
yes prioritized but not guaranteed, are you using queue delay scale?
11 replies
RRunPod
•Created by Lattus on 1/22/2025 in #⚡|serverless
Serverless deepseek-ai/DeepSeek-R1 setup?
you can stream serverless without worrying about request times, look into streaming section, also serverless max timeout is 5mins, proxy is about 90s
108 replies
RRunPod
•Created by hakankaan on 3/5/2025 in #⚡|serverless
Can't get Warm/Cold status
as far as for this request, we have it in backlog to allow you to get warm state along with running, idle, etc
11 replies
RRunPod
•Created by hakankaan on 3/5/2025 in #⚡|serverless
Can't get Warm/Cold status
we do prioritize warm / flashbooted workers first over cold ones
11 replies
RRunPod
•Created by drycoco on 3/4/2025 in #⚡|serverless
Serverless git integration rollback
@drycoco The #1 reason we implemented github integration was to aim for production use cases.
For a production use case I would organize the github repo with dev and prod branch, and then make an endpoint for each. I would test dev branch and its releases, at some point merge dev into prod branch once dev endpoint has been validated. Currently this should get you closer to a dev > production pipeline.
In future we are planning to introduce the following:
- release rollback
- cancel builds if you can't already
- test cases with builds (builds will fail if test cases fail)
10 replies
RRunPod
•Created by sahir on 2/25/2025 in #⚡|serverless
queue delay times
thats about right, depends on workload and capacity, for h100s thats really good if your p50 is hitting flashboot
27 replies
RRunPod
•Created by sahir on 2/25/2025 in #⚡|serverless
queue delay times
thats just flashboot, anything over 10s should be your ideal cold start, is your model baked into the container image?
27 replies
RRunPod
•Created by sahir on 2/25/2025 in #⚡|serverless
queue delay times
your cold starts are high, are you loading model from network volume or is the model just too big?
27 replies