AC_pill Comments - Answer Overflow

AC_pill

•Created by AC_pill on 5/15/2024 in #⚡｜serverless

Model loadtime affected if PODs are running on the same server

interesting, I'll post that, but leave this open so the other users can see, I'm already seeing a lot of complains on the same, so it's getting hard to push to production. Yes, software cap on docker host.

16 replies

RRunPod

•Created by AC_pill on 5/15/2024 in #⚡｜serverless

Model loadtime affected if PODs are running on the same server

and it's clear now it's a hardware issue

16 replies

RRunPod

•Created by AC_pill on 5/15/2024 in #⚡｜serverless

Model loadtime affected if PODs are running on the same server

yeah the problem is on Serverless and PODs, I'm stress testing

16 replies

RRunPod

•Created by AC_pill on 5/15/2024 in #⚡｜serverless

Model loadtime affected if PODs are running on the same server

This is extreme important to share in a board, so we can see the problem repeats

16 replies

RRunPod

•Created by AC_pill on 5/15/2024 in #⚡｜serverless

Model loadtime affected if PODs are running on the same server

there is no support here?

16 replies

RRunPod

•Created by AC_pill on 5/15/2024 in #⚡｜serverless

Model loadtime affected if PODs are running on the same server

I was using Global before, the problem was worse, and now the same region GPUs are showing discrepancy as well. There is no uniformity on inference power. Maybe cap?

16 replies

RRunPod

•Created by AC_pill on 5/15/2024 in #⚡｜serverless

Model loadtime affected if PODs are running on the same server

Do we have any answers here?

16 replies

RRunPod

•Created by AC_pill on 4/30/2024 in #⚡｜serverless

Idle timeout not working

Ha, perfect, back to normal, thanks @flash-singh

9 replies

RRunPod

•Created by AC_pill on 4/30/2024 in #⚡｜serverless

Idle timeout not working

ok, so If I add queue delay that will make the endpoint stay up for the whole period?

9 replies

RRunPod

•Created by AC_pill on 4/30/2024 in #⚡｜serverless

Idle timeout not working

ahhh

9 replies

RRunPod

•Created by AC_pill on 4/30/2024 in #⚡｜serverless

Idle timeout not working

task takes 40 seconds, instead of staying up for 180 seconds up, it goes down after 40s

9 replies

RRunPod

•Created by AC_pill on 2/23/2024 in #⚡｜serverless

Idle time: High Idle time on server but not getting tasks from queue

and yes, a lot of throttled workers pushes the cold start servers to unreliable state

13 replies

RRunPod

•Created by AC_pill on 2/23/2024 in #⚡｜serverless

Idle time: High Idle time on server but not getting tasks from queue

13 replies

RRunPod

•Created by AC_pill on 2/23/2024 in #⚡｜serverless

Idle time: High Idle time on server but not getting tasks from queue

Scaling back to 0 and max solved thanks @justin [Not Staff]

13 replies

RRunPod

•Created by AC_pill on 2/23/2024 in #⚡｜serverless

Idle time: High Idle time on server but not getting tasks from queue

yeah, I'll do that, thanks

13 replies

RRunPod

•Created by AC_pill on 2/23/2024 in #⚡｜serverless

Idle time: High Idle time on server but not getting tasks from queue

13 replies

RRunPod

•Created by AC_pill on 2/23/2024 in #⚡｜serverless

Idle time: High Idle time on server but not getting tasks from queue

My Delay time is 60 seconds but as you can see on each execution the request execution time is 120 seconds besides the fact that the task is only 15s, but the server still hanging with the same task

13 replies

RRunPod

•Created by AC_pill on 2/22/2024 in #⚡｜serverless

Is there a programatic way to activate servers on high demand / peak hours load?

@justin [Not Staff] yeap, not yet, memory leaks using ONNX models in concurrency: [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running FusedConv node. Name:'Conv_455' Status Message: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 2: out of memory ; GPU=0 ; hostname=c67b8afabaf8 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_allocator.cc ; line=47 ; expr=cudaMalloc((void**)&p, size); And memory is only 70% full with 3 instances In case you are using too

42 replies

RRunPod

•Created by AC_pill on 2/22/2024 in #⚡｜serverless

Is there a programatic way to activate servers on high demand / peak hours load?

if it works could be a good case for Runpod

42 replies

RRunPod

•Created by AC_pill on 2/22/2024 in #⚡｜serverless

Is there a programatic way to activate servers on high demand / peak hours load?

I'll post news on how it's moving

42 replies

Gaming

Programming