AC_pill
RRunPod
•Created by AC_pill on 5/15/2024 in #⚡|serverless
Model loadtime affected if PODs are running on the same server
interesting, I'll post that, but leave this open so the other users can see, I'm already seeing a lot of complains on the same, so it's getting hard to push to production.
Yes, software cap on docker host.
16 replies
RRunPod
•Created by AC_pill on 5/15/2024 in #⚡|serverless
Model loadtime affected if PODs are running on the same server
and it's clear now it's a hardware issue
16 replies
RRunPod
•Created by AC_pill on 5/15/2024 in #⚡|serverless
Model loadtime affected if PODs are running on the same server
yeah the problem is on Serverless and PODs, I'm stress testing
16 replies
RRunPod
•Created by AC_pill on 5/15/2024 in #⚡|serverless
Model loadtime affected if PODs are running on the same server
This is extreme important to share in a board, so we can see the problem repeats
16 replies
RRunPod
•Created by AC_pill on 5/15/2024 in #⚡|serverless
Model loadtime affected if PODs are running on the same server
there is no support here?
16 replies
RRunPod
•Created by AC_pill on 5/15/2024 in #⚡|serverless
Model loadtime affected if PODs are running on the same server
I was using Global before, the problem was worse, and now the same region GPUs are showing discrepancy as well. There is no uniformity on inference power.
Maybe cap?
16 replies
RRunPod
•Created by AC_pill on 5/15/2024 in #⚡|serverless
Model loadtime affected if PODs are running on the same server
Do we have any answers here?
16 replies
RRunPod
•Created by AC_pill on 4/30/2024 in #⚡|serverless
Idle timeout not working
Ha, perfect, back to normal, thanks @flash-singh
9 replies
RRunPod
•Created by AC_pill on 4/30/2024 in #⚡|serverless
Idle timeout not working
ok, so If I add queue delay that will make the endpoint stay up for the whole period?
9 replies
RRunPod
•Created by AC_pill on 4/30/2024 in #⚡|serverless
Idle timeout not working
task takes 40 seconds, instead of staying up for 180 seconds up, it goes down after 40s
9 replies
RRunPod
•Created by AC_pill on 2/23/2024 in #⚡|serverless
Idle time: High Idle time on server but not getting tasks from queue
and yes, a lot of throttled workers pushes the cold start servers to unreliable state
13 replies
RRunPod
•Created by AC_pill on 2/23/2024 in #⚡|serverless
Idle time: High Idle time on server but not getting tasks from queue
13 replies
RRunPod
•Created by AC_pill on 2/23/2024 in #⚡|serverless
Idle time: High Idle time on server but not getting tasks from queue
Scaling back to 0 and max solved thanks @justin [Not Staff]
13 replies
RRunPod
•Created by AC_pill on 2/23/2024 in #⚡|serverless
Idle time: High Idle time on server but not getting tasks from queue
yeah, I'll do that, thanks
13 replies
RRunPod
•Created by AC_pill on 2/23/2024 in #⚡|serverless
Idle time: High Idle time on server but not getting tasks from queue
13 replies
RRunPod
•Created by AC_pill on 2/23/2024 in #⚡|serverless
Idle time: High Idle time on server but not getting tasks from queue
My Delay time is 60 seconds but as you can see on each execution the request execution time is 120 seconds besides the fact that the task is only 15s, but the server still hanging with the same task
13 replies
RRunPod
•Created by AC_pill on 2/22/2024 in #⚡|serverless
Is there a programatic way to activate servers on high demand / peak hours load?
@justin [Not Staff] yeap, not yet, memory leaks using ONNX models in concurrency: [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running FusedConv node. Name:'Conv_455' Status Message: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudaError; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDA failure 2: out of memory ; GPU=0 ; hostname=c67b8afabaf8 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_allocator.cc ; line=47 ; expr=cudaMalloc((void**)&p, size);
And memory is only 70% full with 3 instances
In case you are using too
42 replies
RRunPod
•Created by AC_pill on 2/22/2024 in #⚡|serverless
Is there a programatic way to activate servers on high demand / peak hours load?
if it works could be a good case for Runpod
42 replies
RRunPod
•Created by AC_pill on 2/22/2024 in #⚡|serverless
Is there a programatic way to activate servers on high demand / peak hours load?
I'll post news on how it's moving
42 replies