!HeartPattern
!HeartPattern
RRunPod
Created by !HeartPattern on 4/4/2024 in #⛅|pods
Connection unexpectedly abort
We are running an GRPC server inside runpods and 1~2% of request abort unexpectedly. Our API's log complain that downstream disconnect and I suspect RunPod NAT abort connection in certain situation. Is there any connection timeout or other policy for TCP connection, or is it just an unstability of runpod infrastructure?
2 replies
RRunPod
Created by !HeartPattern on 2/19/2024 in #⛅|pods
Cannot create pods even there are available gpus
When I query as following,
query {
gpuTypes(input: {id: "NVIDIA L4"}) {
id
lowestPrice(input: {
gpuCount: 1
secureCloud: true
}) {
gpuName
rentedCount
totalCount
rentalPercentage
}
}
}
query {
gpuTypes(input: {id: "NVIDIA L4"}) {
id
lowestPrice(input: {
gpuCount: 1
secureCloud: true
}) {
gpuName
rentedCount
totalCount
rentalPercentage
}
}
}
It return following result, which seems like there are some available L4 gpus.
{
"data": {
"gpuTypes": [
{
"id": "NVIDIA L4",
"lowestPrice": {
"gpuName": "L4",
"rentedCount": 105,
"totalCount": 136,
"rentalPercentage": 0.7721
}
}
]
}
}
{
"data": {
"gpuTypes": [
{
"id": "NVIDIA L4",
"lowestPrice": {
"gpuName": "L4",
"rentedCount": 105,
"totalCount": 136,
"rentalPercentage": 0.7721
}
}
]
}
}
However, if I create L4 pods with following query,
mutation {
podFindAndDeployOnDemand(
input: {
cloudType: SECURE
gpuCount: 1
volumeInGb: 40
containerDiskInGb: 40
minVcpuCount: 2
minMemoryInGb: 5
gpuTypeId: "NVIDIA L4"
name: "Test"
imageName: "runpod/tensorflow"
dockerArgs: ""
ports: "8888/http"
volumeMountPath: "/workspace"
}
) {
id
}
}
mutation {
podFindAndDeployOnDemand(
input: {
cloudType: SECURE
gpuCount: 1
volumeInGb: 40
containerDiskInGb: 40
minVcpuCount: 2
minMemoryInGb: 5
gpuTypeId: "NVIDIA L4"
name: "Test"
imageName: "runpod/tensorflow"
dockerArgs: ""
ports: "8888/http"
volumeMountPath: "/workspace"
}
) {
id
}
}
It return following error
{
"errors": [
{
"message": "There are no longer any instances available with the requested specifications. Please refresh and try again.",
"path": [
"podFindAndDeployOnDemand"
],
"extensions": {
"code": "RUNPOD"
}
}
],
"data": {
"podFindAndDeployOnDemand": null
}
}
{
"errors": [
{
"message": "There are no longer any instances available with the requested specifications. Please refresh and try again.",
"path": [
"podFindAndDeployOnDemand"
],
"extensions": {
"code": "RUNPOD"
}
}
],
"data": {
"podFindAndDeployOnDemand": null
}
}
1 replies
RRunPod
Created by !HeartPattern on 2/1/2024 in #⛅|pods
Managing savings plan using graphql API
Hi Runpod teams! I'm currently using multiple GPU pods in secure cloud and planning to utilize savings plan. We automate provisioning pods using graphql API and trying to integrate savings plan with this system. We frequently update model server running on runpod and our system handle this upgrade via delete and recreating new pod with new docker parameters. So my question is if pods with savings plan is deleted, does that plan applied to another pod? For example, if I have pod A with savings plan, then delete A and create new pod B, does that savings plan automatically attach to B, or is there any graphql API that we attach existing savings plan to new or existing other pods? Also, if I have two pod A with savings plan and B without savings plan, when I delete pod A, does that savings plan automatically transfer to pod B?
8 replies