RunPod•7mo ago

Maximum number of A40s that can run at one time

I'm looking to run as many A40s to finish a large-scale inference/LLM generation job. How many could I run at one time? 40, 80, 100?

51 Replies

Steve Wozniak•7mo ago

In practice, many setups use between 2 to 8 GPUs, but some high-performance computing environments may use even more, depending on the specific needs and configuration of the system.

sluzorzOP•7mo ago

We split our inference jobs into batches So we can run on any amount of GPUs as we like, but just need someone from RunPod to confirm this is allowed.

Steve Wozniak•7mo ago

it sounds interesting I am familiar with RunPod I wanna know how larget is your inference job

sluzorzOP•7mo ago

200 million rows can run maybe... 120,000 rows per hour per instance

Steve Wozniak•7mo ago

i think so but i need to check could u gimme runpod account?

sluzorzOP•7mo ago

Steve Wozniak•7mo ago

i mean paid account

sluzorzOP•7mo ago

Just to be clear, you don't work for runpod and all I need is an answer from runpod that I can run 100x A40s so I'm not gonna give you access to my runpod acocunt lol

Steve Wozniak•7mo ago

i can understand u

nerdylive•7mo ago

maybe go to https://contact.runpod.io/hc/en-us/requests/new And ask your question there 🙂 they would be able to give you an official info about that

sluzorzOP•7mo ago

yup

yhlong00000•7mo ago

We have a good amount of A40 GPUs available. In the ticket let us know the time you usually need them, the duration you plan to run them, and whether this is just for your current project or a longer-term, ongoing need. This information will help us better plan our capacity.😄

sluzorzOP•7mo ago

Ah, thanks! We just spun up 100, seems like may have consumed all of them, lol.

Soulmind•7mo ago

I got paged by the alert policy we setup internally for A40 availability as our product currently relies on that. Of course it is your right to spin up as many as you want but can you kindly let me know if this is going to be a one-off thingie or something you will be running in long-term? We've been happily enjoying the high availability of A40 but there are now only ~27 GPUs left lol

sluzorzOP•7mo ago

Our job should finish in an hour. Sorry about that! Support told me it was okay haha I’m happy to ping y’all ahead of time

Soulmind•7mo ago

lol yeah for sure it won't be an issue of course, not your fault no need to say sorry! it's just our thing that we only have added A40 to the autoscaling pool for now, cuz seemed like there were plenty of A40s couple days/weeks back. I think we anyways need to add more GPU types to the pool to adapt to any case. and 🤞 for your batch job 😉

sluzorzOP•7mo ago

There’s a bug so! Back to figuring it out and spinning up tomorrow

Soulmind•7mo ago

hope it's easy to debug! seems like now there are ~121 GPUs available. btw which backend are you using for your batch job? I heard SGLang is pretty good for batch jobs. @yhlong00000 any plans on adding more A40s to the pool?

yhlong00000•7mo ago

I’m sorry, I don’t have specific details about the future plans, but I know that we’re continuously working with suppliers to add more based on demand. The more you use a particular one, the more likely we are to expand it.😀

sluzorzOP•7mo ago

We're just splitting our database into chunks, downloads, processes, then uploads when complete

yhlong00000•7mo ago

BTW, are you in the same region? It might be worth checking availability in other regions as well.😆

yhlong00000•7mo ago

sluzorzOP•7mo ago

I just spun down our job so released 100 GPUs back @yhlong00000 is there a way to see quantity of GPUs? rather than just high/low

yhlong00000•7mo ago

For customers, it’s not available. Let me check if there’s a specific reason why we don’t display it. Will get back to you later.

Soulmind•7mo ago

We're pooling from CA-MTL-1 and EU-SE-1, as they are the only datacenters with network volume support with A40s. There is way to do that if you use GraphQL. The doc states that there is totalCount and rentedCount. If you run the query:

query gpuAvailability($gpuTypesInput: GpuTypeFilter, $lowestPriceInput: GpuLowestPriceInput) {
  gpuTypes(input: $gpuTypesInput) {
    lowestPrice(input: $lowestPriceInput) {
      uninterruptablePrice
      rentalPercentage
      rentedCount
      totalCount
    }
  }
}

query gpuAvailability($gpuTypesInput: GpuTypeFilter, $lowestPriceInput: GpuLowestPriceInput) {
  gpuTypes(input: $gpuTypesInput) {
    lowestPrice(input: $lowestPriceInput) {
      uninterruptablePrice
      rentalPercentage
      rentedCount
      totalCount
    }
  }
}

with variable:

variables: {
  gpuTypesInput: {
    id: 'NVIDIA A40',
  },
  lowestPriceInput: {
    gpuCount: 1,
    secureCloud: true,
    dataCenterId: 'CA-MTL-1',
  },
}

variables: {
  gpuTypesInput: {
    id: 'NVIDIA A40',
  },
  lowestPriceInput: {
    gpuCount: 1,
    secureCloud: true,
    dataCenterId: 'CA-MTL-1',
  },
}

you will be able to see the rented count and total count:

{
  "data": {
    "gpuTypes": [
      {
        "lowestPrice": {
          "uninterruptablePrice": 0.35,
          "rentalPercentage": 0.8745,
          "rentedCount": 885,
          "totalCount": 1012
        }
      }
    ]
  }
}

{
  "data": {
    "gpuTypes": [
      {
        "lowestPrice": {
          "uninterruptablePrice": 0.35,
          "rentalPercentage": 0.8745,
          "rentedCount": 885,
          "totalCount": 1012
        }
      }
    ]
  }
}

but seems like the rented count and total count is not strictly from that specific datacenter, but aggregated tho..

sluzorzOP•7mo ago

Yeah, cool. I can work off that

yhlong00000•7mo ago

😂 ok, you guys are smart than me

nerdylive•7mo ago

haha soulmind has an idea of howmuch gpu's in a dc damn

Soulmind•7mo ago

👍 the only thing is, it seems like the GraphQL API is responding with the combined # of GPUs, not the # of GPUs in the specific dc...

nerdylive•7mo ago

ah that shouldn't be right if you're filtering with a dc id if it does like that please report it as a bug in #🧐｜feedback

Soulmind•7mo ago

yeah I will do, cuz I've been monitoring the values for a while, and seems like the totalCount, and rentedCount for same GPU but different dc shows the same value:

Datacenter: CA-MTL-1
GPU Types: NVIDIA A40
{
  "data": {
    "gpuTypes": [
      {
        "lowestPrice": {
          "uninterruptablePrice": 0.7,
          "rentalPercentage": 0.8423,
          "rentedCount": 844,
          "totalCount": 1002,
          "stockStatus": "High"
        },
        "oneMonthPrice": 0.35,
        "threeMonthPrice": 0.35,
        "sixMonthPrice": null
      }
    ]
  }
}
Datacenter: EU-SE-1
GPU Types: NVIDIA A40
{
  "data": {
    "gpuTypes": [
      {
        "lowestPrice": {
          "uninterruptablePrice": 0.7,
          "rentalPercentage": 0.8423,
          "rentedCount": 844,
          "totalCount": 1002,
          "stockStatus": "Medium"
        },
        "oneMonthPrice": 0.35,
        "threeMonthPrice": 0.35,
        "sixMonthPrice": null
      }
    ]
  }
}

Datacenter: CA-MTL-1
GPU Types: NVIDIA A40
{
  "data": {
    "gpuTypes": [
      {
        "lowestPrice": {
          "uninterruptablePrice": 0.7,
          "rentalPercentage": 0.8423,
          "rentedCount": 844,
          "totalCount": 1002,
          "stockStatus": "High"
        },
        "oneMonthPrice": 0.35,
        "threeMonthPrice": 0.35,
        "sixMonthPrice": null
      }
    ]
  }
}
Datacenter: EU-SE-1
GPU Types: NVIDIA A40
{
  "data": {
    "gpuTypes": [
      {
        "lowestPrice": {
          "uninterruptablePrice": 0.7,
          "rentalPercentage": 0.8423,
          "rentedCount": 844,
          "totalCount": 1002,
          "stockStatus": "Medium"
        },
        "oneMonthPrice": 0.35,
        "threeMonthPrice": 0.35,
        "sixMonthPrice": null
      }
    ]
  }
}

sluzorzOP•7mo ago

Starting that batch job again We might take all the A40 capacity or the remainder of it

utmostmick0•7mo ago

how long u guys gonna be running for ?

sluzorzOP•7mo ago

3h ~3h

utmostmick0•7mo ago

okies is this gonna b an ongoing thing ?

sluzorzOP•7mo ago

yes

utmostmick0•7mo ago

sluzorzOP•7mo ago

but mostly 1-2 times per week

utmostmick0•7mo ago

all good dude , i specificly set my workflow up in eu because when i get to use it no one is using them lol

sluzorzOP•7mo ago

are you using spot? I feel like it's releasing spot instances right now lol

Flynn•7mo ago

@sluzorz I see you're using up all the A40s! Do you know if there's a way to transfer all my data from one pod to another? I'm happy using another gpu, but I have a lot of stuff downloaded to my current pod which is on A40

sluzorzOP•7mo ago

Cloud sync and rclone I think their cloud sync is just rclone

nerdylive•7mo ago

wew bro what are you doing with the a40's training something? hey also sluorz, if this goes on commonly, try to tell runpod about this so they can accomodate other user's needs on a40

sluzorzOP•7mo ago

Large batch inference jobs with Bart

nerdylive•7mo ago

Oooh

sluzorzOP•7mo ago

Yeah, we’ve let runpod know.

Flynn•7mo ago

@sluzorz will you be finished about now?

sluzorzOP•7mo ago

Some of our batches are finishing now But we still have about 30 remaining since we couldn't spin up 100 A40s

nerdylive•7mo ago

Oooh

Flynn•7mo ago

okay please let me know when you've finished

sluzorzOP•7mo ago

We're mostly done But we will probably consume more in a few hours for embedding The A40s in runpod as just too good of an offering

Gaming

Programming

Maximum number of A40s that can run at one time

Did you find this page helpful?