ktabrizi
ktabrizi
RRunPod
Created by ktabrizi on 8/30/2024 in #⛅|pods
Two pods disappeared from my account
For anyone investigating something similar, it turns out RunPod has a stale volume deletion policy:
Stale volumes are deleted if they have been inactive for 30 days, or if you run out of funds.
Pre-deletion warning notifications were sent to our team admin (I just never saw them, whoops).
7 replies
RRunPod
Created by ktabrizi on 8/30/2024 in #⛅|pods
Two pods disappeared from my account
No, they were "On-Demand - Secure Cloud" pods (edited my post to include this info).
7 replies
RRunPod
Created by ktabrizi on 7/9/2024 in #⛅|pods
AMD pods don't properly support GPU memory allocation
Sounds good, thanks for the update. If there's any way be notified if/when this is supported, please let me know!
10 replies
RRunPod
Created by ktabrizi on 7/9/2024 in #⛅|pods
AMD pods don't properly support GPU memory allocation
definitely fair, though I imagine there's a slightly more permissive security profile that will allow these pinned memory allocations without dropping seccomp altogether.
10 replies
RRunPod
Created by ktabrizi on 7/9/2024 in #⛅|pods
AMD pods don't properly support GPU memory allocation
we do – our application is compute intensive and involves PyTorch, but isn't an LLM or diffusion model. I think as soon as the software involved is doing anything custom with ROCm/HIP, someone would hit these kinds of issues. It'd be great to be able to run with RunPod's AMD pods as more and more applications are built to take advantage of the MI300Xs.
10 replies
RRunPod
Created by ktabrizi on 7/9/2024 in #⛅|pods
AMD pods don't properly support GPU memory allocation
Here's my script for quickly testing this, in case anyone wants to reproduce it:
#include <hip/hip_runtime.h>
#include <iostream>
#include <sstream>
#include <stdexcept>

#define CHECK_RESULT(result, errorMessage) \
if (result != hipSuccess) { \
std::stringstream m; \
m << errorMessage << ": " << hipGetErrorString(result) << " (" << result << ")"; \
throw std::runtime_error(m.str()); \
}

int main() {
unsigned int* pinnedCountBuffer = nullptr;
hipError_t result;

try {
// Attempt to allocate pinned memory
result = hipHostMalloc((void**)&pinnedCountBuffer, 2 * sizeof(unsigned int), hipHostMallocNumaUser);
CHECK_RESULT(result, "Failed to allocate pinned memory");

std::cout << "Successfully allocated pinned memory." << std::endl;

// Use the allocated memory
pinnedCountBuffer[0] = 42;
pinnedCountBuffer[1] = 84;

std::cout << "Values stored: " << pinnedCountBuffer[0] << ", " << pinnedCountBuffer[1] << std::endl;

// Free the allocated memory
result = hipHostFree(pinnedCountBuffer);
CHECK_RESULT(result, "Failed to free pinned memory");

std::cout << "Successfully freed pinned memory." << std::endl;
}
catch (const std::exception& e) {
std::cerr << "Error: " << e.what() << std::endl;
return 1;
}

return 0;
}
#include <hip/hip_runtime.h>
#include <iostream>
#include <sstream>
#include <stdexcept>

#define CHECK_RESULT(result, errorMessage) \
if (result != hipSuccess) { \
std::stringstream m; \
m << errorMessage << ": " << hipGetErrorString(result) << " (" << result << ")"; \
throw std::runtime_error(m.str()); \
}

int main() {
unsigned int* pinnedCountBuffer = nullptr;
hipError_t result;

try {
// Attempt to allocate pinned memory
result = hipHostMalloc((void**)&pinnedCountBuffer, 2 * sizeof(unsigned int), hipHostMallocNumaUser);
CHECK_RESULT(result, "Failed to allocate pinned memory");

std::cout << "Successfully allocated pinned memory." << std::endl;

// Use the allocated memory
pinnedCountBuffer[0] = 42;
pinnedCountBuffer[1] = 84;

std::cout << "Values stored: " << pinnedCountBuffer[0] << ", " << pinnedCountBuffer[1] << std::endl;

// Free the allocated memory
result = hipHostFree(pinnedCountBuffer);
CHECK_RESULT(result, "Failed to free pinned memory");

std::cout << "Successfully freed pinned memory." << std::endl;
}
catch (const std::exception& e) {
std::cerr << "Error: " << e.what() << std::endl;
return 1;
}

return 0;
}
You can compile and run this with hipcc -o test_hip_malloc test_hip_malloc.cpp && ./test_hip_malloc.
10 replies