R
RunPod3mo ago
Acrack

Real Performance Comparison: H100 vs RX 6000 Ada

Hi, I’m experiencing some confusion or perhaps misunderstanding regarding the performance of the H100 and RX 6000 Ada GPUs during model training. Lately, I’ve been working with both GPUs to train a model using 9 GB of training data and 8 GB of testing data. The model has 2.6M parameters. On the RX 6000 Ada, I’m observing an average speed of around 200 ms/step in my current tests: Epoch 2/15 601/1427 ━━━━━━━━━━━━━━━━━━━━ 2:41 195ms/step - binary_accuracy: 0.8878 - loss: 0.2556 yesterday using the h100 i was having more than 300ms/step... somethings 400ms/step and rarely 200ms/step. With the same script, same date, same everything. Are the H1000 and RX6000 ada the same thing ? Regards.
6 Replies
nerdylive
nerdylive3mo ago
no its not the same, i guess h100 is faster than rtx6000 in most points
Acrack
AcrackOP3mo ago
ok running again on RTX 6000 : 498/1427 ━━━━━━━━━━━━━━━━━━━━ 2:58 192ms/step - binary_accuracy: 0.8825 - loss: 0.2695 I ll make another try on H100 today and I will post the result here
nerdylive
nerdylive3mo ago
im not sure too what affects that ms/step
Acrack
AcrackOP3mo ago
I guess the dataset and the number of parameters your model handles but in both runs everything is same
nerdylive
nerdylive3mo ago
ooo yea those strongly affects, the batch size for training and the model params
Acrack
AcrackOP3mo ago
yep i don't understand those numbers yet... That's on a H100, maybe my script isn't optimized 2005/2005 ━━━━━━━━━━━━━━━━━━━━ 688s 341ms/step - binary_accuracy: 0.8848 - loss: 0.2727 - val_binary_accuracy: 0.8486 - val_loss: 0.4044 - learning_rate: 0.0010 Epoch 3/15 1391/2005 ━━━━━━━━━━━━━━━━━━━━ 2:57 290ms/step - binary_accuracy: 0.9031 - loss: 0.2306

Did you find this page helpful?