Real Performance Comparison: H100 vs RX 6000 Ada
Hi,
I’m experiencing some confusion or perhaps misunderstanding regarding the performance of the H100 and RX 6000 Ada GPUs during model training.
Lately, I’ve been working with both GPUs to train a model using 9 GB of training data and 8 GB of testing data. The model has 2.6M parameters.
On the RX 6000 Ada, I’m observing an average speed of around 200 ms/step in my current tests:
Epoch 2/15
601/1427 ━━━━━━━━━━━━━━━━━━━━ 2:41 195ms/step - binary_accuracy: 0.8878 - loss: 0.2556
yesterday using the h100 i was having more than 300ms/step... somethings 400ms/step and rarely 200ms/step.
With the same script, same date, same everything.
Are the H1000 and RX6000 ada the same thing ?
Regards.
6 Replies
no its not the same, i guess h100 is faster than rtx6000 in most points
ok running again on RTX 6000 : 498/1427 ━━━━━━━━━━━━━━━━━━━━ 2:58 192ms/step - binary_accuracy: 0.8825 - loss: 0.2695
I ll make another try on H100 today and I will post the result here
im not sure too what affects that ms/step
I guess the dataset and the number of parameters your model handles
but in both runs everything is same
ooo yea those strongly affects, the batch size for training and the model params
yep i don't understand those numbers yet...
That's on a H100, maybe my script isn't optimized
2005/2005 ━━━━━━━━━━━━━━━━━━━━ 688s 341ms/step - binary_accuracy: 0.8848 - loss: 0.2727 - val_binary_accuracy: 0.8486 - val_loss: 0.4044 - learning_rate: 0.0010
Epoch 3/15
1391/2005 ━━━━━━━━━━━━━━━━━━━━ 2:57 290ms/step - binary_accuracy: 0.9031 - loss: 0.2306