Marvee Amasi
DIIDevHeads IoT Integration Server
•Created by Marvee Amasi on 7/8/2024 in #middleware-and-os
Debugging Persistent Segmentation Fault in Multi-threaded C++ Program on AMD Barcelona CPUs
I have been wrestling with a persistent segmentation fault in a multi-threaded C++ program running on a cluster of AMD Barcelona CPUs Linux/x86_64. The code causing the crashes is a heavily used function, and under load, running 1000 instances of the program same optimized binary can generate 1 to 2 crashes per hour.
Now here's the interesting part, the crashes happen on different machines within the cluster although the machines themselves are almost identical, and they all share the same characteristics - same crash address and call stack.
9 replies