Given that the numbers of cores in a processor continues to grow (e.g the new six core processor from AMD) single memory domains (motherboards) may have anywhere between 12 and 32 cores in the near future. Here is an interesting scenario. Let’s assume that 12-32 cores systems become common place. If this is enough computing power for your tasks, then how will you approach HPC programming? Will you use MPI because you may want to scale the program to a cluster or will you use something like OpenMP or a new type of multi-core programming tool because it is easier or works better? Could a gulf in HPC programming develop?. Perhaps MPI will still be used for “big cluster HPC” and other methods may be used for “small motherboard HPC”. Of course MPI can always be used on small core counts, but will some point-and-click thread based tool attract more users because “MPI is too hard to program”.
This was something I faced on my last project. I was trained with MPI, but chose OpenMP for a project that I just could never see it running on a large cluster and plus it was easy to modify from the serial version. The biggest issue I see with the “ease of use” with OpenMP is that it’s too easy to make a data race mistake. I found that out once I increased my data set to test how well I had indeed parallelized this code and found a data race that had not shown up in a small data set. I used Sun Studio, but I thought back to webinars I’ve been watching for Intel’s newest Parallel Studio, about using smaller datasets etc…
But what they didn’t say is that if your program is linear this process works well, but if your program goes off on tangents to reduce problem size and then go back and calculate again, then functions that would normally not show up as “benefiting from parallelization” grind to a halt. It was familiarity with the program that helped me realize that. What concerns me is this “ease of use” could very well make anyone out there consider themselves an HPC programmer because they open up VS + IPS and voila! insert #pragma here…before you know it, you’ve got more threads started than you can count.
Another project before I arrived here involved a program where the scientist couldn’t understant why his MPI program kept crashing. He’s passing some huge multi dimensional array that goes well beyond the memory passing limit! Well, either one has to decrease the array, split it, or prioritize what’s being passed…but if you’re trying to pass some 2GB+ message back and forth, MPI isn’t going to help your program much, so whats the point?
Finally, I don’t see a either/or choice here really. They are both good for something, the choice depends on the goal. It would be nice to get easy scalability with OpenMP, but MPI forces people to think about memory, think about algorithms, think about the problem at hand. GPGPUs, FPGAs also force you to think about memory management, thread spawning etc. OpenMP, leaves a lot “open” for common mistakes.
The only problem I really see here is that there aren’t enough schools out there teaching HPC. Computational Science/Scientific Programming Departments are just beginning to sprout and in the meanwhile, we’re left to learn these things on our own, through trial and error and for those of us who are young, a lot of painful annoying mistakes. Everyone knows parallelizing hurts, specially for us young n00bs.

