Postprint version. Published in Journal of Parallel and Distributed Computing, Volume 65, Issue 4, January 1, 2005, pages 414-423.
NOTE: At the time of publication, the author David Marshall was affiliated with Cal Poly.
This paper presents several algorithmic innovations and a hybrid programming style that lead to highly scalable performance using shared memory for a new computational fluid dynamics flow solver. This hybrid model is then converted to a strict message-passing implementation, and performance results for the two are compared. Results show that using this hybrid approach our OpenMP implementation is actually marginally faster than the MPI version, with parallel speedups of up to 599 out of 640 using OpenMP and 486 with MPI.