TY - JOUR
T1 - A GPU application for high-order compact finite difference scheme
AU - Tutkun, Bulent
AU - Edis, Firat Oguz
PY - 2012/2/15
Y1 - 2012/2/15
N2 - In this study, a high-order compact finite difference scheme for the solution of fluid flow problems is implemented to run on a Graphical Processing Unit (GPU) using Compute Unified Device Architecture (CUDA). Besides the compact scheme, a high-order low pass filter is also employed. For time integration, the classical fourth-order Runge-Kutta method is used. Advection of a vortical disturbance and a temporal mixing layer, two basic flows, are chosen for the application of this numerical method on a Tesla C1060, one of NVIDIA's scientific computing GPUs. Obtained results are compared with those obtained on a single core CPU (AMD Phenom 2.5 GHz) in terms of calculation time. The CPU code exploits LAPACK/BLAS library to solve cyclic tridiagonal systems generated by the compact solution and filtering schemes, whereas the GPU code uses the inverse of the coefficient matrix to solve the same linear systems by utilizing the CUBLAS library. Moreover, the shared memory feature of the GPU is also employed to ease coalescing issues on some parts of the GPU code. Speedups between 9x-16.5x are achieved for different mesh sizes in comparison to CPU computations.
AB - In this study, a high-order compact finite difference scheme for the solution of fluid flow problems is implemented to run on a Graphical Processing Unit (GPU) using Compute Unified Device Architecture (CUDA). Besides the compact scheme, a high-order low pass filter is also employed. For time integration, the classical fourth-order Runge-Kutta method is used. Advection of a vortical disturbance and a temporal mixing layer, two basic flows, are chosen for the application of this numerical method on a Tesla C1060, one of NVIDIA's scientific computing GPUs. Obtained results are compared with those obtained on a single core CPU (AMD Phenom 2.5 GHz) in terms of calculation time. The CPU code exploits LAPACK/BLAS library to solve cyclic tridiagonal systems generated by the compact solution and filtering schemes, whereas the GPU code uses the inverse of the coefficient matrix to solve the same linear systems by utilizing the CUBLAS library. Moreover, the shared memory feature of the GPU is also employed to ease coalescing issues on some parts of the GPU code. Speedups between 9x-16.5x are achieved for different mesh sizes in comparison to CPU computations.
KW - Computational fluid dynamics
KW - GPU computing
KW - High-order compact scheme
UR - http://www.scopus.com/inward/record.url?scp=84655161423&partnerID=8YFLogxK
U2 - 10.1016/j.compfluid.2011.10.016
DO - 10.1016/j.compfluid.2011.10.016
M3 - Article
AN - SCOPUS:84655161423
SN - 0045-7930
VL - 55
SP - 29
EP - 35
JO - Computers and Fluids
JF - Computers and Fluids
ER -