TY - GEN
T1 - A comparison of SuperLU solvers on the Intel MIC architecture
AU - Tuncel, Mehmet
AU - Duran, Ahmet
AU - Celebi, M. Serdar
AU - Akaydin, Bora
AU - Topkaya, Figen O.
PY - 2016/10/20
Y1 - 2016/10/20
N2 - In many science and engineering applications, problems may result in solving a sparse linear system AX=B. For example, SuperLU-MCDT, a linear solver, was used for the large penta-diagonal matrices for 2D problems and hepta-diagonal matrices for 3D problems, coming from the incompressible blood flow simulation (see [1]). It is important to test the status and potential improvements of state-of-the-art solvers on new technologies. In this work, sequential, multithreaded and distributed versions of SuperLU solvers (see [2]) are examined on the Intel Xeon Phi coprocessors using offload programming model at the EURORA cluster of CINECA in Italy. We consider a portfolio of test matrices containing patterned matrices from UFMM ([3]) and randomly located matrices. This architecture can benefit from high parallelism and large vectors. We find that the sequential SuperLU benefited up to 45 % performance improvement from the offload programming depending on the sparse matrix type and the size of transferred and processed data.
AB - In many science and engineering applications, problems may result in solving a sparse linear system AX=B. For example, SuperLU-MCDT, a linear solver, was used for the large penta-diagonal matrices for 2D problems and hepta-diagonal matrices for 3D problems, coming from the incompressible blood flow simulation (see [1]). It is important to test the status and potential improvements of state-of-the-art solvers on new technologies. In this work, sequential, multithreaded and distributed versions of SuperLU solvers (see [2]) are examined on the Intel Xeon Phi coprocessors using offload programming model at the EURORA cluster of CINECA in Italy. We consider a portfolio of test matrices containing patterned matrices from UFMM ([3]) and randomly located matrices. This architecture can benefit from high parallelism and large vectors. We find that the sequential SuperLU benefited up to 45 % performance improvement from the offload programming depending on the sparse matrix type and the size of transferred and processed data.
UR - http://www.scopus.com/inward/record.url?scp=84995480091&partnerID=8YFLogxK
U2 - 10.1063/1.4965394
DO - 10.1063/1.4965394
M3 - Conference contribution
AN - SCOPUS:84995480091
T3 - AIP Conference Proceedings
BT - Numerical Computations
A2 - Sergeyev, Yaroslav D.
A2 - Mukhametzhanov, Marat S.
A2 - Dell'Accio, Francesco
A2 - Mukhametzhanov, Marat S.
A2 - Kvasov, Dmitri E.
A2 - Sergeyev, Yaroslav D.
A2 - Kvasov, Dmitri E.
PB - American Institute of Physics Inc.
T2 - 2nd International Conference on Numerical Computations: Theory and Algorithms, NUMTA 2016
Y2 - 19 June 2016 through 25 June 2016
ER -