Kawai, Hiroshi

Subdomain local FE solver implementation using iterative solver in domain decomposition method

Abstract eng:
As the hardware architecture of modern supercomputers is getting more and more complicated, it is more and more difficult to achieve high performance in real production-level application codes. To obtain, not only inter-node parallel performance using MPI, but also higher intra-node thread-parallel performance, more efficient utilization of processor cache memory and SIMD vectorization should be considered. In case of the domain decomposition method (DDM), one obvious performance bottleneck is the subdomain-wise local FE solver. Here in this presentation, performance benchmark of subdomain local FE solvers implemented using preconditioned conjugate gradient solvers will be demonstrated. The strength of this implementation is that the working set of local solver just fits in the processor cache memory. They are tested on modern multi-core scalar based supercomputers, such as RIKEN K Computer, Fujitsu PRIMEHPC FX100, Intel Haswell and Knights Corner.

Publisher:

International Union of Theoretical and Applied Mechanics, 2016

Conference Title:

24th International Congress of Theoretical and Applied Mechanics - Book of Papers

Conference Title:

24th International Congress of Theoretical and Applied Mechanics

Conference Venue:

Montreal (CA)

Conference Dates:

2016-08-21 / 2016-08-26

Rights:

Text je chráněný podle autorského zákona č. 121/2000 Sb.