标题:Refactoring and Optimizing WRF Model on Sunway TaihuLight
作者:Xu, Kai; Song, Zhenya; Chan, Yuandong; Wang, Shida; Meng, Xiangxu; Liu, Weiguo; Xue, Wei
通讯作者:Xue, W;Xue, W
作者机构:[Xu, Kai; Chan, Yuandong; Wang, Shida; Liu, Weiguo] Shandong Univ, Jinan, Shandong, Peoples R China.; [Xu, Kai; Wang, Shida; Liu, Weiguo; Xue, Wei] 更多
会议名称:48th International Conference on Parallel Processing (ICPP)
会议日期:AUG 05-08, 2019
来源:PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019)
出版年:2019
DOI:10.1145/3337821.3337923
关键词:Atmospheric modeling; Domain sepcific language; Sunway TailhuLight;; Heterogeneous computing
摘要:The Weather Research and Forecasting (WRF) Model is one of the widely-used mesoscale numerical weather prediction system and is designed for both atmospheric research and operational forecasting applications. However, it is an extremely time-consuming application: running a single simulation takes researchers days to weeks as the simulation size scales up and computing demands grow. In this paper, we port and optimize the whole WRF model to the Sunway TaihuLight supercomputer at a large scale. For the dynamic core in WRF, we present a domain-specific tool, namely, SWSLL, which is a directive-based compiler tool for the Sunway many-core architecture to convert the stencil computation into optimized parallel code. We also apply a decomposition strategy for SWSLL to improve the memory locality and decrease the number of off-chip memory accesses. For physical parameterizations, we explore the thread-level parallelization using OpenACC directives via reorganizations of data layouts and loops to achieve high performance. We present the algorithms and implementations and demonstrate the optimizations of a real-world complicated atmospheric modeling on the Sunway TaihuLight supercomputer. Evaluation results reveal that for the widely used benchmark with a horizontal resolution of 2.5 km, the speedup of 4.7 can be achieved by using the proposed algorithm and optimization strategies for the whole WRF model. In terms of strong scalability, our implementation scales well to hundreds of thousands of heterogeneous cores on Sunway TaihuLight.
收录类别:CPCI-S
资源类型:会议论文
TOP