GRAPES_MESO與WRF模式在鯤鵬平臺(tái)上的高性能計(jì)算特征分析
CSTR:
作者:
作者單位:

作者簡介:

通訊作者:

中圖分類號(hào):

基金項(xiàng)目:

國家自然科學(xué)基金項(xiàng)目(42475038,42030610)、浙江省科技計(jì)劃項(xiàng)目“尖兵領(lǐng)雁+X”研發(fā)攻關(guān)計(jì)劃(2024C03256)、浙江省自然科學(xué)基金項(xiàng)目(LY21D050001,LGF21D010001)、浙江省氣象科技計(jì)劃重點(diǎn)項(xiàng)目(2022ZD14)共同資助


Analysis of High-Performance Computing Characteristics of GRAPES_MESO and WRF Models on Kunpeng Platform
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 圖/表
  • |
  • 訪問統(tǒng)計(jì)
  • |
  • 參考文獻(xiàn)
  • |
  • 相似文獻(xiàn)
  • |
  • 引證文獻(xiàn)
  • |
  • 資源附件
  • |
  • 文章評(píng)論
    摘要:

    本文選取GRAPES_MESO(Global/Regional Assimilation PrEdiction System-Mesoscale version)模式和WRF(Weather Research and Forecasting Model)模式在國產(chǎn)鯤鵬(KUNPENG)平臺(tái)上開展數(shù)值模式計(jì)算特征分析,并與英特爾(X86)平臺(tái)進(jìn)行對(duì)比,探討數(shù)值模式在鯤鵬平臺(tái)上資源使用、計(jì)算瓶頸、熱點(diǎn)函數(shù)等方面的改進(jìn)空間。結(jié)果表明:經(jīng)過適配后,兩個(gè)模式在國產(chǎn)KUNPENG平臺(tái)上能得到與英特爾X86平臺(tái)一致的計(jì)算結(jié)果,呈現(xiàn)出較好的并行擴(kuò)展性;兩個(gè)模式對(duì)CPU的使用率均較高,計(jì)算瓶頸主要集中在后端CPU瓶頸,對(duì)節(jié)點(diǎn)的整體內(nèi)存使用率適當(dāng),后續(xù)優(yōu)化主要集中在代碼效率、算法、訪存等方面。在KUNPENG平臺(tái)上,可以考慮通過優(yōu)化集合通信的Collective Sync、Allreduce和Wait算法,來改善GRAPES_MESO模式的MPI的通信效率;可通過優(yōu)化GCR算法、以u(píng)ct、ucg為代表的集合通信熱點(diǎn)、以expf、powf等為代表的數(shù)學(xué)函數(shù)、malloc內(nèi)存操作等熱點(diǎn)函數(shù)對(duì)GRAPES_MESO模式進(jìn)行優(yōu)化。

    Abstract:

    The GRAPES_MESO and WRF models are used to analyse the computational characteristics of numerical models on the KUNPENG platform, and are compared with the Intel (X86) platform to explore the improvement space of numerical models in resource utilisation, computational bottlenecks, hotspot functions, and other aspects on the KUNPENG platform. The results indicate that: (1) After adaptation, both models obtain consistent results on the domestic KUNPENG platform as on the X86 platform. (2) Both models exhibit good parallel scalability on both X86 and KUNPENG platforms. When using the same number of processes, the computing efficiency of the KUNPENG platform is 65% to 90% of that of the X86 platform. However, when using the same number of nodes, the computing efficiency of the KUNPENG platform exceeds that of the X86 platform by 22% to 45%. (3) In terms of hardware resource utilisation, the two models consume the most time in computing, followed by communication, and finally IO. The models have a higher CPU usage rate, appropriate memory usage of nodes, and the subsequent optimisation mainly focuses on code efficiency, algorithm, memory access, etc. (4) In terms of MPI communication, the communication efficiency of MPI in the GRAPES model improves by optimising the Collective Sync, Allreduce, and Wait algorithms of collective communication on the KUNPENG platform. (5) Through top-down analysis, it is found that the computing bottlenecks of the two models on the two platforms are mainly concentrated in the back-end CPU bottleneck and the back-end memory subsystem bottleneck. Thanks to the optimisation of multi-memory channels and the Bisheng compiler, the memory access efficiency, branch prediction rate, and cache hit rate of the GRAPES model on the KUNPENG platform are higher than those on the X86 platform. In addition, from the perspective of memory subsystem bottleneck information, TLB Miss and L1/L2 Miss are generally low, the memory access efficiency is high, and the memory access optimisation space is limited. From the perspective of instruction distribution information, the proportion of memory read and shaping instructions is relatively high, and there are certain floating-point instructions, which reflect the high memory bandwidth advantage of the KUNPENG architecture. In addition, the vectorisation instruction is not high, so vectorisation optimisation is considered. (6) From the analysis of hotspots, the GRAPES model is optimised by the GCR algorithm, the collective communication hotspots represented by uct and ucg, the mathematical functions represented by expf and powf, and the hot functions such as malloc memory operations are also optimised on the KUNPENG platform.

    參考文獻(xiàn)
    相似文獻(xiàn)
    引證文獻(xiàn)
引用本文

陳鋒,何明揚(yáng),陳曄峰,吳兵成,徐誠. GRAPES_MESO與WRF模式在鯤鵬平臺(tái)上的高性能計(jì)算特征分析[J].氣象科技,2025,53(3):347~361

復(fù)制
分享
相關(guān)視頻

文章指標(biāo)
  • 點(diǎn)擊次數(shù):
  • 下載次數(shù):
  • HTML閱讀次數(shù):
  • 引用次數(shù):
歷史
  • 收稿日期:2024-04-11
  • 最后修改日期:2025-01-07
  • 錄用日期:
  • 在線發(fā)布日期: 2025-06-27
  • 出版日期:
文章二維碼
您是第位訪問者
技術(shù)支持:北京勤云科技發(fā)展有限公司
海伦市| 云安县| 兴国县| 张家港市| 台前县| 定结县| 化隆| 长武县| 鸡泽县| 玛纳斯县| 自贡市| 贵德县| 鸡泽县| 六安市| 高尔夫| 台南市| 资源县| 许昌市| 威信县| 华池县| 南康市| 石屏县| 鹰潭市| 桐梓县| 融水| 尚志市| 武汉市| 尼木县| 东城区| 邯郸县| 秦皇岛市| 正阳县| 册亨县| 元谋县| 台北县| 西宁市| 甘孜| 永昌县| 泸西县| 峨眉山市| 乌苏市|