長時(shí)間序列格點(diǎn)數(shù)據(jù)管理平臺的設(shè)計(jì)與實(shí)踐

doi:10.19517/j.1671-6345.20230430

首頁 > 過刊瀏覽>2024年第52卷第6期 >797-806. DOI:10.19517/j.1671-6345.20230430

長時(shí)間序列格點(diǎn)數(shù)據(jù)管理平臺的設(shè)計(jì)與實(shí)踐
DOI:
                        10.19517/j.1671-6345.20230430
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者單位:
作者簡介:
通訊作者:
中圖分類號:
基金項(xiàng)目:

Design and Practice of Long-Term Sequential Grid Data Management Platform

Author:

Affiliation:

Fund Project:

摘要

圖/表

訪問統(tǒng)計(jì)

參考文獻(xiàn)

相似文獻(xiàn)

引證文獻(xiàn)

資源附件

文章評論

摘要:

隨著數(shù)值模式時(shí)空分辨率的提升，數(shù)據(jù)量急劇增加，長序列數(shù)據(jù)很難直接通過文件拷貝或者網(wǎng)絡(luò)傳輸方式為用戶提供數(shù)據(jù)服務(wù)。為此，筆者設(shè)計(jì)實(shí)現(xiàn)了一種分布式管理平臺，該平臺根據(jù)用戶定制的數(shù)據(jù)需求，運(yùn)用預(yù)報(bào)要素、空間范圍、時(shí)間尺度等約束條件，抽取或根據(jù)區(qū)域參數(shù)裁剪指定氣象要素，生成精簡數(shù)據(jù)進(jìn)行用戶服務(wù)。該平臺集成了搜索引擎、格點(diǎn)數(shù)據(jù)解碼、內(nèi)存數(shù)據(jù)庫技術(shù)以及分布式框架，實(shí)現(xiàn)跨操作系統(tǒng)的統(tǒng)一接口調(diào)用和數(shù)據(jù)快速獲取，有效解決用戶訪問長時(shí)間序列歷史資料的難題。實(shí)驗(yàn)測試顯示，該平臺在格點(diǎn)數(shù)據(jù)管理規(guī)模和訪問效率方面均表現(xiàn)出色。特別是在北京2022年冬季奧運(yùn)會和冬殘奧會氣象保障服務(wù)中，該平臺發(fā)揮了重要作用，展現(xiàn)了其實(shí)際應(yīng)用的價(jià)值和潛力。

Abstract:

With the rapid development of numerical weather prediction services, the resolution and forecasting lead time of meteorological models have significantly improved, leading to an exponential growth in the volume of forecast data output. As a national meteorological model research and operational centre, CMA Earth System Modeling and Prediction Center (CEMC) currently produces daily gridded data outputs of 0.76 TB, with an annual output reaching 155.12 TB. Given the enormous data volumes, researchers’ preferences for data access are evolving. Wagemann predicts that future scientific users increasingly prefer cloud platforms or other interfaces for data access rather than solely relying on downloads. To address these issues, this paper proposes a lightweight distributed parallel processing framework for gridded data management, aiming to streamline data management processes and enhance data access speed. The core design philosophy revolves around leveraging search engine technology for rapid metadata retrieval and gridded data decoding techniques for efficient data acquisition. To mitigate performance penalties from repetitive decoding, the framework decodes gridded data files once and supports multiple retrievals and extractions, significantly accelerating data access. Additionally, it supports cross-platform data access, facilitating easier data acquisition for researchers. The framework adopts a three-tier architecture: the data layer stores data, the algorithm layer implements core search and cataloguing algorithms, and the business layer interfaces directly with user needs. The framework implements crucial functions such as gridded data cataloguing, extraction, and clipping. During cataloguing, users invoke the cataloguing interface and input parameters (e.g., original data file paths, index names, index types), and the system automatically parses file metadata and generates indexes. For data extraction, users call the retrieval interface with specific parameters to obtain designated data. Moreover, the framework supports precise extraction of specified latitudinal and longitudinal data segments by configuring cropping parameters. It reduces decoding time by creating indexes based on binary storage characteristics, utilises an inverted index value-id model for rapid data location retrieval, enhances processing performance through GlusterFS shared storage and Celery distributed message queues, and ensures efficient and stable data transmission using gRPC technology for C/S communication. Practical tests and applications demonstrate the framework’s exceptional performance in handling massive meteorological data. Notably, it successfully processes petabyte-scale gridded data during the Beijing Winter Olympics meteorological support services, significantly improving data access efficiency. Additionally, the framework supports flexible processing and scalable upgrades for various file formats to meet diverse user needs. By integrating advanced search engine technology, gridded data decoding methods, and a distributed cluster framework, the platform not only enables rapid data retrieval and efficient access but also satisfies researchers’ urgent demand for cross-platform data access. As meteorological data continues to grow, this platform holds significant potential to play a pivotal role in various fields, offering more robust data support for weather forecasting, scientific research, and operational applications.

參考文獻(xiàn)

相似文獻(xiàn)

引證文獻(xiàn)

引用本文

賈曉振,胡江凱,王大鵬,梁晨.長時(shí)間序列格點(diǎn)數(shù)據(jù)管理平臺的設(shè)計(jì)與實(shí)踐[J].氣象科技,2024,52(6):797~806

復(fù)制

文章指標(biāo)

點(diǎn)擊次數(shù):
下載次數(shù):
HTML閱讀次數(shù):
引用次數(shù):

歷史

收稿日期:2023-12-12
最后修改日期:2024-10-09
錄用日期:
在線發(fā)布日期: 2024-12-25
出版日期:

文章二維碼

您是第位訪問者
技術(shù)支持：北京勤云科技發(fā)展有限公司

互助| 宁明县| 汉寿县| 台东县| 云林县| 页游| 乐业县| 驻马店市| 稻城县| 乌兰县| 红桥区| 壤塘县| 宣城市| SHOW| 肇庆市| 介休市| 凤冈县| 商城县| 翁牛特旗| 富宁县| 洛隆县| 木里| 屏东市| 巴青县| 资兴市| 双峰县| 南部县| 阿鲁科尔沁旗| 阿尔山市| 荆门市| 惠东县| 洪江市| 西平县| 时尚| 巧家县| 福安市| 久治县| 徐州市| 上饶市| 阿瓦提县| 台南县|

引用本文

分享

相關(guān)視頻

文章指標(biāo)

歷史

文章二維碼