ERA5再分析数据集下载

It has been 564 days since the last update, the content of the article may be outdated.

本文将包括以下解决方案：

如何自动化脚本下载？
如何自定义ERA5再分析数据的水平分辨率？

ERA5再分析数据是大气科学领域常用的几种再分析数据之一，记得在3~5年以前，ERA5还不能普遍获取，只有新世纪年代的在分析数据集。
目前，最新的ERA5数据集已经回溯到了1970年以前，无论是对于数值模拟还是年代际的气候变化研究来说，ERA5数据集都一个非常好用的数据集。
相较于之前的ERA再分析数据集来说，ERA5有了更高效、更快速的数据分发系统，即Climate Data Store ，简称CDS。
下载速度已经达到了MB级，会想起以前下载ERA-Interim/20C的痛苦回忆，不甚唏嘘。

准备工作：

访问https://cds.climate.copernicus.eu/cdsapp#!/search?text=ERA5，注册账号，登录。

ERA5数据浏览器下载方式

以下载ERA5小时分辨率的地表参数为例，

访问数据集页面ERA5-Land hourly data from 1950 to present ，会发现大量的复选框；
按需求勾选对应的地表参数，这里包含了个参数的分组，包括Temperature，Lakes Snow Snow，Soil Water，Radiation and Heat…等参数集,这里我们勾选✔2m dewpoint temperature为例；
勾选年月日，时刻（UTC时间），这里我们勾选2023，Jan, 01, 01:00；
自定义区域范围和数据格式，这里勾选了NETCDF3(experimental)
点击submit form就可以提交任务，并且跳转到my requests界面查看任务状态，当status变成download就可以点击下载了。

这样的下载方式操作比较简单，但是缺点明显：

手动的提交和下载方式比较费人，数据量少还好说，但是多年数据的下载就比较麻烦了，CDS系统设定了单个文件的大小，必须拆成足够小的粒度才能提交成功；
由上述的过程可以看出，手动下载不提供自定义水平分辨率的选项：默认的分辨率为:ERA5 0.25°，ERA-Land: 0.1°；

基于CDS API的ERA5数据集的自动化下载

CDS系统提供了专门的数据瞎子API,可以使用PYTHON脚本等访问下载，这种方式提供了更多更自由的参数选择。
本文介绍了Windows系统下基于VScode+Python+CDSAPI+jupyter插件的下载方式。
注：本文介绍基于python3的下载方法，如果你有现成的python解释器，可以直接命令行运行python脚本下载，不需要安装vccode，直接跳到下面的步骤4：

环境准备

下载vscode编辑器，默认方式安装，避免后续麻烦，建议安装时勾选加入PATH等选项;
安装python编译器，默认方式安装，安装时勾选加入PATH等选项;
安装cdsapi包，vscode新建终端，运行pip3 install cdsapi;
登录CDS系统。进入一下地址https://cds.climate.copernicus.eu/api-how-to ，复制黑色命令行中的两行代码：

plaintext

url: https://cds.climate.copernicus.eu/api/v2
key: 2100??:5cb??b6-????-4??6-98?f-e9e???????da

这两行是脚本访问api的身份识别信息，每个人的账户下自动生成，需要自己到该网址下复制；
windows用户主目录C:\Users\jiheng\下新建.cdsapirc文件，输入以上复制的两行内容，保存；
linux用户应该在/home/jiheng/下新建如上文件和内容；

脚本下载

获取脚本内容：
访问数据集页面 ERA5-Land hourly data from 1950 to present ，按需要勾选变量和时间，填写区域和数据格式；
点击最底部的Show API request，会出现python脚本，复制到本地。

python

import cdsapi
c = cdsapi.Client()
c.retrieve(
    'reanalysis-era5-land',
    {
        'variable': '2m_dewpoint_temperature',
        'year': '2023',
        'month': '01',
        'day': '01',
        'time': '00:00',
        'area': [
            90, -180, -90,
            180,
        ],
        'format': 'netcdf',
    },
    'download.nc')

新建数据下载文件夹，如G:\ERA5-Land,将上述复制的python脚本新建为era5-land-download-eaxple.py
使用VScode打开当前的文件夹作为工作路径（注意，最好这样干，如果直接打开py文件，下载数据会默认保存到个人主目录下，而不是数据下载目录），打开并运行脚本即可；

如何更优雅的下载？

但是这样的下载还是不够优雅，我们好需要进一步自定义分辨率和修改脚本，这里提供了一种可能的下载任务，供大家参考；

python

#%%
import os
import cdsapi
c = cdsapi.Client()

### specify the index of selected region 
ia =3
## SKIP Certain Arae or Year
downloadedAreas=['',]
downloadedYears=['',]

## index:  0      1      2     3     4     5     6     7
Regions=['ASIA','EUAF','AFR','NAM','SAM','MLY','AUS','NZD',]#'EBor','WBor','antarctic','arctic']

Areas=  [
        [60, 70, 0, 145,],   ### ASIA
        [60,-20,0,70,],       ## EUAF
        [0, 7, -36, 52,],  ## AFR
        [60,-140,15,-50,],  ## NAM
        [15,-95,-56,-34,],  ## SAM
        [0, 97, -11, 163,],   ### MLY
        [-11, 113, -44, 155,], ### AUS
        [-34, 166, -48, 179,], ### NZD
        # [75, 0, 60, 180,], ### EBor
        # [75, -180, 60, -15,], ### WBor
        # [-56, -180, -90, 180,], ### Antarctic
        # [75, -180, 90, 180,], ### Arctic
        ]

''' general setups '''
Years=['2019','2020','2021','2022',]
Months=['01','02','03','04','05','06','07','08','09','10','11','12']
Days=['01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20',
             '21','22','23','24','25','26','27','28','29','30','31']
utc_times= ['00:00', '01:00', '02:00','03:00', '04:00', '05:00',
            '06:00', '07:00', '08:00','09:00', '10:00', '11:00',
            '12:00', '13:00', '14:00','15:00', '16:00', '17:00',
            '18:00', '19:00', '20:00','21:00', '22:00', '23:00',]
leapyears=['2012','2016','2020',]

# 29 Layers
plevels=['50', '70', '100','125', '150', '175','200', '225', '250','300', '350', '400', '450', '500', '550',
        '600', '650', '700', '750', '775', '800', '825', '850', '875','900', '925', '950','975', '1000',]

params=['geopotential', 'relative_humidity', 'temperature',]

resolution= ['0.25', '0.25']        

''''''
## loops for downloading
area = Areas[ia]
# for ia, area in enumerate(Areas):
if not os.path.exists(Regions[ia]):
    os.mkdir(Regions[ia])
UTC=utc_times
for iy, year in enumerate(Years):  
    if (Regions[ia] in downloadedAreas) or year in downloadedYears:
        print(Regions[ia]+ " no need  for "+year)
        continue  
    for im, month in enumerate(Months):          
        for id, day in enumerate(Days):
            if month in ['04','06','09','11'] and day>='31':
                continue
            if year in leapyears and month=='02' and day>='30':
                continue
            if (year not in leapyears) and month=='02' and day>='29':
                continue
            ncFileName = Regions[ia]+'/'+year+'/ERA5-PL-29L-0P25-'+Regions[ia]+'-'+year+month+day+'.nc'
            if os.path.exists(ncFileName):
                continue
            if not os.path.exists(Regions[ia]+'/'+year):
                os.mkdir(Regions[ia]+'/'+year)   
           # ncFileName=Regions[ia]+'/'+year+'/ERA5-PL-29L-0P25-'+Regions[ia]+'-'+year+month+day+'.nc'
            c.retrieve('reanalysis-era5-pressure-levels',
                {   'product_type': 'reanalysis',
                    'variable': params,
                    'pressure_level': plevels,
                    'year': year,
                    'month': month,
                    'day':   day,
                    'time': UTC,
                    'format': 'netcdf',
                    'area': area,
                    'grid': resolution,   ### 自定义分辨率
                },
                ncFileName)
# %%

并且，当你尝试使用vscode jupyter插件时，可以点击运行单元格来打开窗口进行下载，新建多个脚本，实现并排下载：

其他产品的代码

ERA5 大气廓线:

python

#%%
import os
import cdsapi
c = cdsapi.Client()

### specify the index of selected region 

ia =0

## SKIP Certain Arae or Year
downloadedAreas=['',]
downloadedYears=['',]

## index:  0      1      2     3     4     5     6     7
Regions=['ASIA','EUAF','AFR','NAM','SAM','MLY','AUS','NZD',]#'EBor','WBor','antarctic','arctic']

Areas=  [
        [60, 70, 0, 145,],   ### ASIA
        [60,-20,0,70,],       ## EUAF
        [0, 7, -36, 52,],  ## AFR
        [60,-140,15,-50,],  ## NAM
        [15,-95,-56,-34,],  ## SAM
        [0, 97, -11, 163,],   ### MLY
        [-11, 113, -44, 155,], ### AUS
        [-34, 166, -48, 179,], ### NZD
        # [75, 0, 60, 180,], ### EBor
        # [75, -180, 60, -15,], ### WBor
        # [-56, -180, -90, 180,], ### Antarctic
        # [75, -180, 90, 180,], ### Arctic
        ]

''' general setups '''
Years=['2019','2020','2021','2022']
Months=['01','02','03','04','05','06','07','08','09','10','11','12']
Days=['01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20',
             '21','22','23','24','25','26','27','28','29','30','31']
utc_times= ['00:00', '01:00', '02:00','03:00', '04:00', '05:00',
            '06:00', '07:00', '08:00','09:00', '10:00', '11:00',
            '12:00', '13:00', '14:00','15:00', '16:00', '17:00',
            '18:00', '19:00', '20:00','21:00', '22:00', '23:00',]
leapyears=['2012','2016','2020',]

# 29 Layers
plevels=['50', '70', '100','125', '150', '175','200', '225', '250','300', '350', '400', '450', '500', '550',
        '600', '650', '700', '750', '775', '800', '825', '850', '875','900', '925', '950','975', '1000',]

params=['geopotential', 'relative_humidity', 'temperature',]

resolution= ['0.25', '0.25']        

''''''
## loops for downloading
area = Areas[ia]
# for ia, area in enumerate(Areas):
if not os.path.exists(Regions[ia]):
    os.mkdir(Regions[ia])
UTC=utc_times
for iy, year in enumerate(Years):  
    if (Regions[ia] in downloadedAreas) or year in downloadedYears:
        print(Regions[ia]+ " no need  for "+year)
        continue  
    for im, month in enumerate(Months):          
        for id, day in enumerate(Days):
            if month in ['04','06','09','11'] and day>='31':
                continue
            if year in leapyears and month=='02' and day>='30':
                continue
            if (year not in leapyears) and month=='02' and day>='29':
                continue
            ncFileName = Regions[ia]+'/'+year+'/ERA5-PL-29L-0P25-'+Regions[ia]+'-'+year+month+day+'.nc'
            if os.path.exists(ncFileName):
                continue
            if not os.path.exists(Regions[ia]+'/'+year):
                os.mkdir(Regions[ia]+'/'+year)   
            c.retrieve('reanalysis-era5-pressure-levels',
                {   'product_type': 'reanalysis',
                    'variable': params,
                    'pressure_level': plevels,
                    'year': year,
                    'month': month,
                    'day':   day,
                    'time': UTC,
                    'format': 'netcdf',
                    'area': area,
                    'grid': resolution,
                },
                ncFileName)
# %%

ERA5 single level:

python

#%%
import os
import cdsapi
c = cdsapi.Client()

### specify the index of selected region 

ia =0

## SKIP Certain Arae or Year
downloadedAreas=['',]
downloadedYears=['',]

## index:  0      1      2     3     4     5     6     7
Regions=['ASIA','EUAF','AFR','NAM','SAM','MLY','AUS','NZD',]#'EBor','WBor','antarctic','arctic']

Areas=  [
        [60, 70, 0, 145,],   ### ASIA
        [60,-20,0,70,],       ## EUAF
        [0, 7, -36, 52,],  ## AFR
        [60,-140,15,-50,],  ## NAM
        [15,-95,-56,-34,],  ## SAM
        [0, 97, -11, 163,],   ### MLY
        [-11, 113, -44, 155,], ### AUS
        [-34, 166, -48, 179,], ### NZD
        # [75, 0, 60, 180,], ### EBor
        # [75, -180, 60, -15,], ### WBor
        # [-56, -180, -90, 180,], ### Antarctic
        # [75, -180, 90, 180,], ### Arctic
        ]

''' general setups '''
Years=['2011','2012','2013','2014','2015','2016','2017','2018','2019','2020','2021','2022',]
Months=['01','02','03','04','05','06','07','08','09','10','11','12']
Days=['01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31']
utc_times= ['00:00', '01:00', '02:00','03:00', '04:00', '05:00',
            '06:00', '07:00', '08:00','09:00', '10:00', '11:00',
            '12:00', '13:00', '14:00','15:00', '16:00', '17:00',
            '18:00', '19:00', '20:00','21:00', '22:00', '23:00',]
leapyears=['2012','2016','2020',]

params= ['cloud_base_height', 'high_vegetation_cover', 'leaf_area_index_high_vegetation',
            'leaf_area_index_low_vegetation', 'low_vegetation_cover', 'soil_type',
            'total_cloud_cover', 'total_column_cloud_ice_water', 'total_column_cloud_liquid_water',
            'total_column_rain_water', 'total_column_snow_water', 'type_of_high_vegetation',
            'type_of_low_vegetation',]

resolution= ['0.25', '0.25']        

''''''
## loops for downloading
area = Areas[ia]
# for ia, area in enumerate(Areas):
if not os.path.exists(Regions[ia]):
    os.mkdir(Regions[ia])
UTC=utc_times
for iy, year in enumerate(Years):  
    if (Regions[ia] in downloadedAreas) or year in downloadedYears:
        print(Regions[ia]+ " no need  for "+year)
        continue  
    for im, month in enumerate(Months):          
        for id, day in enumerate(Days):
            if month in ['04','06','09','11'] and day>='31':
                continue
            if year in leapyears and month=='02' and day>='30':
                continue
            if (year not in leapyears) and month=='02' and day>='29':
                continue
            if not os.path.exists(Regions[ia]+'/'+year):
                os.mkdir(Regions[ia]+'/'+year)   
            ncFileName=Regions[ia]+'/'+year+'/ERA5-SingleL-0P25-'+Regions[ia]+'-'+year+month+day+'.nc'
            c.retrieve('reanalysis-era5-single-levels',
                        {   'variable':params,
                            'year': year,
                            'month': month,
                            'day': day,
                            'time': UTC,
                            'area': area,
                            'product_type': 'reanalysis',
                            'format': 'netcdf',
                            'grid': resolution,
                        },
                        ncFileName)
# %%