本文将包括以下解决方案:

  1. 如何自动化脚本下载?
  2. 如何自定义ERA5再分析数据的水平分辨率?

ERA5再分析数据是大气科学领域常用的几种再分析数据之一,记得在3~5年以前,ERA5还不能普遍获取,只有新世纪年代的在分析数据集。
目前,最新的ERA5数据集已经回溯到了1970年以前,无论是对于数值模拟还是年代际的气候变化研究来说,ERA5数据集都一个非常好用的数据集。
相较于之前的ERA再分析数据集来说,ERA5有了更高效、更快速的数据分发系统,即Climate Data Store ,简称CDS。
下载速度已经达到了MB级,会想起以前下载ERA-Interim/20C的痛苦回忆,不甚唏嘘。

准备工作:

访问https://cds.climate.copernicus.eu/cdsapp#!/search?text=ERA5,注册账号,登录。

ERA5数据浏览器下载方式

以下载ERA5小时分辨率的地表参数为例,

  1. 访问数据集页面ERA5-Land hourly data from 1950 to present ,会发现大量的复选框;
  2. 按需求勾选对应的地表参数,这里包含了个参数的分组,包括Temperature,Lakes Snow Snow,Soil Water,Radiation and Heat…等参数集,这里我们勾选✔2m dewpoint temperature为例;
  3. 勾选年月日,时刻(UTC时间),这里我们勾选2023,Jan, 01, 01:00;
  4. 自定义区域范围和数据格式,这里勾选了NETCDF3(experimental)
  5. 点击submit form就可以提交任务,并且跳转到my requests界面查看任务状态,当status变成download就可以点击下载了。

这样的下载方式操作比较简单,但是缺点明显:

  1. 手动的提交和下载方式比较费人,数据量少还好说,但是多年数据的下载就比较麻烦了,CDS系统设定了单个文件的大小,必须拆成足够小的粒度才能提交成功;
  2. 由上述的过程可以看出,手动下载不提供自定义水平分辨率的选项:默认的分辨率为:ERA5 0.25°,ERA-Land: 0.1°;

基于CDS API的ERA5数据集的自动化下载

CDS系统提供了专门的数据瞎子API,可以使用PYTHON脚本等访问下载,这种方式提供了更多更自由的参数选择。
本文介绍了Windows系统下基于VScode+Python+CDSAPI+jupyter插件的下载方式。
注:本文介绍基于python3的下载方法,如果你有现成的python解释器,可以直接命令行运行python脚本下载,不需要安装vccode,直接跳到下面的步骤4:

环境准备

  1. 下载vscode编辑器,默认方式安装,避免后续麻烦,建议安装时勾选加入PATH等选项;
  2. 安装python编译器,默认方式安装,安装时勾选加入PATH等选项;
  3. 安装cdsapi包,vscode新建终端,运行pip3 install cdsapi;
  4. 登录CDS系统。进入一下地址https://cds.climate.copernicus.eu/api-how-to ,复制黑色命令行中的两行代码:
plaintext
url: https://cds.climate.copernicus.eu/api/v2
key: 2100??:5cb??b6-????-4??6-98?f-e9e???????da

这两行是脚本访问api的身份识别信息,每个人的账户下自动生成,需要自己到该网址下复制;
windows用户主目录C:\Users\jiheng\下新建.cdsapirc文件,输入以上复制的两行内容,保存;
linux用户应该在/home/jiheng/下新建如上文件和内容;

脚本下载

  1. 获取脚本内容:
    访问数据集页面 ERA5-Land hourly data from 1950 to present ,按需要勾选变量和时间,填写区域和数据格式;
    点击最底部的Show API request,会出现python脚本,复制到本地。

    python
    import cdsapi
    c = cdsapi.Client()
    c.retrieve(
    'reanalysis-era5-land',
    {
    'variable': '2m_dewpoint_temperature',
    'year': '2023',
    'month': '01',
    'day': '01',
    'time': '00:00',
    'area': [
    90, -180, -90,
    180,
    ],
    'format': 'netcdf',
    },
    'download.nc')
  2. 新建数据下载文件夹,如G:\ERA5-Land,将上述复制的python脚本新建为era5-land-download-eaxple.py

  3. 使用VScode打开当前的文件夹作为工作路径(注意,最好这样干,如果直接打开py文件,下载数据会默认保存到个人主目录下,而不是数据下载目录),打开并运行脚本即可;

如何更优雅的下载?

但是这样的下载还是不够优雅,我们好需要进一步自定义分辨率和修改脚本,这里提供了一种可能的下载任务,供大家参考;

python
#%%
import os
import cdsapi
c = cdsapi.Client()

### specify the index of selected region
ia =3
## SKIP Certain Arae or Year
downloadedAreas=['',]
downloadedYears=['',]

## index: 0 1 2 3 4 5 6 7
Regions=['ASIA','EUAF','AFR','NAM','SAM','MLY','AUS','NZD',]#'EBor','WBor','antarctic','arctic']

Areas= [
[60, 70, 0, 145,], ### ASIA
[60,-20,0,70,], ## EUAF
[0, 7, -36, 52,], ## AFR
[60,-140,15,-50,], ## NAM
[15,-95,-56,-34,], ## SAM
[0, 97, -11, 163,], ### MLY
[-11, 113, -44, 155,], ### AUS
[-34, 166, -48, 179,], ### NZD
# [75, 0, 60, 180,], ### EBor
# [75, -180, 60, -15,], ### WBor
# [-56, -180, -90, 180,], ### Antarctic
# [75, -180, 90, 180,], ### Arctic
]

''' general setups '''
Years=['2019','2020','2021','2022',]
Months=['01','02','03','04','05','06','07','08','09','10','11','12']
Days=['01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20',
'21','22','23','24','25','26','27','28','29','30','31']
utc_times= ['00:00', '01:00', '02:00','03:00', '04:00', '05:00',
'06:00', '07:00', '08:00','09:00', '10:00', '11:00',
'12:00', '13:00', '14:00','15:00', '16:00', '17:00',
'18:00', '19:00', '20:00','21:00', '22:00', '23:00',]
leapyears=['2012','2016','2020',]

# 29 Layers
plevels=['50', '70', '100','125', '150', '175','200', '225', '250','300', '350', '400', '450', '500', '550',
'600', '650', '700', '750', '775', '800', '825', '850', '875','900', '925', '950','975', '1000',]

params=['geopotential', 'relative_humidity', 'temperature',]

resolution= ['0.25', '0.25']

''''''
## loops for downloading
area = Areas[ia]
# for ia, area in enumerate(Areas):
if not os.path.exists(Regions[ia]):
os.mkdir(Regions[ia])
UTC=utc_times
for iy, year in enumerate(Years):
if (Regions[ia] in downloadedAreas) or year in downloadedYears:
print(Regions[ia]+ " no need for "+year)
continue
for im, month in enumerate(Months):
for id, day in enumerate(Days):
if month in ['04','06','09','11'] and day>='31':
continue
if year in leapyears and month=='02' and day>='30':
continue
if (year not in leapyears) and month=='02' and day>='29':
continue
ncFileName = Regions[ia]+'/'+year+'/ERA5-PL-29L-0P25-'+Regions[ia]+'-'+year+month+day+'.nc'
if os.path.exists(ncFileName):
continue
if not os.path.exists(Regions[ia]+'/'+year):
os.mkdir(Regions[ia]+'/'+year)
# ncFileName=Regions[ia]+'/'+year+'/ERA5-PL-29L-0P25-'+Regions[ia]+'-'+year+month+day+'.nc'
c.retrieve('reanalysis-era5-pressure-levels',
{ 'product_type': 'reanalysis',
'variable': params,
'pressure_level': plevels,
'year': year,
'month': month,
'day': day,
'time': UTC,
'format': 'netcdf',
'area': area,
'grid': resolution, ### 自定义分辨率
},
ncFileName)
# %%

并且,当你尝试使用vscode jupyter插件时,可以点击运行单元格来打开窗口进行下载,新建多个脚本,实现并排下载:

其他产品的代码

ERA5 大气廓线:

python
#%%
import os
import cdsapi
c = cdsapi.Client()

### specify the index of selected region

ia =0

## SKIP Certain Arae or Year
downloadedAreas=['',]
downloadedYears=['',]

## index: 0 1 2 3 4 5 6 7
Regions=['ASIA','EUAF','AFR','NAM','SAM','MLY','AUS','NZD',]#'EBor','WBor','antarctic','arctic']

Areas= [
[60, 70, 0, 145,], ### ASIA
[60,-20,0,70,], ## EUAF
[0, 7, -36, 52,], ## AFR
[60,-140,15,-50,], ## NAM
[15,-95,-56,-34,], ## SAM
[0, 97, -11, 163,], ### MLY
[-11, 113, -44, 155,], ### AUS
[-34, 166, -48, 179,], ### NZD
# [75, 0, 60, 180,], ### EBor
# [75, -180, 60, -15,], ### WBor
# [-56, -180, -90, 180,], ### Antarctic
# [75, -180, 90, 180,], ### Arctic
]

''' general setups '''
Years=['2019','2020','2021','2022']
Months=['01','02','03','04','05','06','07','08','09','10','11','12']
Days=['01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20',
'21','22','23','24','25','26','27','28','29','30','31']
utc_times= ['00:00', '01:00', '02:00','03:00', '04:00', '05:00',
'06:00', '07:00', '08:00','09:00', '10:00', '11:00',
'12:00', '13:00', '14:00','15:00', '16:00', '17:00',
'18:00', '19:00', '20:00','21:00', '22:00', '23:00',]
leapyears=['2012','2016','2020',]

# 29 Layers
plevels=['50', '70', '100','125', '150', '175','200', '225', '250','300', '350', '400', '450', '500', '550',
'600', '650', '700', '750', '775', '800', '825', '850', '875','900', '925', '950','975', '1000',]

params=['geopotential', 'relative_humidity', 'temperature',]

resolution= ['0.25', '0.25']

''''''
## loops for downloading
area = Areas[ia]
# for ia, area in enumerate(Areas):
if not os.path.exists(Regions[ia]):
os.mkdir(Regions[ia])
UTC=utc_times
for iy, year in enumerate(Years):
if (Regions[ia] in downloadedAreas) or year in downloadedYears:
print(Regions[ia]+ " no need for "+year)
continue
for im, month in enumerate(Months):
for id, day in enumerate(Days):
if month in ['04','06','09','11'] and day>='31':
continue
if year in leapyears and month=='02' and day>='30':
continue
if (year not in leapyears) and month=='02' and day>='29':
continue
ncFileName = Regions[ia]+'/'+year+'/ERA5-PL-29L-0P25-'+Regions[ia]+'-'+year+month+day+'.nc'
if os.path.exists(ncFileName):
continue
if not os.path.exists(Regions[ia]+'/'+year):
os.mkdir(Regions[ia]+'/'+year)
c.retrieve('reanalysis-era5-pressure-levels',
{ 'product_type': 'reanalysis',
'variable': params,
'pressure_level': plevels,
'year': year,
'month': month,
'day': day,
'time': UTC,
'format': 'netcdf',
'area': area,
'grid': resolution,
},
ncFileName)
# %%

ERA5 single level:

python
#%%
import os
import cdsapi
c = cdsapi.Client()

### specify the index of selected region

ia =0

## SKIP Certain Arae or Year
downloadedAreas=['',]
downloadedYears=['',]

## index: 0 1 2 3 4 5 6 7
Regions=['ASIA','EUAF','AFR','NAM','SAM','MLY','AUS','NZD',]#'EBor','WBor','antarctic','arctic']

Areas= [
[60, 70, 0, 145,], ### ASIA
[60,-20,0,70,], ## EUAF
[0, 7, -36, 52,], ## AFR
[60,-140,15,-50,], ## NAM
[15,-95,-56,-34,], ## SAM
[0, 97, -11, 163,], ### MLY
[-11, 113, -44, 155,], ### AUS
[-34, 166, -48, 179,], ### NZD
# [75, 0, 60, 180,], ### EBor
# [75, -180, 60, -15,], ### WBor
# [-56, -180, -90, 180,], ### Antarctic
# [75, -180, 90, 180,], ### Arctic
]

''' general setups '''
Years=['2011','2012','2013','2014','2015','2016','2017','2018','2019','2020','2021','2022',]
Months=['01','02','03','04','05','06','07','08','09','10','11','12']
Days=['01','02','03','04','05','06','07','08','09','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31']
utc_times= ['00:00', '01:00', '02:00','03:00', '04:00', '05:00',
'06:00', '07:00', '08:00','09:00', '10:00', '11:00',
'12:00', '13:00', '14:00','15:00', '16:00', '17:00',
'18:00', '19:00', '20:00','21:00', '22:00', '23:00',]
leapyears=['2012','2016','2020',]

params= ['cloud_base_height', 'high_vegetation_cover', 'leaf_area_index_high_vegetation',
'leaf_area_index_low_vegetation', 'low_vegetation_cover', 'soil_type',
'total_cloud_cover', 'total_column_cloud_ice_water', 'total_column_cloud_liquid_water',
'total_column_rain_water', 'total_column_snow_water', 'type_of_high_vegetation',
'type_of_low_vegetation',]

resolution= ['0.25', '0.25']

''''''
## loops for downloading
area = Areas[ia]
# for ia, area in enumerate(Areas):
if not os.path.exists(Regions[ia]):
os.mkdir(Regions[ia])
UTC=utc_times
for iy, year in enumerate(Years):
if (Regions[ia] in downloadedAreas) or year in downloadedYears:
print(Regions[ia]+ " no need for "+year)
continue
for im, month in enumerate(Months):
for id, day in enumerate(Days):
if month in ['04','06','09','11'] and day>='31':
continue
if year in leapyears and month=='02' and day>='30':
continue
if (year not in leapyears) and month=='02' and day>='29':
continue
if not os.path.exists(Regions[ia]+'/'+year):
os.mkdir(Regions[ia]+'/'+year)
ncFileName=Regions[ia]+'/'+year+'/ERA5-SingleL-0P25-'+Regions[ia]+'-'+year+month+day+'.nc'
c.retrieve('reanalysis-era5-single-levels',
{ 'variable':params,
'year': year,
'month': month,
'day': day,
'time': UTC,
'area': area,
'product_type': 'reanalysis',
'format': 'netcdf',
'grid': resolution,
},
ncFileName)
# %%

参考:https://cds.climate.copernicus.eu/api-how-to