Access Reanalysis Data From MERRA-2
Requirements
- Earthdata login (EDL) credentials.
- Concept Collection ID or DOI for the relevant data product.
- Python >= 3.11.
- Mamba-forge (or conda-forge) installed on the machine.
- Familiarity with Jupyter notebooks and Jupyter Lab.
Optional:
- Store all EDL credentials in a
.netrcfile. - Basic knowledge of conda environment installation.
Objectives
To download 2 years of Fall data in a region of the Atlantic Ocean. The spatial and temporal range is defined by the following parameters:
- Time range: 9/01 — 12/22, for years 2024 and 2025.
- Spatial range: -75 < longitude < 15, and 15 < latitude < 65.
To accomplish this goal above, the tutorial will demonstrate how to:
- Authenticate (via earthaccess).
- Search for all available NASA OPeNDAP URLs for a specific NASA collection. The search will further filter by time range.
- Subset with OPeNDAP, by variable name and spatial / temporal range.
Install required python dependencies
In a terminal shell, use mamba or conda forge to install all required dependencies to run this tutorial and activate the environment to run an interactive jupyter notebook on a browser.
$ mamba create -n opendap_env -c conda-forge python=3.12 ipython pydap jupyterlab earthaccess netCDF4
$ mamba activate opendap_env
$ jupyter lab
Once in the jupyter notebook environment, import in the first cell all necessary methods that will be used to stream remote data into a local file:
import xarray as xr
import datetime as dt
import earthaccess
import numpy as np
# import pydap-specific tools
from pydap.client import get_cmr_urls, open_url
from pydap.client import to_netcdf as dap_to_netcdf
Finding OPeNDAP URLs with PyDAP
We are interested in MERRA-2 data, hourly data. MERRA-2 has several collections, organized by data products and frequency of data. In this case, we are interested in hourly data from the following collections:
| Short Name | Collection Concept ID | DOI | Variables of Interest |
|---|---|---|---|
| M2I3NPASM | C1276812879-GES_DISC | 10.5067/QBZ6MG944HW0 | U, V, T, EPV, PS |
| M2I3NVCHM | C1276812901-GES_DISC | 10.5067/HO9OVZWF3KW2 | CO, O3 |
To learn how to find the concept collection id for a specific data product, click the button below:
MERRA-2 data is a level 4 data product, meaning all remote files have the same longitude and latitude coordinate arrays. In this case, it is not necessary to filter the search for all relevant data URLs by a bounding box. Any subset by coordinate values will be done by OPeNDAP.
Below are the required parameters to search for all OPeNDAP URLs using PyDAP's get_cmr_urls:
MERRA2_M2I3NPASM_ccid = "C1276812879-GES_DISC" # short name M2I3NPASM
MERRA2_M2I3NVCHM_ccid = "C1276812901-GES_DISC" # short name M2I3NVCHM
# Fall data from 2 years
time_ranges = [[dt.datetime(year, 9, 21), dt.datetime(year, 12, 22)] for year in range(2024, 2026)]
cmr_urls1 = [urls for time in time_ranges for urls in get_cmr_urls(ccid=MERRA2_M2I3NPASM_ccid, time_range=time, limit=1000)]
cmr_urls2 = [urls for time in time_ranges for urls in get_cmr_urls(ccid=MERRA2_M2I3NVCHM_ccid, time_range=time, limit=1000)]
EDL Authentication with earthaccess and OPeNDAP
There are various ways to authenticate with NASA, and here we will use earthaccess to retrieve a session object containing all required credentials to access data.
When using earthaccess to "login", you need to define a strategy and you have two options:
- If you already have a
.netrcfile with your EDL credentials stored in your machine, setstrategy="netrc" - If you DO NOT have a
.netrcfile with your EDL credentials, or you are not sure, do insteadstrategy="interactive"
Regardless the following block tries option 1 above, and it fallbacks to option 2
from earthaccess.exceptions import LoginStrategyUnavailable
try:
auth = earthaccess.login(strategy="netrc", persist=True) # you will be promted to add your EDL credentials
except LoginStrategyUnavailable:
auth = earthaccess.login(strategy="interactive", persist=True)
# pass Token Authorization to a new Session.
my_session = session=auth.get_session()
The object my_session contains your EDL credentials, and it will be used to retrieve data from OPeNDAP. Moreover, by adding a persist=True as an argument to earthaccess.login, a .netrc is created to stored your EDL credentials in the machine for later reuse.
Use OPeNDAP to subset data by coordinate values and variable names
The goal is to run the following code block per Collection:
dap_to_netcdf(
urls,
session=my_session,
output_path = output_path,
dim_slices=dim_slices,
keep_variables=keep_vars,
)
where:
output_path: a user-defined directory path where the files will be stored. If not specified, PyDAP streams data into the current directory.dim_slices: a dictionary where spatial slices are defined.keep_variables: a list declaring all variables in the remote file that will be downloaded.
The API above, dap_to_netcdf is an alias to pydap.client.to_netcdf (see the import!). It works exclusively with the DAP4 protocol.
Below we outline how to define dim_slices and keep_variables using OPeNDAP metadata and downloading only minimal data to identify the correct dimension slices to subset by coordinate values.
Subset by Coordinate Values
In DAP4, the remote server currently does not subset by coordinate value, only by coordinate slices or dimension slices. In this case, these MERRA-2 data products are Level 4 data products, with the dimensions array data being lat and lon. These dimensions arrays are automatically downloaded by Xarray when creating the Xarray Dataset using OPeNDAP metadata (that is not the general case).
# create an Xarray Dataset object. It eagerly downloads all dimension data, which in this case
# it facilitates our workflow since `latitude` & `longitude` are dimension data.
ds1 = xr.open_dataset(cmr_urls1[0].replace("https", "dap4"), engine="pydap", session=my_session)
ds2 = xr.open_dataset(cmr_urls2[0].replace("https", "dap4"), engine="pydap", session=my_session)
# Min/max of lon values & Min/Max of lat values
lat_min, lat_max = 15, 65
lon_min, lon_max = -75, 15
Lon1, Lat1 = ds1['lon'], ds1['lat']
Lon2, Lat2 = ds2['lon'], ds2['lat']
iLon1 = np.where((Lon1>lon_min)&(Lon1 < lon_max))[0]
iLon2 = np.where((Lon2>lon_min)&(Lon2 < lon_max))[0]
iLat1 = np.where((Lat1>lat_min)&(Lat1 < lat_max))[0]
iLat2 = np.where((Lat2>lat_min)&(Lat2 < lat_max))[0]
# CHECK coordinate dim array are the same
assert all(iLon1==iLon2) and all(iLat1==iLat2)
# Homogenize variables
iLon = iLon1
iLat = iLat1
# ======================================================
# Create input argument for Streaming a subset of data
# ======================================================
dim_slices = {
'lon': (iLon[0], iLon[-1]),
'lat': (iLat[0], iLat[-1]),
}
Subset by Variable Names
Having access to the Xarray datasets for each collection, we can now inspect the (OPeNDAP-created) metadata in these Dataset objects and identify the variables of interest.
# Variables from M2I3NPASM
Variables1 = ["/U", "/V", "/T", "/EPV", "/PS"]
dims1 = list(set(["/"+dim for var in Variables1 for dim in ds1[var[1:]].dims]))
# Variables from M2I3NVCHM
Variables2 = ["/CO", "/O3"]
dims2 = list(set(["/"+dim for var in Variables2 for dim in ds2[var[1:]].dims]))
# Add dimensions to Variable lists from each collection
Variables1 += dims1
Variables2 += dims2
Both Variables1 and Variables2 identify the variables to download from each Collection. NOTE: All names must be fully qualifying names meaning they must identify the variable inside the Hierarchical data file. This is particularly useful when the remote file has Groups (which behave similar to a Folder on a local machine), and necessary for the DAP4 protocol (avoids Variable name collisions).
Stream data into a local directory
# For this example, we will place the data in the data directory
output_path = "./data"
We now stream only the data of interest into local files, using PyDAP and the remote OPeNDAP Hyrax data server (all via the DAP4 protocol).
# Stream data from collection M2I3NPASM
dap_to_netcdf(cmr_urls1,
session=my_session,
keep_variables=Variables1,
dim_slices=dim_slices,
output_path=output_path)
# Stream data from collection M2I3NVCHM
dap_to_netcdf(cmr_urls2,
session=my_session,
keep_variables=Variables2,
dim_slices=dim_slices,
output_path=output_path)
See code in action below!



References
Global Modeling And Assimilation Office, & Pawson, S. (2015). MERRA-2 inst3_3d_asm_Np: 3d,3-Hourly,Instantaneous,Pressure-Level,Assimilation,Assimilated Meteorological Fields V5.12.4. NASA Goddard Earth Sciences Data and Information Services Center. https://doi.org/10.5067/QBZ6MG944HW0
Global Modeling And Assimilation Office, & Pawson, S. (2015). MERRA-2 inst3_3d_chm_Nv: 3d,3-Hourly,Instantaneous,Model-Level,Assimilation,Carbon Monoxide and Ozone Mixing Ratio V5.12.4. NASA Goddard Earth Sciences Data and Information Services Center. https://doi.org/10.5067/HO9OVZWF3KW2
Cite this Tutorial
Jimenez-Urias, M. A. (2026). Access Reanalysis Data From MERRA-2 Via OPeNDAP. Zenodo. https://doi.org/10.5281/zenodo.19475636
@misc{jimenez_urias_2026_19475636,
author = {Jimenez-Urias, Miguel Angel},
title = {Access Reanalysis Data From MERRA-2 Via OPeNDAP},
month = apr,
year = 2026,
publisher = {Zenodo},
doi = {10.5281/zenodo.19475636},
url = {https://doi.org/10.5281/zenodo.19475636},
}
