Access Gridded Ocean Chlorophyll A data from NASA’s PACE mission
Requirements
- Earthdata login (EDL) credentials.
- Concept Collection ID or DOI for the relevant data product.
- Python >= 3.11.
- Mamba-forge (or conda-forge) installed on the machine.
- Familiarity with Jupyter notebooks and Jupyter Lab.
Optional:
- Store all EDL credentials in a
.netrcfile. - Basic knowledge of conda environment installation.
Objectives
To download 6 months of Chlorophyll A data in a region of the Atlantic Ocean. The spatial and temporal range is defined by the following parameters:
- Time range: 01/01/2025 – 06/30/2025.
- Spatial range: -96 < longitude < 10, and 6 < latitude < 70.
To accomplish this goal above, the tutorial will demonstrate how to:
- Authenticate (via earthaccess).
- Search for all available NASA OPeNDAP URLs for a specific NASA collection. The search will further filter by time range.
- Subset with OPeNDAP, by variable name and spatial / temporal range.
Install required python dependencies
In a terminal shell, use mamba or conda forge to install all required dependencies to run this tutorial and activate the environment to run an interactive jupyter notebook on a browser.
$ mamba create -n opendap_env -c conda-forge python=3.12 ipython pydap jupyterlab earthaccess netCDF4
$ mamba activate opendap_env
$ jupyter lab
Once in the jupyter notebook environment, import in the first cell all necessary methods that will be used to stream remote data into a local file:
import xarray as xr
import datetime as dt
import earthaccess
import numpy as np
# import pydap-specific tools
from pydap.client import get_cmr_urls, open_url
from pydap.client import to_netcdf as dap_to_netcdf
Finding OPeNDAP URLs with PyDAP
The needed parameter to search for all PACE chlorophyll a data available through OPeNDAP is
Concept Collection ID = C3620140256-OB_CLOUD (This is the Version 3.1)
Chlorophyll data from the above collection is a level 3 data product, meaning all remote files have the same longitude and latitude coordinate arrays. In this case, it is not necessary to filter the search for all relevant data URLs by a bounding box. Any subset by coordinate values will be done by OPeNDAP.
To learn how to find the concept collection id for a specific data product, click the button below:
Below are the required parameters to search for all OPeNDAP URLs using PyDAP's get_cmr_urls:
PACE_ccid = "C3620140256-OB_CLOUD" # version 3.1
time_range=[dt.datetime(2025, 1, 1), dt.datetime(2025, 6, 30)]
# search all granules
cmr_urls = get_cmr_urls(ccid=PACE_ccid, time_range=time_range,limit=1000) # limit by default = 50
# Filter only those associated with 4km resolution
chlor_a_urls = [url for url in cmr_urls if "DAY.CHL.V3_1.chlor_a.4km" in url]
Line 8 above further filters the URLs returned by the CMR, to select only those URLs related to 4km resolution.
EDL Authentication with earthaccess and OPeNDAP
There are various ways to authenticate with NASA, and here we will use earthaccess to retrieve a session object containing all required credentials to access data.
When using earthaccess to "login", you need to define a strategy and you have two options:
- If you already have a
.netrcfile with your EDL credentials stored in your machine, setstrategy="netrc" - If you DO NOT have a
.netrcfile with your EDL credentials, or you are not sure, do insteadstrategy="interactive"
Regardless the following block tries option 1 above, and it fallbacks to option 2.
from earthaccess.exceptions import LoginStrategyUnavailable
try:
auth = earthaccess.login(strategy="netrc", persist=True) # you will be promted to add your EDL credentials
except LoginStrategyUnavailable:
auth = earthaccess.login(strategy="interactive", persist=True)
# pass Token Authorization to a new Session.
my_session = session=auth.get_session()
The object my_session contains your EDL credentials, and it will be used to retrieve data from OPeNDAP. Moreover, by adding a persist=True as an argument to earthaccess.login, a .netrc is created to stored your EDL credentials in the machine for later reuse.
Use OPeNDAP to subset data by coordinate values and variable names
The goal is to run the following code block:
dap_to_netcdf(
chlor_a_urls,
session=my_session,
output_path = output_path,
dim_slices=dim_slices,
keep_variables=keep_vars,
)
where:
output_path: a user-defined directory path where the files will be stored. If not specified, PyDAP streams data into the current directory.dim_slices: a dictionary where spatial slices are defined.keep_variables: a list declaring all variables in the remote file that will be downloaded.
The API above, dap_to_netcdf is an alias to pydap.client.to_netcdf (see the import!). It works exclusively with the DAP4 protocol.
Below we outline how to define dim_slices and keep_variables using OPeNDAP metadata and downloading only minimal data to identify the correct dimension slices to subset by coordinate values.
Subset by variable names
Below we use Xarray to download the OPeNDAP DAP4 metadata, and eagerly download ALL coordinate dimension data which in this case, Xarray it will download all Latitude and Longitude data
keep_vars = ['/lon', '/lat', "/chlor_a"]# variables to download
NOTE: slashes (/) on variable names is a requirement in the DAP4 protocol, since it supports hierarchical data structures such as Groups. Groups act as directory in the remote file. In this case, the file does not have any Group, but the variables still need to be defined with a "full path", in this case the / identifies the "root".
Subset by coordinate values
In DAP4, the remote server currently does not subset by coordinate value, only by coordinate slices or dimension slices. In this case, this PACE data product is a Level 3 data product, and the dimensions are lat and lon, which are automatically downlaoded by Xarray (that is not the general case). Since this data has already been downloaded we can use it to construct the subset that will be applied to ALL remote granules.
# create an Xarray Dataset object. It eagerly downloads all dimension data, which in this case
# it facilitates our workflow since `latitude` & `longitude` are dimension data.
ds = xr.open_dataset(chlor_a_urls[0].replace("https","dap4"), session=my_session, engine='pydap')
# Min/max of lon values
minLon, maxLon = -96, 10
# Min/Max of lat values
minLat, maxLat = 6, 70
lat, lon = ds['lat'].values, ds['lon'].values
# Identify ALL lat/lon data inside the bounding box
iLon = np.where((lon>minLon)&(lon < maxLon))[0] # 1D array
iLat= np.where((lat>minLat)&(lat < maxLat))[0] # 1D array
# define argument used by PyDAP to subset the remote data before download
dim_slices = {'/lat': (iLat[0], iLat[-1]), '/lon': (iLon[0], iLon[-1])}
Stream data into a local directory
# For this example, we will place the data in the data directory
output_path = "./data"
We now stream only the data of interest into local files, using PyDAP and the remote OPeNDAP Hyrax data server (all via the DAP4 protocol).
dap_to_netcdf(
chlor_a_urls,
session=my_session,
keep_variables=keep_vars,
dim_slices=dim_slices,
output_path=output_path)
See demo below

References
NASA Ocean Biology Processing Group. (2025). PACE OCI Level-3 Global Mapped Chlorophyll (CHL) Data, version 3.1 [Data set]. NASA Ocean Biology Distributed Active Archive Center. https://doi.org/10.5067/PACE/OCI/L3M/CHL/3.1
Cite this Tutorial
Jimenez-Urias, M. A. (2026). Access Ocean Chlorophyll A Data From PACE Via OPeNDAP. Zenodo. https://doi.org/10.5281/zenodo.19476833
@misc{jimenez_urias_2026_19476833,
author = {Jimenez-Urias, Miguel Angel},
title = {Access Ocean Chlorophyll A Data From PACE Via
OPeNDAP
},
month = apr,
year = 2026,
publisher = {Zenodo},
doi = {10.5281/zenodo.19476833},
url = {https://doi.org/10.5281/zenodo.19476833},
}
