•  Icon
  •  Icon
  •  Icon
  •  Icon

Access Gridded Ocean Chlorophyll A data from NASA’s PACE mission

Access Gridded Ocean Chlorophyll A data from NASA’s PACE mission
Subset-driven, parallel access to ocean chlorophyll data from PACE via OPeNDAP and Pydap, enabling efficient remote workflows
Geospatial data icon About the Data
The dataset accessed in this tutorial is freely available and provides global, gridded composites of surface chlorophyll-a concentration (a proxy for phytoplankton biomass). Source: NASA Earthdata.

Requirements

  • Earthdata login (EDL) credentials.
  • Concept Collection ID or DOI for the relevant data product.
  • Python >= 3.11.
  • Mamba-forge (or conda-forge) installed on the machine.
  • Familiarity with Jupyter notebooks and Jupyter Lab.

Optional:

  • Store all EDL credentials in a .netrc file.
  • Basic knowledge of conda environment installation.

Objectives

To download 6 months of Chlorophyll A data in a region of the Atlantic Ocean. The spatial and temporal range is defined by the following parameters:

  • Time range: 01/01/2025 – 06/30/2025.
  • Spatial range: -96 < longitude < 10, and 6 < latitude < 70.

To accomplish this goal above, the tutorial will demonstrate how to:

  • Authenticate (via earthaccess).
  • Search for all available NASA OPeNDAP URLs for a specific NASA collection. The search will further filter by time range.
  • Subset with OPeNDAP, by variable name and spatial / temporal range.

Install required python dependencies

In a terminal shell, use mamba or conda forge to install all required dependencies to run this tutorial and activate the environment to run an interactive jupyter notebook on a browser.

Terminal
$ mamba create -n opendap_env -c conda-forge python=3.12 ipython pydap jupyterlab earthaccess netCDF4
$ mamba activate opendap_env
$ jupyter lab

Once in the jupyter notebook environment, import in the first cell all necessary methods that will be used to stream remote data into a local file:

Python
import xarray as xr
import datetime as dt
import earthaccess
import numpy as np

# import pydap-specific tools
from pydap.client import get_cmr_urls, open_url
from pydap.client import to_netcdf as dap_to_netcdf

Finding OPeNDAP URLs with PyDAP

The needed parameter to search for all PACE chlorophyll a data available through OPeNDAP is

Concept Collection ID = C3620140256-OB_CLOUD (This is the Version 3.1)

Chlorophyll data from the above collection is a level 3 data product, meaning all remote files have the same longitude and latitude coordinate arrays. In this case, it is not necessary to filter the search for all relevant data URLs by a bounding box. Any subset by coordinate values will be done by OPeNDAP.

To learn how to find the concept collection id for a specific data product, click the button below:

Below are the required parameters to search for all OPeNDAP URLs using PyDAP's get_cmr_urls:

Python
PACE_ccid = "C3620140256-OB_CLOUD" # version 3.1 
time_range=[dt.datetime(2025, 1, 1), dt.datetime(2025, 6, 30)]

# search all granules
cmr_urls = get_cmr_urls(ccid=PACE_ccid, time_range=time_range,limit=1000) # limit by default = 50

# Filter only those associated with 4km resolution
chlor_a_urls = [url for url in cmr_urls if "DAY.CHL.V3_1.chlor_a.4km" in url]


Line 8 above further filters the URLs returned by the CMR, to select only those URLs related to 4km resolution.

EDL Authentication with earthaccess and OPeNDAP

There are various ways to authenticate with NASA, and here we will use earthaccess to retrieve a session object containing all required credentials to access data.

When using earthaccess to "login", you need to define a strategy and you have two options:

  1. If you already have a .netrc file with your EDL credentials stored in your machine, set strategy="netrc"
  2. If you DO NOT have a .netrc file with your EDL credentials, or you are not sure, do instead strategy="interactive"

Regardless the following block tries option 1 above, and it fallbacks to option 2.

Python
from earthaccess.exceptions import LoginStrategyUnavailable
try:
    auth = earthaccess.login(strategy="netrc", persist=True) # you will be promted to add your EDL credentials
except LoginStrategyUnavailable:
    auth = earthaccess.login(strategy="interactive", persist=True)

# pass Token Authorization to a new Session.
my_session = session=auth.get_session()

The object my_session contains your EDL credentials, and it will be used to retrieve data from OPeNDAP. Moreover, by adding a persist=True as an argument to earthaccess.login, a .netrc is created to stored your EDL credentials in the machine for later reuse.

Use OPeNDAP to subset data by coordinate values and variable names

The goal is to run the following code block:

Python
dap_to_netcdf(
    chlor_a_urls,
    session=my_session, 
    output_path = output_path, 
    dim_slices=dim_slices, 
    keep_variables=keep_vars,
)

where:

  • output_path: a user-defined directory path where the files will be stored. If not specified, PyDAP streams data into the current directory.
  • dim_slices: a dictionary where spatial slices are defined.
  • keep_variables: a list declaring all variables in the remote file that will be downloaded.

The API above, dap_to_netcdf is an alias to pydap.client.to_netcdf (see the import!). It works exclusively with the DAP4 protocol.

Below we outline how to define dim_slices and keep_variables using OPeNDAP metadata and downloading only minimal data to identify the correct dimension slices to subset by coordinate values.

Subset by variable names

Below we use Xarray to download the OPeNDAP DAP4 metadata, and eagerly download ALL coordinate dimension data which in this case, Xarray it will download all Latitude and Longitude data

Python
keep_vars = ['/lon', '/lat', "/chlor_a"]# variables to download

NOTE: slashes (/) on variable names is a requirement in the DAP4 protocol, since it supports hierarchical data structures such as Groups. Groups act as directory in the remote file. In this case, the file does not have any Group, but the variables still need to be defined with a "full path", in this case the / identifies the "root".

Subset by coordinate values

In DAP4, the remote server currently does not subset by coordinate value, only by coordinate slices or dimension slices. In this case, this PACE data product is a Level 3 data product, and the dimensions are lat and lon, which are automatically downlaoded by Xarray (that is not the general case). Since this data has already been downloaded we can use it to construct the subset that will be applied to ALL remote granules.

Python

# create an Xarray Dataset object. It eagerly downloads all dimension data, which in this case
# it facilitates our workflow since `latitude` & `longitude` are dimension data.
ds = xr.open_dataset(chlor_a_urls[0].replace("https","dap4"), session=my_session, engine='pydap')


# Min/max of lon values
minLon, maxLon = -96, 10
# Min/Max of lat values
minLat, maxLat = 6, 70

lat, lon = ds['lat'].values, ds['lon'].values

# Identify ALL lat/lon data inside the bounding box
iLon = np.where((lon>minLon)&(lon < maxLon))[0] # 1D array
iLat= np.where((lat>minLat)&(lat < maxLat))[0] # 1D array

# define argument used by PyDAP to subset the remote data before download
dim_slices = {'/lat': (iLat[0], iLat[-1]), '/lon': (iLon[0], iLon[-1])}

Stream data into a local directory

Python
# For this example, we will place the data in the data directory
output_path = "./data"

We now stream only the data of interest into local files, using PyDAP and the remote OPeNDAP Hyrax data server (all via the DAP4 protocol).

Python

dap_to_netcdf(
    chlor_a_urls, 
    session=my_session, 
    keep_variables=keep_vars, 
    dim_slices=dim_slices, 
    output_path=output_path)

See demo below

References

NASA Ocean Biology Processing Group. (2025). PACE OCI Level-3 Global Mapped Chlorophyll (CHL) Data, version 3.1 [Data set]. NASA Ocean Biology Distributed Active Archive Center. https://doi.org/10.5067/PACE/OCI/L3M/CHL/3.1

Cite this Tutorial

Citation
Jimenez-Urias, M. A. (2026). Access Ocean Chlorophyll A Data From PACE Via OPeNDAP. Zenodo. https://doi.org/10.5281/zenodo.19476833
BibTeX
@misc{jimenez_urias_2026_19476833,
  author       = {Jimenez-Urias, Miguel Angel},
  title        = {Access Ocean Chlorophyll A Data From PACE Via
                   OPeNDAP
                  },
  month        = apr,
  year         = 2026,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.19476833},
  url          = {https://doi.org/10.5281/zenodo.19476833},
}