•  Icon
  •  Icon
  •  Icon
  •  Icon
  •  Icon

Discovering NASA Data to access via OPeNDAP

Overview

NASA continuously produces massive amounts of scientific data, stores it in self-describing datafiles such as NetCDF and HDF5, and makes it available free of charge through various endpoint APIs. One of such endpoint is OPeNDAP! This means scientists, educators, students, and even science enthusiasts, can access NASA data from its many scientific Missions free of cost via the OPeNDAP protocol! Free access to data is the first step towards scientific reproducibility, but oftentimes the first step of how to find data can be a roadblock. Below, we outline minimal requirements and steps to discover data using tools such as:

  • Common Metadata Repository (CMR)
  • Earthdata Search
  • Earthdata Login
  • PyDAP

In addition, in this very brief tutorial, we will cover the basics related to finding Earthdata from NASA, the user will gain knowledge to be able to understand the following concepts:

  • What is a Granule and how are these identified?
  • What is a Collection, and how are these identified?
  • How can I know the Collection Concept ID, or the DOI of the data of interest?
  • How can I search and download data?

To begin, we will assume the data of interest is from the TEMPO Mission.

Example: TEMPO Mission

Figure 1. TEMPO is the North American component of a global constellation of satellites tracking air quality from geostationary orbits. Credit: Tim Marvel @ NASA.

TEMPO, a collaboration between the Smithsonian and NASA, is the first space-based instrument to measure atmospheric gases that impact air quality across the North American continent every daylight hour at high spatial resolution, at neighborhood scales. To learn more about the TEMPO Mission, the instruments, and general information about the Mission, you can head to the project site[1].

Consider you are interested in learning more about air quality in your city or neighborhood somewhere in North America. What physical quantity can be used to estimate air quality? TEMPO measures NO2 (Nitrogen Dioxide) a gas with highest concentrations in large urban areas[2] and, in particular, around roads where trucks, buses, and cars continuously produce them (albeit these are not the only sources of NO2). You can learn more about NO2, and other air pollutants in online official environmental resources such as the EPA[3].

Once you have identified what kind of data you are interested, the first step to finding data is understanding how data is organized within the NASA ecosystem. That is, learning about data Collections, Granules, and how to discover these.

Data Organization

Within NASA, all publicly available geospatial data has traditionally been held in and accessed from Distributed Active Archives Centers, or DAACs, each of these centers covering distinct domains of expertise. For example, a data user interested in ocean data involving physical variables such as sea surface height anomalies, could search through the Physical Oceanography Active Archive Center (or PODAAC) for data discovery. On the other hand, a data user interested in say chlorophyll A data, would query for such data from the Ocean Biology Active Archive Center (OBDAAC). Data could then be accessed from different services run on-premises. As NASA leads its effort to migrate data to the cloud[4], users can discover data directly from centralized resources such as the Earthdata Search, and download from cloud-services which enable scalability.

But how can one discover the data? Data is organized in Collections and Granules. Broadly, a Granule is the smallest unit of independently managed data[5], and can be associated to an individual file such as an NetCDF or HDF5 file. while a Collection is a set of Granules and project level metadata, that together describe a a major release of a data product and can be accompanied by minor version releases[5]. In our scenario of NO2 data, for example, there are as of January 2026, there are 6 matching Collections (each with distinct identifier). How to distinguish from each of these?

TEMPO produces both gridded data (i.e. Level 3, data interpolated into a regular latitudinal grid) and non-gridded (i.e. Level 2, data at the native resolution of the satellite along a swatch that has 10 km2 ). Level 2 data can be significantly trickier to use to analyze, but it provides a unprecedented Near Real Time (NRT) high spatial resolution. In addition to the difference in processing levels of data publicly available, data can be archived in different Versions, each Version covering distinct time periods. As a result, a user searching for NO2 data from TEMPO may find, as of January 2026, 6 distinct Collections matching the search for TEMPO NO2 tropospheric using the Earthdata Search tool shown below in Figure 2.

Figure 2. Distinct Collections matching the search using the queries: TEMPO NO2 tropospheric, and selecting TEMPO as an added filter under Projects. Tool: Earthdata Search.

To identify what is the Collection of interest from the given 6 options, the user needs to understand:

  1. What level of processing that is needed?
  2. What is the time range of interest?

And so, for example, if the user is interested in any NO2 data in the Sacramento area covering from 2023 to 2025 during summertime, then Near Real Time data (Level 2) may not be what is needed. That is, if any is data will do, one can begin with analysing Level 3 data first. This means the TEMPO NO2 V03 collection, which covers from 2023 to August 2025, could be accessed. If instead, wintertime data is needed as well, then both V03 and V04 TEMPO NO2 data could be used since V04 is also Level 3 data but provides some winter data through 2026. Lets assume that only summertime data is needed, and thus V03 (Version 03 data) is needed.

Once a collection has been identified, the user needs a unique identifier for such collection. Two such unique identifiers are:

  • Concept Collection Id for that Collection.
  • Digital Object Identifier (DOI) for that Collection.

There are many ways to discover both the Concept Collection ID and DOI for any collection, for example these can be found on the respective Mission website, or the DAAC associated with the data product. This information can also be found in the Earthdata Search platform, already used to search for the distinct data collections in Figure 2. Figure 3 below shows a 2-step process to extract the Collection Concept Id or DOI from a Collection, by inspecting additional information related to any Granule from said collection.

Figure 3. Left: View from the Earthdata Search after selecting the V03 Collection. It displays many granules in that collection. The three vertical dots highlighted by a red ellipse enables to inspect additional information for any Granule. Selecting “View details” after a click on the vertical dots will display general information about the Granule (shown in Right), including its Concept Collection Id and DOI of the collection the Granule belongs to.

The Concept Collection Id for TEMPO NO2 data V03 is C2930763263-LARC_CLOUD. Its DOI is 10.5067/IS-40e/TEMPO/NO2_L3.003 (see Figure 3) . Both of these are unique identifiers that can be used to search and discover OPeNDAP URLs, and enable data access. In fact, one can search for the OPeNDAP URL for the Granule in Figure 3, by scrolling through all the information as shown in the right panel. This may work for an individual Granule, but can rapidly become impractical for most purposes. Below we show a way to query the CMR for all OPeNDAP urls that match a search with a few lines of code using Python.

A more useful tool for discovering OPeNDAP URLs once a DOI or Concept Collection ID has been identified, is to query the Common Metadata Repository (CMR)[6] for all related information from the unique identifier. The CMR enables users to programatically query for any information related to publicly available data. The CMR has a well described API and many tools already exist that can query and parse the information from it, given know parameters (in fact, Earthdata Search queries the CMR to provide the results shown in figures 2 and 3). One such tool is PyDAP[7], an open-source python client commonly used to access and stream data from OPeNDAP servers. Below, we provide an example for using PyDAP to query for ALL possible OPeNDAP URL given a Concept Collection Id and a time range of interest.

Finding OPeNDAP URLs with PyDAP

Requirements:

  • A local compute environment with internet connection.
  • Mambaforge installed.
  • Basic knowledge of conda environments.

As a pre-requisite, we create an isolated conda environment with Python 3.12 and install the latest official release of PyDAP. Installing PyDAP using mamba as a package manager will automatically install all its minimal required dependencies.


mamba create -n opendap_env -c conda-forge python=3.12 ipython pydap jupyterlab
mamba activate opendap_env

  

Having installed and “activated” the environment we can access all the correct binaries installed. We can now make use of an interactive compute environment such as Jupyter Notebooks, Jupyter Lab, or even ipython. Below we make use of ipython for simplicity, but the code can be executed from a jupyter lab environment.


from pydap.client import get_cmr_urls
import datetime as dt

  

Now define the filters of interest: Concept Collection Id, and the desired time range:


ccid = "C2930763263-LARC_CLOUD"
time_range = [dt.datetime(2023, 7, 1), dt.datetime(2025, 7, 31)]

  

Lastly, pass these arguments to PyDAP for it to return the first 500 OPeNDAP URLs:


urls = get_cmr_urls(ccid=ccid, time_range=time_range, limit=500)

  

NOTE: We restricted the limit of URLs to 500, but this is an arbitrary limit and it can be set arbitrarily large.

Finally, additional filters can be added, for example bounding_box to further filter the search to return only the relevant OPeNDAP urls. For additional examples, make sure to look at PyDAP’s official documentation for additional examples searching for OPeNDAP URLs using for example a bounding_box to further filter results from Level 2 data[8]!