•  Icon
  •  Icon
  •  Icon
  •  Icon

Access Near Real Time Air Quality data from NASA’s Tempo Mission

Access Near Real Time Air Quality data from NASA’s Tempo Mission
Subset-driven, parallel access to near-real time air quality data from TEMPO via OPeNDAP and Pydap, enabling efficient remote workflows. © NASA.
Geospatial data icon About the Data
The dataset accessed in this tutorial is freely available and contain information on tropospheric and stratospheric Nitrogen dioxide vertical columns in North America at ~10 km2 spatial resolution. Source: NASA Earthdata.

Requirements

  • Earthdata login (EDL) credentials.
  • Concept Collection ID or DOI for the relevant data product.
  • Python >= 3.11.
  • Mamba-forge (or conda-forge) installed on the machine.
  • Familiarity with Jupyter notebooks and Jupyter Lab.

Optional:

  • Store all EDL credentials in a .netrc file.
  • Basic knowledge of conda environment installation.

Objectives

To download one month of Near Real Time Air Quality (NO2) data from NASA’s TEMPO mission. The spatial and temporal range is defined by the following parameters:

  • Time range: 10/01/2025 – 10/31/2025.
  • Spatial range: -124.6330 < longitude < -121, and 46.35 < latitude < 49.83.

To accomplish this goal above, the tutorial will demonstrate how to:

  • Authenticate (via earthaccess).
  • Search for all available NASA OPeNDAP URLs for a specific NASA collection. The search will further filter by time range.
  • Subset with OPeNDAP, by variable name and spatial / temporal range.

Install required python dependencies

In a terminal shell, use mamba or conda forge to install all required dependencies to run this tutorial and activate the environment to run an interactive jupyter notebook on a browser.

Terminal
$ mamba create -n opendap_env -c conda-forge python=3.12 ipython jupyterlab earthaccess netCDF4
$ mamba activate opendap_env
$ pip install git+https://github.com/pydap/pydap.git
$ jupyter lab

Once in the jupyter notebook environment, import in the first cell all necessary methods that will be used to stream remote data into a local file:

Python
import xarray as xr
import datetime as dt
import earthaccess
import numpy as np

# import pydap-specific tools
from pydap.client import get_cmr_urls, open_url
from pydap.client import to_netcdf as dap_to_netcdf

Finding OPeNDAP URLs with PyDAP

The needed parameter to search for all NO2 Near Real Time data from TEMPO available through OPeNDAP is

Concept Collection ID = C3685896872-LARC_CLOUD

Data from the above collection is a Level 2 data product, meaning SWATH data, and all remote files span different longitudes and latitudes. In this case, the necessary first step is to filter the search for all relevant data URLs by a bounding box. Later on, a further subset by coordinate values will be done by OPeNDAP.

To learn how to find the concept collection id for a specific data product, click the button below:

Below are the required parameters to search for all OPeNDAP URLs using PyDAP's get_cmr_urls:

Python
TEMPO_L2_NRTNO2_ccid = "C3685896872-LARC_CLOUD" # 
time_range = [dt.datetime(2025, 10, 1), dt.datetime(2025, 10, 31)] # One month of data

bounding_box = [-124.63309,46.35932,  -121, 49.83307] # WSEN area within Seattle PNW

cmr_urls = get_cmr_urls(ccid=TEMPO_L2_NRTNO2_ccid, bounding_box=bounding_box, time_range=time_range, limit=1000) # you can incread the limit of results

What does specifying a bounding box defining an area of interest to the CMR search do?

One may think that the CMR search for OPeNDAP URLs returned the desired subset of data when we provided a bounding box in the CMR query. That is unfortunately not the case. While the CMR did filter using the bounding box, it returns all the OPeNDAP URLs with data that intersects the bounding box. To get ONLY the data in the bounding box, we will have to do some more work as described below.

How can I download only the data within the bounding box?

This can be done with OPeNDAP in a two stage process.

  1. Download ONLY coordinate data from each granule, to identify the slices needed to download only the data within the bounding box.
  2. Use the identified slice from each OPeNDAP URL, to stream data into a local file for analysis. PyDAP enables this.

This workflow is demonstrated below. But before we can download any data, one must authenticate via EDL.

EDL Authentication with earthaccess and OPeNDAP

There are various ways to authenticate with NASA, and here we will use earthaccess to retrieve a session object containing all required credentials to access data.

When using earthaccess to "login", you need to define a strategy and you have two options:

  1. If you already have a .netrc file with your EDL credentials stored in your machine, set strategy="netrc"
  2. If you DO NOT have a .netrc file with your EDL credentials, or you are not sure, do instead strategy="interactive"

Python
from earthaccess.exceptions import LoginStrategyUnavailable
try:
    auth = earthaccess.login(strategy="netrc", persist=True) 
except LoginStrategyUnavailable:
    # you will be prompted to add your EDL credentials
    auth = earthaccess.login(strategy="interactive", persist=True) 

# pass Token Authorization to a new Session.
my_session = session=auth.get_session()

The object my_session contains your EDL credentials, and it will be used to retrieve data from OPeNDAP. Moreover, by adding a persist=True as an argument to earthaccess.login, a .netrc is created to stored your EDL credentials in the machine for later reuse.

Use OPeNDAP to subset data by coordinate values and variable names

Subset by variable names

Below we use PyDAP to download the OPeNDAP DAP4 metadata. Pydap will create a Python representation of the dataset, including all variable names and their dimension, along with all metadata attributes associated with each variable. We will use this information to identify the variables of interest.

Python
pyds = open_url(cmr_urls[0], protocol="dap4", session=my_session)
pyds.tree()
HDF5 Tree
.TEMPO_NO2_L2_V04_20251001T141426Z_S004G08.nc
├──product
│  ├──main_data_quality_flag
│  ├──vertical_column_troposphere
│  ├──vertical_column_stratosphere
│  └──vertical_column_troposphere_uncertainty
├──geolocation
│  ├──time
│  ├──longitude
│  ├──latitude
│  ├──solar_azimuth_angle
│  ├──longitude_bounds
│  ├──solar_zenith_angle
│  ├──viewing_zenith_angle
│  ├──latitude_bounds
│  ├──relative_azimuth_angle
│  └──viewing_azimuth_angle
├──support_data
│  ├──surface_pressure
│  ├──wind_speed
│  ├──amf_cloud_pressure
│  ├──vertical_column_total_uncertainty
│  ├──vertical_column_total
│  ├──terrain_height
│  ├──fitted_slant_column_uncertainty
│  ├──amf_troposphere
│  ├──fitted_slant_column
│  ├──gas_profile
│  ├──fitted_slant_column_uncorrected
│  ├──amf_cloud_fraction
│  ├──snow_ice_fraction
│  ├──amf_diagnostic_flag
│  ├──destriping_correction
│  ├──albedo
│  ├──amf_total
│  ├──scattering_weights
│  ├──tropopause_pressure
│  ├──eff_cloud_fraction
│  ├──amf_stratosphere_clear_sky
│  ├──pbl_height
│  ├──amf_total_clear_sky
│  ├──temperature_profile
│  ├──amf_stratosphere
│  ├──amf_troposphere_clear_sky
│  ├──scattering_weights_clear_sky
│  └──ground_pixel_quality_flag
├──qa_statistics
│  ├──fit_rms_residual
│  └──fit_convergence_flag
├──xtrack
└──mirror_step

Stage 1

For SWATH data, Coordinate arrays such as Latitude and Longitude are NOT the dimensions of the dataset. The first step before downloading is identifying the dimensions associated with the coordinate data. We can do that with the PyDAP dataset object already created above.

We want to download the following variables identified by their fully qualifying name:

  • /geolocation/time
  • /geolocation/longitude
  • /geolocation/latitude

Their dimensions are identified as follows

Python
dims = list(set(pyds['geolocation/latitude'].dims + pyds['geolocation/longitude'].dims + pyds['geolocation/time'].dims))
print("\nnecessary dimensions to download:", dims, "\n")
Jupyter cell
  necessary dimensions to download: ['/mirror_step', '/xtrack'] 

We can now download the coordinate and dimensions, to identify the subset of interest. Since both latitude and longitude are 2D, we need to identify the slices for both /mirror_step and /xtrack. We demonstrate this below

Python
# Download coordinate data into local directory
dap_to_netcdf(cmr_urls, session=my_session, 
              keep_variables= ["/xtrack", "/mirror_step",
                               "/geolocation/time",
                               "/geolocation/longitude",
                               "/geolocation/latitude", 
              ],
              output_path=output_path) # <--------- you need to define your own output_path

See the code in action below:

Spatial subset of data, specified by dimension slices for each granule

Python
# Use coord data from Bounding Box
minLon, maxLon = bounding_box[0], bounding_box[2]
minLat, maxLat = bounding_box[1], bounding_box[3]

slices = []
# iterate over all downloaded files
# Will use the URL to extract the filename
for url in cmr_urls:
    filename = output_path+f"{url.split('/')[-1][:-3]}.nc4"
    # Flatten data 
    ds = xr.merge([xr.open_dataset(filename), xr.open_dataset(filename, group='geolocation')])
    ds.load()
    # Identify subset from Lon/Lat data per granule
    
    longitude = ds['longitude'].values
    latitude = ds['latitude'].values

    mask = (
        (longitude >= minLon) & (longitude <= maxLon) &
        (latitude >= minLat) & (latitude <= maxLat)
    )

    rows, cols = np.where(mask)
    # indexes below
    y0, y1 = rows.min(), rows.max()
    x0, x1 = cols.min(), cols.max()
    slice_ = {
        "mirror_step":(y0,y1),
        "xtrack": (x0,x1),
        }
    slices.append({
        "mirror_step":(y0,y1),
        "xtrack": (x0,x1),
        })
print(slices[:4])
Jupyter Cell Output

[{'mirror_step': (51, 127), 'xtrack': (299, 449)},
 {'mirror_step': (51, 127), 'xtrack': (299, 448)},
 {'mirror_step': (52, 128), 'xtrack': (300, 450)},
 {'mirror_step': (52, 128), 'xtrack': (299, 449)}]

Figure 2. Longitude and Latitude for entire track from the last remote granule (cmr_urls[-1]). The dark rectangle is the area of interest, defined by the indexes x0,x1 and y0,y1 computed per remote URLs and stored in slices list.

Finally we clean the downloaded data to avoid filename collisions (NOTE replace output_path with your own!)

Terminal
$ cd path_to_data_replace_here
$ rm TEMPO_NO2_L2*.nc4

Stage 2

Now we stream ONLY the data of interest, applying subsets by Variable Names and Spatial subsetting using the slices variable we just calculated, to each remote granule.

Python
Vars = dims + [
    "/product/main_data_quality_flag",
    "/product/vertical_column_troposphere",
    "/product/vertical_column_stratosphere",
    "/geolocation/time",
    "/geolocation/longitude",
    "/geolocation/latitude",
    "/support_data/wind_speed",
    "/support_data/terrain_height",
    "/support_data/gas_profile",
    "/support_data/pbl_height",
    "/support_data/temperature_profile",
]

# Now Stream data
dap_to_netcdf(cmr_urls, session=my_session, 
              keep_variables = Vars,
              dim_slices= slices,
              output_path=output_path)

See the code in action below!

References

Liu, X. (2025). TEMPO NO2 tropospheric, stratospheric, and total columns V04 [Data set]. NASA Langley Atmospheric Science Data Center Distributed Active Archive Center. https://doi.org/10.5067/IS-40E/TEMPO/NO2_L2.004

Cite this Tutorial

Citation
Jimenez-Urias, M. A. (2026). Access Near-Real-Time (NRT) Air Quality Data From TEMPO Via OPeNDAP. Zenodo. https://doi.org/10.5281/zenodo.19477163
BibTeX
@misc{jimenez_urias_2026_19477163,
  author       = {Jimenez-Urias, Miguel Angel},
  title        = {Access Near-Real-Time (NRT) Air Quality Data From
                   TEMPO Via OPeNDAP
                  },
  month        = apr,
  year         = 2026,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.19477163},
  url          = {https://doi.org/10.5281/zenodo.19477163},
}