Access Near Real Time Air Quality data from NASA’s Tempo Mission
Requirements
- Earthdata login (EDL) credentials.
- Concept Collection ID or DOI for the relevant data product.
- Python >= 3.11.
- Mamba-forge (or conda-forge) installed on the machine.
- Familiarity with Jupyter notebooks and Jupyter Lab.
Optional:
- Store all EDL credentials in a
.netrcfile. - Basic knowledge of conda environment installation.
Objectives
To download one month of Near Real Time Air Quality (NO2) data from NASA’s TEMPO mission. The spatial and temporal range is defined by the following parameters:
- Time range: 10/01/2025 – 10/31/2025.
- Spatial range: -124.6330 < longitude < -121, and 46.35 < latitude < 49.83.
To accomplish this goal above, the tutorial will demonstrate how to:
- Authenticate (via earthaccess).
- Search for all available NASA OPeNDAP URLs for a specific NASA collection. The search will further filter by time range.
- Subset with OPeNDAP, by variable name and spatial / temporal range.
Install required python dependencies
In a terminal shell, use mamba or conda forge to install all required dependencies to run this tutorial and activate the environment to run an interactive jupyter notebook on a browser.
$ mamba create -n opendap_env -c conda-forge python=3.12 ipython jupyterlab earthaccess netCDF4
$ mamba activate opendap_env
$ pip install git+https://github.com/pydap/pydap.git
$ jupyter lab
Once in the jupyter notebook environment, import in the first cell all necessary methods that will be used to stream remote data into a local file:
import xarray as xr
import datetime as dt
import earthaccess
import numpy as np
# import pydap-specific tools
from pydap.client import get_cmr_urls, open_url
from pydap.client import to_netcdf as dap_to_netcdf
Finding OPeNDAP URLs with PyDAP
The needed parameter to search for all NO2 Near Real Time data from TEMPO available through OPeNDAP is
Concept Collection ID = C3685896872-LARC_CLOUD
Data from the above collection is a Level 2 data product, meaning SWATH data, and all remote files span different longitudes and latitudes. In this case, the necessary first step is to filter the search for all relevant data URLs by a bounding box. Later on, a further subset by coordinate values will be done by OPeNDAP.
To learn how to find the concept collection id for a specific data product, click the button below:
Below are the required parameters to search for all OPeNDAP URLs using PyDAP's get_cmr_urls:
TEMPO_L2_NRTNO2_ccid = "C3685896872-LARC_CLOUD" #
time_range = [dt.datetime(2025, 10, 1), dt.datetime(2025, 10, 31)] # One month of data
bounding_box = [-124.63309,46.35932, -121, 49.83307] # WSEN area within Seattle PNW
cmr_urls = get_cmr_urls(ccid=TEMPO_L2_NRTNO2_ccid, bounding_box=bounding_box, time_range=time_range, limit=1000) # you can incread the limit of results
What does specifying a bounding box defining an area of interest to the CMR search do?
One may think that the CMR search for OPeNDAP URLs returned the desired subset of data when we provided a bounding box in the CMR query. That is unfortunately not the case. While the CMR did filter using the bounding box, it returns all the OPeNDAP URLs with data that intersects the bounding box. To get ONLY the data in the bounding box, we will have to do some more work as described below.
How can I download only the data within the bounding box?
This can be done with OPeNDAP in a two stage process.
- Download ONLY coordinate data from each granule, to identify the slices needed to download only the data within the bounding box.
- Use the identified slice from each OPeNDAP URL, to stream data into a local file for analysis. PyDAP enables this.
This workflow is demonstrated below. But before we can download any data, one must authenticate via EDL.
EDL Authentication with earthaccess and OPeNDAP
There are various ways to authenticate with NASA, and here we will use earthaccess to retrieve a session object containing all required credentials to access data.
When using earthaccess to "login", you need to define a strategy and you have two options:
- If you already have a
.netrcfile with your EDL credentials stored in your machine, setstrategy="netrc" - If you DO NOT have a
.netrcfile with your EDL credentials, or you are not sure, do insteadstrategy="interactive"
from earthaccess.exceptions import LoginStrategyUnavailable
try:
auth = earthaccess.login(strategy="netrc", persist=True)
except LoginStrategyUnavailable:
# you will be prompted to add your EDL credentials
auth = earthaccess.login(strategy="interactive", persist=True)
# pass Token Authorization to a new Session.
my_session = session=auth.get_session()
The object my_session contains your EDL credentials, and it will be used to retrieve data from OPeNDAP. Moreover, by adding a persist=True as an argument to earthaccess.login, a .netrc is created to stored your EDL credentials in the machine for later reuse.
Use OPeNDAP to subset data by coordinate values and variable names
Subset by variable names
Below we use PyDAP to download the OPeNDAP DAP4 metadata. Pydap will create a Python representation of the dataset, including all variable names and their dimension, along with all metadata attributes associated with each variable. We will use this information to identify the variables of interest.
pyds = open_url(cmr_urls[0], protocol="dap4", session=my_session)
pyds.tree()
.TEMPO_NO2_L2_V04_20251001T141426Z_S004G08.nc
├──product
│ ├──main_data_quality_flag
│ ├──vertical_column_troposphere
│ ├──vertical_column_stratosphere
│ └──vertical_column_troposphere_uncertainty
├──geolocation
│ ├──time
│ ├──longitude
│ ├──latitude
│ ├──solar_azimuth_angle
│ ├──longitude_bounds
│ ├──solar_zenith_angle
│ ├──viewing_zenith_angle
│ ├──latitude_bounds
│ ├──relative_azimuth_angle
│ └──viewing_azimuth_angle
├──support_data
│ ├──surface_pressure
│ ├──wind_speed
│ ├──amf_cloud_pressure
│ ├──vertical_column_total_uncertainty
│ ├──vertical_column_total
│ ├──terrain_height
│ ├──fitted_slant_column_uncertainty
│ ├──amf_troposphere
│ ├──fitted_slant_column
│ ├──gas_profile
│ ├──fitted_slant_column_uncorrected
│ ├──amf_cloud_fraction
│ ├──snow_ice_fraction
│ ├──amf_diagnostic_flag
│ ├──destriping_correction
│ ├──albedo
│ ├──amf_total
│ ├──scattering_weights
│ ├──tropopause_pressure
│ ├──eff_cloud_fraction
│ ├──amf_stratosphere_clear_sky
│ ├──pbl_height
│ ├──amf_total_clear_sky
│ ├──temperature_profile
│ ├──amf_stratosphere
│ ├──amf_troposphere_clear_sky
│ ├──scattering_weights_clear_sky
│ └──ground_pixel_quality_flag
├──qa_statistics
│ ├──fit_rms_residual
│ └──fit_convergence_flag
├──xtrack
└──mirror_step
Stage 1
For SWATH data, Coordinate arrays such as Latitude and Longitude are NOT the dimensions of the dataset. The first step before downloading is identifying the dimensions associated with the coordinate data. We can do that with the PyDAP dataset object already created above.
We want to download the following variables identified by their fully qualifying name:
/geolocation/time/geolocation/longitude/geolocation/latitude
Their dimensions are identified as follows
dims = list(set(pyds['geolocation/latitude'].dims + pyds['geolocation/longitude'].dims + pyds['geolocation/time'].dims))
print("\nnecessary dimensions to download:", dims, "\n")
necessary dimensions to download: ['/mirror_step', '/xtrack']
We can now download the coordinate and dimensions, to identify the subset of interest. Since both latitude and longitude are 2D, we need to identify the slices for both /mirror_step and /xtrack. We demonstrate this below
# Download coordinate data into local directory
dap_to_netcdf(cmr_urls, session=my_session,
keep_variables= ["/xtrack", "/mirror_step",
"/geolocation/time",
"/geolocation/longitude",
"/geolocation/latitude",
],
output_path=output_path) # <--------- you need to define your own output_path
See the code in action below:

Spatial subset of data, specified by dimension slices for each granule
# Use coord data from Bounding Box
minLon, maxLon = bounding_box[0], bounding_box[2]
minLat, maxLat = bounding_box[1], bounding_box[3]
slices = []
# iterate over all downloaded files
# Will use the URL to extract the filename
for url in cmr_urls:
filename = output_path+f"{url.split('/')[-1][:-3]}.nc4"
# Flatten data
ds = xr.merge([xr.open_dataset(filename), xr.open_dataset(filename, group='geolocation')])
ds.load()
# Identify subset from Lon/Lat data per granule
longitude = ds['longitude'].values
latitude = ds['latitude'].values
mask = (
(longitude >= minLon) & (longitude <= maxLon) &
(latitude >= minLat) & (latitude <= maxLat)
)
rows, cols = np.where(mask)
# indexes below
y0, y1 = rows.min(), rows.max()
x0, x1 = cols.min(), cols.max()
slice_ = {
"mirror_step":(y0,y1),
"xtrack": (x0,x1),
}
slices.append({
"mirror_step":(y0,y1),
"xtrack": (x0,x1),
})
print(slices[:4])
[{'mirror_step': (51, 127), 'xtrack': (299, 449)},
{'mirror_step': (51, 127), 'xtrack': (299, 448)},
{'mirror_step': (52, 128), 'xtrack': (300, 450)},
{'mirror_step': (52, 128), 'xtrack': (299, 449)}]

cmr_urls[-1]). The dark rectangle is the area of interest, defined by the indexes x0,x1 and y0,y1 computed per remote URLs and stored in slices list.Finally we clean the downloaded data to avoid filename collisions (NOTE replace output_path with your own!)
$ cd path_to_data_replace_here
$ rm TEMPO_NO2_L2*.nc4
Stage 2
Now we stream ONLY the data of interest, applying subsets by Variable Names and Spatial subsetting using the slices variable we just calculated, to each remote granule.
Vars = dims + [
"/product/main_data_quality_flag",
"/product/vertical_column_troposphere",
"/product/vertical_column_stratosphere",
"/geolocation/time",
"/geolocation/longitude",
"/geolocation/latitude",
"/support_data/wind_speed",
"/support_data/terrain_height",
"/support_data/gas_profile",
"/support_data/pbl_height",
"/support_data/temperature_profile",
]
# Now Stream data
dap_to_netcdf(cmr_urls, session=my_session,
keep_variables = Vars,
dim_slices= slices,
output_path=output_path)
See the code in action below!

References
Liu, X. (2025). TEMPO NO2 tropospheric, stratospheric, and total columns V04 [Data set]. NASA Langley Atmospheric Science Data Center Distributed Active Archive Center. https://doi.org/10.5067/IS-40E/TEMPO/NO2_L2.004
Cite this Tutorial
Jimenez-Urias, M. A. (2026). Access Near-Real-Time (NRT) Air Quality Data From TEMPO Via OPeNDAP. Zenodo. https://doi.org/10.5281/zenodo.19477163
@misc{jimenez_urias_2026_19477163,
author = {Jimenez-Urias, Miguel Angel},
title = {Access Near-Real-Time (NRT) Air Quality Data From
TEMPO Via OPeNDAP
},
month = apr,
year = 2026,
publisher = {Zenodo},
doi = {10.5281/zenodo.19477163},
url = {https://doi.org/10.5281/zenodo.19477163},
}
