Access Cloud and Aerosol Lidar data from NASA’s Calipso
Requirements
- Earthdata login (EDL) credentials.
- Concept Collection ID or DOI for the relevant data product.
- Python >= 3.11.
- Mamba-forge (or conda-forge) installed on the machine.
- Familiarity with Jupyter notebooks and Jupyter Lab.
Optional:
- Store all EDL credentials in a
.netrcfile. - Basic knowledge of conda environment installation.
Objectives
To download ten years of springtime Cloud and Aerosol data from NASA’s Calipso Level 2 Collection in the area around Isla de Guadalupe in Mexico. The spatial and temporal range is defined by the following parameters:
- Time range: 10 years data spanning spring time 03/01 – 05/31 (2013-2023).
- Spatial range: -121 < longitude < -115, and 26.5 < latitude < 31
- Daytime data only.
The Data Challenge: Data stored in HDF4
All of our data of interest has Level 2 processing, meaning each file spans geographical location that follows the track of the satellite (i.e., swath data). This adds a layer of complication since we need to subset each remote file by its own subset slice. In addition to this, all the data from this collection is stored in a pre-cloud file format not optimized for Cloud Access: HDF4. With OPeNDAP, the access pattern remains unaltered, independent of the remote source file format. But access from HDF4 data can be slower compared to more modern HDF5, despite the fact that all data is in the cloud, since these formats were optimized for random access and for maximum write speed.
To accomplish the goals above, the tutorial will demonstrate how to:
- Authenticate (via earthaccess).
- Search for all available NASA OPeNDAP URLs for a specific NASA collection. The search will further filter by time range.
- Subset with OPeNDAP, by variable name and spatial / temporal range.
Install required python dependencies
In a terminal shell, use mamba or conda forge to install all required dependencies to run this tutorial and activate the environment to run an interactive jupyter notebook on a browser.
$ mamba create -n opendap_env -c conda-forge python=3.12 ipython jupyterlab earthaccess netCDF4
$ mamba activate opendap_env
$ pip install git+https://github.com/pydap/pydap.git
$ jupyter lab
Once in the jupyter notebook environment, import in the first cell all necessary methods that will be used to stream remote data into a local file:
import xarray as xr
import datetime as dt
import earthaccess
import numpy as np
# import pydap-specific tools
from pydap.client import get_cmr_urls, open_url
from pydap.client import to_netcdf as dap_to_netcdf
The needed parameter to search for all cloud and aerosol lidar data from Calipso that is available through OPeNDAP is
Concept Collection ID = C3463063995-LARC_CLOUD
To learn how to find the concept collection id for a specific data product, click the button below:
Data from the above collection is a Level 2 data product, meaning SWATH data, and all remote files span different longitudes and latitudes. In this case, the necessary first step is to filter the search for all relevant data URLs by a bounding box. Later on, a further subset by coordinate values will be done by OPeNDAP.
Below are the required parameters to search for all OPeNDAP URLs using PyDAP's get_cmr_urls:
# look up Concept Collection ID
Calipso_L2_ccid = "C3463063995-LARC_CLOUD" #
bbox = [-121,26.5,-115,31] # [west, south, east, north]
# 10 years of Spring time data
time_ranges = [[dt.datetime(year, 3, 1), dt.datetime(year, 5, 31)] for year in range(2013, 2024)]
CMR_URLs = []
args = {
"ccid": Calipso_L2_ccid,
"bounding_box": bbox,
"limit": 1000,
}
cmr_urls = [url for time_range in time_ranges for url in get_cmr_urls(**args, time_range=time_range)]
What does specifying a bounding box defining an area of interest to the CMR search do?
The CMR search filters using the bounding box, returning all the OPeNDAP URLs with data that intersects the bounding box. To ONLY get data within the bounding box, we will have to do some more work as described below.
How to download only the data within the bounding box?
With OPeNDAP it is a two stage process.
- Download coordinate data ONLY from each granule, to identify the slices needed to download only the data within the bounding box.
- Use the identified slice from each OPeNDAP URL, to stream data into a local file for analysis. PyDAP enables this.
This workflow is demonstrated below. But before we can download any data, one must authenticate via EDL.
EDL Authentication with earthaccess and OPeNDAP
There are various ways to authenticate with NASA, and here we will use earthaccess to retrieve a session object containing all required credentials to access data.
When using earthaccess to "login", you need to define a strategy and you have two options:
- If you already have a
.netrcfile with your EDL credentials stored in your machine, setstrategy="netrc" - If you DO NOT have a
.netrcfile with your EDL credentials, or you are not sure, do insteadstrategy="interactive"
from earthaccess.exceptions import LoginStrategyUnavailable
try:
auth = earthaccess.login(strategy="netrc", persist=True)
except LoginStrategyUnavailable:
# you will be prompted to add your EDL credentials
auth = earthaccess.login(strategy="interactive", persist=True)
# pass Token Authorization to a new Session.
# This will be used to download serialized binary data from OPeNDAP
my_session = session=auth.get_session()
Use OPeNDAP to subset data by coordinate values and variable names
Subset by variable names
This requires access to ONLY the metadata of the remote file.
Below we use PyDAP to download the OPeNDAP DAP4 metadata only. Pydap will create a Python representation of the dataset, including all variable names and their dimension, along with all metadata attributes associated with each variable. We will use this information to identify the variables of interest.
pyds = open_url(cmr_urls[0], protocol="dap4", session=my_session)
pyds.tree()
.CAL_LID_L2_01kmCLay-Standard-V5-00.2013-03-01T20-43-09ZD.hdf
├──Lidar_Surface_Detection
│ ├──Surface_Top_Altitude_532
│ ├──Surface_Base_Altitude_532
│ ├──Surface_Integrated_Attenuated_Backscatter_532
│ ├──Surface_532_Integrated_Depolarization_Ratio
│ ├──Surface_532_Integrated_Attenuated_Color_Ratio
│ ├──Surface_Detection_Flags_532
│ ├──Surface_Overlying_Integrated_Attenuated_Backscatter_532
│ ├──Surface_Scaled_RMS_Background_532
│ ├──Surface_Peak_Signal_532
│ ├──Surface_Detections_333m_532
│ ├──Surface_Top_Altitude_1064
│ ├──Surface_Base_Altitude_1064
│ ├──Surface_Integrated_Attenuated_Backscatter_1064
│ ├──Surface_1064_Integrated_Depolarization_Ratio
│ ├──Surface_1064_Integrated_Attenuated_Color_Ratio
│ ├──Surface_Detection_Flags_1064
│ ├──Surface_Overlying_Integrated_Attenuated_Backscatter_1064
│ ├──Surface_Scaled_RMS_Background_1064
│ ├──Surface_Peak_Signal_1064
│ └──Surface_Detections_333m_1064
├──Ocean_Derived_Column_Optical_Depth
│ ├──ODCOD_Effective_Optical_Depth_532
│ ├──ODCOD_Effective_Optical_Depth_532_Uncertainty
│ ├──ODCOD_QC_Flag_532
│ ├──ODCOD_Surface_Wind_Speeds_10m
│ └──ODCOD_Surface_Wind_Speed_Correction
├──Lidar_Data_Altitudes
├──Profile_ID
├──Latitude
├──Longitude
├──Profile_Time
├──Profile_UTC_Time
├──Day_Night_Flag
├──Off_Nadir_Angle
├──Solar_Zenith_Angle
├──Solar_Azimuth_Angle
├──Scattering_Angle
├──Spacecraft_Position
├──Parallel_Column_Reflectance_532
├──Parallel_Column_Reflectance_Uncertainty_532
├──Perpendicular_Column_Reflectance_532
├──Perpendicular_Column_Reflectance_Uncertainty_532
├──Column_Integrated_Attenuated_Backscatter_532
├──Column_IAB_Cumulative_Probability
├──Column_Particulate_Optical_Depth_Above_Opaque_Water_Cloud_532
├──Column_Particulate_Optical_Depth_Above_Opaque_Water_Cloud_Uncertainty_532
├──Tropopause_Height
├──Tropopause_Temperature
├──IGBP_Surface_Type
├──Snow_Ice_Surface_Type
├──DEM_Surface_Elevation
├──Minimum_Laser_Energy_532
├──Low_Energy_Mitigation_Column_QC_Flag
├──Number_Layers_Found
├──Scene_Flag
├──Low_Energy_Mitigation_Feature_QC_Flag
├──Layer_Top_Altitude
├──Layer_Base_Altitude
├──Layer_Top_Pressure
├──Midlayer_Pressure
├──Layer_Base_Pressure
├──Layer_Top_Temperature
├──Layer_Centroid_Temperature
├──Midlayer_Temperature
├──Layer_Base_Temperature
├──Opacity_Flag
├──Attenuated_Scattering_Ratio_Statistics_532
├──Attenuated_Backscatter_Statistics_532
├──Integrated_Attenuated_Backscatter_532
├──Integrated_Attenuated_Backscatter_Uncertainty_532
├──Attenuated_Backscatter_Statistics_1064
├──Integrated_Attenuated_Backscatter_1064
├──Integrated_Attenuated_Backscatter_Uncertainty_1064
├──Volume_Depolarization_Ratio_Statistics
├──Integrated_Volume_Depolarization_Ratio
├──Integrated_Volume_Depolarization_Ratio_Uncertainty
├──Attenuated_Total_Color_Ratio_Statistics
├──Integrated_Attenuated_Total_Color_Ratio
├──Integrated_Attenuated_Total_Color_Ratio_Uncertainty
├──Overlying_Integrated_Attenuated_Backscatter_532
├──Layer_IAB_QA_Factor
├──Feature_Classification_Flags
├──CAD_Score
├──Initial_CAD_Score
├──metadata.Product_ID
├──metadata.Date_Time_at_Granule_Start
├──metadata.Date_Time_at_Granule_End
├──metadata.Date_Time_of_Production
├──metadata.Number_of_Good_Profiles
├──metadata.Number_of_Bad_Profiles
├──metadata.Initial_Subsatellite_Latitude
├──metadata.Initial_Subsatellite_Longitude
├──metadata.Final_Subsatellite_Latitude
├──metadata.Final_Subsatellite_Longitude
├──metadata.Orbit_Number_at_Granule_Start
├──metadata.Orbit_Number_at_Granule_End
├──metadata.Orbit_Number_Change_Time
├──metadata.Path_Number_at_Granule_Start
├──metadata.Path_Number_at_Granule_End
├──metadata.Path_Number_Change_Time
├──metadata.Lidar_L1_Production_Date_Time
├──metadata.Number_of_Single_Shot_Records_in_File
├──metadata.Number_of_Average_Records_in_File
├──metadata.Number_of_Features_Found
├──metadata.Number_of_Cloud_Features_Found
├──metadata.Number_of_Aerosol_Features_Found
├──metadata.Number_of_Indeterminate_Features_Found
├──metadata.Ocean_Fresnel_Reflection_Coefficient_532
├──metadata.MERRA2_Wind_Uncertainty
├──metadata.AMSR_Wind_Correction_Uncertainty
├──metadata.Lidar_Data_Altitudes
├──metadata.GEOS_Version
├──metadata.GMAO_Files_Used
├──metadata.Classifier_Coefficients_Version_Number
├──metadata.Classifier_Coefficients_Version_Date
└──metadata.Production_Script
Stage 1
For SWATH data, Coordinate arrays such as Latitude and Longitude are NOT the dimensions of the dataset. The first step before downloading is identifying the dimensions associated with the coordinate data. We can do that with the PyDAP dataset object already created above.
We want to download the following variables identified by their fully qualifying name:
/latitude/longitude/Day_Night_Flag
Before downloading, we need to identify any dimension that is also array of the dataset. (There can be dimensions that are only named, meaning these variables exist only in the metadata but have no data associated in the file). The PyDAP dataset hold all the relevant information to identify any such variable in the remote file.
DIMS = list(set(pyds['Latitude'].dims + pyds['Longitude'].dims + pyds['Day_Night_Flag'].dims))
dims = [dim for dim in DIMS if dim.split("/")[-1] in pyds[("/").join(DIMS[1].split('/')[:-1])].variables()]
print("Dimensions that are also arrays: ", dims)
Dimensions that are also arrays: []
With this confirmation, we do not need any extra variables related to any coordinate data.
# Download coordinate data into local directory
dap_to_netcdf(cmr_urls, session=my_session,
keep_variables= ["/Longitude", "Day_Night_Flag"],
output_path=output_path) # <--------- you need to define your own output_path
Spatial subset of data, specified by dimension slices for each granule
# Get data from Bounding Box
minLon, maxLon = bbox[0], bbox[2]
slices=[]
final_urls = []
for url in cmr_urls:
filename = output_path+f"{url.split('/')[-1][:-4]}.nc4"
dt1 = xr.open_datatree(filename).load()
daytime_flag = dt1['Day_Night_Flag']
# find index /data_01/longitude
longitude = dt1['/Longitude']
mask = (longitude >= minLon) & (longitude <= maxLon)
idx = np.nonzero(mask.values)[0]
daytime_flag = dt1['Day_Night_Flag'].isel(Record_Number=slice(idx[0], idx[-1]))==1
if all(daytime_flag==0):
final_urls.append(url)
slices.append({"/Record_Number":(idx[0], idx[-1])})
print(f"\nOnly {len(final_urls)} out of the {len(cmr_urls)} remote files satisfy our Daylight Criteria!\n")
print("Sample subsetting slices:")
slices[:4]
Only 268 out of the 550 remote files satisfy our Daylight Criteria!
Sample subsetting slices:
[{'/Record_Number': (12882, 15472)},
{'/Record_Number': (14230, 16373)},
{'/Record_Number': (11783, 14565)},
{'/Record_Number': (13218, 15585)}]

slice.Finally we clean the downloaded data to avoid filename collisions (NOTE replace output_path with your own!)
$ cd output_path
$ rm CAL_LID_L2*.nc4
Stage 2
Now we stream ONLY the data of interest, applying subsets by Variable Names and Spatial subsetting using the slices variable we just calculated, to each remote granule that meets our daylight criteria.
# Define variables to download
# Will Download a total of 34 Variables!
keep_variables = [
'/Lidar_Surface_Detection', # <----- ALL Variables inside Group
"/Ocean_Derived_Column_Optical_Depth", # < -- ALL Varibles inside Group
"/Lidar_Data_Altitudes", "/Profile_ID", "/Latitude", "/Longitude",
"/Profile_Time", "/Profile_UTC_Time", "/Day_Night_Flag", "/Tropopause_Height",
"/Tropopause_Temperature",
]
# Stream the data with PyDAP data from the remote granules
# that meet the criteria, applying the subsetting slice to
# each of them, and to all variables BEFORE downloading
# (data-proximate subsetting!)
dap_to_netcdf(final_urls, session=my_session,
keep_variables = keep_variables,
dim_slices= slices,
output_path=output_path)
See the code in action below!

References
Getzewich, B. (2025). CALIPSO Lidar Level 2 1 km Cloud Layer, V5-00. NASA Langley Atmospheric Science Data Center Distributed Active Archive Center. https://doi.org/10.5067/CALIOP/CALIPSO/CAL_LID_L2_01KMCLAY-STANDARD-V5-00
Cite this Tutorial
Jimenez-Urias, M. A. (2026). Access Cloud and Aerosol Lidar (Swath) Data From CALIPSO Via OPeNDAP. Zenodo. https://doi.org/10.5281/zenodo.19477128
@misc{jimenez_urias_2026_19477128,
author = {Jimenez-Urias, Miguel Angel},
title = {Access Cloud and Aerosol Lidar (Swath) Data From
CALIPSO Via OPeNDAP
},
month = apr,
year = 2026,
publisher = {Zenodo},
doi = {10.5281/zenodo.19477128},
url = {https://doi.org/10.5281/zenodo.19477128},
}
