We want to hear from you! Take our short OPeNDAP User Survey!

  •  Icon
  •  Icon
  •  Icon
  •  Icon
  •  Icon

How to Serve Data with Groups with Hyrax

How to Serve Data with Groups with Hyrax

Scientific data can be stored in many formats, and one of the most widely used is the HDF5 file format. A feature of HDF5 datasets that is increasingly becoming popular with data producers is the use of Groups to create internal hierarchies within the dataset, resembling a traditional posix filesystem say on a computer. Adding Groups means that one can navigate the contents within an HDF5 file as one would on a local directory, specified by folders and subfolders, and enable the presence of heterogeneous data within a single file.

The HDF5 file format is not unique when it comes to incorporating Groups within its data model. The NetCDF4 (aka “enhanced NetCDF”) also has Groups, but the representation of Groups and its internal logic is much simpler compared to that of the HDF5, because in a NetCDF4 a Group always has a unique parent Group. OPeNDAP incorporated a Group Container type within the DAP4 protocol that dates back to 2016 (see Fig. 1), and OPeNDAP’s Group is based on the NetCDF4 data model: a Group has a single unique parent Group and there is always a unique root Group denoted as /.

Figure 1. Comparison between the DAP4 and its predecessor, the DAP2 (aka DODS) protocol. The Hyrax data server implements DAP4 protocol, which can also “serve” DAP2 data since DAP4 is a superset of DAP2 (Grids are represented as arrays along with additional information about their Dimensions and Maps).

The OPeNDAP Hyrax data server implements the DAP4 protocol, enabling OPeNDAP users to “serve” and access data with Groups. While “serving” data that has Groups is straightforward, it is NOT the default configuration given a fresh installation of the Hyrax data server. The reason behind that is that many popular client APIs have not fully adopted Groups within their own data models, and due to the slow adoption of Groups by client APIs the default configuration of the Hyrax data server is to Flatten the access pattern to an HDF5 file (same for NetCDF4 files). Note that OPeNDAP does not modify the original file.

The recent adoption of Groups by many popular client API’s such as xarray’s Datatree and Panoply, is very encouraging. Below we outline the steps to make datasets with Groups available via the OPeNDAP’s Hyrax data server, which require changing the default configuration on the Hyrax data server.

Goals

  • Install and run the latest version of OPeNDAP’s Hyrax (1.17.0) data server.
  • Change the default configuration of the BES via site.conf file.
  • Test that data Groups can be accessed locally (e.g. via Pydap or other client).

Prerequisites

  • Data stored locally (~/tmp/DATA/).
    • If you have any NetCDF data saved as filename.nc that contain types of the NetCDF4 model such as Groups, you should rename the file as filename.nc4. The renaming from .nc to .nc4 when the dataset relates to the Enhanced NetCDF data model should be be a convention to distinguish betwee the classic and enhanced NetCDF data models (e.g. similar to the distinction between HDF4 and HDF5).
  • Docker deamon running in background (Docker Desktop).
  • OPTIONAL: Create conda/mamba (Python testing) environment for interactive testing workflow:

mamba create -n opendap_env -c conda-forge python=3.11 jupyterlab ipython netCDF4 matplotlib pydap

  

Run Hyrax

  1. Open a terminal window on your Desktop computer and pull the latest (snapshot) of Hyrax:

docker pull opendap/hyrax:snapshot
  

  1. On a separate folder, create an empty file called site.conf

mkdir ~/tmp/BESconfig
touch ~/tmp/BESconfig/site.conf
  

  1. Add overriding parameters to enable Group (hierarchy) access to the HDF5 file. This is, add the following parameters to the site.conf file:

H5.EnableCF=false
H5.EnableCFDMR=true
  

  1. (Extra) If your data is in NetCDF4 (.nc4), then assign the H5 handler to these files by adding the following lines to the site.conf file

BES.Catalog.catalog.TypeMatch=
BES.Catalog.catalog.TypeMatch+=csv:.*\.csv(\.bz2|\.gz|\.Z)?$;
BES.Catalog.catalog.TypeMatch+=reader:.*\.(dds|dods|data_ddx|dmr|dap)$;
BES.Catalog.catalog.TypeMatch+=dmrpp:.*\.(dmrpp)(\.bz2|\.gz|\.Z)?$;
BES.Catalog.catalog.TypeMatch+=ff:.*\.dat(\.bz2|\.gz|\.Z)?$;
BES.Catalog.catalog.TypeMatch+=gdal:.*\.(tif|TIF)$|.*\.grb\.(bz2|gz|Z)?$|.*\.jp2$|.*/gdal/.*\.jpg$;
BES.Catalog.catalog.TypeMatch+=h4:.*\.(hdf|HDF|eos|HDFEOS)(\.bz2|\.gz|\.Z)?$;
BES.Catalog.catalog.TypeMatch+=ncml:.*\.ncml(\.bz2|\.gz|\.Z)?$;
BES.Catalog.catalog.TypeMatch+=h5:.*\.(HDF5|h5|he5|H5)(\.bz2|\.gz|\.Z)?$;
BES.Catalog.catalog.TypeMatch+=h5:.*\.nc4(\.bz2|\.gz|\.Z)?$;
  

The last line is the one that assigns any .nc4 dataset to Hyrax’s h5 handler, but we add the extra lines to reset all handlers and that way avoid any potential name clashing (two handlers assign to the same data type), which can be hard to debug.

  1. Now, if you are running this tutorial on a Linux environment, run the following command to assign the locations of the data and site.conf within your Desktop, to the default locations where Hyrax searches within the Docker container:

docker run -d -h hyrax -p 8080:8080 \
--volume ~/tmp/DATA:/usr/share/hyrax \
--volume ~/tmp/BESconfig/site.conf:/etc/bes/site.conf \
--name=hyrax opendap/hyrax:snapshot
  

NOTE: If you are running this tutorial on a MacOS with an M-chip, you will need to add the following line to the docker run command above:


--platform linux/amd64 \
  

By Default, Hyrax will run on localhost and so the landing page for Hyrax will be 127.0.0.0:8080:8080/opendap/

  1. Lastly, using the Web browser or another API such as PyDAP, access the dataset url and verify that the dataset has a hierarchical representation of the metadata. For example, with PyDAP:

from pydap.client import open_url
url = 'http://localhost:...'
ds = open_url(url, protocol='da4')
ds.tree()
  


Check out the tutorial video associated with this post