•  Icon
  •  Icon
  •  Icon
  •  Icon
  •  Icon

New paper on Web Accessible APIs published in CODATA Data Science Journal

This new article titled, “Web Accessible APIs in the Cloud Trade Study (Task 28)”, provides an account of metadata issues encountered in the development of OPeNDAP software, including the metadata taxonomy used in the project.

This study, conducted by OPeNDAP and The HDF Group, explored three candidate architectures for serving NASA Earth Science HDF5 data stored in S3 via Hyrax running on Amazon Web Services (AWS). We studied the cost and performance for each architecture using the same set of several representative Use-Cases, making direct performance and cost differences in the technologies clearly evident. We found that simple approaches can yield optimal performance under many circumstances. Click here to download the PDF.

Summary of the paper on Web Accessible APIs

The Earth Science Data and Information System (ESDIS) Project, which manages NASA’s Earth Observing System Data and Information System (EOSDIS), aims to improve data discoverability, accessibility, and usability by leveraging cloud computing. The study compares different architectures for providing Open-source Project for a Network Data Access Protocol (OPeNDAP) data services in the cloud using existing ESDIS datasets and representative use cases.

Three architectures are explored, focusing on serving NASA Earth Science HDF5 data via Hyrax on Amazon Web Services (AWS). The study evaluates cost, performance, and storage considerations for each architecture. Findings suggest a hybrid approach combining elements of Architecture 1 (A1) and Architecture 2 (A2) may be optimal, with Architecture 3 (A3) offering potential storage savings for specific granules.

Recommendations include refining implementations, integrating support for S3 into the HDF5 Library, and exploring serverless architectures. Performance comparisons reveal advantages and disadvantages of each architecture, with A2 and A3 generally outperforming A1 for requests accessing one or two variables but lagging for requests involving multiple variables or entire granules. Cost comparisons highlight differences in processing costs, S3 storage costs, and data requests, with S3 requests representing a small portion of overall costs. In the performance-driven context, processing time impacts costs, while in the performance-agnostic context, S3 requests may constitute a minor portion of costs compared to other factors such as egress data.


Citations for the documents: