Back to top

Comparing Price and Performance of Three Cloud-Based Data-Storage Architectures To Optimize Use of S3 in AWS

Submitted by jimg on Mon, 12/18/2017 - 15:55

Providing data services based on cloud computing technology that is equivalent to those developed for traditional computing and storage systems is critical for successful migration to cloud-based architectures for data production, scientific analysis and storage.

OPeNDAP Web-service capabilities (comprising the Data Access Protocol (DAP) specification plus open-source software for realizing DAP in servers and clients) are among the most widely deployed means for achieving data-as-service functionality in the Earth sciences. OPeNDAP services are especially common in traditional data center environments where servers offer access to datasets stored in (very large) file systems, and a preponderance of the source data for these services is being stored in the Hierarchical Data Format Version 5 (HDF5).

Three candidate architectures for serving NASA satellite Earth Science HDF5 data via Hyrax running on Amazon Web Services (AWS) were developed and their performance examined for a set of representative use cases. The performance was based both on runtime and incurred cost. The three architectures differ in how HDF5 files are stored in the Amazon Simple Storage Service (S3) and how the Hyrax server (as an EC2 instance) retrieves their data. The results for both the serial and parallel access to HDF5 data in the S3 will be presented.

While the study focused on HDF5 data, OPeNDAP and the Hyrax data server, the architectures are generic and the analysis can be extrapolated to many different data formats, web APIs, and data servers.

James Gallagher, Aleksandar Jelenak, Nathan Potter, David W Fulker and Ted Habermann,  A price and performance comparison of three different storage architectures for data in cloud-based systems (Presentation), IN22A-06 presented at 2017 Fall Meeting, AGU, New Orleans, LA, 11-15 Dec.