Prev Up Next Index
Go backward to 6.1.1 Data Models and APIs
Go up to 6.1 Data models
Go forward to 6.2 Data Access Protocol

6.1.2 Translating Data Models

The problem of data model translation is central to the implementation of OPeNDAP. With an effective data translator, an OPeNDAP program originally designed to read netCDF data can have some access to data sets that use an incompatible data model, such as JGOFS.

In general, it is not possible to define an algorithm that will translate data from any model to any other, without losing information defined by the position of data values or the relations between them. Some of these incompatibilities are obvious; a data model designed for time series data may not be able to accommodate multi-dimensional arrays. Others are more subtle. For example, a sequence looks very similar to a collection of lists in many respects. However, a sequence is an ordered collection of data types, whereas a list implies no order. However, there are many useful translations that can be done, and there are many others that are still useful despite their inherent information loss.

For example, consider a relational structure like the one in figure 6.1.2. This is similar to the examples in Section 6.1, rearranged to accommodate an entire cruise worth of temperature-depth measurements. This is the sort of data type that the JGOFS API is designed to use.

Dataset {
   Sequence {
      Int32 id;
      Float64 latitude;
      Float64 longitude;
      Sequence {
         Float64 depth;
         Float64 temperature;
      } xbt_drop;
   } station;
} cruise;
Example Data Description of XBT Cruise
 

Note that each entry in the cruise sequence is composed of a tuple of data values (one of which is itself a sequence). Were we to arrange these data values as a table, they might look like this:

id   lat   lon   depth  temp
1   10.8   60.8    0     70
                  10     46
                  20     34
2   11.2   61.0    0     71
                  10     45
                  20     34
3   11.6   61.2    0     69
                  10     47
                  20     34

This can be made into an array, although that introduces redundancy.

id   lat   lon   depth  temp
1   10.8   60.8    0     70
1   10.8   60.8   10     46
1   10.8   60.8   20     34
2   11.2   61.0    0     71
2   11.2   61.0   10     45
2   11.2   61.0   20     34
3   11.6   61.2    0     69
3   11.6   61.2   10     47
3   11.6   61.2   20     34

The data is now in a form that may be read by an API such as netCDF. But consider the analysis stage. Suppose a user wants to see graphs of station data. It is not obvious simply from the arrangement of the array where a station stops and the next one begins. Analyzing data in this format is not a function likely to be accommodated by a program that uses the netCDF API.


Tom Sgouros, August 25, 2004

Prev Up Next