Prev Up Next Index
Go backward to 6 Data and Data Models
Go up to 6 Data and Data Models
Go forward to 6.1.1 Data Models and APIs

6.1 Data models

Any data set is made up of data and a data model. The data model defines the size and arrangement of data values, and may be thought of as an abstract representation of the relationship between one data value and another. Though it may seem paradoxical, it is precisely this relationship that defines the meaning of some number. Without the context provided by a data model, a number does not represent anything. For example, within some data set, it may be apparent that a number represents the value of temperature at some point in space and time. Without its neighboring temperature measurements, and without the latitude, longitude, depth, and time, the same number means nothing.

As the model only defines an abstract set of relationships, two data sets containing different data may share the same data model. For example, the data produced by two different measurements with the same instrument will use the same data model, though the values of the data are different. Sometimes two models may be equivalent. For example, an XBT measures a time series of temperature, but is usually stored as a series of temperature and depth measurements. The temperature vs. time model of the original data is equivalent to the temperature vs. depth model of the stored data.

In a computational sense, a data model may be considered to be the data type or collection of data types used to represent that data. A temperature measurement might occur as half an entry in a sequence of temperature and depth pairs. However the data model also includes the scalar latitude, longitude and date that identify the time and place where the temperature measurements were taken. Thus the data set might be represented in a C-like syntax like this (figure 6.1):

Dataset {
   Float64 lat;
   Float64 lon;
   Int32 minutes;
   Int32 day;
   Int32 year;
   Sequence {
      Float64 depth;
      Float64 temperature;
   } cast;
} xbt-station;
Example Data Description of XBT Station
 

In the above example, a data set is described that contains all the data from a single XBT. The data set is called xbt-station, and contains floating-point representations of the latitude and longitude of the station, and three integers that specify when the XBT was released. The xbt-station contains a single sequence (called cast) of measurements, which are here represented as values for depth and temperature16.

A different data model representing the same data might look like this (figure 6.1):

Dataset {
   Structure {
      Float64 lat;
      Float64 lon;
   } location;
   Structure {
      Int32 minutes;
      Int32 day;
      Int32 year;
   } time;
   Sequence {
      Float64 depth;
      Float64 temperature;
   } cast;
} xbt-station;
Example Data Description of XBT Station Using Structures
 

In this example, several of the data have been grouped, implying a relation between them. The nature of the relationship is not defined, but it is clear that lat and lon are both components of location, and that each measurement in the cast sequence is made up of depth and temperature values.

In these two examples, meaning was added to the data set only by providing a more refined context for the data values. No other data was added, but still the second example can be said to contain more information than the first one.

These two examples are refinements of the same basic arrangement of data. However, there is nothing that says that a completely different data model can't be just as useful or just as accurate. For example, the depth and temperature data, instead of being represented by a sequence of pairs, as in figure 6.1 and figure 6.1, could be represented by a pair of sequences or arrays, as in figure 6.1

Dataset {
   Structure {
      Float64 lat;
      Float64 lon;
   } location;
   Structure {
      Int32 minutes;
      Int32 day;
      Int32 year;
   } time;
   Float64 depth[500];
   Float64 temperature[500];
} xbt-station;
Example Data Description of XBT Station Using Arrays
 

The relationship between the depth and temperature variables is no longer clear, but, depending on what sort of processing is intended, this may not be that important a loss.

The choice of a computational data model to contain some data set depends in many cases on the whims and preferences of the user, as well as on the data analysis software to be used. Several different data models may be equally useful for a given task. Of course, some data models will contain more information about the data than others, but this information can also be carried in a scientist's head.

Note that with a carefully chosen set of data type constructors, such as those we've used in the preceding examples, a user can implement an infinite number of data models. The examples above use the OPeNDAP Dataset Descriptor Structure (DDS) format, which will become important in later discussions of the details of the OPeNDAP Data Access Protocol. The precise details of the DDS syntax are described in Section 6.4.1.


Tom Sgouros, August 25, 2004

Prev Up Next