3.2 The 3.2 The getxxx Function Tools" alt="3.2.2 getxxx Function Tools" src="http://opendap.org/icons/ts-next.gif"> Index
NVODS Home

3.2.1 getxxx Function Modes

The getxxx function operates in three different modes. One mode returns URLs corresponding to a dataset, another returns estimates of the request size, and the third actually retrieves data. The following sections describe the three modes.

Catalog mode

In catalog mode (or cat mode) the browser is requesting information about location in time and space of the indicated dataset. Can't exec program: /usr/local/bin/giftext.pl The function takes the location of an indicated geo-spatial region (using the ranges matrix), and returns a list of the points where the queried dataset contains data.

The return arguments from such a request are:

[ x, y, z, t, n, URL ] = getxxx( ... )
x, y, z, t
Are one-dimensional vectors containing the locations and times corresponding to individually addressable URLs available to the browser user. (Latitude and longitude in decimal degrees, depth in meters from the surface, time in decimal years.) These vectors will be used to plot tick marks on the browser display, and the user will be able to select one or more of these tick marks to get data. All of the arguments must be returned, although those that are not appropriate for a given dataset may be returned empty.

For example, a dataset containing buoy data might return a series of (x,y) pairs, a vector of depths indicating instrument depths, and an empty time vector. The browser will display the buoy locations, and the depth levels, allowing the user to select specific buoys or specific depths. A dataset containing satellite images taken daily might return only a time vector, indicating the times of each image.

n
is the total number of URLs identified by the catalog function. The browser will call the getxxx function (in get mode) this many times to retrieve all the data referenced in the cat mode request.
URL
A URL directed at the dataset, or at a catalog server, if one was used to make the cat mode request. This is displayed in case you would like to make this request (or a related one) independent of the OPeNDAP Matlab GUI, say with loaddods.

Sometimes the only way to provide the data requested in a catalog request is to contact the dataset's OPeNDAP server for data, or contact a catalog server for information about the dataset (see Section 3.2.2). For some datasets, such as monthly climatologies, an internet request does not need to be made to satisfy the request -- instead a Matlab function can be written which examines the user-selected time and geographic ranges and determines how many of the climatologies fall within these ranges. For other kinds of datasets, such as satellite datasets stored along a groundtrack, which repeat in a predictable but irregular pattern in time and space, reference satellite groundtrack locations and data densities can be stored in a local .mat file.

During a catalog request, the dataset, vars and ranges values should be stored internally (i.e., in a global variable). This is so that for future get and datasize mode requests, the getxxx function will be able to determine whether or not the existing catalog is up-to-date. If a get or datasize request is made where the data parameters have changed from the last catalog request, the getxxx function should make a new catalog request internally, before satisfying the new request.

The getxxx function should also internally store an ordered list of URLs based on this catalog request, or enough information to recreate those URLs when requested by the browser.

Datasize Mode

The function of the datasize mode is to provide the browser with an estimate of the size of a formulated data request as it will appear in the Matlab workspace (as a double-precision float). When a data Can't exec program: /usr/local/bin/giftext.pl request is issued, the browser checks to see if the estimated datasize is below a threshold level (a level which defaults to 1Mb but can be set by the user). If above that level, the browser requires the user to affirm the data transfer. This is to prevent users from unintenionally requesting very large data transfers. (See Section 2.2.4.)

The previous values (from the most recent catalog request) of the dataset, variable and ranges variables should first be compared with the new values, to see if the catalog request is current. If any of these values has changed, the getxxx function should first call itself in cat mode to get the number of data points or URLs or whatever the pertinent operative may be that causes the size of a data request to change.

The return arguments from a datasize request are:

[ datasize, num_urls ] = getxxx [ ... ]
datasize
The approximate size (in megabytes) of the data that will be returned by the specified data request.
num_urls
The number of URLs in the request. (This is in case a new catalog request was required during the datasize request. If no new catalog request was required, the input num_urls argument can simply be returned.)

Get Mode

The get mode request (also "data request") queries the remote dataset for data. The browser makes a datasize request before Can't exec program: /usr/local/bin/giftext.pl each data request, and a cat request before each datasize request. This series is to be repeated if the user changes any of the search parameters, like the geographic ranges or times.

The data request therefore need only operate on the ordered list of URLs generated by the most recent cat request -- the browser simply increments the url_index_number from one to the total number of URLs and waits for each result. However because the input information from the browser (such as the text list of selected variables) is passed in each time, the getxxx function does not actually have to construct a URL until this point1.

Sometimes it is appropriate to return a count of one URL to the browser while actually using more than one, internally. This can occur when, for example, the user requests all data from longitudes 40W to 40E from a gridded dataset with global coverage but the grids are stored with a split at 20E (that is, the longitude in the dataset ranges from 20E to 380E). The getxxx function in this case must either transfer the whole dataset, then subset and reformat it, or dereference two URLs, one from 40W to 20E and one from 20E to 40E, and combine the result. In other words, the number of derefernced URLs returned by the getxxx function should match what the browser expects based on the result of the catalog mode request.2

The data should be scaled according to the DataScale information in the archive.m file before being returned.

The return arguments from the get mode are as follows:

[ data, sizes, names, index, URL ] = getxxx[ ... ]
data
is a M×1 vector of data, where M is the total number of data measurements returned. That is, all the data are reshaped to a single column, with their original sizes recorded in the 'sizes' argument.
sizes
is an N×2 array of array dimensions, indicating the sizes of the returned arrays. This requirement must be met: sum(sizes(:,1).*sizes(:,2)) = M.
names
is a list of character strings with the same number of rows as sizes -- each variable as it is unpacked will be given the corresponding name. The current convention is to rename returning variables with the common names used in the browser. If `Longitude' and `Latitude' (exactly so) are not present, the browser will not allow plotting on the browse window. Some of the getxxx functions now written wll create longitude and latitude vectors for gridded datasets using the information in the archive M-files. This is an occasionally useful tactic for troublesome datasets.

NOTE: For the benefit of Matlab Version 4 users, variable names should be no longer than 19 characters.

index
The index number is the index into this list:
Time
Longitude
Latitude
Depth
SelectableVariables[1]
.
.
.
SelectableVariables[n]

The browser uses this index into the DataRange array in the archive.m file for mapping multiple images within the same dataset to a unified color scale, and into the DataScale array for scaling the returned data.

URL
This variable should contain the URL dereferenced to retrieve the data provided as a result of the get request.

This variable is to be a single string -- if several URLs have actually been dereferenced to satisfy what the browser has treated as a single URL, they should all be included, concatenated end-to-end, each separated by a space, i.e.:

If TimeName in the archive.m file is empty because the server doesn't actually return a time, you can insert a time made up from the catalog request. However this is not required; every getxxx function should be able to handle empty variables correctly.

If any of the variable names resulting from dereferencing a URL is longer than 19 characters, the data contained in that variable must be requested separately for Matlab 4 users and the result mapped to the first 19 characters of that variable. That is, you must first do:

loaddods(URL)
then

[longname(1:19)] = loaddods(URL);

This is a workaround for a bug concerning long variable names interned into Matlab v4.


Tom Sgouros, December 21, 2004