There are three operators defined to compare one String data
type to another. The = operator returns TRUE if its two input
character strings are identical, and the != operator returns
TRUE if the Strings do not match. A third operator,
~= is provided that returns TRUE if the String
to the left of the operator matches the regular expression in
the String on the right.
A regular expression is simply a character string containing wildcard characters that allow it to match patterns within a longer string. For example, the following constraint expression might return all the stations on the sample cruise at which a shark was sighted:
?station&station.comment~=``.*shark.*''
Most characters in a regular expression match themselves. That is, an "f" in a regular expression matches an "f" in the target string. There are several special characters, however, that provide more sophisticated pattern-matching capabilities.
.
The period matches any single character except a newline.
* + ?
These are postfix operators, which indicate to try to match the
preceding regular expression repetitively (as many times as
possible). Thus, o* matches any number of o's. The
operators differ in that o* also matches zero o's,
o+ matches only a series of one or more o's, and
o? matches only zero or one o.
Define a "character set," which begins with [ and is
terminated by ]. In the simplest case, the characters between
the two brackets are what this set can match. The expression
[Ss] matches either an upper or lower case s. Brackets
can also contain character ranges, so [0-9] matches all the
numerals. If the first character within the brackets is a caret
( ), the expression will only match characters that do not
appear in the brackets. For example, [ 0-9]* only matches
character strings that contain no numerals.
$
These are special characters that match the empty string at the beginning or end of a line.
\|
These two characters define a logical OR between the largest
possible expression on either side of the operator. So, for
example, the string Endeavor\|Oceanus matches
either Endeavor or Oceanus. The scope of the OR can be
contained with the grouping operators, \( and
\).
\( \)
These are used to group a series of characters into an expression,
or for the OR function. So, for example,
\(abc\)* matches zero or more
repetitions of the string abc2.
There are several more special characters and several other features of the characters described here, but they are beyond the scope of this guide. The OPeNDAP regular expression syntax is the same as that used in the Emacs editor. See the documentation for Emacs [1] for a complete description of all the pattern- matching capabilities of regular expressions.
In the above example, a user might wonder whether the shark comments had been spelled with upper or lower case letters. The following constraint expression will return any station that mentions a shark in upper or lower case.
?station&station.comment~=``.*\(SHARK\|shark\).*''
Of course, this would miss Shark and sHark and so on. The
constraint could be written this way to catch all odd permutations of
upper and lower case:
?station&station.comment~=``.*[Ss][Hh][Aa][Rr][Kk].*''