Prev Up Next Index
Go backward to Examples
Go up to 4.1 Selecting Data: Using Constraint Expressions
Go forward to 4.1.6 Optimizing the Query

4.1.5 Pattern Matching with Constraint Expressions

There are three operators defined to compare one String data type to another. The = operator returns TRUE if its two input character strings are identical, and the != operator returns TRUE if the Strings do not match. A third operator, ~= is provided that returns TRUE if the String to the left of the operator matches the regular expression in the String on the right.

A regular expression is simply a character string containing wildcard characters that allow it to match patterns within a longer string. For example, the following constraint expression might return all the stations on the sample cruise at which a shark was sighted:

?station&station.comment~=``.*shark.*''

Most characters in a regular expression match themselves. That is, an "f" in a regular expression matches an "f" in the target string. There are several special characters, however, that provide more sophisticated pattern-matching capabilities.  

.

The period matches any single character except a newline.

* + ?

These are postfix operators, which indicate to try to match the preceding regular expression repetitively (as many times as possible). Thus, o* matches any number of o's. The operators differ in that o* also matches zero o's, o+ matches only a series of one or more o's, and o? matches only zero or one o.

`[ ... ]'

Define a "character set," which begins with [ and is terminated by ]. In the simplest case, the characters between the two brackets are what this set can match. The expression [Ss] matches either an upper or lower case s. Brackets can also contain character ranges, so [0-9] matches all the numerals. If the first character within the brackets is a caret ( ), the expression will only match characters that do not appear in the brackets. For example, [ 0-9]* only matches character strings that contain no numerals.

$

These are special characters that match the empty string at the beginning or end of a line.

\|

These two characters define a logical OR between the largest possible expression on either side of the operator. So, for example, the string Endeavor\|Oceanus matches either Endeavor or Oceanus. The scope of the OR can be contained with the grouping operators, \( and \).

\( \)

These are used to group a series of characters into an expression, or for the OR function. So, for example, \(abc\)* matches zero or more repetitions of the string abc2.

There are several more special characters and several other features of the characters described here, but they are beyond the scope of this guide. The OPeNDAP regular expression syntax is the same as that used in the Emacs editor. See the documentation for Emacs [1] for a complete description of all the pattern- matching capabilities of regular expressions.

Examples

In the above example, a user might wonder whether the shark comments had been spelled with upper or lower case letters. The following constraint expression will return any station that mentions a shark in upper or lower case.

?station&station.comment~=``.*\(SHARK\|shark\).*''

Of course, this would miss Shark and sHark and so on. The constraint could be written this way to catch all odd permutations of upper and lower case:

?station&station.comment~=``.*[Ss][Hh][Aa][Rr][Kk].*''

Tom Sgouros, August 25, 2004

Prev Up Next