1 RXFIND
|
The syntax of RXFIND is:
|
return = RXFIND(source-string, regular-expression) |
RXFIND has the following two arguments:
|
source-string |
String to be be search for a pattern matching the regular expression |
|
regular-expression |
The regular expression used to search the source string |
RXFIND searches the source string for a series of characters that
match the regular expression. If the regular expression is not found, zero (0) is returned.
If the regular expression is found, the index within the source string of the first
character matched is returned, with the first index being 1.
Note: In a regular expression, a leading circumflex (^)
indicates a match with the beginning of the source string and a trailing dollar sign ($)
indicates a match with the end of the source string (See 3.1 below). Therefore, if the regular expression has a leading circumflex and
a trailing dollar sign, RXFIND returns only 0 for no match or 1 for a match, thus
performing like RXMATCH. Many examples from the internet have
a leading circumflex and a trailing dollar sign to indicate the RXMATCH
functionality. For example, this Microsoft web page
shows the regular expression to check a social security number is "^\d{3}-\d{2}-\d{4}$".
You can either use this regular expression as is and call RXFIND,
or remove the circumflex and dollar sign and call RXMATCH
using "\d{3}-\d{2}-\d{4}".
RXFIND examples within a Warehouse expression:
|
No. |
Expression |
Result |
|
1 |
RXFIND("123abc", "abc") |
4 |
|
2 |
RXFIND("123abc", "xyz") |
0 |
|
3 |
RXFIND("123abc", "[a-z]") |
4 |
|
4 |
RXFIND("123abc", "az") |
0 |
|
5 |
RXFIND("123abc", "a*") |
1 |
|
6 |
RXFIND("123abc", "a+") |
4 |
|
7 |
RXFIND("123abc", "[0-9].a") |
2 |
Notes on examples:
|
1 |
The regular expression "abc" is found
in position 4 of the source string. |
|
2 |
The regular expression "xyz" is not
found in the source string. |
|
3 |
The first lower case letter (specified with [a-z]) is
in position 4. |
|
4 |
The regular expression "az" is not
found in the source string. |
|
5 |
Zero a's (specified with a*) are
found in position 1. |
|
6 |
One or more a's (specified with a+) are
found in position 4. |
|
7 |
A digit (specified with [0-9]),
followed by any character (specified with .),
then a, is found in position 2. |
|
1 RXMATCH
|
The syntax of RXMATCH is:
|
return = RXMATCH(source-string, regular-expression) |
RXMATCH has the following two arguments:
|
source-string |
String to be searched for a pattern matching the regular expression |
|
regular-expression |
The regular expression used to search the source string |
RXMATCH matches the source string against the regular expression.
If the source string exactly matches the regular expression, true ($TRUE) is returned.
If the source string is not an exact match, false (FALSE) is returned.
RXMATCH examples within a Warehouse expression:
|
No. |
Expression |
Result |
|
1 |
RXMATCH("abc", "abc") |
True |
|
2 |
RXMATCH("123abc", "abc") |
False |
|
3 |
RXMATCH("abc", "[a-z]") |
False |
|
4 |
RXMATCH("abc", "[a-z]+") |
True |
|
5 |
RXMATCH("123abc", "[0-9]+") |
False |
|
6 |
RXMATCH("123abc", "[0-9]+[a-z]+") |
True |
|
7 |
RXMATCH("123abc", "[0-9].+c") |
True |
Notes on examples:
|
1 |
The regular expression "abc" matches the source string. |
|
2 |
The regular expression "abc" does not matches the source string. |
|
3 |
The source string does not match the single character regular expression "[a-z]". |
|
4 |
The source string matches one or more lower case letters (specified with [a-z]+). |
|
5 |
The source string does not match one or digits (specified with [0-9]+). |
|
6 |
The source string matches one or digits ([0-9]+)
followed by one or more lower case letters ([a-z]+). |
|
7 |
The source string matches one digit ([0-9]),
then one or more characters (.+),
followed by c. |
3 Regular Expression Reference
|
Regular expressions with RXFIND and RXMATCH functions
are specified using regular expression sytnax with features common to many computer programming languages.
Relatively simple regular expressions are standard and are able to be used without change in many programming environments.
In a regular expression all alphanumeric characters are matched as they are in a case-sensitive manner. For example,
the regular expression "abc"
matches string "abc", but not "Abc".
Unless they have a special meaning, special characters also must match exactly. For example,
the regular expression "a@b"
matches string "a@b".
The characters with a special meaning are:
|
|
|
Source |
|
Regular |
|
|
Description |
String |
|
Expression |
|
. |
A period matches any one character |
A5c |
matches |
A.c |
|
\ |
A backslash is used to escape special characters |
A*B |
matches |
A\*B |
|
^ |
A circumflex (hat) matches the beginning of the string |
Abc |
matches |
^Abc |
|
$ |
A dollar sign matches the end of the string |
Abc |
matches |
Abc$ |
|
| |
A vertical bar is used to do a logical OR |
Ab |
matches |
Xy|Ab |
|
[ ] |
Square brackets are used match a group |
Ab |
matches |
A[abc] |
|
( ) |
Parentheses are used to specify a group |
AdeF |
matches |
A(bc|de)F |
|
? |
A question mark matches zero or one of the previous character or group |
Ac |
matches |
Ab?c |
|
+ |
A plus sign matches one or more of the previous character or group |
Abbbc |
matches |
Ab+c |
|
* |
An asterisk matches zero or more of the previous character or group |
Ac |
matches |
Ab*c |
|
{m,n} |
Curly braces are used to match a range of the previous character or group |
Abbc |
matches |
Ab{2,3}c |
-
Bracket expressions are used to match or not match a single character. A
list of matching characters are placed within square brackets. A range may be specified
using a hyphen (-) between the high and low values.
If the first character is a circumflex (^)
the expression is interpreted as NOT, meaning a match is made if the
source character is not in the list. Examples:
|
[.?!] |
Matches either a period (.), question mark (?) or exclamation point(!). |
|
[1-9] |
Matches any digit, except 0. |
|
[0-9A-Z] |
Matches an upper case hexadecimal digit. |
|
[^/:] |
Matches any character, except a slash (/) or colon (:). |
|
[^0-9.] |
Matches any character, except a numeric digit or a period (.). |
Certain classes of characters (such as numeric digits) have predefined representations
within regular expressions. Many classes have more than one specification to provide
more compatibility between regular expressions in different computer languages.
|
|
|
Bracket |
|
Class |
Description |
Equivalent |
|
[:alnum:] |
Matches any alphanumeric character |
[0-9A-Za-z] |
|
[:alpha:] |
Matches any alphabetic character |
[A-Za-z] |
|
[:ascii:] |
Matches any ASCII character from 0 -to 127 |
[\x00-\x7F] |
|
[:blank:] |
Matches a tab or space character |
[\t ] |
|
[:cntrl:] |
Matches any control character |
[\x00-\x1F] |
|
[:digit:] |
Matches any numeric digit |
[0-9] |
|
[:graph:] |
Matches any graphical (non-space printing) character |
[!-~] |
|
[:lower:] |
Matches any lower case alphabetic character |
[a-z] |
|
[:print:] |
Matches any printing character |
[ -~] |
|
[:punct:] |
Matches any punctuation character |
[!-/:-@\[-`{-~] |
|
[:space:] |
Matches any whitespace character |
[\t\n\v\f\r ] |
|
[:upper:] |
Matches any upper case alphabetic character |
[A-Z] |
|
[:word:] |
Matches any word character |
[0-9A-Z_a-z] |
|
[:xdigit:] |
Matches a hexadecimal digit |
[0-9A-Fa-f] |
|
\p{Alnum} |
Matches any alphanumeric character |
[0-9A-Za-z] |
|
\p{Alpha} |
Matches any alphabetic character |
[A-Za-z] |
|
\p{ASCII} |
Matches any ASCII character from 0 -to 127 |
[\x00-\x7F] |
|
\p{Blank} |
Matches a tab or space character |
[\t ] |
|
\p{Cntrl} |
Matches any control character |
[\x00-\x1F] |
|
\p{Digit} |
Matches any numeric digit |
[0-9] |
|
\p{Graph} |
Matches any graphical (non-space printing) character |
[!-~] |
|
\p{Lower} |
Matches any lower case alphabetic character |
[a-z] |
|
\p{Print} |
Matches any printing character |
[ -~] |
|
\p{Punct} |
Matches any punctuation character |
[!-/:-@\[-`{-~] |
|
\p{Space} |
Matches any whitespace character |
[\t\n\v\f\r ] |
|
\p{Upper} |
Matches any upper case alphabetic character |
[A-Z] |
|
\p{XDigit} |
Matches a hexadecimal digit |
[0-9A-Fa-f] |
|
\d |
Matches any numeric digit |
[0-9] |
|
\D |
Matches any character, except a numeric digit |
[^0-9] |
|
\s |
Matches any whitespace character |
[\t\n\v\f\r ] |
|
\S |
Matches any character, except a whitespace character |
[^\t\n\v\f\r ] |
|
\w |
Matches any word character |
[0-9A-Z_a-z] |
|
\W |
Matches any character, except a word character |
[^0-9A-Z_a-z] |
-
Escapes are elements of a regular expression that begin with a backslash (\).
Escapes are used to match special characters or to treat characters with a special meaning as regular characters. For example, if you wished to match
an asterisk, you would use \* in the regular expression. Here is a list of supported escapes:
|
\0nnn |
Species a character value in octal using nnn. |
|
\a |
Matches an alert (bell) or ASCII 7 |
|
\b |
Matches a backspace or ASCII 8 |
|
\cX |
Matches the control character corresponding to X |
|
\d |
Matches any numeric digit |
|
\D |
Matches any character, except a numeric digit (See above) |
|
\e |
Matches an escape or ASCII 27 |
|
\f |
Matches a form feed or ASCII 12 |
|
\n |
Matches a line feed or ASCII 10 |
|
\p{group} |
Matches the specified group (See above) |
|
\r |
Matches a carriage return or ASCII 13 |
|
\s |
Matches any whitespace character (See above) |
|
\S |
Matches any character, except a whitespace character (See above) |
|
\t |
Matches a horizontal tab or ASCII 9 |
|
\uhhhh |
Species a character value in with exactly 4 hexadecimal digits using hhhh. |
|
\v |
Matches a vertical tab or ASCII 11 |
|
\w |
Matches any word character (See above) |
|
\W |
Matches any character, except a word character (See above) |
|
\xhhhh |
Species a character value in with 1 to 4 hexadecimal digits using hhhh. |
|
\\ |
Matches a backslash (\) |
|
\special |
Matches any special character. e.g. \( matches ( |
Note: Using backslashes in Warehouse scripts can be confusing because the Warehouse script processor
converts backslashes and then the regular expression processes them again. That means that backslashes
in a script must be doubled. For example, if you wish to check if a field called src_field
matches "ab\cd(ef)", you would use this RXMATCH
expression:
|
RXMATCH(src_field, "ab\\\\cd\\(ef\\)") |
-
Regular Expression quantifiers are used to specify how many occurrences of
a character or group are needed to make a match. The quantifiers are:
|
* |
Zero or more occurrences |
|
+ |
One or more occurrences |
|
? |
Zero or one occurrences |
|
{m} |
Exactly m occurrences |
|
{m,} |
At least m occurrences, i.e. m occurrences or more |
|
{m,n} |
Between m and n occurrences |
-
Parentheses ( ) are used to create groups within a regular expression.
The purpose of a group is either to use a quantifier or to create more than one possible match using a
vertical bar (|) as an OR operator.
Examples:
|
Source |
Regular |
|
|
|
String |
Expression |
Matches |
Comments |
|
AbAbAb |
(Ab)+ |
Yes |
Match Ab one or more times |
|
Adef |
A(bc|de)f |
Yes |
Match start with A,
then either bc or de,
followed by f |
|
Acdf |
A(bc|de)f |
No |
The cd does not match |
|
Abcdef |
A(bc|de)f |
No |
Only one bc or de
permitted between the A and f
|
|
Abcdef |
A(bc|de)*f |
Yes |
Two occurrences of bc or de
between the A and f
|
|
Af |
A(bc|de)*f |
Yes |
Zero occurrences of bc or de
between the A and f
|
|
abe |
(ab|cd*)e |
Yes |
ab matches first part of OR, followed by e |
|
cde |
(ab|cd*)e |
Yes |
cd matches second part of OR, followed by e |
|
ce |
(ab|cd*)e |
Yes |
c matches and d occurs zero or more times. |
|
cabe |
(ab|cd*)+e |
Yes |
c matches, then ab, then e. |
|
abEcdEf |
((ab|cd)E)+f |
Yes |
Groups may be nested to create complex expressions. |
-
Regular expressions may contain options specified with a leading (?
followed by the option, then ). Only two options are supported: i to do case insensitive
matching, and s to allow a dot (.) to match a newline ab
character. Normally a dot (.) does not match a newline.
Examples:
|
Source |
Regular |
|
|
|
String |
Expression |
Matches |
Comments |
|
Abc |
(?i)ABC |
Yes |
Case insensitive match |
|
Abc |
(?i)[A-Z]+ |
Yes |
Case insensitive match with 1 or more A to Z |
-
Here are some things that can be matched using regular expressions:
|
Item |
Regular Expression |
|
Email address |
[a-zA-Z0-9_.+-]+@([a-zA-Z0-9-]+\.)+[a-zA-Z]{2,5} |
|
Roman numeral |
M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3}) |
|
Web address |
https?://((.)+\.)+[A-Za-z]{2,5}(/.*)? |
|
Phone number |
[01]?[- .]?(\([2-9]\d{2}\)|[2-9]\d{2})[- .]?\d{3}[- .]?\d{4} |
|
Real number |
[+-]?\d+(\.\d*)?([eE][+-]?\d+)? |
|
SSN |
\d{3}-\d{2}-\d{4} |
|
US Dollars |
\$(\d{1,3}(\,\d{3})*|(\d+))(\.\d{2})? |
-
The following websites contain more information about regular expressions.
Keep in mind that each regular expression implementation is different and the information
in these websites is not necessarily applicable to the Warehouse implementation.
|
| | |