Rogue Wave banner
Previous fileTop of DocumentContentsIndex pageNext file
Essential Tools Module Reference Guide

RWCRegexp

Module:  Essential Tools Module   Group:  String Processing


Does not inherit

Local Index

Members

Synopsis

#include <rw/regexp.h>
RWCRegexp re(".*\\.doc");// Matches filename with suffix ".doc"

Deprecation Notice

This alternative interface is now deprecated and may be eliminated from a later release. For more information on the support of deprecated classes, please contact your Rogue Wave account representative.

Use the RWTRegex<T> interface instead of this class. RWTRegex<T> provides enhanced functionality and increased performance.


NOTE -- If you need backreferencing, you must continue to use this class. Backreferencing is not provided in RWTRegex<T>.

Description

Class RWCRegexp represents a regular expression. The constructor "compiles" the expression into a form that can be used more efficiently. The results can then be used for string searches using class RWCString.

The regular expression (RE) is constructed as follows:

The following rules determine one-character REs that match a single character:

  1. Any character that is not a special character (to be defined) matches itself.

  2. A backslash (\) followed by any special character matches the literal character itself. I.e., this "escapes" the special character.


  3. NOTE -- There is one exception to this rule. \^char is interpreted as a control character: thus \^R is control-R. To match the circumflex ^ itself, use \x5e in ASCII environments.
  4. The "special characters" are:

  5. The period (.) matches any character except the newline. For example, ".umpty" matches either "Humpty" or "Dumpty."

  6. A set of characters enclosed in brackets ([]) is a one-character RE that matches any of the characters in that set. Example: "[akm]" matches either an "a", "k", or "m". A range of characters can be indicated with a dash. Example: "[a-z]" matches any lower-case letter. However, if the first character of the set is the caret (^), then the RE matches any character except those in the set. It does not match the empty string. Example: [^akm] matches any character except "a", "k", or "m". The caret loses its special meaning if it is not the first character of the set.

The following rules can be used to build a multicharacter RE.

  1. A one-character RE followed by an asterisk (*) matches zero or more occurrences of the RE. Hence, [a-z]* matches zero or more lower-case characters.

  2. A one-character RE followed by a plus (+) matches one or more occurrences of the RE. Hence, [a-z]+ matches one or more lower-case characters.

  3. A question mark (?) is an optional element. The preceding RE can occur zero or once in the string -- no more. For example, xy?z matches either xyz or xz.

  4. The concatenation of REs is a RE that matches the corresponding concatenation of strings. For example, [A-Z][a-z]* matches any capitalized word.

Finally, the entire regular expression can be anchored to match only the beginning or end of a line:

  1. If the caret (^) is at the beginning of the RE, then the matched string must be at the beginning of a line.

  2. If the dollar sign ($) is at the end of the RE, then the matched string must be at the end of the line.

The following escape codes can be used to match control characters:

\b

backspace

\e

ESC (escape)

\f

formfeed

\n

newline

\r

carriage return

\t

tab

\xddd

the literal hex number 0xdd

\ddd

the literal octal number ddd

\^C

Control code. For example, \^D is "control-D"

The most frequent problem with use of this class is in being able to specify a backslash character to be parsed. If you are attempting to parse a regular expression that contains backslashes, you must be aware that the C++ compiler and the regular expression constructor will both assume that any backslashes they see are intended to escape the following character. Thus, to specify a regular expression that exactly matches "a\a", you would have to create the regular expression using four backslashes as follows: the regular expression needs to see "a\\a", and for that to happen, the compiler would have to see "a\\\\a".

The backslashes marked with a ^ are an escape for the compiler, and the ones marked with | will thus be seen by the regular expression parser. At that point, the backslash marked 1 is an escape, and the one marked 2 will actually be put into the regular expression.

Similarly, if you really need to escape a character, such as a ".", you will have to pass two backslashes to the compiler:

Once again, the backslash marked ^ is an escape for the compiler, and the one marked with | will be seen by the regular expression constructor as an escape for the following ".".

Persistence

None

Example

Public Constructors

RWCRegexp(const char* pat);
RWCRegexp(const RWCRegexp& r);

Public Destructor

~RWCRegexp();

Assignment Operators

RWCRegexp&
operator=(const RWCRegexp&);
RWCRegexp&
operator=(const char* pat);

Public Member Functions

size_t
index(const RWCString& str,size_t* len, size_t start=0)
      const;
statVal
status();


Previous fileTop of DocumentContentsIndex pageNext file

©2004 Copyright Quovadx, Inc. All Rights Reserved.
Rogue Wave and SourcePro are registered trademarks of Quovadx, Inc. in the United States and other countries. All other trademarks are the property of their respective owners.
Contact Rogue Wave about documentation or support issues.