RegExp Studio
TRegExpr v.0.952 - Delphi Regular Expressions

Bulgarian
English
French
German
Russian
Spanish
About TRegExpr About What's new What's new Installation Installation Regular Expressions Syntax Syntax What's new Interface What's new FAQ TRegExpr usage demos Demos AnSo@Web Author  
TRegExpr interface
Public methods and properties of TRegExpr class:

class function VersionMajor : integer;
class function VersionMinor : integer;
Return major and minor version, for example, for v. 0.944 VersionMajor = 0 and VersionMinor = 944

property Expression : string
Regular expression.
For optimization, TRegExpr will automatically compiles it into 'P-code' (You can see it with help of Dump method) and stores in internal structures. Real [re]compilation occures only when it really needed - while calling Exec[Next], Substitute, Dump, etc and only if Expression or other P-code affected properties was changed after last [re]compilation.
If any errors while [re]compilation occures, Error method is called (by default Error raises exception - see below)

property ModifierStr : string
Set/get default values of r.e.modifiers. Format of the string is similar as in (?ismx-ismx). For example ModifierStr := 'i-x' will switch on modifier /i, switch off /x and leave unchanged others.
If you try to set unsupported modifier, Error will be called (by defaul Error raises exception ERegExpr).

property ModifierI : boolean
Modifier /i - ("caseinsensitive"), initialized with RegExprModifierI value.

property ModifierR : boolean
Modifier /r - ("Russian.syntax extensions), initialized with RegExprModifierR value.

property ModifierS : boolean
Modifier /s - '.' works as any char (else doesn't match LineSeparators and LinePairedSeparator), initialized with RegExprModifierS value.

property ModifierG : boolean;
Modifier /g Switching off modifier /g switchs all operators in non-greedy style, so if ModifierG = False, then all '*' works as '*?', all '+' as '+?' and so on, initialized with RegExprModifierG value.

property ModifierM : boolean;
Modifier /m Treat string as multiple lines. That is, change `^' and `$' from matching at only the very start or end of the string to the start or end of any line anywhere within the string, initialized with RegExprModifierM value.

property ModifierX : boolean;
Modifier /x - ("eXtended syntax"), initialized with RegExprModifierX value.

function Exec (const AInputString : string) : boolean;
match a programm against a string AInputString
!!! Exec store AInputString into InputString property
For Delphi 5 and higher available overloaded versions:
function Exec : boolean;
without parameter (uses already assigned to InputString property value)
function Exec (AOffset: integer) : boolean;
is same as ExecPos

function ExecNext : boolean;
Find next match:
ExecNext;
Works same as
if MatchLen [0] = 0 then ExecPos (MatchPos [0] + 1)
else ExecPos (MatchPos [0] + MatchLen [0]);
but it's more simpler !
Raises exception if used without preceeding successful call to
Exec* (Exec, ExecPos, ExecNext). So You always must use something like
if Exec (InputString) then repeat { proceed results} until not ExecNext;

function ExecPos (AOffset: integer = 1) : boolean;
find match for InputString starting from AOffset position
(AOffset=1 - first char of InputString)

property InputString : string;
returns current input string (from last Exec call or last assign to this property).
Any assignment to this property clear Match* properties !

function Substitute (const ATemplate : string) : string;
Returns ATemplate with '$&' or '$0' replaced by whole r.e. occurence and '$n' replaced by occurence of subexpression #n.
Since v.0.929 '$' used instead of '\' (for future extensions and for more Perl-compatibility) and accept more then one digit.
If you want place into template raw '$' or '\', use prefix '\'
Example: '1\$ is $2\\rub\\' -> '1$ is <Match[2]>\rub\'
If you want to place raw digit after '$n' you must delimit n with curly braces '{}'.
Example: 'a$12bc' -> 'a<Match[12]>bc', 'a${1}2bc' -> 'a<Match[1]>2bc'.

procedure Split (AInputStr : string; APieces : TStrings);
Split AInputStr into APieces by r.e. occurencies
Internally calls Exec[Next]

function Replace (AInputStr : RegExprString;
const AReplaceStr : RegExprString;
AUseSubstitution : boolean = False) : RegExprString;
function Replace (AInputStr : RegExprString;
AReplaceFunc : TRegExprReplaceFunction) : RegExprString;
function ReplaceEx (AInputStr : RegExprString;
AReplaceFunc : TRegExprReplaceFunction) : RegExprString;
Returns AInputStr with r.e. occurencies replaced by AReplaceStr
If AUseSubstitution is true, then AReplaceStr will be used
as template for Substitution methods.
For example:
Expression := '({-i}block|var)\s*\(\s*([^ ]*)\s*\)\s*';
Replace ('BLOCK( test1)', 'def "$1" value "$2"', True);
will return: def 'BLOCK' value 'test1'
Replace ('BLOCK( test1)', 'def "$1" value "$2"', False)
will return: def "$1" value "$2"
Internally calls Exec[Next]
Overloaded version and ReplaceEx operate with call-back function,
so You can implement really complex functionality.

property SubExprMatchCount : integer; // ReadOnly
Number of subexpressions has been found in last Exec* call.
If there are no subexpr. but whole expr was found (Exec* returned True), then SubExprMatchCount=0, if no subexpressions nor whole r.e. found (Exec* returned false) then SubExprMatchCount=-1.
Note, that some subexpr. may be not found and for such subexpr. MathPos=MatchLen=-1 and Match=''.
For example: Expression := '(1)?2(3)?';
Exec ('123'): SubExprMatchCount=2, Match[0]='123', [1]='1', [2]='3'
Exec ('12'): SubExprMatchCount=1, Match[0]='12', [1]='1'
Exec ('23'): SubExprMatchCount=2, Match[0]='23', [1]='', [2]='3'
Exec ('2'): SubExprMatchCount=0, Match[0]='2'
Exec ('7') - return False: SubExprMatchCount=-1

property MatchPos [Idx : integer] : integer; // ReadOnly
pos of entrance subexpr. #Idx into tested in last Exec* string. First subexpr. have Idx=1, last - MatchCount, whole r.e. have Idx=0.
Returns -1 if in r.e. no such subexpr. or this subexpr. not found in input string.

property MatchLen [Idx : integer] : integer; // ReadOnly
len of entrance subexpr. #Idx r.e. into tested in last Exec* string. First subexpr. have Idx=1, last - MatchCount, whole r.e. have Idx=0.
Returns -1 if in r.e. no such subexpr. or this subexpr. not found in input string.

property Match [Idx : integer] : string; // ReadOnly
== copy (InputString, MatchPos [Idx], MatchLen [Idx])
Returns '' if in r.e. no such subexpr. or this subexpr. not found in input string.

function LastError : integer;
Returns ID of last error, 0 if no errors (unusable if Error method raises exception) and clear internal status into 0 (no errors).

function ErrorMsg (AErrorID : integer) : string; virtual;
Returns Error message for error with ID = AErrorID.

property CompilerErrorPos : integer; // ReadOnly
Returns pos in r.e. there compiler stopped.
Usefull for error diagnostics

property SpaceChars
: RegExprString
Contains chars, treated as \s (initially filled with RegExprSpaceChars global constant)

property WordChars
: RegExprString;
Contains chars, treated as \w (initially filled with RegExprWordChars global constant)

property LineSeparators
: RegExprString
line separators (like \n in Unix), initially filled with RegExprLineSeparators global constant)
see also about line separators

property LinePairedSeparator
: RegExprString
paired line separator (like \r\n in DOS and Windows).
must contain exactly two chars or no chars at all, initially filled with RegExprLinePairedSeparator global constant)
see also about line separators

For example, if You need Unix-style behaviour, assign LineSeparators := #$a (newline character) and LinePairedSeparator := '' (empty string), if You want to accept as line separators only \x0D\x0A but not \x0D or \x0A alone, then assign LineSeparators := '' (empty string) and LinePairedSeparator := #$d#$a.

By default 'mixed' mode is used (defined in RegExprLine[Paired]Separator[s] global constants): LineSeparators := #$d#$a; LinePairedSeparator := #$d#$a. Behaviour of this mode is detailed described in the syntax section.

class function InvertCaseFunction (const Ch : REChar) : REChar;
Converts Ch into upper case if it in lower case or in lower if it in upper (uses current system local setings)

property InvertCase : TRegExprInvertCaseFunction;
Set this property if you want to override case-insensitive functionality.
Create set it to RegExprInvertCaseFunction (InvertCaseFunction by default)

procedure Compile;
[Re]compile r.e. Usefull for example for GUI r.e. editors (to check all properties validity).

function Dump : string;
dump a compiled regexp in vaguely comprehensible form


Global constants

EscChar = '\'; // 'Escape'-char ('\' in common r.e.) used for escaping metachars (\w, \d etc).
// it's may be usefull to redefine it if You are using C++ Builder - to avoide ugly constructions
// like '\\w+\\\\\\w+\\.\\w+' - just define EscChar='/' and use '/w+\/w+/./w+'

Modifiers default values:
RegExprModifierI
: boolean = False;   // TRegExpr.ModifierI
RegExprModifierR : boolean = True;   // TRegExpr.ModifierR
RegExprModifierS : boolean = True;   // TRegExpr.ModifierS
RegExprModifierG : boolean = True;   // TRegExpr.ModifierG
RegExprModifierM : boolean = False;   // TRegExpr.ModifierM
RegExprModifierX : boolean = False;   // TRegExpr.ModifierX

RegExprSpaceChars : RegExprString = ' '#$9#$A#$D#$C;
// default for SpaceChars property

RegExprWordChars : RegExprString =
'0123456789'
+ 'abcdefghijklmnopqrstuvwxyz'
+ 'ABCDEFGHIJKLMNOPQRSTUVWXYZ_';
// default value for WordChars property

RegExprLineSeparators : RegExprString =
#$d#$a{$IFDEF UniCode}#$b#$c#$2028#$2029#$85{$ENDIF};
// default value for LineSeparators property
RegExprLinePairedSeparator : RegExprString =
#$d#$a;
// default value for LinePairedSeparator property

RegExprInvertCaseFunction : TRegExprInvertCaseFunction = TRegExpr.InvertCaseFunction;
// default for InvertCase property


Usefull global functions


function ExecRegExpr (const ARegExpr, AInputStr : string) : boolean;
true if string AInputString match regular expression ARegExpr
! will raise exeption if syntax errors in ARegExpr

procedure SplitRegExpr (const ARegExpr, AInputStr : string; APieces : TStrings);
Split AInputStr into APieces by r.e. ARegExpr occurencies

function ReplaceRegExpr (const ARegExpr, AInputStr, AReplaceStr : string;
AUseSubstitution : boolean = False) : string;
Returns AInputStr with r.e. occurencies replaced by AReplaceStr.
If AUseSubstitution is true, then AReplaceStr will be used as template for Substitution methods.
For example:
ReplaceRegExpr ('({-i}block|var)\s*\(\s*([^ ]*)\s*\)\s*',
'BLOCK( test1)', 'def "$1" value "$2"', True)
will return: def 'BLOCK' value 'test1'
ReplaceRegExpr ('({-i}block|var)\s*\(\s*([^ ]*)\s*\)\s*',
'BLOCK( test1)', 'def "$1" value "$2"')
will return: def "$1" value "$2"

function QuoteRegExprMetaChars (const AStr : string) : string;
Replace all metachars with its safe representation, for example 'abc$cd.(' converts into 'abc\$cd\.\('
This function usefull for r.e. autogeneration from user input

function RegExprSubExpressions (const ARegExpr : string;
ASubExprs : TStrings; AExtendedSyntax : boolean = False) : integer;
Makes list of subexpressions found in ARegExpr r.e.
In ASubExps every item represent subexpression, from first to last, in format:
String - subexpression text (without '()')
low word of Object - starting position in ARegExpr, including '(' if exists! (first position is 1)
high word of Object - length, including starting '(' and ending ')' if exist!
AExtendedSyntax - must be True if modifier /x will be On while using the r.e.
Usefull for GUI editors of r.e. etc (You can find example of using in TestRExp.dpr project)

Result code   Meaning


0       Success. No unbalanced brackets was found;
-1       there are not enough closing brackets ')';
-(n+1)       at position n was found opening '[' without corresponding closing ']';
n       at position n was found closing bracket ')' without corresponding opening '('.

If Result <> 0, then ASubExprs can contain empty items or illegal ones


Exception type


Default error handler of TRegExpr raise exception:

ERegExpr = class (Exception)
public
ErrorCode : integer; // error code. Compilation error codes are before 1000.
CompilerErrorPos : integer; // Position in r.e. where compilation error occured
end;


How to use Unicode


TRegExpr now supports UniCode, but it works very slow :(
Who want to optimize it ? ;)
Use it only if you really need Unicode support !
Remove '.' in {.$DEFINE UniCode} in regexpr.pas. After that all strings will be treated as WideString.



© 2004 Andrey V. Sorokin, Saint Petersburg, Russia
anso@mail.ru
RegExpStudio.com

Help&Manual - the best help authoring tool!