YuPcre2 is an up to date regular expression library for Delphi with Perl syntax. Directly supports UnicodeString, AnsiString, or UCS4String, as well as UTF-8, and UTF-16.
This document describes the differences and similarities between the new YuPcre2 and the old DIRegEx to help convert existing projects. If you never used DIRegEx or start a new project with YuPcre2, you might skip this document.
YuPcre2 is a new project, not just a drastic update to DIRegEx. A lot has changed, even though some units, classes, and functions carry familiar names. Unfortunately, it was not possible to keep identical identifiers because Delphi rejects them if both YuPcre2 and DIRegEx are installed into the IDE. Overall, DIRegEx
names have changed to DIRegEx2
where possible, which should simplify transition to YuPcre2.
Unit names had to be changed to allow YuPcre2 to be installed into the IDE in parallel with DIRegEx. Unit names start with the YuPcre2
prefix. The native PCRE2 API is in YuPcre2.pas
. DIRegEx
units with class wrappers and helper routines have been renamed to YuPcre2_RegEx2…
:
DIRegEx | YuPcre2 |
---|---|
DIRegEx_Api.pas | YuPcre2.pas |
n/a | YuPcre2OptInfo.pas |
DIRegEx_Reg.pas | YuPcre2Reg.pas |
DIRegEx.pas | YuPcre2_RegEx2.pas |
DIRegEx_Consts.pas | YuPcre2_RegEx2_Consts.pas |
DIRegEx_MaskControls.pas | YuPcre2_RegEx2_MaskControls.pas |
DIRegEx_SearchStream.pas | YuPcre2_RegEx2_SearchStream.pas |
DIRegEx_Utils.pas | YuPcre2_RegEx2_Utils.pas |
Class names now contain “RegEx2” the number 2 is appended to “RegEx”. Most members, helper routines and identifier names are unchanged. Deprecated warnings are issued where appropriate.
TDIRegEx2Base.CompileOptions
is empty by default. In DIRegEx, coCaseLess
and coDotAll
were set by default. YuPcre2 excludes them for compatibility with PCRE2. If matching relies on these options, set them like this:
{ Set YuPcre2 CompileOptions to DIRegEx default: } RegEx.CompileOptions := [coCaseLess, coDotAll];
TDIRegEx2Base.BSR
and TDIRegEx2Base.NewLine
options are new properties of their own. In DIRegEx they were be part of the CompileOptions
and MachOptions
. As a consequence, BSR
and NewLine
options can no longer be passed to CompileMatchPatternStrOpt
but must be set beforehand.
pcre_exec
has become pcre2_match
. The PCRE_JAVASCRIPT_COMPAT
option has been split into independent functional options PCRE2_ALT_BSUX
, PCRE2_ALLOW_EMPTY_CLASS
, and PCRE2_MATCH_UNSET_BACKREF
.PCRE2_ZERO_TERMINATED
for zero-terminated strings.PCRE2_SIZE
elements instead of Integers. The special value PCRE2_UNSET
is used for unset elements.pcre2_jit_compile
after a successful return from pcre2_compile
.capture_last
field of the pcre2_callout_block
is now an unsigned integer, set to zero if there have been no captures.pcre2_substitute
that performs “find and replace” operations.PCRE2_NO_DOTSTAR_ANCHOR
, PCRE2_NEVER_BACKSLASH_C
, and PCRE2_ALT_CIRCUMFLEX
options.(*NOTEMPTY)
or (*NOTEMPTY_ATSTART)
to set the PCRE2_NOTEMPTY
or PCRE2_NOTEMPTY_ATSTART
options for every subject line that is matched by that pattern.(?(VERSION>=10)yes|no)
against a string such as “yesno”.^(\x{23a})\1*(.)
is matched caselessly (and in UTF-8 mode) against x{23a}\x{2c65}\x{2c65}\x{2c65}
, group 2 should capture the final character, which is the three bytes E2, B1, and A5 in UTF-8. Incorrect backtracking meant that group 2 captured only the last two bytes. This bug has been fixed; the new code is slower, but it is used only when the strings matched by the repetition are not all the same length.()a
was not setting the “first character must be 'a'” information. This applied to any pattern with a group that matched no characters, for example: (?:(?=.)|(?<!x))a
.(*ACCEPT)
is triggered inside capturing parentheses, it arranges for those parentheses to be closed with whatever has been captured so far. However, it was failing to mark any other groups between the highest capture so far and the currrent group as “unset”. Thus, the ovector for those groups contained whatever was previously there. An example is the pattern (x)|((*ACCEPT))
when matched against “abcd”.(*NO_JIT)
pattern feature.