YuPcre2 is an up to date regular expression library for Delphi with Perl syntax. Directly supports UnicodeString, AnsiString, or UCS4String, as well as UTF-8, and UTF-16.
Update to PCRE2 v10.44.
(((?⇐123?456456|ABC)))(?⇐\2)
.pcre2_set_max_pattern_compiled_length
to limit the size of compiled patterns, and TDIRegEx2Base.MaxPatternCompiledLength
.\X
: A break should occur between two characters with the Extended Pictographic break property unless a zero-width joiner intervenes. PCRE2 was not insisting on the ZWJ, causing \X
to match more than it should.Update to PCRE2 v10.43 final.
{,3}
which did not used to be treated as a quantifier. Now it is interpreted as {0,3}
and PCRE2 has changed to match. Note that {,}
is still not a quantifier.{
or before }
in all items that use braces, and also before or after the comma in quantifiers. PCRE2 now does the same, except for \u{…}
, which is recognized only when PCRE2_EXTRA_ALT_BSUX
is set. This an ECMAScript, non-Perl compatible, extension, so PCRE2 follows ECMAScript rather than Perl.pcre2_match
was not fully resetting all captures that had been set within a (possibly recursive) subroutine call such as (?3)
.\w
(and its synonyms) in UCP mode to match Perl. It now matches characters whose general categories are L or N or whose particular categories are Mn (non-spacing mark) or Pc (combining puntuation). The latter includes underscore.[:xdigit:]
in UCP mode to match Perl. It now also matches the “fullwidth” versions of the hex digits. Just like it is done for [:digit:]
, PCRE2_EXTRA_ASCII_DIGIT
can be used to keep this class ASCII only without affecting other POSIX classes.pcre2_dfa_match
.\b
and \B
in UCP mode to match the changes to \w
because \b
and \B
are defined in terms of \w
.(?aT)
and (?-aT)
set and reset the PCRE2_EXTRA_ASCII_DIGIT
option, and (?aP)
also sets (?aT)
so that (?-aP)
disables all ASCII restrictions on POSIX classes.PCRE2_FIRSTLINE
was set on an anchored pattern, pcre2_match
and pcre2_dfa_match
misbehaved. PCRE2_FIRSTLINE
is now ignored for anchored patterns.\z
was misbehaving when matching fragments inside invalid UTF strings.\X
matching in 32 bit mode without UTF in JIT.PCRE2_MATCH_UNSET_BACKREF
is set in JIT.(?0)
in pcre2_match
so that its end is handled similarly to other recursions. This has altered the behaviour of |(?0).
with PCRE2_ENDANCHORED
which was previously not right.[\x{ffffffff}]
when PCRE2_CASELESS
and PCRE2_UCP
(but not PCRE2_UTF
) were set.a?(?=bc|)d
used to set all of a, b, and d as possible starting code units; now it sets only a and d.pcre2_jit_match
.PCRE2_DISABLE_RECURSELOOP_CHECK
for pcre2_match
to enable some apparently looping recursions to run to completion and therefore match the JIT behaviour. With this set, real loops will eventually get caught by match or heap limits or run out of resource.pcre2_get_match_data_heapframes_size
allow for finer control of the heap used when pcre2_match
without JIT is used and the match_data might be reused.\d
:PCRE2_EXTRA_CASELESS_RESTRICT
to lock out mixing of ASCII and non-ASCII when matching caselessly.PCRE2_EXTRA_ASCII_{BSD,BSS,BSW,POSIX}
and corresponding (?aD) etc in patterns.PCRE2_EXTRA_ASCII_DIGIT
to allow [:digit:]
to be kept on sync with \d
even in UCP mode.pcre2_compile
to treat a nil
pattern with zero length as an empty string.pcre2_match
in the code for handling the vector of backtracking frames on the heap, which caused a heap overflow if *LIMIT_HEAP restricted an attempt to extend to less than the frame size.pcre2_match
always uses the heap for backtracking. The heap vector is remembered in the match data block and re-used if that block itself is re-used. It is freed with the match data block.pcre2_match_data_create
caused a crash because the field in the match data block is only 16 bits. A maximum of 65535 is now silently applied.\p
and \P
:\p{script:xxx}
and \p{script_extensions:xxx}
(synonyms sc and scx).\p{scriptname}
from being the same as \p{sc:scriptname}
to being the same as \p{scx:scriptname}
because this change happened in Perl at release 5.26.pcre2_match
, pcre2_dfa_match
, and pcre2_substitute
, and the replacement argument of the latter, if the pointer is nil
and the length is zero, treat as an empty string.nil
replacement to pcre2_substitute
.[Aa]
is optimized into a caseless single character match. When this was quantified (e.g. [Aa]{2}
) and was also the last literal item in a pattern, the optimizing “must be present for a match” character check was not being flagged as caseless, causing some matches that should have succeeded to fail.\K
in lookarounds, so PCRE2 now does the same by default. However, just in case anybody was relying on the old behaviour, there is an option called PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
that enables the old behaviour.(?|)
was present in the pattern, because (?|)
disables caching of group lengths.\214748364
gave an overflow error instead of being treated as the octal number \214
followed by literal digits.{65536
that has no terminating }
so is not a quantifier was nevertheless complaining that a quantifier number was too big.a\K.(?0)*
when matched against “abac” by the interpreter gave the answer “bac”, whereas Perl and JIT both yield “c”. This was because the effect of \K
was not propagating back from the full pattern recursion. Other recursions such as (a\K.(?1)*)
did not have this problem.TDIRegEx2_16
, TDIRegEx2_8
, and descendants.(?(VERSION=n.d
where n is any number but d is just a single digit, the code unit beyond d was being read (i.e. there was a read buffer overflow).PCRE2_MATCH_INVALID_UTF
in 8-bit mode when PCRE2_CASELESS
was set and PCRE2_NO_START_OPTIMIZE
was not set. The optimization for finding the start of a match was not resetting correctly after a failed match on the first valid fragment of the subject, possibly causing incorrect “no match” returns on subsequent fragments.Overview:
pcre2_substitute
:PCRE2_SUBSTITUTE_LITERAL
: The replacement string is literal.PCRE2_SUBSTITUTE_MATCHED
: Use pre-existing match data for 1st match.PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
: Return only replacement string(s).PCRE2_UCP
is set without PCRE2_UTF
, Unicode character properties are used for upper/lower case computations on characters whose code points are greater than 127.Details:
(?*
and (?<*
as synonms for (*napla:
and (*naplb:
to match another regex engine. The Perl regex folks are aware of this usage and have made a note about it.*THEN
verbs in lookahead assertions in JIT.(?(DEFINE)…)
groups were not being handled correctly when checking for the fixed length of a lookbehind assertion. Such a group within a lookbehind should be skipped, as it does not contribute to the length of the group. Instead, the (DEFINE)
group was being processed, and if at the end of the lookbehind, that end was not correctly recognized. Errors such as “lookbehind assertion is not fixed length” and also “internal error: bad code value in parsed_skip()” could result.PCRE2_MATCH_INVALID_UTF
was set and a match started immediately following the invalid high surrogate, such as aa
matching \x{d800}aa
.DEFINE
group immediately preceded a lookbehind assertion, the pattern could be mis-compiled and therefore not match correctly. This is the example that found this: (?(DEFINE)(?<foo>bar))(?<![-a-z0-9])word
which failed to match “word” because the “move back” value was set to zero.PCRE2_CONFIG_TABLES_LENGTH
is added to pcre2_config
so that an application that wants to save tables in binary knows how long they are.pcre2_match
interpreter, and integrate with the existing JIT support via the new PCRE2_MATCH_INVALID_UTF
compile-time option.(*ACCEPT)
to be quantified, because an ungreedy quantifier with a zero minimum is potentially useful.PCRE2_NO_START_OPTIMIZE
is set, no minimum length is computed.(*ACCEPT)
inside a qualified group whose minimum repetition was zero, for example A(?:(*ACCEPT))?B
, which incorrectly computed a minimum of 2. The minimum length scan no longer happens for a pattern that contains (*ACCEPT)
.(*MARK)
value inside a successful condition was not being returned by the interpretive matcher (it was returned by JIT). This bug has been mended.{1}
was always being ignored, but this is incorrect when it is made possessive and applied to an item in parentheses, because a parenthesized item may contain multiple branches or other backtracking points, for example (a|ab){1}+c
or (a+){1}+a
.pcre2_dfa_match
) was not recognising a partial match if the end of the subject was encountered in a lookahead (conditional or otherwise), an atomic group, or a recursion.(?⇐(?=(?⇐a)))b
was matched to “ab” it gave no match instead of matching “b”.pcre2_get_match_data_size
.c*+(?⇐[bc])
with subject “ab”.(?![ab]).*
with subject “ab”. This case applies only to PCRE2_PARTIAL_HARD
.\z
and \Z
as it is documented that they shouldn't match.(*ACCEPT)
was not being recognized as one that could match an empty string.pcre2_set_character_tables
tables data type: was const C_unsigned_char_num_ptr
instead of const C_uint8_t_ptr
, as generated by pcre2_maketables
.pcre2_maketables_free
function.pcre2_match
and to pcre2_dfa_match
.[Aa]
which contain just the two cases of the same character, to be treated as a single caseless character. This causes the first and required code unit optimizations to kick in where relevant.[Ww]ord
and (word|WORD)
. However, this optimization doesn't happen if there is a “required” code unit of the same value (because the search for a “required” code unit starts at the match start for non-unique first code unit patterns, but after a unique first code unit, and patterns such as a*a need the former action).\X
or \R
has a greater than 1 fixed quantifier.pcre2_substitute
.PCRE2_EXTRA_ESCAPED_CR_IS_LF
.(*pla:…)
and (*atomic:…)
. These are characterized by a lower case letter following (*
.(*script_run:…)
and (*atomic_script_run:…)
aka (*sr:…)
and (*asr:…)
.PCRE2_COPY_MATCHED_SUBJECT
for pcre2_match
(including JIT via pcre2_match
) and pcre2_dfa_match
, but *not* the pcre2_jit_match
fast path. Also, when a match fails, set the subject field in the match data to nil for tidiness - none of the substring extractors should reference this after match failure.(?&xxx)*ABC(?<xxx>XYZ)
would (incorrectly) expect 'A' to be the first character of a match.pcre2_dfa_match
could suffer from overflow if the heap limit was set very large. This could cause incorrect “heap limit exceeded” errors.(*MARK)
, (*COMMIT)
, (*PRUNE)
, (*SKIP)#, or
(*THEN) followed by
^ it was not recognized as anchored.
PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
set, escape sequences such as \s
which are valid in character classes, but not as the end of ranges, were being treated as literals. An example is [_-\s]
(but not [\s-_]
because that gave an error at the start of a range). Now an “invalid range” error is given independently of PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
.PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
was affecting known escape sequences such as \eX
when they appeared invalidly in a character class. Now the option applies only to unrecognized or malformed escape sequences.pcre2_dfa_match
function was incorrectly handling conditional version tests such as (?(VERSION>=0)…)
when the version test was true. Incorrect processing or a crash could result.PCRE2_UTF
is set, allow non-ASCII letters and decimal digits in group names, as Perl does.PCRE2_EXTRA_ALT_BSUX
to support ECMAScript 6's \u{hhh}
construct.\p{Any}
to be the same as .
in PCRE2_DOTALL
mode, so that it benefits from auto-anchoring if \p{Any}*
starts a pattern.DI.inc
include file. Directly link in DICompilers.inc
instead.TDIRegEx2_8.Replace
and TDIRegEx2_16.Replace
did not return the start of the string if StartOffset > 0.TDIRegEx2SearchStream_Enc
to DIConverters 1.18.0: Converter functions now use the native unsigned integer type for the length of a string and support stings longer than 2 GB. This change only affects projects using DIConverters 1.18.0.(*UTF)\C[^\v]+\x80
against an 8-bit string containing multi-code-unit characters caused bad behaviour and possibly a crash.pcre2_pattern_convert
, ensure the error offset is set zero for early errors.pcre2_dfa_match
so that the internal recursive calls no longer use the stack for local workspace and local ovectors. Instead, an initial block of stack is reserved, but if this is insufficient, heap memory is used. The heap limit parameter now applies to pcre2_dfa_match
.pcre2_substitute
, with global matching, a pattern that matched an empty string, but never at the starting match offset, was not handled in a Perl-compatible way. The pattern (<?=\G.)
is an example of such a pattern. Because \G
is in a lookbehind assertion, there has to be a “bumpalong” before there can be a match. The automatic “advance by one character after an empty string match” rule is therefore inappropriate. A more complicated algorithm has now been implemented.(?>a(*:1))(?>b)(*SKIP:1)x|.*
matched against “abc”, where the *SKIP
shouldn't find a MARK (because is in an atomic group), but it did.(*ACCEPT:ARG)
, (*FAIL:ARG)
, and (*COMMIT:ARG)
are now supported.(*MARK)
name was not being passed back for positive assertions that were terminated by (*ACCEPT)
.\N{U+dddd}
, but only in Unicode mode.(?^)
for unsetting all imnsx
options.PCRE2_EXTENDED
(/x
) option only ever discarded space characters whose code point was less than 256. Now, when Unicode support is compiled, PCRE2_EXTENDED
also discards U+0085, U+200E, U+200F, U+2028, and U+2029, which are additional characters defined by Unicode as “Pattern White Space”. This makes PCRE2 compatible with Perl.((?i)A)(?m)B
incorrectly matched “ab”. (The (?m)
setting lost the fact that (?i)
should be reset at the end of its group during the parse process, but without another setting such as (?m)
the compile phase got it right.)[^\x{100}-\x{ffff}]*[\x80-\xff]
which has a repeated negative class with no characters less than 0x100 followed by a positive class with only characters less than 0x100, the first class was incorrectly being auto-possessified, causing incorrect match failures.(?(1)^())b
or (?(?=^))b
.TDIRegEx2_16.MatchNext
which might not not have properly advanced the start offset if the previous match was an empty string.pcre2_config
options: PCRE2_CONFIG_NEVER_BACKSLASH_C
and PCRE2_CONFIG_COMPILED_WIDTHS
.pcre2_compile
error numbers.pcre2_jit_match
checks whether the pattern is compiled in a given mode, it was also expected that at least one mode is available. This is fixed and pcre2_jit_match
returns with PCRE2_ERROR_JIT_BADOPTION
when the pattern is not optimized by JIT at all.(?=(a))\1?b
, “b” was incorrectly set as the first character of a match.(?=(a))\1?b
caused this process to fail. This was an infelicity rather than an outright bug, because it did not affect the result of a match, just its speed. (In fact, in this case, the starting 'a' was subsequently picked up in the study.)pcre2_match
and set its never-changing fields once only. Do the same for pcre2_dfa_match
.PCRE2_INFO_EXTRAOPTIONS
to retrieve them.PCRE2_CALLOUT_STARTMATCH
and PCRE2_CALLOUT_BACKTRACK
bits to a new field callout_flags in callout blocks. The bits are set by pcre2_match
, but not by JIT or pcre2_dfa_match
. These bits are provided to help with tracking how a backtracking match is proceeding.PCRE2_FIRSTLINE
without PCRE2_NO_START_OPTIMIZE
was used in non-JIT matching (both pcre2_match
and pcre2_dfa_match
) and the matched string started with the first code unit of a newline sequence, matching failed because it was not tried at the newline.pcre2_match
and pcre2_dfa_match
. This was a missing optimization rather than a bug.pcre2_substring_list_get
. This could not actually cause a crash because it was always used in a memcpy() call with zero length.(a+)b
would auto-possessify the a+
) but this caused incorrect behaviour when the group was called recursively from elsewhere in the pattern where something different might follow. Iterators at the ends of capturing groups are no longer considered for auto-possessification if the pattern contains any recursions.PCRE2_ENDANCHORED
, coEndAnchored
, and moEndAnchored
.pcre2_match
, set by pcre2_set_heap_limit
, TDIPerlRegEx2_8.HeapLimit
, TDIDfaRegEx2_16.HeapLimit
, and the pattern start (*LIMIT_HEAP=xxx)
.(?(DEFINE)…)^A
and (…){0}^B
are now flagged as anchored.PCRE2_EXTENDED_MORE
and coExtendedMore
, and related /xx
and (?xx)
features.(?n:
for PCRE2_NO_AUTO_CAPTURE
and coNoAutoCapture
, because Perl now has this.PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
and coAllowSurrogateEscapes
;PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
and coBadEscapeIsLiteral
;PCRE2_EXTRA_MATCH_LINE
and coMatchLine
;PCRE2_EXTRA_MATCH_WORD
and coMatchWord
.PCRE2_NEWLINE_NUL
.pcre2_dfa_match
.pcre2_dfa_match
as there are patterns that can use up a lot of resources without necessarily recursing very deeply.PCRE2_LITERAL
and coLiteral
.pcre2_pattern_convert
and friends).[\d-X]
(and similar escapes), as is documented.New features:
pcre2_match
, has been refactored into a new version that does not use recursive function calls (and therefore the stack) for remembering backtracking positions. The new implementation allows backtracking into recursive group calls in patterns, making it more compatible with Perl, and also fixes some other hard-to-do issues.pcre2_match
no longer uses recursive function calls (see above), the “match limit recursion” value seems misnamed. It still exists, and limits the depth of tree that is searched. To avoid future confusion, it has been renamed as “depth limit” in all relevant places (TDIRegEx2Base.MatchLimitDepth
, PCRE2_INFO_DEPTHLIMIT
, PCRE2_CONFIG_DEPTHLIMIT
, PCRE2_ERROR_DEPTHLIMIT
, pcre2_set_depth_limit
, etc.) but the old names are still available for backwards compatibility.PCRE2_CONFIG_STACKRECURSE
is no longer used and deprecated.PCRE2_INFO_FRAMESIZE
item to pcre2_pattern_info
and the InfoFrameSize
property to TDIRegEx2_8
as well as TDIRegEx2_16.InfoFrameSize
.Bug fixes:
PCRE2_ANCHORED
option, undefined actions (often a segmentation fault) could occur, depending on what other options were set. An example assertion is (?<!\1(abc))
where the reference \1
precedes the group (abc)
.pcre2_serialize_decode
when the input is invalid.pcre2_callout_enumerate
if called with a nil pattern pointer.pcre2_dfa_match
misbehaved if it encountered a character class with a possessive repeat, for example [a-f]{3}+
.New features:
pcre2_code_copy_with_tables
.\g{+<number>}
(e.g. \g{+2}
) is now supported. It is a “forward back reference” and can be useful in repetitions (compare \g{-<number>}
). Perl does not recognize this syntax.Optimizations:
pcre2_dfa_match
function now takes note of the recursion limit for the internal recursive calls that are used for lookrounds and recursions within the pattern.X?(R||){3335}
.Bug fixes:
\D
in a positive class should cause all characters greater than 255 to match, whatever else is in the class. There was a bug that caused this not to happen if a Unicode property item was added to such a class, for example [\D\P{Nd}]
or [\W\pL]
.pcre2_compile
. Most syntax checking is now done in the pre-pass that identifies capturing groups. While doing this, some minor bugs and Perl incompatibilities were fixed, including:\Q\E
in the middle of a quantifier such as A+\Q\E+
is now ignored instead of giving an invalid quantifier error.{0}
can now be used after a group in a lookbehind assertion; previously this caused an “assertion is not fixed length” error.(?(DEFINE)
as a “define” group, even if a group with the name “DEFINE” exists. PCRE2 now does likewise.(?(R2)…)
must now refer to an existing subpattern.(?(R)…)
misbehaved if there was a group whose name began with “R”.[[:ascii:]-z]
) now generates an error. Perl does accept this as a literal, but gives a warning, so it seems best to fail it in PCRE.\Q\E
sequence may appear after a callout that precedes an assertion condition (it is, of course, ignored).(?|
has not been used), and, if the reference is by name, there is only one group of that name. The referenced group must, of course be of fixed length.PCRE2_NO_START_OPTIMIZE
was *not* set:(?=.*X)X$
was incorrectly optimized as if it needed both an initial 'X' and a following 'X'..*
were incorrectly optimized as having to match at the start of the subject or after a newline. There are cases where this is not true, for example, (?=.*[A-Z])(?=.{8,16})(?!.*[\s])
matches after the start in lines that start with spaces. Starting .*
in an assertion is no longer taken as an indication of matching at the start (or after a newline).PCRE2_DOTALL
(/s
) set but not PCRE2_NO_DOTSTAR_ANCHOR
, and which started with .*
inside a positive lookahead was incorrectly being compiled as implicitly anchored..
against an empty string when the newline type is CRLF.\p
, \P
, or \X
in a substitution string when PCRE2_SUBSTITUTE_EXTENDED
was set caused a segmentation fault (nil
dereference).pcre2_substitute
an out-of-bounds memory reference could occur.PCRE2_UCP
set without PCRE2_UTF
if a class required all wide characters to match (for example, [\s[:^ascii:]]
).PCRE2_CASELESS
when processing \h
, \H
, \v
, and \V
in classes as it just wastes time. In the UTF case it can also produce redundant entries in XCLASS lists caused by characters with multiple other cases and pairs of characters in the same “not-x” sublists.New Features:
pcre2_code_copy
to make a copy of a compiled pattern.PCRE2_NO_JIT
option for pcre2_match
and moNoJit
option for TDIRegEx2Base.MatchOptions
.pcre2_get_error_message
with error numbers that are never returned by PCRE2 functions were returning empty strings. Now the error code PCRE2_ERROR_BADDATA
is returned.\C
in lookbehinds and DFA matching in UTF-32 mode.Bug fixes:
(?J)(?'a'))(?'a')
gave a message about invalid duplicate group names.(*ACCEPT)
in the middle of a sufficiently deeply nested set of parentheses of sufficient size caused an overflow of the compiling workspace (which was diagnosed, but of course is not desirable).New features:
TDIRegEx2Base.MaxPatternLength
and pcre2_set_max_pattern_length
.TDIRegEx2Base.OffsetLimit
and pcre2_set_offset_limit
.pcre2_substitute
options PCRE2_SUBSTITUTE_EXTENDED
, PCRE2_SUBSTITUTE_UNSET_EMPTY
, PCRE2_SUBSTITUTE_UNKNOWN_UNSET
, and PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
.Bug fixes:
[\W\p{Any}]
where both a negative-type escape (“not a word character”) and a property escape were present, the property escape was being ignored.[[:<:]]
and [[:>:]]
gave rise to incorrect compiling errors or other strange effects if compiled in UCP mode.[:^ascii:]
or [:^xdigit:]
are present in a non-negated class, all characters with code points greater than 255 are in the class. When a Unicode property was also in the class (if PCRE2_UCP
is set, escapes such as \w
are turned into Unicode properties), wide characters were not correctly handled, and could fail to match. Negated classes such as [^[:^ascii:]\d]
were also not working correctly in UCP mode.PCRE2_AUTO_CALLOUT
was set on a pattern that had a (?#
comment between an item and its qualifier (for example, A(?#comment)?B
) pcre2_compile
misbehaved.\E
was present between an item and its qualifier when PCRE2_AUTO_CALLOUT
was set, pcre2_compile
misbehaved.\Q\E
sequence between an item and its qualifier caused pcre2_compile
to misbehave when auto callouts were enabled.PCRE2_ALT_VERBNAMES
and PCRE2_EXTENDED
were set, and a (*MARK)
or other verb “name” ended with whitespace immediately before the closing parenthesis, pcre2_compile
misbehaved. Example: (*:abc )
, but only when both those options were set.pcre2_compile
was not handling nil
characters correctly.PCRE2_EXTENDED
started with white space or a #-type comment that was followed by (?-x)
, which turns off PCRE2_EXTENDED
, and there was no subsequent (?x)
to turn it on again, pcre2_compile
assumed that (?-x)
applied to the whole pattern and consequently mis-compiled it. The fix for this bug means that a setting of any of the (?imsxU)
options at the start of a pattern is no longer transferred to the options that are returned by PCRE2_INFO_ALLOPTIONS
. In fact, this was an anachronism that should have changed when the effects of those options were all moved to compile time.(*verb)
when PCRE2_ALT_VERBNAMES
was set caused pcre2_compile
to malfunction.pcre2_match
and pcre2_dfa_match
to look only at the part of the subject that is relevant when the starting offset is non-zero.pcre2_substitute
.PCRE2_ALT_VERBNAMES
and coAltVerbnames
.[:
caused pcre2_compile
to run for a very long time.(?R
was followed by -
or +
incorrect behaviour happened instead of a diagnostic.\Q
.(?|…)
group, the computation of the minimum matching length gave a wrong result, which could cause incorrect “no match” errors. For such patterns, a minimum matching length cannot at present be computed.(?(<digits>)
and (?(R<digits>)
.\p{Any}
inside an xclass did not read the current character.*THEN
control verbs.(?3)
are compiled has been re-written because the old way was the cause of many issues. Now, conversion of the group number into a pattern offset does not happen until the pattern has been completely compiled. This does mean that detection of all infinitely looping recursions is postponed till match time. In the past, some easy ones were detected at compile time.\987
. This caused incorrect code to be compiled.\g
and \k
were giving inaccurate offsets in the pattern.(*LIMIT_MATCH=)
now gives an error instead of setting the value to 0.