DIRegEx is a library of components and procedures that implement regular expression pattern matching using the same syntax and semantics as Perl for Delphi (Embarcadero / CodeGear / Borland).
Please register and / or log in to edit. Anonymous Wiki edits are disabled to protect against vandalism.
Question: Apparently, specific patterns with specific subjects can drive DIRegEx into stack overflow errors. How can I avoid these?
Answer: DIRegEx uses a recursive matching algorithm which can run out of stack space with extremely demanding patterns. The recursive algorithm was still chosen over an iterative implementation for performance reasons: Extensive testing revealed that DIRegEx runs multiple times faster with recursion than without.
Even though stack overflows are a real problem, they happens so rarely with common regex patterns and subjects that most DIRegEx users will never notice. In case you ever do, these steps can help to avoid them. Obviously, all 3 options combined yield best results:
{$MAXSTACKSIZE $00400000}
, preferably in your *.dpr file, should stop overflows even for extremely demanding patterns. {$MAXSTACKSIZE $00200000}
enables most normally demanding patterns to run well and is a reasonable precaution setting.In Windows, the stack size is defined on a per-thread basis when the thread is created. This means that the calling thread's stack size applies to DIRegEx even if it is compiled into an external *.bpl or *.dll link library. It defaults to the application's stack size when called from the main thread.
If the calling application's stack size is too small and you are unable to change its stack size because you are writing a plugin or extension for a larger application, you might want to run DIRegEx in a newly created thread. The Windows CreateThread() function's dwStackSize parameter allows to change the initially committed stack space, which you should choose according to your needs.
Unfortunately, it is not possible to predict the required stack size in advance. It is highly dependent on the number of potential matches in the subject text. A pattern which works well with larger text can still fail with shorter ones if it encounters lots of failed matches which must be backtracked.
This page contains an interesting e-mail conversion about the internals of TDIRegExSearchStream and descendent classes (German / Deutsch).
This is sample function which shows if string contains a given regex or not.
function RegExMatch(const Str, Re: string; ACaseSens: Boolean): Boolean; var RegEx: TDIRegEx; begin Result := False; if (Str = '') or (Re = '') then Exit; RegEx := TDIPerlRegEx.Create(nil); try //DIRegEx_Api.set_locale(LANG_RUSSIAN); //RegEx.Options := RegEx.Options + [poUserLocale]; if ACaseSens then RegEx.CompileOptions := RegEx.CompileOptions - [coCaseLess] else RegEx.CompileOptions := RegEx.CompileOptions + [coCaseLess]; RegEx.SetSubjectStr(Str); RegEx.MatchPattern := Re; Result := RegEx.Match(0) >= 0; finally RegEx.Free; end; end;
This is sample function which replaces a regex ASearch with regex AReplace in string AValue.
function RegExReplace( const AValue: AnsiString; const ASearch: AnsiString; const AReplace: AnsiString; const AOptions: TDIRegexCompileOptions = [coCaseLess]): AnsiString; var RE: TDIPerlRegEx; begin RE := TDIPerlRegEx.Create(nil); try RE.SetSubjectStr(AValue); RE.CompileOptions := AOptions; RE.CompileMatchPatternStr(ASearch); RE.FormatPattern := AReplace; if RE.Replace2(Result) = 0 then Result := AValue; finally RE.Free; end; end;
This function returns original string, where all occurances of regex are filled with a char. (e.g. “\w{3,7}” can match “wwwww” - replaced with “…..”)
function RegExReplaceToChar( const Str, Re: string; ch: Char; ACaseSens: Boolean): string; var RegEx: TDIRegEx; N_prev, N, i: Integer; begin Result := Str; if (Str = '') or (Re = '') then Exit; RegEx := TDIPerlRegEx.Create(nil); try if ACaseSens then RegEx.CompileOptions := RegEx.CompileOptions - [coCaseLess] else RegEx.CompileOptions := RegEx.CompileOptions + [coCaseLess]; RegEx.MatchPattern := Re; N_prev := -1; repeat RegEx.SetSubjectStr(Result); if RegEx.Match(0) < 0 then Break; N := RegEx.MatchedStrFirstCharPos + 1; if N = N_prev then Break; N_prev := N; for i := N to (N + RegEx.MatchedStrLength - 1) do Result[i] := ch; until False; finally RegEx.Free; end; end;