This formula is great Limit the number of words in a response with a regular expression It works great in English
But probably not so good in Hebrew
Is there anything that should be changed?
I entered the formula But when I try to enter the form in Hebrew An error message (of word limit) is observed already in the first word
If it's in English it works great
I hope I am answering in the right place I'm having a little trouble understanding where to answer This is the screenshot https://prnt.sc/fO3OeoPeMRXI This is the error message https://prnt.sc/gPMzM9eVY2Ej
I suggested before:
But
\b(word boundary) does not work well with any non-Latin script (like Hebrew) in regular expressions.It is due to its reliance on the characteristics of the Latin script and the ASCII character set:
\bmatches positions where one side is a word character (\w) and the other side is not a word character.For Latin scripts,
\wmatches any alphanumeric character (letters and digits) and underscores (_). However, this set of characters does not include characters from scripts like Hebrew, Arabic, Cyrillic, etc. So\bdoes not recognize the boundary of a word written in Hebrew correctly, as it does not see Hebrew characters as part of the\wcategory.To work with other non-Latin scripts (like Hebrew for instance), you would need to define your own word boundaries, typically by directly specifying the range of characters in the script (like
[\u0590-\u05FF]for Hebrew) and using other means to detect spaces or separators between words. That is why custom solutions are necessary for regex operations in non-Latin scripts.In the regex pattern
^(?:[\u0590-\u05FF]+(?:\s+|$)){0,250}$designed for Hebrew text, the detection of spaces or separators between words is handled by the part(?:\s+|$).The screenshot indicates that the error is occurring due to the use of PCRE2 (Perl Compatible Regular Expressions version 2) in PHP which does not support the use of
\Uin the regular expression. That is a common issue when transitioning from PCRE to PCRE2, as\Uis interpreted as the start of a Unicode escape sequence, which is not completed in the pattern.To fix this, you should use lowercase
\ufor Unicode escape sequences in your regular expression, and also make sure your regular expression is enclosed in double quotes (" "), since PHP interprets escape sequences differently in single-quoted strings. The double quotes will allow PHP to interpret the\uescape sequence correctly.With:
\u{0590}-\u{05FF}is the correct syntax for Unicode escape sequences in PHP regex.umodifier at the end of the regex pattern, necessary to treat the pattern as UTF-8.The error upon form submission could also be due to several factors unrelated to the regex itself, such as:
Make sure the server-side environment is correctly configured to handle UTF-8 encoded data, and that the form processing script is using the corrected regex pattern. If the issue persists, they might need to check the documentation of the JetForm plugin or contact support for that plugin to resolve compatibility issues with Unicode patterns in PHP.
The regex
^\s?([\u0590-\u05fe]+\s?){1,5}$from your picture is intended to match between 1 to 5 groups of Hebrew characters, where each group is optionally preceded by a whitespace character and optionally followed by a whitespace character.That regex is anchored at the beginning (
^) and end ($) of the string to match the whole input.It might fail with a
Sanitize_Value_Exceptionin a Jet form because of:Incorrect Unicode Syntax: In PHP PCRE regex, Unicode characters should be expressed with
\x{}or\u{}syntax when using theumodifier. The regex provided lacks the curly braces{}around the Unicode hex codes, which might be causing the pattern to be invalid or incorrectly interpreted by the PHP engine.Character Range: The range
\u0590-\u05feincludes almost all the characters in the Hebrew block of Unicode, but the syntax without braces{}is incorrect in PHP.Form Field Validation: The Jet form might be expecting a certain format or encoding for the input data, and if the input does not strictly conform to these expectations, it could throw a
Sanitize_Value_Exception.The corrected regex in PHP should look like this:
Or, if you are including it in a PHP string, it should be double-escaped:
The input might be sanitized in a way that removes or alters characters expected by the regex, causing the match to fail.