In a paragraph question in google forms, the following settings are used to stop the input of emojis, emdashes (character \x97), and € (character \x80): regular expression matches ^[\x0A-\xFF]*$.
![Capture of Google Forms input: Regular expression Matches ^[\x0A-\xFF]*$](https://i.stack.imgur.com/BvYG6.png)
In a Chrome browser on a mobile device (not a desktop device) this regular expression restricts the input of:
- Double quotes (character
\x22) - Single quotes (character
\x27) - Left single quotation mark (character
\x91) - Right single quotation mark (character
\x92) - Left double quotation mark (character
\x93) - Right double quotation mark (character
\x94)
although the expression ^[\x0A-\xFF]*$ includes character 10 (\x0A) to character 255 (\xFF).
How can I update the regular expression ^[\x0A-\xFF]*$ to enable the 6 items above?
I've tried inputting different formulas in the regular expression, such as ^([^\\\p{Emoji}]|\\[^p{Emoji}])*$ but this was not helpful, it made the situation worse.
TL;DR
You confused Windows Latin-1 and Unicode character sets in your numeral representations of characters, this is why your regular expression did not return the expected results. I corrected this and removed from the class some non-pertinent characters to obtain this regular expression for use in Google Forms:
^[\x0A\x0D\x20-\x7E\xA0-\xFF\x{2018}\x{2019}\x{201C}\x{201D}]*$.Your problem on mobile devices may result from the behavior of virtual keyboards inputing unexpected quotation marks that are not targeted by your regular expression (please read below).
Detailed answer
In the following, I used
255for the decimal notation, and\xFFfor the hexadecimal notation.The problem is that you are designating characters with their numeral representation in the Windows Latin-1 (CP1252) character set, when the Google RE2 regular expression library implemented in Google Forms designates characters with their Unicode code points (probably like most – if not all – modern regular expression engines).
For the first 256 positions (
\x00to\xFF), characters are identical in both sets, so the confusion is permitted since the RE2 expression^[\x0A-\xFF]*$matches the same characters, which are:! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ® ¯ ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿN.B.: the blanks above correspond to non-printable characters.
But for building RE2 compatible regular expressions with characters in positions higher than
\xFF, you must use the Unicode values ("code points").Let us compare the numeral representations of the characters considered in your question:
Windows Latin-1
character set
the Unicode
character set
the regular
expression
"34or\x2234or\x22'39or\x2739or\x27‘145or\x918216or\x2018’146or\x928217or\x2019“147or\x938220or\x201C”148or\x948221or\x201D—151or\x978212or\x2014€128or\x808364or\x20AC128512or\x1F600...or\x...All the above clarifies that your regular expression
^[\x0A-\xFF]*$will match lower-position characters, but not the left/right quotation marks that stand at high positions (well above\xFF) in Unicode. So you need to extend the character class with the representations of these specific marks, like this:^[\x0A-\xFF\x{2018}\x{2019}\x{201C}\x{201D}]*$.Curly brackets are required by RE2 for hexadecimal numbers made of three digits or more.
Incidentally, it seems unecessary to me to include all the control characters between positions
\x0Aand\x1F(only\x0Aand\x0Dseem pertinent to me). Also positions\x7Fto\x9Fare assigned to control (thus non-printable) characters that are not to be input in your case. So a more pertinent, yet longer, expression would be^[\x0A\x0D\x20-\x7E\xA0-\xFF\x{2018}\x{2019}\x{201C}\x{201D}]*$. You can test it there.By the way, these expressions exclude the Euro sign, the Em dash and emojis as desired.
The mismatch with characters
\x22and\x27on mobile device may result from the virtual keyboard not inputing exactly the character targeted in the regular expression (quotations marks are numerous in Unicode and their shape sometimes very similar depending on the font; you could include more quotation marks in your character class).Also, be aware that the Google RE2 library does not support the
\p{Emoji}character class.