How to enable single and double quotes and marks in a regular expression in a paragraph Question in Google Forms?

Question

How to enable single and double quotes and marks in a regular expression in a paragraph Question in Google Forms?

56 Views Asked by Laura At 17 March 2024 at 20:35

In a paragraph question in google forms, the following settings are used to stop the input of emojis, emdashes (character \x97), and € (character \x80): regular expression matches ^[\x0A-\xFF]*$.

$Capture of Google Forms input: Regular expression Matches ^[\x0A-\xFF]*$$

In a Chrome browser on a mobile device (not a desktop device) this regular expression restricts the input of:

Double quotes (character \x22)
Single quotes (character \x27)
Left single quotation mark (character \x91)
Right single quotation mark (character \x92)
Left double quotation mark (character \x93)
Right double quotation mark (character \x94)

although the expression ^[\x0A-\xFF]*$ includes character 10 (\x0A) to character 255 (\xFF).

How can I update the regular expression ^[\x0A-\xFF]*$ to enable the 6 items above?

I've tried inputting different formulas in the regular expression, such as ^([^\\\p{Emoji}]|\\[^p{Emoji}])*$ but this was not helpful, it made the situation worse.

Original Q&A

There are 1 best solutions below

**Éric** · Answer 1 · 2024-03-22T17:23:28.147000

TL;DR

You confused Windows Latin-1 and Unicode character sets in your numeral representations of characters, this is why your regular expression did not return the expected results. I corrected this and removed from the class some non-pertinent characters to obtain this regular expression for use in Google Forms: ^[\x0A\x0D\x20-\x7E\xA0-\xFF\x{2018}\x{2019}\x{201C}\x{201D}]*$.

Your problem on mobile devices may result from the behavior of virtual keyboards inputing unexpected quotation marks that are not targeted by your regular expression (please read below).

Detailed answer

In the following, I used 255 for the decimal notation, and \xFF for the hexadecimal notation.

The problem is that you are designating characters with their numeral representation in the Windows Latin-1 (CP1252) character set, when the Google RE2 regular expression library implemented in Google Forms designates characters with their Unicode code points (probably like most – if not all – modern regular expression engines).
For the first 256 positions (\x00 to \xFF), characters are identical in both sets, so the confusion is permitted since the RE2 expression ^[\x0A-\xFF]*$ matches the same characters, which are:

! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ® ¯ ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ

N.B.: the blanks above correspond to non-printable characters.

But for building RE2 compatible regular expressions with characters in positions higher than \xFF, you must use the Unicode values ("code points").

Let us compare the numeral representations of the characters considered in your question:

Character	Description	Position in Windows Latin-1 character set	Position in the Unicode character set	Must match the regular expression
`"`	quotation mark (or double quote)	`34` or `\x22`	`34` or `\x22`	yes
`'`	apostrophe (or single quote)	`39` or `\x27`	`39` or `\x27`	yes
`‘`	left single quotation mark	`145` or `\x91`	`8216` or `\x2018`	yes
`’`	right single quotation mark	`146` or `\x92`	`8217` or `\x2019`	yes
`“`	left double quotation mark	`147` or `\x93`	`8220` or `\x201C`	yes
`”`	right double quotation mark	`148` or `\x94`	`8221` or `\x201D`	yes
`—`	Em dash	`151` or `\x97`	`8212` or `\x2014`	no
`€`	Euro sign	`128` or `\x80`	`8364` or `\x20AC`	no
	grinning face	not included	`128512` or `\x1F600`	no
other emojis	other emojis	not included	`...` or `\x...`	no

All the above clarifies that your regular expression ^[\x0A-\xFF]*$ will match lower-position characters, but not the left/right quotation marks that stand at high positions (well above \xFF) in Unicode. So you need to extend the character class with the representations of these specific marks, like this: ^[\x0A-\xFF\x{2018}\x{2019}\x{201C}\x{201D}]*$.
Curly brackets are required by RE2 for hexadecimal numbers made of three digits or more.

Incidentally, it seems unecessary to me to include all the control characters between positions \x0A and \x1F (only \x0A and \x0D seem pertinent to me). Also positions \x7F to \x9F are assigned to control (thus non-printable) characters that are not to be input in your case. So a more pertinent, yet longer, expression would be ^[\x0A\x0D\x20-\x7E\xA0-\xFF\x{2018}\x{2019}\x{201C}\x{201D}]*$. You can test it there.

By the way, these expressions exclude the Euro sign, the Em dash and emojis as desired.
The mismatch with characters \x22 and \x27 on mobile device may result from the virtual keyboard not inputing exactly the character targeted in the regular expression (quotations marks are numerous in Unicode and their shape sometimes very similar depending on the font; you could include more quotation marks in your character class).
Also, be aware that the Google RE2 library does not support the \p{Emoji} character class.

How to enable single and double quotes and marks in a regular expression in a paragraph Question in Google Forms?

There are 1 best solutions below

TL;DR

Detailed answer

Related Questions in REGEX

Related Questions in GOOGLE-FORMS

Related Questions in DOUBLE-QUOTES

Related Questions in SINGLE-QUOTES

Trending Questions

Popular # Hahtags

Popular Questions