Is there any way to pattern match with Unicode graphemes?
As a quick example, when I run this query:
CREATE TABLE test (
id SERIAL NOT NULL,
name VARCHAR NOT NULL,
PRIMARY KEY (id),
UNIQUE (name)
);
INSERT INTO test (name) VALUES (' One');
INSERT INTO test (name) VALUES (' Two');
SELECT * FROM public.test WHERE test.name LIKE '%';
I get both rows returned, rather than just ' Two'. Postgres seems to be just comparing code points, but I want it to compare full graphemes, so it should only match ' Two', because is a different grapheme.
Is this possible?
It's a very interesting question!
I am not quite sure if it is possible anyway:
The skinned emojis are, in fact, two joined characters (like ligatures). The first character is the yellow hand which is followed by an emoji skin modifier
This is how the light skinned hand is stored internally. So, for me, your result makes sense:
When you query any string, that begins with , it will return:
Two(trivial)_ One(ignore the underscore, I try to suppress the automated ligature with this)So, you can see, the light skinned emoji internally also starts with . That's why I believe, that your query doesn't work the way you like.
Workarounds/Solutions:
You can add a space to your query. This ensures, that there's no skin modifier after your character. Naturally, this only works in your case, where all data sets have a space after the hand:
You can simply extend the
WHEREclause like this:You can use regular expression pattern matching to exclude the skins:
see demo:db<>fiddle (note that the fiddle seems not to provide automated ligatures, so both characters are separated displayed there)