Character::IsEmoji not working for Characters with numbers in them?

79 Views Asked by At

I have a Java 21 app where I want to determine if a string has an emoji. I am using the newly created Emoji API from Java 21 but every time I have an input String containing a number like "123" Character::isEmoji() returns true. I have been using this as a resource: https://inside.java/2023/11/20/sip089/

This is the code I have been using:

  private boolean containsEmoji(String s) {
    return s.codePoints().anyMatch(Character::isEmoji);
  }

For example:

System.out.println(
        "123".codePoints().anyMatch( Character :: isEmoji )
);

true

And also:

  private boolean containsEmoji(String s) {
    for(int i = 0; i < s.length(); i++) {
      int codePoint = s.codePointAt(i);
      if (Character.isEmoji(codePoint)) {
        return true;
      }
    } 
    return false;
  }
1

There are 1 best solutions below

4
Basil Bourque On BEST ANSWER

Those digits are emoji, technically

Yes, digits 0-9 in the Basic Latin (US-ASCII) block of Unicode are considered to be Emoji, for reasons that escape me.

Follow the trail of documentation:

  1. Javadoc for Character.isEmoji
  2. Unicode Emoji (Technical Standard #51)
  3. emoji-data
  4. emoji-data.txt (for Emoji Version 15.1)

… lists:

0030..0039 ; Emoji # E0.0 [10] (0️..9️) digit zero..digit nine

Section 1.5.2 Versioning of the Unicode page explains comment E0.0 as:

This label is used for special characters, including:

• Most emoji component characters, regardless of when they were first encoded.

• Other non-emoji characters in the data files.

… which confounds me.

But it seems to me that Character.isEmoji reporting plain digits as being emoji is a feature, not a bug.

Use Charater.isEmojiPresentation

To determine if a character is what we more commonly think of as an emoji, use another method on Character class: Charater.isEmojiPresentation. That method returns false for the code points of the Basic Latin digits.