I have a program that reads a list of unescaped unicode strings (u/XXXX) and converts them into their encoded unicode character, writing that version to both the terminal and to a textfile.
I'm using org.apache.commons.text.StringEscapeUtils.unescapeJava(String) to handle the unescaping of the escaped unicode points. (From Apache Commons Text library.)
I'm referring to these unicode entries to get my private-use characters: https://jrgraphix.net/r/Unicode/E000-F8FF
(I preprend u/ with the hex digits shown above ^)
Heres an example of what the output should look like:
If you pasted that into a ctrl F box on the website above, you'll see that it points to E022
Now, here is my question, and by extension the problem I am having:
Its not working. For some reason, it doesn't output the character itself, rather it just outputs a generic question mark that does not represent the private use char in question. If someone can help me with this it'd be much appreciated.
So far, I have had no luck.
tl;dr
\uXXXXTo get an officially sanctioned Red Heart:
Example code
You did not show your exact code. But your Question mentions
u/XXXXwhich is incorrect. Correct syntax in Java for a Unicode hexadecimal is\uXXXX.You can verify your hexadecimal literal by asking for the code point, as shown below.
Here is some example code.
Dump to console.
When run:
Red heart emoji
If you really want a red heart, Unicode does define an emoji.
But accessing this emoji requires two code points. Unicode 1.1 in 1993 defined “Heavy Black Heart” at decimal 10,084 (U+2764). Later versions of Unicode added Emoji 1.0 definitions in 2015, adding a definition for Red Heart by combining
HEAVY BLACK HEARTwithVARIATION SELECTOR-16at decimal 65,039 (U+FEOF).See
red heartrow of Full Emoji List at the Unicode Consortium web site. However, that row appears to me to be incorrect in that it fails to mention the requiredU+FE0Fcode point.Full example code:
When run:
A PUA has no officially assigned characters
By definition, a Private Use Area (PUA) has no characters assigned by the Unicode Consortium. All the code point numbers in that range are promised by the Unicode Consortium to never be officially assigned any character.
These leaves all of us free to create a font that assigns any kind of glyph we want to assign to any of those code points.
You may want to create a font with red heart cartoon at code point E022. Meanwhile I may choose to make a font that has a drawing of a cockatiel. And some guy named Bob creates his own font with a picture of a Microlino car at E022. All of us, you, me, and Bob, are all happy knowing that our custom font will never be stomped on by a future officially sanctioned character at that code point.
If Alice likes your red heart, and wants to use it, she needs to obtain a copy of your font. She needs to install that font on her computer. And she needs to either:
If Alice has installed no fonts at all with a glyph at E022, then the operating system of her computer will fall back to displaying some kind of substitute glyph such as an empty box or question mark or nothing to indicate the lack of a glyph.
The three PUAs defined in Unicode have turned out to be rather popular. People use them to create fonts for characters that do not meet the requirements of the Unicode Consortium, preventing those characters from ever being considered for future inclusion in Unicode. For example, fictional languages such as Klingon in Star Trek or elves’ language from novels.
This popularity has prompted volunteers outside the Unicode Consortium to devise a public registry of the PUA code points, in an attempt to avoid conflicts among various fonts over particular code points.