Imagine a string of single ASCII character i (U+0069). In Turkish and akin writing system, ı (U+0131) is present as well. Can Unicode normalization split U+0069 (i) into U+0131 U+0307 (ı̇)? Is it locale-dependent, and so might vary on environment?
Is ASCII-only Unicode string always normalized?
136 Views Asked by Netch At
1
There are 1 best solutions below
Related Questions in LOCALE
- Set Netbeans Console to English
- How can I copy a date from excel to powerpoint through vba and forcing english format regardless of local formatting?
- Formatting very large numbers to local settings in javascript
- Accessing locale in nextJS 14 App Directory
- GetDateFormatEx returns a non-zero value but the buffer returned by it is faulty
- Remove grouping from numpunct of std::locale
- Fullcalendar locale doesn't work in Chorme and Edge
- Rewriting URL using Locale in Next JS 13
- Javascript: Date string in different locale to Timestamp
- How to parse country code from RFC 5646 locale string?
- Wrong Hungarian (hu_HU) sort order
- Change language in keycloak login page
- After updating application locale of an android app, toast message is showing in previous language
- How to set default stings.xml using AppCompatDelegate in android?
- How to set language for a Vonage's sms in Laravel?
Related Questions in UNICODE-NORMALIZATION
- Python Unicode Normalization Can Not Normalize '\u0069\u0307' (i̇)
- how to identify accent characters in string - Apex Salesforce?
- Unicode normalization of homoglyphs to ASCII using Rust
- Does UTF-8 have more than one version?
- JavaFX string to unicode entity conversion
- Convert superscript 2 (²) symbol to string
- Postgres compare text with special font/encoding
- Some annoying characters are not normalised by unicodedata
- Is ASCII-only Unicode string always normalized?
- How to convert Fancy text to normal text in javascript?
- Accessing S3 multibyte Unicode character filename files using TypeScript (JavaScript) from the browser
- unicode normalization: dotless i + accent
- Special character returns wrong codepoint
- How do I process multi-character Unicode emojis in Python 3 with the unicodedata module?
- Where can I get examples of unicode that normalizes differently?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
The normali\ation forms defined by Unicode are not locale-specific; they have no input other than the sequence of code points to be normalized.
The Unicode website has a user-friendly chart of all characters which differ between the standardized normalization forms.
Unfortunately, it is grouped by script, not by block, so we can't quickly check all the characters in the "Basic Latin" block (which matches the 128 characters of ASCII).
Searching for "0069" specifically, we see that it appears as the result of normalising certain code points - either as part of a "decomposition" in NFD, or as a compatibility replacement in forms NFKC and NFKD. However, it doesn't appear in the input column, because it doesn't change when converted to any of the normalization forms.
I have not checked the other Basic Latin characters, but would be extremely surprised if any of them normalize to anything other than themselves. So to answer your original question: yes, I believe a string that only uses code points U+0000 to U+007F (the code points inherited from the 7-bit ASCII standard) will not change in any of the normalization forms defined by Unicode.