According to Wikipedia, in 2017 using an uppercase ẞ (Unicode U+1E9E) was officially adopted--at least as an option--for what may in fact be a subset of fully-capitalized words in German:
In June of that year, the Council for German Orthography officially adopted a rule that ⟨ẞ⟩ would be an option for capitalizing ⟨ß⟩ besides the previous capitalization as ⟨SS⟩ (i.e., variants STRASSE and STRAẞE would be accepted as equally valid).2
It seems like this addition to the German language would greatly simplify case-comparisons between strings (so-called "case-folding" or "fold-case" comparisons). Note, I started this inquiry trying to understand Raku's (a.k.a. Perl6's) implementation, but the question in fact seems to generalize to other programming languages. Here is Raku's default implementation--starting with 13 words from rfdr_Regeln_2017.pdf that have been lowercased (via Raku's .lc function):
~$ cat TO_ẞ_OR_NOT_TO_ẞ.txt
maß straße grieß spieß groß grüßen außen außer draußen strauß beißen fleiß heißen
~$ raku -ne '.words>>.match(/^ <:Ll>+ $/).say;' TO_ẞ_OR_NOT_TO_ẞ.txt
(「maß」 「straße」 「grieß」 「spieß」 「groß」 「grüßen」 「außen」 「außer」 「draußen」 「strauß」 「beißen」 「fleiß」 「heißen」)
~$ raku -ne '.uc.say;' TO_ẞ_OR_NOT_TO_ẞ.txt
MASS STRASSE GRIESS SPIESS GROSS GRÜSSEN AUSSEN AUSSER DRAUSSEN STRAUSS BEISSEN FLEISS HEISSEN
~$ raku -ne '.fc.say;' TO_ẞ_OR_NOT_TO_ẞ.txt
mass strasse griess spiess gross grüssen aussen ausser draussen strauss beissen fleiss heissen
I'm suprised that Raku's fc fold-case implementation essentially converts to lowercase ss. It's no surprise then that trying to search for eq string equality between the upper/lower "round-tripped" words and the original are all False:
~$ raku -ne 'for .words {print $_.uc.lc eq $_.lc }; "".put;' TO_ẞ_OR_NOT_TO_ẞ.txt
FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse
Fold-cased (.fc) words match, but they do so on the basis of ss characters, not ß:
~$ raku -ne 'for .words {print $_.uc.lc eq $_.fc }; "".put;' TO_ẞ_OR_NOT_TO_ẞ.txt
TrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrue
Starting from a capital-ẞ, taking just one capitalized/uppercase word again demonstrates the dichotomy:
~$ echo "straße STRASSE STRAẞE" | raku -ne ' .put for .words;'
straße
STRASSE
STRAẞE
~$ echo "straße STRASSE STRAẞE" | raku -ne ' .lc.say for .words;'
straße
strasse
straße
~$ echo "straße STRASSE STRAẞE" | raku -ne ' for .words { say $_.lc eq "straße" };'
True
False
True
~$ echo "straße STRASSE STRAẞE" | raku -ne ' for .words { say $_.lc eq $_.fc };'
False
True
False
Have any programming languages instituted a foldcase conversion between lowercase ß <--> uppercase ẞ, by default? What programming languages have added lowercase ß <--> uppercase ẞ conversion, as an option (or via a library)? Many Questions/Answers on StackOverflow pre-date the 2017 decision, so I'm looking for up-to-date answers.
[ADDENDUM: I note via this FAQ that the Unicode Consortium's rules appear to be at odds with the 2017 decision of the Council for German Orthography].
1. Lowercase/Uppercase:
In Raku, the default conversion from lowercase German
ßis to uppercaseSS, but this can be overcome (as shown below).The Unicode Consortium has a special FAQ on these letters in the German language. However, if one wants to work around the first
ucuppercasing issue using Raku, the"ß" => "ẞ"characters can be appropriatelytranslated prior to calling the bog-standarducuppercase method/function:The code above works to uppercase text with
ẞinstead ofSS--and in true Raku/Perl spirit--there's more than one way to do it (TMTOWTDI):2. Foldcase:
The Unicode Consortium promulgates a rule that foldcase pairs should be stable (according to the Unicode Casefolding Stability Policy).
As for
fcfoldcase stability, I had hoped that prior conversion of"ß" => "ẞ"would provide a "30th-uppercase character" that would act as a bicameral foldcase partner of lowercaseß(in a pair). The code below seems promising in that starting with a small sample of mixed-case text, you can "round-trip" from uppercase-to-lowercase, and still have output text matching lowercase:However, the
fcfoldcase code below shows that the present course of action is to take an uppercaseẞand convert to lowercasess(not to lowercaseß). Essentially.fcfoldcase converts uppercaseẞorSSto lowercasess, regardless:Changes anticipated? According to a 2017 StackOverflow post, "Just wait half a century."
https://raku.org