I'm searching for a safe method to escape all non-ASCII characters in a QString (and of course to un-escape them later) that will result in pure (printable) ASCII but yield the shortest possible string.
What I currently do is:
QByteArray excludes = QStringLiteral(" !\"#$&'()*+,-./:;<=>?@[]_{|}~").toUtf8();
auto escaped = QString::fromUtf8(someString.toUtf8().toPercentEncoding(excludes));
auto unEscaped = QString::fromUtf8(QByteArray::fromPercentEncoding(escaped.toUtf8()));
This is reliable and works perfectly in both directions. But the problem is that the result is quite long. An escaped character takes at least 6 chars:
E.g. ê is encoded as %C3%AA, or would become %F0%9F%98%8A.
I tried to find a way to make this shorter. E.g. the shortest Base64 representation of C3AA would be w6o, for F09F988A it would be 8J+Yig. Half the length of the percent encoded version. But I don't know how long the escaped sequence would be, so I would need to add a start and finish character like &w6o; or such – and then, only one single char would be cheaped out.
So: Is there a better (meaning shorter) way than the said percent encoding to reliably escape and unescape all non-ASCII characters in a QString (using C++ obviously ;-)?