Get the printed length of a string in terminal

237 Views Asked by At

It seems like a fairly simple task, yet I can't find a fast and reliable solution to it.

I have strings in bash, and I want to know the number of characters that will be printed on the terminal. The reason I need this, is to nicely align the strings in three columns of n characters each. For that, I need to add as many "space" as necessary to make sure the second and third columns always starts at the same location in the terminal.

Example of problematic string length:

v='féé'

echo "${#v1}"
 > # 5 (should be 3)

printf '%s' "${v1}" | wc -m
 > # 5 (should be 3)

printf '%s' "${v1}" | awk '{print length}'
 > # 5 (should be 3)

The best I have found is this, that works most of the time.

echo "${v}" | python3 -c 'v=input();print(len(v))'
 > # 3 (yeah!)

But sometimes, I have characters that are modified by the following sequences. I can't copy/past that here, but this is how it looks like:

v="de\314\201tresse"
echo "${v}"
 > # détresse
echo "${v}" | python3 -c 'v=input();print(len(v))'
 > # 9 (should be 8)

I know it can be even more complicated with \r character or ANSI sequences, but I am only going to have to deal with "regular" strings that can be commonly found in filenames, documents and other file content writing by humans. Since the string IS printed in the terminal, I guess there must be some engine that knows or can know the printed length of the string.

I have also considered the possible solution of sending ANSI sequence to get the position of the cursor in the terminal before and after printing the string, and use the difference to compute the length, but it looks like a rabbit hole I don't want to dig. Plus it will be very slow.

4

There are 4 best solutions below

6
Lucas Moura Gomes On BEST ANSWER

How about

v='féé'
echo "${v}" | python3 -c 'import unicodedata as ud;v=input();print(len(ud.normalize("NFC",v)))'

If you have trouble installing with

pip install unicodedata

try unicodedata2

Additional Notes

This will normalize strings to utf-8 according to the NFC standard explained here. If you are working with Latin ANSI, then it should work fine. However, for pre-Unicode ANSI encodings of languages such as Arabic, Greek, Hebrew, Russian or Thai, then NFC may keep the original formatting. Although it is generally more advisable to use NFC, you could try NFKC in those cases. The reason for preferring NFC is to avoid normalizing symbols that are compatible but not canonically equivalent, for example the single character ff (U+FB00): if you normalize it with NFC, it is length 1, but if you normalize it with NFKC, that's length 2. Depending on your application that can create some issues, but if you just want readable text, then NFKC is fine.

1
Gilles Quénot On

With Perl:

Without modules:

perl -CSAD -E 'say length($ARGV[0])' été
3

With utf8::all module:

perl -Mutf8::all -E 'say length($ARGV[0])' été
3
1
Paolo On

Using grep and wc:

$ v="de\314\201tresse"
$ printf "%s" "$v" | grep -o '[a-z]' | wc -l
8

$ v='féé'
$ printf "%s" "$v" | grep -o '[a-z]' | wc -l
3
0
Andj On

To get the number of terminal cells used by a string, it is possible to use wcswidth. There is a Python implementation for wcwidth and wcswidth.

With Python install wcwidth:

pip install wcwidth

And the Python code would be:

from wcwidth import wcswidth
v = 'féé'
print(wcswidth(v))
# 3

It will also yield the correct result for NFD:

v = ud.normalize("NFD",v)
print(wcswidth(v))
# 3

Additionally it will correctly handle wide characters, i.e. characters that take up 2 terminal cells per character:

v='中文'
print(wcswidth(v))
# 4

And adapting Lucas' solution above, for the terminal:

v='féé'
echo "${v}" | python3 -c 'from wcwidth import wcswidth;v=input();print(wcswidth(v))'