Shell: how to pass georgian (utf-8) character key to a specific field on a website with lynx

170 Views Asked by At
lynx -accept_all_cookies $URL -cmd_script=bar.txt

bar.txt:

key <tab> //get to first field
key <tab> //get to second field
key ძ     //utf=8 input ძ
key ე     //utf=8 input ე
key ბ     //utf=8 input ბ
key ნ     //utf=8 input ნ
key ა     //utf=8 input ა
key <tab> //get to third field
key <tab> //get to fourth field
key <tab> //get to sumbit button
key ^J    //click submit and wait for load
key <tab> //get to hyperlink
key ^J    //click hyperlink and wait for load
key Q     //exit
key y     //confirm exit

The above attempt, adapted from this SO question, works fine for Ascii characters but not for Georgian input characters.

Any suggestions?

2

There are 2 best solutions below

1
rici On

The characters you want to send are multibyte UTF-8 sequences, and the lynx command script key command only sends a single 8-bit byte. So you have to break the characters into individual bytes and send each byte separately.

That's a little annoying; the simplest way to do it might be to use the -cmd_log option to create a log file while you type the characters you want to send.

However, you can do this with bash:

$ georgian=ძებნა
$ (export LC_ALL=C; printf "key 0x%2x\n" $(sed 's/./"& /g' <<<"$georgian"); )
key 0xe1
key 0x83
key 0xab
key 0xe1
key 0x83
key 0x94
key 0xe1
key 0x83
key 0x91
key 0xe1
key 0x83
key 0x9c
key 0xe1
key 0x83
key 0x90

That bash command is probably a little obscure :-). Here's a quick breakdown:

(                         # Start a subshell to isolate environment change
 export LC_ALL=C;         # In the subshell, set the locale to plain ascii
 printf "key 0x%2x\n"     # Apply this format to each argument
        $(                # Substitute the result of running the following
          sed             # Editor command
              's/./"& /g' # Change each byte to <"><byte><space>
              <<<"$georgian"  # Fabricate an input stream for sed
                              # using the value of shell variable $georgian
        );                # End of command substitution
)                         # End of subshell

Unlike C printf (and others), the shell printf repeats its pattern until all arguments have been handled. And, as another idiosyncracy, if the format is numeric (such as %x, which formats its argument in hexadecimal and the corresponding argument starts with a ", the number used for the argument is the character code of the second character in the argument (i.e. the one after the ").

The sed command turns a byte-sequence into a series of arguments with exactly this form; it's important to not quote the command substitution so that the individual substitutions will be treated as separate arguments. That means that the command won't work if $georgian contains any shell metacharacter, but since all shell metacharacters are simple ascii characters, there won't be any problem provided that every character in $georgian is, in fact, Georgian. (But don't put spaces in the string. They'll get silently dropped.)

0
Archil Elizbarashvili On

Here is my alternative for breaking a multi-byte character into individual bytes:

$ georgian=ძებნა    
$ echo -n "$georgian" | od -w1 -An -t xC | sed 's|^ |key 0x|'
key 0xe1
key 0x83
key 0xab
key 0xe1
key 0x83
key 0x94
key 0xe1
key 0x83
key 0x91
key 0xe1
key 0x83
key 0x9c
key 0xe1
key 0x83
key 0x90