I have the following CSV file: Hèllo,"ab - привет" (echo 'SMOobGxvLCJhYiAtINC/0YDQuNCy0LXRgiINCg==' | base64 --decode > bug.csv)
Opening it on Windows with Excel app via double-click, it shows the following: Hèllo ab - привет (but they are correctly placed in two columns).
If I do iconv -f utf-8 -t utf-16le bug.csv > bug_uft16le.csv and double-click, it shows the following: Hèllo ab - ?@825B, e.g. it correctly decoded Hèllo, but not the rest. (base64 bug_utf16le.csv: SADoAGwAbABvACwAIgBhAGIAIAAtACAAPwRABDgEMgQ1BEIEIgANAAoA).
If I do iconv -f utf-8 -t utf-16 bug.csv > bug_utf16.csv, Excel correctly shows Hèllo,"ab - привет", but it does not recognize that it should be two columns (base64 bug_utf16.csv: //5IAOgAbABsAG8ALAAiAGEAYgAgAC0AIAA/BEAEOAQyBDUEQgQiAA0ACgA=). bug_utf16.csv is exactly the same as bug_utf16le.csv but only has BOM as the file's first two bytes).
Is there a way to transcode a UTF-8 csv file so that Excel can open it, recognize the columns correctly (and keeping , as separator) and show all the French / Cyrillic script correctly?
I found a way sed s@","@"\t"@ bug.csv | iconv -f utf-8 -t utf-16 > bug_utf16_tab.csv, but I'd much prefer to not mess with replacing the separator (as it's brittle and may break around various quote escaping).
Thanks!
$ iconv --version
iconv (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Ulrich Drepper.
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.3 LTS
Release: 22.04
Codename: jammy