I am trying to unzip a huge zip file split into several parts. I am in a Macbook laptop and I am using:
>> unzip '*.zip' -d <unzip_path>
All works well, but during unzipping process, some if the files report:
illegal byte sequence
And they are not extracted.
I am very aware that this is due to some weird characters like letters (á) included in the name of some of the files inside some of the .zip file parts.
I would like to know how to solve this, and still be able to extract the problematic files.
Looking into the different zip file parts and somehow replace the file names is not an option since there are so many files with illegal characters.
Without seeing the zip file (is the file publically available?) I'm guessing at the issue, but In your case I suspect the problem is as follows
To unzip the files & get the charset correct you need to get the encoding changed from whatever was used in the zip file to utf8.
Some newish versions of
unziphave a-Ioption that will do this for you. Below is the help text fromunzipon my Ubuntu setup, Note the presence of the line with-I CHARSETIf you do have this option available you just run it like this (replacing
ISO-8859-7with whatever encoding is used in the zip file)If you unzip is too old, an alternative is
7z-- it has a commandline option-scsthat allows you to specify the charset used in the filenames.