I am trying to decompress a WARC ZST file that I downloaded from here: https://archive.org/details/archiveteam_yahooanswers_20210422220546_c4fac540
I tried the command zstd -d yahooanswers_20210422220546_c4fac540.1619026173.megawarc.warc.zst but I got this error:
73.megawarc.warc.zst : 0 MB... 73.megawarc.warc.zst : Decoding error (36) : Dictionary mismatch
How can I find the said dictionary or are there any alternatives to this?
The dictionary can be found inside the first skippable frame of the warc.
To extract the dictionary OrIdow6 write this to extract it: https://transfer.notkiska.pw/inline/TXlRo/xtract.py
You'll require python3, zstd and zstandard
python ./xtract.py /path/to/megawarc.warc.zst > dictThen you can
zstd -d /path/to/megawarc.warc.zst -D dictAnd you should be able to view the megawarc with your standard warc viewing tools