How to merge zcat and bzcat in a single function

262 Views Asked by At

I would like to build a little helper function that can deal with fastq.gz and fastq.bz2 files.

I want to merge zcat and bzcat into one transparent function which can be used on both sorts of files:

zbzcat example.fastq.gz
zbzcat example.fastq.bz2


zbzcat() {
  file=`echo $1 | `
## Not working
  ext=${file##*/};
  
  if [ ext == "fastq.gz" ]; then
    exec gzip -cd "$@"  
  else
    exec bzip -cd "$@"  
  fi
}

The extension extraction is not working correctly. Are you aware of other solutions

3

There are 3 best solutions below

2
Socowi On BEST ANSWER

These are quite a lot of problems:

  • file=`echo $1 | ` gives a syntax error because there is no command after |. But you don't need the command substitution anyways. Just use file=$1.
  • ext=${file##*/} is not extracting the extension, but the filename. To extract the extension use ext=${file##*.}.
  • In your check you didn't use the variable $ext but the literal string ext.
  • Usually, only the string after the last dot in a filename is considered to be the extension. If you have file.fastq.gz, then the extension is gz. So use the check $ext = gz. That the uncompressed files are fastq files is irrelevant to the function anyways.
  • exec replaces the shell process with the given command. So after executing your function, the shell would exit. Just execute the command.

By the way: You don't have to extract the extension at all, when using pattern matchting:

zbzcat() {
  file="$1"
  case "$file" in
    *.gz) gzip -cd "$@";;
    *.bz2) bzip -cd "$@";;
    *) echo "Unknown file format" >&2;;
  esac
}

Alternatively, use 7z x which supports a lot of formats. Most distributions name the package p7zip.

3
user1934428 On
ext=${1##*.}

Why are you throwing in an echo and try to strip a /?

Also, the string ext (3 characters) will never be equal to the string fastq.gz (7 characters). If you want to check that the extension equals gz, just do a

if [[ $ext == gz ]]

Having said this, relying on the extension to get an idea of the content of a file is a bit brave. Perhaps a more reliable way would be to use the file to determine the most likely file type. The probably safest approach would be to just try a bzip extraction first, and if it fails, do the gzip extraction.

0
Gyula Kokas On

I think it would be better if you would use mimetype.

File extensions are not always correct.

decomp() {  
  case $(file -b --mime-type  $1)  in
    "application/gzip")
         gzip -cd "$@"
         ;;
    "application/x-bzip2")
         bzcat  "$@"
         ;;
    "application/x-xz")
        xzcat "$@"
        ;;
    *) 
      echo "Unknown file format" >&2
    ;;
  esac
}