bash one-liner/script to execute a command on files that do not have an associated file

65 Views Asked by At

The following executes do_x.sh on all files of type a_???.json recursively. However, I would like to do the same on a subset of these files that do not have a corresponding file with the same name and different extension.

 find $PWD -type f -name "a_???.json"  | xargs -I{} do_x.sh {}

How do I say in the same one-liner, do the same but only on the files a_???.json that do not have a corresponding a_???.json done? The following is not a solution for sure:

find $PWD -type f -name "a_???.json" -exclude "a_???.json.done" | xargs -I{} do_x.sh {}
Example
a_111.json
a_112.json
a_111.json.done

So, execute do_x.sh only on a_112.json

3

There are 3 best solutions below

3
Philippe On BEST ANSWER

To keep the same structure of your script, try this :

find $PWD -type f -name "a_???.json" -execdir test '!' -f {}.done \; -print | xargs -I{} do_x.sh {}
3
Barmar On

Execute a shell if statement in xargs.

find "$PWD" -type f -name "a_???.json" -exec bash -c 'if ! [ -f "$1.done" ]; then do_x.sh "$1"; fi' {} {} \;

There's no need to use xargs, you can use the -exec keyword to find to execute commands.

Since -exec doesn't use the shell to execute the command, you have to execute bash -c explicitly.

For your more complex command, it's similar. Use xargs to get the parallel operation, then put your full command in the then clause of if.

find "$PWD" -type f -name "a_???.json" | 
    xargs -I{} -n1 -P10 bash -c 'if ! [ -f "$1.done" ]; then srun -N1 -A goc -p slurm do_x.sh "$1"; fi' {} {} \;
2
John Bollinger On

You can use xargs to run shell code via the command bash -c. This allows you to process multiple find hits with the same command, which may provide a noticeable performance improvement if you have a lot of files:

find "$PWD" -type f -name "a_???.json" -print0 |
   xargs -0 -r bash -c 'for f; do [[ -e "${f}.done" ]] || do_x.sh "$f"; done' bash

I have split that across two lines for legibility, but it is a single pipeline.

Note that the for loop without explicit arguments items runs over the positional parameters, which is how the shell will receive the filenames from find.

Note also that the trailing bash is intentional and necessary, or at least something is necessary at that position, else the first fileame emitted by find will be consumed for use as the $0 of the shell in which the command runs.