I have a directory, and need to get a list of files with MIME types of application/pdf, which I can loop through and process with my CompressPdf function. The remaining files only need to be copied over the destination directory using cp, for which I need a loopable list as well.
The obvious obstacle is correctly handling UNIX filenames using NUL. So far I've come up with this:
find "dir-to-search" -type f -print0 | xargs -0 file -0 --mime-type -F " " | grep -zZ "application/pdf"
But grep doesn't handle the results correctly because file -0 inserts NUL right after the file name, with \n after the MIME information. It would return something like this:
0000000 . / f i l e 1 . p d f \0
0000010 a p p l i c a t i o n / p d f \n
0000020 . / f i l e 2 . p d f \0
0000030 a p p l i c a t i o n / p d f \n
Another obstacle is that putting everything in one line limit the ability to use several lines of code with each iteration. Calling xargs -I{} sh -c {} inline will spawn a new process, which is unable to call my CompressPdf function. I am using Dash and export -f CompressPdf does not work. Executing $0 recursively is my best bet.
Currently, my code is running well when processing several PDF files concurrently inside a single directory recursively. It prevents me from processing a large number of files at once, however.
Can someone help me with this? I'm trying to write in Dash instead of Bash for a little more performance, despite the fact that array is not available. I can switch to Bash if there is no other way.
Try this:
So first from man file:
So specify it twice.
Then I use
sed -zto read zero separated stream two lines at a time.-zis a gnu extension tosed. If two zero separated lines end withapplication/pdf, then this matched string is removed and the filename is printed.You can always work around zero terminated strings with
xxd: