I have never written Makefiles before, but I suspect that it would be helpful in my situation. I have a corpus of text files that I need to preprocess to extract features for machine learning. The directory structure could be like this:
/
+---Makefile
+---/corpus
| +-- a.txt
| +-- b.txt
| +-- ...
|
+---/wordcounts
| +-- a.wordcount
| +-- b.wordcount
| +-- ...
|
+---/lettercounts
| +-- a.lettercount
| +-- b.lettercount
| +-- ...
|
...
The files in /wordcounts and /lettercounts are generated from the files in /corpus. For just the file a.txt, I can write make dependencies like this:
all: wordcounts/a.wordcount lettercounts/a.lettercount
wordcounts/a.wordcount: corpus/a.txt
cat corpus/a.txt | wc -w > wordcounts/a.wordcount
lettercounts/a.lettercount: corpus/a.txt
cat corpus/a.txt | wc -m > lettercounts/a.lettercount
However, with thousands of files in \corpus this Makefile will become extremely long. I want to write a Makefile that will adjust to whatever files are in \corpus. The idea is that no matter how many files I put in /corpus, the Makefile will automatically make all the other files. How can I do this? Is this what automake is for?
Background Currently, I use a number of scripts to generate large csv files, and running all of the scripts for the whole corpus takes a couple hours. I need to restructure so that changes in one file will not necessitate reprocessing the whole corpus. I welcome any suggestions for how to set up the project more efficiently, if what I am suggesting is not ideal.
Here's one way to accomplish this
Run
makewith the-rflag to disable the builtin rules for maximum performance.