Makefile for a LARGE number of files

Question

Makefile for a LARGE number of files

786 Views Asked by reynoldsnlp At 16 November 2025 at 10:39

I have never written Makefiles before, but I suspect that it would be helpful in my situation. I have a corpus of text files that I need to preprocess to extract features for machine learning. The directory structure could be like this:

/
+---Makefile
+---/corpus
|   +-- a.txt
|   +-- b.txt
|   +-- ...
|
+---/wordcounts
|   +-- a.wordcount
|   +-- b.wordcount
|   +-- ...
|
+---/lettercounts
|   +-- a.lettercount
|   +-- b.lettercount
|   +-- ...
|
...

The files in /wordcounts and /lettercounts are generated from the files in /corpus. For just the file a.txt, I can write make dependencies like this:

all: wordcounts/a.wordcount lettercounts/a.lettercount

wordcounts/a.wordcount: corpus/a.txt
    cat corpus/a.txt | wc -w > wordcounts/a.wordcount

lettercounts/a.lettercount: corpus/a.txt
    cat corpus/a.txt | wc -m > lettercounts/a.lettercount

However, with thousands of files in \corpus this Makefile will become extremely long. I want to write a Makefile that will adjust to whatever files are in \corpus. The idea is that no matter how many files I put in /corpus, the Makefile will automatically make all the other files. How can I do this? Is this what automake is for?

Background Currently, I use a number of scripts to generate large csv files, and running all of the scripts for the whole corpus takes a couple hours. I need to restructure so that changes in one file will not necessitate reprocessing the whole corpus. I welcome any suggestions for how to set up the project more efficiently, if what I am suggesting is not ideal.

Original Q&A

There are 1 best solutions below

**user657267** · Accepted Answer

Here's one way to accomplish this

corpora      := $(wildcard corpus/*.txt)
wordcounts   := $(corpora:corpus/%.txt=wordcounts/%.wordcount)
lettercounts := $(corpora:corpus/%.txt=lettercounts/%.lettercount)

.PHONY: all
all: $(wordcounts) $(lettercounts)

$(wordcounts): wcflags += -w
$(wordcounts): wordcounts/%.wordcount: corpus/%.txt

$(lettercounts): wcflags += -m
$(lettercounts): lettercounts/%.lettercount: corpus/%.txt

$(wordcounts) $(lettercounts):
    cat $< | wc $(wcflags) > $@

Run make with the -r flag to disable the builtin rules for maximum performance.

Makefile for a LARGE number of files

There are 1 best solutions below

Related Questions in MAKEFILE

Related Questions in CORPUS

Related Questions in TAGGED-CORPUS

Trending Questions

Popular # Hahtags

Popular Questions