How to get the average number of words in files using the output of "wc -w"

Question

How to get the average number of words in files using the output of "wc -w"

100 Views Asked by Dan Grec At 14 January 2024 at 04:47

I'm listing the number of words in a bunch of files and sorting it like this:

wc -w *.tex | sort -rn

which outputs a nice list of the files and word count for each file

   17423 total
    6481 panama-to-colombia.tex
    5516 the-salt-flats.tex
    5426 hiking-cordillera-huayhuash.tex

How can I also calculate and display the average number of words per file? i.e. a line at the bottom like:

5808 AVERAGE

Note: I'd like to find a solution that works for an arbitrary number of files in the list.

Original Q&A

There are 5 best solutions below

Renaud Pacalet On 14 January 2024 at 07:02

wc -w --total=never *.tex | datamash -W mean 1
5807.6666666667

If you prefer a rounded result, e.g., to 2 decimal places:

wc -w --total=never *.tex | datamash -WR2 mean 1
5807.67

dawg On 14 January 2024 at 07:25

You can do entirely in awk:

awk 'FNR==1{files[FILENAME]=0}
{for(i=1;i<=NF;i++) files[FILENAME]++}
END{for ( f in files ) {
    total+=files[f]
    print files[f], f }
    print total, "total"
    print total / (length(files)), "average"
}
' *.tex

Prints:

5426 hiking-cordillera-huayhuash.tex
5516 the-salt-flats.tex
6481 panama-to-colombia.tex
17423 total
5807.67 average

pmf On 14 January 2024 at 07:28

You could make it a function (or a script) which counts the words of the concatenation of its file args cat -- "$@", and then divides that by the number of its file args $#:

wc-wavg() { echo $(($(cat -- "$@" | wc -w) / $#)); }

wc-wavg *.tex

Ed Morton On 14 January 2024 at 12:59

You could do it all in awk, e.g. given these input files which includes an empty file (fileempty) and a repeated file (file1) which are 2 of the possible rainy day cases likely to cause a potential solution to fail:

$ wc -w file1 fileempty file1
 2 file1
 0 fileempty
 2 file1
 4 total

and using GNU awk for ARGIND:

$ awk '
    {
        numWords[ARGIND] += NF
        tot += NF
    }
    END {
        fmt=" %" length(tot) "s %s\n"
        for ( i=1; i<=ARGIND; i++ ) {
            printf fmt, numWords[i]+0, ARGV[i]
        }
        printf fmt, tot+0, "total"
        printf printf fmt, tot / (ARGIND ? ARGIND : 1), "AVERAGE"
    }
' file1 fileempty file1
 2 file1
 0 fileempty
 2 file1
 4 total
 1.33333 AVERAGE

and just to show how that behaves for the other rainy day cases that come to mind:

Just 1 input file:

$ awk '
    {numWords[ARGIND] += NF; tot += NF} END{fmt=" %" length(tot) "s %s\n"; for (i=1; i<=ARGIND; i++) { printf fmt, numWords[i]+0, ARGV[i] }; printf fmt, tot+0, "total"; printf fmt, tot / (ARGIND ? ARGIND : 1), "AVERAGE" }
' file1
 2 file1
 2 total
 2 AVERAGE

An empty file as the only input:

$ awk '
    {numWords[ARGIND] += NF; tot += NF} END{fmt=" %" length(tot) "s %s\n"; for (i=1; i<=ARGIND; i++) { printf fmt, numWords[i]+0, ARGV[i] }; printf fmt, tot+0, "total"; printf fmt, tot / (ARGIND ? ARGIND : 1), "AVERAGE" }
' fileempty
 0 fileempty
 0 total
 0 AVERAGE

No input file, just input from stdin (this may or may not be the desired output, idk):

$ awk '
    {numWords[ARGIND] += NF; tot += NF} END{fmt=" %" length(tot) "s %s\n"; for (i=1; i<=ARGIND; i++) { printf fmt, numWords[i]+0, ARGV[i] }; printf fmt, tot+0, "total"; printf fmt, tot / (ARGIND ? ARGIND : 1), "AVERAGE" }
' <<!
> foo
> bar
> !
 2 total
 2 AVERAGE

**Cyrus** · Accepted Answer · 2024-01-14T04:57:39.717000

I suggest to append to your code:

| awk '{sum=sum+$1; print};END{print sum/2/(NR-1),"AVERAGE"}'

sum=sum+$1 adds the number in the first column ($1) to the variable sum in each row. print outputs the current row unchanged. The average is calculated after the last line read in. During the calculation, please note that the line with total is also included in the output of wc -w *.tex.

See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR

How to get the average number of words in files using the output of "wc -w"

There are 5 best solutions below

Related Questions in BASH

Related Questions in SHELL

Related Questions in SORTING

Related Questions in WC

Trending Questions

Popular # Hahtags

Popular Questions