Bash commands piped to awk are sometimes buffered

2.2k Views Asked by At

System: Linux 4.13.0-43-generic #48~16.04.1-Ubuntu BASH_VERSION='4.3.48(1)-release'

The command:

while sleep 5
do
  date +%T
done | awk -F: '{print $3}'

Should print the 3rd field (seconds) of the "date" output, one line every 5 seconds. Problem: awk reads from the pipe, and processes its input, only when the pipe's buffer is full. i.e. when more than 4K of input is generated.

When awk is replaced by cat, a line is printed every 5 seconds as expected.

This code snippet is simplified from a shell script which had worked ok on other systems, so there must be something about bash, awk and their configuration in this system.

In short, is there a way to convince awk to behave like cat when reading from a pipe?

@Ed Morton : I did try to add fflush() after each print, but it does not work -- that's what showed that the problem is with awk's input, not output. I also tried to add calls to system("date"), which showed that indeed awk gets all input lines at once, not immediately when they are produced.

For those who asked:

$ awk -W version
mawk 1.3.3 Nov 1996, Copyright (C) Michael D. Brennan

compiled limits:
max NF             32767
sprintf buffer      2040
2

There are 2 best solutions below

0
Amos Shapir On

While trying to find out how to make awk print its version, I discovered that it is really mawk, and that it has the following flag:

 -W interactive -- sets unbuffered writes to stdout and line buffered reads from stdin.
                   Records from stdin are lines regardless of the value of RS.

This seems to solve the problem!

Thanks to all repliers.

0
kev On

stdbuf is a general solution:

stdbuf - Run COMMAND, with modified buffering operations for its standard streams.

# buffered
while sleep 5; do date +%T; done | awk -F: '{print $0, strftime("%T")}' | ts %T

# unbuffered
while sleep 5; do date +%T; done | stdbuf -oL awk -F: '{print $0, strftime("%T")}' | ts %T

Please install moreutils to get ts