A confusion about ls, dir and tee

208 Views Asked by At

I know that tee will read from STDIN and create a new file. But when it comes with ls, which process happens first?

For example:

➤ ls
12  123  1234
➤ ls | tee hello
12
123
1234
hello # ls catch hello
➤ ls | tee 000
12
123
1234
hello # ls didn't get 000
➤ ls | tee 0
000
12
123
1234
hello # ls didn't get 0
➤ ls | tee 00000
0
000
00000 # ls did get 00000
12
123
1234
hello
➤ 

but when it comes to dir:

➤ ls
12  123  1234
➤ dir | tee hello
12  123  1234  hello # get hello
➤ dir | tee 000
000  12  123  1234  hello
➤ dir | tee 0
0  000  12  123  1234  hello #get 0
➤ dir | tee 000000
0  000  12  123  1234  hello # didn't get 00000
➤ dir | tee 01
0  000  000000  01  12  123  1234  hello
➤ dir | tee 000000000000000000000000
0  000  000000  000000000000000000000000  01  12  123  1234  hello #get 00000000..000
➤ 

WHY? Which happens first? tee create a new file or ls/dir output?

2

There are 2 best solutions below

2
g24l On BEST ANSWER

This is actually the case of a process-race-condition on a directory-resource since the two processes are executed in parallel.

Each command in a pipeline is executed as a separate process (i.e., in a sub-shell).

The idea of pipeline is that output from process A associated with executable exec_A is redirected to process B associated with executable exec_B :

exec_A | exec_B

How this is done is largely implementation dependent but given pragmatic limitations the operating system would have to create a buffer to place the output of A and force B to read from that buffer. This happens before processes start.

So what happens is something like:

exec_A &> buf ; exec_B < buf &

What the processes do internally with the data they receive or write depends on the implementation of the process. In this case tee is creating the file that is going to write at process start, which is absolutely logical as it needs to append incoming data.

Given that, it depends on if process A ( i.e. ls/dir ) completes its directory transversal before process B has opened the file. Which is actually dependant on who obtains the lock on the resource's parent.

You can actually observe that ls will almost always output a resource that is created as such:

ls * | tee subdir/0

because it obtains the lock on subdir late.

4
hek2mgl On

Both programs are running at the same time. While the process on the left side of the pipe is writing output to the pipe, the process on the right side of the pipe is reading from it.

tee will create the output file right after start, before reading from input. That's why you can sometimes(!) see the file in the output of ls, and in the output of dir. However, there is no guarantee for that. Generally it depends on when each process will enter the/a CPU and for how many cycles, how long tee needs to wait to open the file and so on.

Actually on my test system the file almost always showed up, either with ls or dir. But sometimes the file was missing from the listing again with both ls or dir.