Perl IPC::Run pipeline blocks with input file larger than 64KiB

166 Views Asked by At

A Perl program uses IPC::Run to pipe a file through a series of commands determined at runtime and into another file, like this small test excerpt demonstrates:

#!/usr/bin/perl
use IO::File;
use IPC::Run qw(run);

open (my $in, 'test.txt');
my $out = IO::File->new_tmpfile;

my @args = ( [ split / /, shift ], "<", $in); # this code
while ($#ARGV >= 0) {                         # extracted
    push @args, "|", [ split / /, shift ];    # verbatim
}                                             # from the
push @args, ">pipe", $out;                    # program

print "Running...";
run @args or die "command failed ($?)";
print "Done\n";

It builds the pipeline from commands given as arguments, the test file is hard-coded. The problem is that the pipeline hangs if the file is bigger than 64KiB. Here is a demonstration that uses cat in the pipeline to keep things simple. First a 64KiB (65536 bytes) file works as expected:

$ dd if=/dev/urandom of=test.txt bs=1 count=65536
65536 bytes (66 kB, 64 KiB) copied, 0.16437 s, 399 kB/s
$ ./test.pl cat
Running...Done

Next, one byte more. The call to run never returns...

$ dd if=/dev/urandom of=test.txt bs=1 count=65537
65537 bytes (66 kB, 64 KiB) copied, 0.151517 s, 433 kB/s
$ ./test.pl cat
Running...

With IPCRUNDEBUG enabled, plus a few more cats you can see it's the last child that doesn't end:

$ IPCRUNDEBUG=basic ./test.pl cat cat cat cat
Running...
...
IPC::Run 0000 [#1(3543608)]: kid 1 (3543609) exited
IPC::Run 0000 [#1(3543608)]: 3543609 returned 0
IPC::Run 0000 [#1(3543608)]: kid 2 (3543610) exited
IPC::Run 0000 [#1(3543608)]: 3543610 returned 0
IPC::Run 0000 [#1(3543608)]: kid 3 (3543611) exited
IPC::Run 0000 [#1(3543608)]: 3543611 returned 0

(with a file under 64KiB you see all four exit normally)

How can this be made to work for files of any size ?

(Perl 5, version 30, subversion 3 (v5.30.3) built for x86_64-linux-thread-multi, tried on Alpine Linux, the target platform, and Arch Linux to rule out Alpine as a cause)

2

There are 2 best solutions below

6
ikegami On BEST ANSWER

You have a deadlock:

Diagram of the deadlock

Consider using one of the following instead:

run [ 'cat' ], '<', $in_fh, '>', \my $captured;

# Do something with the captured output in $captured.

or

my $receiver = sub {
    # Do something with the chunk in $_[0].
};

run [ 'cat' ], '<', $in_fh, '>', $receiver;

For example, the following "receiver" processes each line as they come in:

my $buffer = '';
my $receiver = sub {
    $buffer .= $_[0];
    while ($buffer =~ s/^(.*)\n//) {
       process_line("$1");
    }
};

run [ 'cat' ], '<', $in_fh, '>', $receiver;

die("Received partial line") if length($buffer);
0
Håkon Hægland On

Here is an example that does not deadlock but still uses the >pipe output handle. I would not recommend using this complicated approach for your use case, instead consider the approach suggested by @ikegami.

The problem is that the >pipe handle is never read from. cat tries to write to the >pipe handle but it gets filled up (since no one reads from it) and the cat process blocks when the pipe content reaches 64 KiB which is the capacity of a pipe on Linux. Now the IPC::Run::finish() process is waiting for the child cat process to exit, but at the same time the cat process is waiting for the parent to read from its pipe so we have a deadlock situation.

To avoid this situation, we can use IPC::Run::start() instead of IPC::Run::run():

use feature qw(say);
use strict;
use warnings;
use constant READ_BUF_SIZE => 8192;

use Errno qw( EAGAIN );
use IO::Select;
use IPC::Run qw();
use Symbol 'gensym';

my $outfile = 'out.txt';
open (my $out, '>', $outfile) or die "Could not open file '$outfile': $!";
my $h = IPC::Run::start ['cat'], '<', 'test.txt', '>pipe', my $pipeout = gensym;
my $select = IO::Select->new( $pipeout );
my $data = '';
my $read_offset = 0;
while (1) {
    my @ready = $select->can_read;
    last if !@ready;
    for my $fh (@ready) {
        my $bytes_read = sysread $fh, $data, READ_BUF_SIZE, $read_offset;
        say "Read $bytes_read bytes..";
        if ( !defined $bytes_read ) {
            die "sysread failed: $!" if $! != EAGAIN;
            $bytes_read = 0;
        }
        elsif ( $bytes_read == 0 ) {
            say "Removing pipe handle from select loop";
            $select->remove( $fh );
            close $fh;
        }
        $read_offset += $bytes_read;
    }
}
say "Saving data to file..";
print $out $data;  #Save data to file
close $out;
say "Finishing harness..";
IPC::Run::finish $h or die "cat returned $?";
say "Done.";