parallel: When hitting EOF: Close file without reading to EOF.

This commit is contained in:
Ole Tange 2014-08-07 20:20:37 +02:00
parent d02f31b05c
commit fcf1e64438
3 changed files with 31 additions and 11 deletions

View file

@ -223,15 +223,19 @@ GNU Parallel 20140822 ('Argentina/Gaza') has been released. It is available for
Haiku of the month: Haiku of the month:
>>>Are you tired of code fork headache blues?
inflexible replacements? option P is your new friend
Use Perl expressions. `man parallel` now!
-- Ole Tange -- Malcolm Cook
New in this release: New in this release:
* GNU Parallel now uses the same shell it was started from as the command shell for local jobs. So if GNU Parallel is started from tcsh it will use tcsh as its shell even if the login $SHELL is different. For remote jobs the login $SHELL will be used.
* GNU Parallel was cited in: A Web Service for Scholarly Big Data Information Extraction http://patshih.ist.psu.edu/publications/Williams-CiteSeerExtractor-ICWS14.pdf * GNU Parallel was cited in: A Web Service for Scholarly Big Data Information Extraction http://patshih.ist.psu.edu/publications/Williams-CiteSeerExtractor-ICWS14.pdf
* --plus adds the replacement strings {+/} {+.} {+..} {+...} {..} {...} {/..} {/...}. The idea being that '+foo' matches the opposite of 'foo' and {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} = {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}
* GNU Parallel was covered in the webcast 2014-08-20: Data Science at the Command Line http://www.oreilly.com/pub/e/3115 * GNU Parallel was covered in the webcast 2014-08-20: Data Science at the Command Line http://www.oreilly.com/pub/e/3115
* Сборка GNU parallel для CentOS/RHEL http://www.stableit.ru/2014/07/gnu-parallel-centosrhel.html * Сборка GNU parallel для CentOS/RHEL http://www.stableit.ru/2014/07/gnu-parallel-centosrhel.html

View file

@ -647,8 +647,8 @@ sub options_hash {
"arg-file|a=s" => \@opt::a, "arg-file|a=s" => \@opt::a,
"no-run-if-empty|r" => \$opt::r, "no-run-if-empty|r" => \$opt::r,
"replace|i:s" => \$opt::i, "replace|i:s" => \$opt::i,
"E=s" => \$opt::E, "E=s" => \$opt::eof,
"eof|e:s" => \$opt::E, "eof|e:s" => \$opt::eof,
"max-args|n=i" => \$opt::max_args, "max-args|n=i" => \$opt::max_args,
"max-replace-args|N=i" => \$opt::max_replace_args, "max-replace-args|N=i" => \$opt::max_replace_args,
"colsep|col-sep|C=s" => \$opt::colsep, "colsep|col-sep|C=s" => \$opt::colsep,
@ -818,7 +818,7 @@ sub parse_options {
my ($shorthand,$long) = split/ /,$_,2; my ($shorthand,$long) = split/ /,$_,2;
$Global::rpl{$shorthand} = $long; $Global::rpl{$shorthand} = $long;
} }
if(defined $opt::E) { $Global::end_of_file_string = $opt::E; } if(defined $opt::eof) { $Global::end_of_file_string = $opt::eof; }
if(defined $opt::max_args) { $Global::max_number_of_args = $opt::max_args; } if(defined $opt::max_args) { $Global::max_number_of_args = $opt::max_args; }
if(defined $opt::timeout) { $Global::timeoutq = TimeoutQueue->new($opt::timeout); } if(defined $opt::timeout) { $Global::timeoutq = TimeoutQueue->new($opt::timeout); }
if(defined $opt::tmpdir) { $ENV{'TMPDIR'} = $opt::tmpdir; } if(defined $opt::tmpdir) { $ENV{'TMPDIR'} = $opt::tmpdir; }
@ -6902,7 +6902,7 @@ sub empty {
for my $fh (@{$self->{'fhs'}}) { for my $fh (@{$self->{'fhs'}}) {
$empty &&= eof($fh); $empty &&= eof($fh);
} }
::debug("run", "MultifileQueue->empty $empty"); ::debug("run", "MultifileQueue->empty $empty ");
return $empty; return $empty;
} }
@ -7036,8 +7036,8 @@ sub read_arg_from_fh {
if($Global::end_of_file_string and if($Global::end_of_file_string and
$arg eq $Global::end_of_file_string) { $arg eq $Global::end_of_file_string) {
# Ignore the rest of input file # Ignore the rest of input file
while (<$fh>) {} close $fh;
# ::debug("run", "EOF-string $arg\n"); ::debug("run", "EOF-string ($arg) met\n");
if(defined $prepend) { if(defined $prepend) {
return Arg->new($prepend); return Arg->new($prepend);
} else { } else {

View file

@ -1012,6 +1012,14 @@ control on the command line (used by GNU B<parallel> internally when
called with B<--sshlogin>). called with B<--sshlogin>).
=item B<--plus> (alpha testing)
Activate additional replacement strings: {+/} {+.} {+..} {+...} {..}
{...} {/..} {/...}. The idea being that '{+foo}' matches the opposite of
'{foo}' and {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} =
{+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}
=item B<--progress> =item B<--progress>
Show progress of computations. List the computers involved in the task Show progress of computations. List the computers involved in the task
@ -2862,7 +2870,7 @@ The idea is to put the jobs into a file and have GNU B<parallel> read
from that continuously. As GNU B<parallel> will stop at end of file we from that continuously. As GNU B<parallel> will stop at end of file we
use B<tail> to continue reading: use B<tail> to continue reading:
B<true >>B<jobqueue>; B<tail -f jobqueue | parallel> B<true >>B<jobqueue>; B<tail -n+0 -f jobqueue | parallel>
To submit your jobs to the queue: To submit your jobs to the queue:
@ -2884,6 +2892,14 @@ E.g. if you have 10 jobslots then the output from the first completed
job will only be printed when job 11 has started, and the output of job will only be printed when job 11 has started, and the output of
second completed job will only be printed when job 12 has started. second completed job will only be printed when job 12 has started.
To use B<--eof> to make GNU B<parallel> exit, B<tail> also needs to be
forced to exit:
tail -n+0 -f command-list.txt |
(parallel --eof=EXIT {}; echo Parallel is now done;
(seq 1000 >> command-list.txt &);
echo Done appending dummy data forcing tail to exit)
=head1 EXAMPLE: GNU Parallel as dir processor =head1 EXAMPLE: GNU Parallel as dir processor