2010-12-06 23:30:08 +00:00
|
|
|
#!/usr/bin/perl -w
|
|
|
|
|
|
|
|
=head1 NAME
|
|
|
|
|
|
|
|
parallel - build and execute shell command lines from standard input in parallel
|
|
|
|
|
2011-04-07 19:54:02 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=head1 SYNOPSIS
|
|
|
|
|
|
|
|
B<parallel> [options] [I<command> [arguments]] < list_of_arguments
|
|
|
|
|
2011-05-05 16:26:29 +00:00
|
|
|
B<parallel> [options] [I<command> [arguments]] ( B<:::> arguments |
|
|
|
|
B<::::> argfile(s) ) ...
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
B<parallel> --semaphore [options] I<command>
|
|
|
|
|
|
|
|
B<#!/usr/bin/parallel> --shebang [options] [I<command> [arguments]]
|
|
|
|
|
2011-04-07 19:54:02 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=head1 DESCRIPTION
|
|
|
|
|
2011-12-21 23:17:49 +00:00
|
|
|
GNU B<parallel> is a shell tool for executing jobs in parallel using
|
2012-03-12 22:38:38 +00:00
|
|
|
one or more computers. A job can be a single command or a small
|
2011-12-21 23:17:49 +00:00
|
|
|
script that has to be run for each of the lines in the input. The
|
|
|
|
typical input is a list of files, a list of hosts, a list of users, a
|
|
|
|
list of URLs, or a list of tables. A job can also be a command that
|
|
|
|
reads from a pipe. GNU B<parallel> can then split the input into
|
|
|
|
blocks and pipe a block into each command in parallel.
|
2011-03-20 21:40:12 +00:00
|
|
|
|
|
|
|
If you use xargs and tee today you will find GNU B<parallel> very easy to
|
|
|
|
use as GNU B<parallel> is written to have the same options as xargs. If
|
|
|
|
you write loops in shell, you will find GNU B<parallel> may be able to
|
|
|
|
replace most of the loops and make them run faster by running several
|
|
|
|
jobs in parallel.
|
|
|
|
|
|
|
|
GNU B<parallel> makes sure output from the commands is the same output as
|
|
|
|
you would get had you run the commands sequentially. This makes it
|
|
|
|
possible to use output from GNU B<parallel> as input for other programs.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
For each line of input GNU B<parallel> will execute I<command> with
|
|
|
|
the line as arguments. If no I<command> is given, the line of input is
|
|
|
|
executed. Several lines will be run in parallel. GNU B<parallel> can
|
|
|
|
often be used as a substitute for B<xargs> or B<cat | bash>.
|
|
|
|
|
2011-07-25 22:12:46 +00:00
|
|
|
=head2 Reader's guide
|
|
|
|
|
2014-01-04 07:11:02 +00:00
|
|
|
Start by watching the intro videos for a quick introduction:
|
2011-11-22 21:52:52 +00:00
|
|
|
http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
|
2011-04-07 19:54:02 +00:00
|
|
|
|
2014-01-04 07:11:02 +00:00
|
|
|
Then look at the B<EXAMPLE>s after the list of B<OPTIONS>. That will
|
|
|
|
give you an idea of what GNU B<parallel> is capable of.
|
|
|
|
|
|
|
|
Then spend an hour walking through the tutorial (B<man
|
|
|
|
parallel_tutorial>). Your command line will love you for it.
|
|
|
|
|
|
|
|
Finally you may want to look at the rest of this manual if you have
|
|
|
|
special needs not already covered.
|
|
|
|
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=head1 OPTIONS
|
|
|
|
|
|
|
|
=over 9
|
|
|
|
|
|
|
|
=item I<command>
|
|
|
|
|
|
|
|
Command to execute. If I<command> or the following arguments contain
|
2011-07-18 16:29:37 +00:00
|
|
|
replacement strings (such as B<{}>) every instance will be substituted
|
|
|
|
with the input.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2012-03-15 20:23:53 +00:00
|
|
|
If I<command> is given, GNU B<parallel> solve the same tasks as
|
|
|
|
B<xargs>. If I<command> is not given GNU B<parallel> will behave
|
|
|
|
similar to B<cat | sh>.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2012-02-20 00:48:28 +00:00
|
|
|
The I<command> must be an executable, a script, a composed command, or
|
2014-03-31 19:29:47 +00:00
|
|
|
a function.
|
|
|
|
|
|
|
|
If it is a Bash function you need to B<export -f> the
|
2013-06-30 16:11:36 +00:00
|
|
|
function first. An alias will, however, not work (see why
|
2011-04-23 12:01:22 +00:00
|
|
|
http://www.perlmonks.org/index.pl?node_id=484296).
|
|
|
|
|
2014-03-31 19:29:47 +00:00
|
|
|
If it is a zsh function you will need to use this helper function
|
|
|
|
B<exportf> to export and to set $SHELL to bash:
|
|
|
|
|
|
|
|
function exportf (){
|
|
|
|
export $(echo $1)="`whence -f $1 | sed -e "s/$1 //" `"
|
|
|
|
}
|
|
|
|
|
|
|
|
function my_func(){
|
|
|
|
echo $1;
|
|
|
|
echo "hello";
|
|
|
|
}
|
|
|
|
|
|
|
|
exportf my_func
|
|
|
|
SHELL=/bin/bash parallel "my_func {}" ::: 1 2
|
|
|
|
|
2014-07-14 16:25:45 +00:00
|
|
|
The command cannot contain the character \257 (¯).
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2014-07-14 16:25:45 +00:00
|
|
|
=item B<{}> (alpha testing)
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-07-10 14:33:33 +00:00
|
|
|
Input line. This replacement string will be replaced by a full line
|
2011-07-16 23:46:02 +00:00
|
|
|
read from the input source. The input source is normally stdin
|
|
|
|
(standard input), but can also be given with B<-a>, B<:::>, or
|
|
|
|
B<::::>.
|
2011-07-10 14:33:33 +00:00
|
|
|
|
|
|
|
The replacement string B<{}> can be changed with B<-I>.
|
|
|
|
|
|
|
|
If the command line contains no replacement strings then B<{}> will be
|
|
|
|
appended to the command line.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
2014-07-14 16:25:45 +00:00
|
|
|
=item B<{.}> (alpha testing)
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-07-10 14:33:33 +00:00
|
|
|
Input line without extension. This replacement string will be replaced
|
|
|
|
by the input with the extension removed. If the input line contains
|
|
|
|
B<.> after the last B</> the last B<.> till the end of the string will
|
|
|
|
be removed and B<{.}> will be replaced with the
|
|
|
|
remaining. E.g. I<foo.jpg> becomes I<foo>, I<subdir/foo.jpg> becomes
|
|
|
|
I<subdir/foo>, I<sub.dir/foo.jpg> becomes I<sub.dir/foo>,
|
|
|
|
I<sub.dir/bar> remains I<sub.dir/bar>. If the input line does not
|
|
|
|
contain B<.> it will remain unchanged.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-10-10 20:14:55 +00:00
|
|
|
The replacement string B<{.}> can be changed with B<--er>.
|
2011-07-10 14:33:33 +00:00
|
|
|
|
|
|
|
To understand replacement strings see B<{}>.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
2014-07-14 16:25:45 +00:00
|
|
|
=item B<{/}> (alpha testing)
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-07-10 14:33:33 +00:00
|
|
|
Basename of input line. This replacement string will be replaced by
|
|
|
|
the input with the directory part removed.
|
|
|
|
|
|
|
|
The replacement string B<{/}> can be changed with
|
|
|
|
B<--basenamereplace>.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-07-10 14:33:33 +00:00
|
|
|
To understand replacement strings see B<{}>.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
2014-07-14 16:25:45 +00:00
|
|
|
=item B<{//}> (alpha testing)
|
2011-04-27 15:12:35 +00:00
|
|
|
|
2011-07-10 14:33:33 +00:00
|
|
|
Dirname of input line. This replacement string will be replaced by the
|
|
|
|
dir of the input line. See B<dirname>(1).
|
2011-04-27 15:12:35 +00:00
|
|
|
|
2011-07-10 14:33:33 +00:00
|
|
|
The replacement string B<{//}> can be changed with
|
|
|
|
B<--dirnamereplace>.
|
|
|
|
|
|
|
|
To understand replacement strings see B<{}>.
|
2011-04-27 15:12:35 +00:00
|
|
|
|
|
|
|
|
2014-07-14 16:25:45 +00:00
|
|
|
=item B<{/.}> (alpha testing)
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-07-10 14:33:33 +00:00
|
|
|
Basename of input line without extension. This replacement string will
|
|
|
|
be replaced by the input with the directory and extension part
|
|
|
|
removed. It is a combination of B<{/}> and B<{.}>.
|
|
|
|
|
|
|
|
The replacement string B<{/.}> can be changed with
|
|
|
|
B<--basenameextensionreplace>.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-07-10 14:33:33 +00:00
|
|
|
To understand replacement strings see B<{}>.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
2014-07-14 16:25:45 +00:00
|
|
|
=item B<{#}> (alpha testing)
|
2011-04-08 19:57:19 +00:00
|
|
|
|
2011-07-10 14:33:33 +00:00
|
|
|
Sequence number of the job to run. This replacement string will be
|
|
|
|
replaced by the sequence number of the job being run. It contains the
|
|
|
|
same number as $PARALLEL_SEQ.
|
2011-04-08 19:57:19 +00:00
|
|
|
|
|
|
|
The replacement string B<{#}> can be changed with B<--seqreplace>.
|
|
|
|
|
2011-07-10 14:33:33 +00:00
|
|
|
To understand replacement strings see B<{}>.
|
|
|
|
|
2011-04-08 19:57:19 +00:00
|
|
|
|
2014-06-04 00:22:39 +00:00
|
|
|
=item B<{%}> (alpha testing)
|
2014-05-22 12:53:33 +00:00
|
|
|
|
|
|
|
Job slot number. This replacement string will be replaced by the job's
|
2014-05-31 06:42:56 +00:00
|
|
|
slot number between 1 and number of jobs to run in parallel. There
|
|
|
|
will never be 2 jobs running at the same time with the same job slot
|
|
|
|
number.
|
2014-05-22 12:53:33 +00:00
|
|
|
|
|
|
|
The replacement string B<{%}> can be changed with B<--slotreplace>.
|
|
|
|
|
|
|
|
To understand replacement strings see B<{}>.
|
|
|
|
|
|
|
|
|
2013-06-22 12:50:48 +00:00
|
|
|
=item B<{>I<n>B<}>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-07-10 14:33:33 +00:00
|
|
|
Argument from input source I<n> or the I<n>'th argument. This
|
|
|
|
positional replacement string will be replaced by the input from input
|
|
|
|
source I<n> (when used with B<-a> or B<::::>) or with the I<n>'th
|
2013-02-21 22:36:14 +00:00
|
|
|
argument (when used with B<-N>). If I<n> is negative it refers to the
|
|
|
|
I<n>'th last argument.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-07-10 14:33:33 +00:00
|
|
|
To understand replacement strings see B<{}>.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
2013-06-22 12:50:48 +00:00
|
|
|
=item B<{>I<n>.B<}>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-05-13 12:02:03 +00:00
|
|
|
Argument from input source I<n> or the I<n>'th argument without
|
2010-12-06 23:30:08 +00:00
|
|
|
extension. It is a combination of B<{>I<n>B<}> and B<{.}>.
|
|
|
|
|
2011-07-10 14:33:33 +00:00
|
|
|
This positional replacement string will be replaced by the input from
|
|
|
|
input source I<n> (when used with B<-a> or B<::::>) or with the
|
|
|
|
I<n>'th argument (when used with B<-N>). The input will have the
|
|
|
|
extension removed.
|
|
|
|
|
|
|
|
To understand positional replacement strings see B<{>I<n>B<}>.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
2013-06-22 12:50:48 +00:00
|
|
|
=item B<{>I<n>/B<}>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-05-13 12:02:03 +00:00
|
|
|
Basename of argument from input source I<n> or the I<n>'th argument.
|
2011-07-10 14:33:33 +00:00
|
|
|
It is a combination of B<{>I<n>B<}> and B<{/}>.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-07-10 14:33:33 +00:00
|
|
|
This positional replacement string will be replaced by the input from
|
|
|
|
input source I<n> (when used with B<-a> or B<::::>) or with the
|
|
|
|
I<n>'th argument (when used with B<-N>). The input will have the
|
|
|
|
directory (if any) removed.
|
|
|
|
|
|
|
|
To understand positional replacement strings see B<{>I<n>B<}>.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
2013-06-22 12:50:48 +00:00
|
|
|
=item B<{>I<n>//B<}>
|
2012-03-15 20:23:53 +00:00
|
|
|
|
|
|
|
Dirname of argument from input source I<n> or the I<n>'th argument.
|
|
|
|
It is a combination of B<{>I<n>B<}> and B<{//}>.
|
|
|
|
|
|
|
|
This positional replacement string will be replaced by the dir of the
|
|
|
|
input from input source I<n> (when used with B<-a> or B<::::>) or with
|
|
|
|
the I<n>'th argument (when used with B<-N>). See B<dirname>(1).
|
|
|
|
|
|
|
|
To understand positional replacement strings see B<{>I<n>B<}>.
|
|
|
|
|
|
|
|
|
2013-06-22 12:50:48 +00:00
|
|
|
=item B<{>I<n>/.B<}>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-05-13 12:02:03 +00:00
|
|
|
Basename of argument from input source I<n> or the I<n>'th argument
|
2010-12-06 23:30:08 +00:00
|
|
|
without extension. It is a combination of B<{>I<n>B<}>, B<{/}>, and
|
2011-07-10 14:33:33 +00:00
|
|
|
B<{.}>.
|
|
|
|
|
|
|
|
This positional replacement string will be replaced by the input from
|
|
|
|
input source I<n> (when used with B<-a> or B<::::>) or with the
|
|
|
|
I<n>'th argument (when used with B<-N>). The input will have the
|
|
|
|
directory (if any) and extension removed.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-07-10 14:33:33 +00:00
|
|
|
To understand positional replacement strings see B<{>I<n>B<}>.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
2014-07-14 16:25:45 +00:00
|
|
|
=item B<{=>I<perl expression>B<=}>
|
|
|
|
|
|
|
|
Replace with calculated I<perl expression>. B<$_> will contain the
|
|
|
|
same as B<{}>. After evaluating I<perl expression> B<$_> will be used
|
|
|
|
as the value. It is recommended to only change $_ but you have full
|
|
|
|
access to all of GNU B<parallel>'s internal functions and data
|
|
|
|
structures.
|
|
|
|
|
|
|
|
The B<{=>I<perl expression>B<=}> must be given as a single string.
|
|
|
|
|
|
|
|
See also: B<--rpl> B<--parens>
|
|
|
|
|
|
|
|
|
|
|
|
=item B<{=>I<n> I<perl expression>B<=}>
|
|
|
|
|
|
|
|
Positional equivalent to B<{= perl expression =}>. To understand
|
|
|
|
positional replacement strings see B<{>I<n>B<}>.
|
|
|
|
|
|
|
|
See also: B<{= perl expression =}> B<{>I<n>B<}>.
|
|
|
|
|
|
|
|
|
2011-08-21 23:01:57 +00:00
|
|
|
=item B<:::> I<arguments>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-07-16 23:46:02 +00:00
|
|
|
Use arguments from the command line as input source instead of stdin
|
|
|
|
(standard input). Unlike other options for GNU B<parallel> B<:::> is
|
|
|
|
placed after the I<command> and before the arguments.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
The following are equivalent:
|
|
|
|
|
|
|
|
(echo file1; echo file2) | parallel gzip
|
|
|
|
parallel gzip ::: file1 file2
|
|
|
|
parallel gzip {} ::: file1 file2
|
|
|
|
parallel --arg-sep ,, gzip {} ,, file1 file2
|
|
|
|
parallel --arg-sep ,, gzip ,, file1 file2
|
|
|
|
parallel ::: "gzip file1" "gzip file2"
|
|
|
|
|
|
|
|
To avoid treating B<:::> as special use B<--arg-sep> to set the
|
|
|
|
argument separator to something else. See also B<--arg-sep>.
|
|
|
|
|
|
|
|
stdin (standard input) will be passed to the first process run.
|
|
|
|
|
2011-05-13 12:02:03 +00:00
|
|
|
If multiple B<:::> are given, each group will be treated as an input
|
|
|
|
source, and all combinations of input sources will be
|
|
|
|
generated. E.g. ::: 1 2 ::: a b c will result in the combinations
|
|
|
|
(1,a) (1,b) (1,c) (2,a) (2,b) (2,c). This is useful for replacing
|
|
|
|
nested for-loops.
|
2011-05-05 16:26:29 +00:00
|
|
|
|
|
|
|
B<:::> and B<::::> can be mixed. So these are equivalent:
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-05-05 21:36:12 +00:00
|
|
|
parallel echo {1} {2} {3} ::: 6 7 ::: 4 5 ::: 1 2 3
|
2011-05-05 16:26:29 +00:00
|
|
|
parallel echo {1} {2} {3} :::: <(seq 6 7) <(seq 4 5) :::: <(seq 1 3)
|
|
|
|
parallel -a <(seq 6 7) echo {1} {2} {3} :::: <(seq 4 5) :::: <(seq 1 3)
|
|
|
|
parallel -a <(seq 6 7) -a <(seq 4 5) echo {1} {2} {3} ::: 1 2 3
|
2011-05-05 16:52:23 +00:00
|
|
|
seq 6 7 | parallel -a - -a <(seq 4 5) echo {1} {2} {3} ::: 1 2 3
|
|
|
|
seq 4 5 | parallel echo {1} {2} {3} :::: <(seq 6 7) - ::: 1 2 3
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-05-05 21:36:12 +00:00
|
|
|
|
2011-08-21 23:01:57 +00:00
|
|
|
=item B<::::> I<argfiles>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Another way to write B<-a> I<argfile1> B<-a> I<argfile2> ...
|
|
|
|
|
2011-05-13 12:02:03 +00:00
|
|
|
B<:::> and B<::::> can be mixed.
|
|
|
|
|
2012-01-13 23:53:19 +00:00
|
|
|
See B<-a>, B<:::> and B<--xapply>.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
|
|
|
=item B<--null>
|
|
|
|
|
|
|
|
=item B<-0>
|
|
|
|
|
|
|
|
Use NUL as delimiter. Normally input lines will end in \n
|
|
|
|
(newline). If they end in \0 (NUL), then use this option. It is useful
|
|
|
|
for processing arguments that may contain \n (newline).
|
|
|
|
|
|
|
|
|
|
|
|
=item B<--arg-file> I<input-file>
|
|
|
|
|
|
|
|
=item B<-a> I<input-file>
|
|
|
|
|
2011-05-26 21:19:58 +00:00
|
|
|
Use I<input-file> as input source. If you use this option, stdin
|
|
|
|
(standard input) is given to the first process run. Otherwise, stdin
|
|
|
|
(standard input) is redirected from /dev/null.
|
2011-05-04 18:55:01 +00:00
|
|
|
|
2011-05-13 12:02:03 +00:00
|
|
|
If multiple B<-a> are given, each I<input-file> will be treated as an
|
|
|
|
input source, and all combinations of input sources will be
|
|
|
|
generated. E.g. The file B<foo> contains B<1 2>, the file B<bar>
|
|
|
|
contains B<a b c>. B<-a foo> B<-a bar> will result in the combinations
|
|
|
|
(1,a) (1,b) (1,c) (2,a) (2,b) (2,c). This is useful for replacing
|
|
|
|
nested for-loops.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-07-16 23:46:02 +00:00
|
|
|
See also B<--xapply> and B<{>I<n>B<}>.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-05-13 12:02:03 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=item B<--arg-file-sep> I<sep-str>
|
|
|
|
|
|
|
|
Use I<sep-str> instead of B<::::> as separator string between command
|
|
|
|
and argument files. Useful if B<::::> is used for something else by the
|
|
|
|
command.
|
|
|
|
|
|
|
|
See also: B<::::>.
|
|
|
|
|
|
|
|
|
|
|
|
=item B<--arg-sep> I<sep-str>
|
|
|
|
|
|
|
|
Use I<sep-str> instead of B<:::> as separator string. Useful if B<:::>
|
|
|
|
is used for something else by the command.
|
|
|
|
|
|
|
|
Also useful if you command uses B<:::> but you still want to read
|
|
|
|
arguments from stdin (standard input): Simply change B<--arg-sep> to a
|
|
|
|
string that is not in the command line.
|
|
|
|
|
|
|
|
See also: B<:::>.
|
|
|
|
|
|
|
|
|
2014-01-22 02:06:39 +00:00
|
|
|
=item B<--bar>
|
2013-11-22 22:31:46 +00:00
|
|
|
|
|
|
|
Show progress as a progress bar. In the bar is shown: % of jobs
|
|
|
|
completed, estimated seconds left, and number of jobs started.
|
|
|
|
|
|
|
|
It is compatible with B<zenity>:
|
|
|
|
|
|
|
|
seq 1000 | parallel -j30 --bar '(echo {};sleep 0.1)' 2> >(zenity --progress --auto-kill) | wc
|
|
|
|
|
|
|
|
|
2014-01-22 02:06:39 +00:00
|
|
|
=item B<--basefile> I<file>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2014-01-22 02:06:39 +00:00
|
|
|
=item B<--bf> I<file>
|
2011-10-10 20:14:55 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
I<file> will be transferred to each sshlogin before a jobs is
|
|
|
|
started. It will be removed if B<--cleanup> is active. The file may be
|
|
|
|
a script to run or some common base data needed for the jobs.
|
2012-01-07 01:24:50 +00:00
|
|
|
Multiple B<--bf> can be specified to transfer more basefiles. The
|
2010-12-06 23:30:08 +00:00
|
|
|
I<file> will be transferred the same way as B<--transfer>.
|
|
|
|
|
|
|
|
|
2011-01-24 19:06:30 +00:00
|
|
|
=item B<--basenamereplace> I<replace-str>
|
|
|
|
|
|
|
|
=item B<--bnr> I<replace-str>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-04-08 19:57:19 +00:00
|
|
|
Use the replacement string I<replace-str> instead of B<{/}> for
|
|
|
|
basename of input line.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
2011-03-01 22:04:15 +00:00
|
|
|
=item B<--basenameextensionreplace> I<replace-str>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-07-16 23:46:02 +00:00
|
|
|
=item B<--bner> I<replace-str>
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
Use the replacement string I<replace-str> instead of B<{/.}> for basename of input line without extension.
|
|
|
|
|
|
|
|
|
2011-05-05 18:50:53 +00:00
|
|
|
=item B<--bg>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Run command in background thus GNU B<parallel> will not wait for
|
|
|
|
completion of the command before exiting. This is the default if
|
|
|
|
B<--semaphore> is set.
|
|
|
|
|
2013-08-14 18:11:00 +00:00
|
|
|
See also: B<--fg>, B<man sem>.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Implies B<--semaphore>.
|
|
|
|
|
|
|
|
|
2014-01-22 02:06:39 +00:00
|
|
|
=item B<--bibtex>
|
2011-10-22 23:49:32 +00:00
|
|
|
|
2013-11-22 22:31:46 +00:00
|
|
|
Print the BibTeX entry for GNU B<parallel> and disable citation
|
|
|
|
notice.
|
2011-10-22 23:49:32 +00:00
|
|
|
|
|
|
|
|
2013-05-21 22:29:41 +00:00
|
|
|
=item B<--block> I<size>
|
2011-01-22 22:40:15 +00:00
|
|
|
|
2013-05-21 22:29:41 +00:00
|
|
|
=item B<--block-size> I<size>
|
2011-01-22 22:40:15 +00:00
|
|
|
|
2011-08-03 09:54:55 +00:00
|
|
|
Size of block in bytes. The size can be postfixed with K, M, G, T, P,
|
|
|
|
k, m, g, t, or p which would multiply the size with 1024, 1048576,
|
|
|
|
1073741824, 1099511627776, 1125899906842624, 1000, 1000000,
|
|
|
|
1000000000, 1000000000000, or 1000000000000000 respectively.
|
2011-01-22 22:40:15 +00:00
|
|
|
|
|
|
|
GNU B<parallel> tries to meet the block size but can be off by the
|
2012-09-23 21:21:13 +00:00
|
|
|
length of one record. For performance reasons I<size> should be bigger
|
|
|
|
than a single record.
|
2011-01-22 22:40:15 +00:00
|
|
|
|
|
|
|
I<size> defaults to 1M.
|
|
|
|
|
|
|
|
See B<--pipe> for use of this.
|
|
|
|
|
|
|
|
|
2014-06-23 01:35:59 +00:00
|
|
|
=item B<--cat>
|
2014-03-23 00:07:18 +00:00
|
|
|
|
|
|
|
Create a temporary file with content. Normally B<--pipe> will give
|
|
|
|
data to the program on stdin (standard input). With B<--cat> GNU
|
|
|
|
B<parallel> will create a temporary file with the name in {}, so you
|
|
|
|
can do: B<parallel --pipe --cat wc {}>.
|
|
|
|
|
|
|
|
See also B<--fifo>.
|
|
|
|
|
|
|
|
|
2014-01-22 02:06:39 +00:00
|
|
|
=item B<--cleanup>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Remove transferred files. B<--cleanup> will remove the transferred files
|
2010-12-21 17:08:16 +00:00
|
|
|
on the remote computer after processing is done.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
find log -name '*gz' | parallel \
|
|
|
|
--sshlogin server.example.com --transfer --return {.}.bz2 \
|
|
|
|
--cleanup "zcat {} | bzip -9 >{.}.bz2"
|
|
|
|
|
2010-12-21 17:08:16 +00:00
|
|
|
With B<--transfer> the file transferred to the remote computer will be
|
|
|
|
removed on the remote computer. Directories created will not be removed
|
2010-12-06 23:30:08 +00:00
|
|
|
- even if they are empty.
|
|
|
|
|
2010-12-21 17:08:16 +00:00
|
|
|
With B<--return> the file transferred from the remote computer will be
|
|
|
|
removed on the remote computer. Directories created will not be removed
|
2010-12-06 23:30:08 +00:00
|
|
|
- even if they are empty.
|
|
|
|
|
|
|
|
B<--cleanup> is ignored when not used with B<--transfer> or B<--return>.
|
|
|
|
|
|
|
|
|
|
|
|
=item B<--colsep> I<regexp>
|
|
|
|
|
|
|
|
=item B<-C> I<regexp>
|
|
|
|
|
|
|
|
Column separator. The input will be treated as a table with I<regexp>
|
|
|
|
separating the columns. The n'th column can be access using
|
|
|
|
B<{>I<n>B<}> or B<{>I<n>.B<}>. E.g. B<{3}> is the 3rd column.
|
|
|
|
|
|
|
|
B<--colsep> implies B<--trim rl>.
|
|
|
|
|
|
|
|
I<regexp> is a Perl Regular Expression:
|
|
|
|
http://perldoc.perl.org/perlre.html
|
|
|
|
|
|
|
|
|
2014-04-22 10:03:41 +00:00
|
|
|
=item B<--compress>
|
2013-11-02 22:17:20 +00:00
|
|
|
|
|
|
|
Compress temporary files. If the output is big and very compressible
|
|
|
|
this will take up less disk space in $TMPDIR and possibly be faster due to less
|
|
|
|
disk I/O.
|
|
|
|
|
|
|
|
GNU B<parallel> will try B<lzop>, B<pigz>, B<gzip>, B<pbzip2>,
|
|
|
|
B<plzip>, B<bzip2>, B<lzma>, B<lzip>, B<xz> in that order, and use the
|
|
|
|
first available.
|
|
|
|
|
|
|
|
|
2014-04-22 10:03:41 +00:00
|
|
|
=item B<--compress-program> I<prg>
|
2013-11-02 22:17:20 +00:00
|
|
|
|
2014-04-22 10:03:41 +00:00
|
|
|
=item B<--decompress-program> I<prg>
|
2014-02-22 10:32:42 +00:00
|
|
|
|
|
|
|
Use I<prg> for (de)compressing temporary files. It is assumed that I<prg
|
2013-11-02 22:17:20 +00:00
|
|
|
-dc> will decompress stdin (standard input) to stdout (standard
|
2014-02-22 10:32:42 +00:00
|
|
|
output) unless B<--decompress-program> is given.
|
2013-11-02 22:17:20 +00:00
|
|
|
|
|
|
|
|
2013-07-20 15:33:26 +00:00
|
|
|
=item B<--ctrlc>
|
2013-04-21 21:33:02 +00:00
|
|
|
|
|
|
|
Sends SIGINT to tasks running on remote computers thus killing them.
|
|
|
|
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=item B<--delimiter> I<delim>
|
|
|
|
|
|
|
|
=item B<-d> I<delim>
|
|
|
|
|
|
|
|
Input items are terminated by the specified character. Quotes and
|
|
|
|
backslash are not special; every character in the input is taken
|
|
|
|
literally. Disables the end-of-file string, which is treated like any
|
|
|
|
other argument. This can be used when the input consists of simply
|
|
|
|
newline-separated items, although it is almost always better to design
|
|
|
|
your program to use --null where this is possible. The specified
|
|
|
|
delimiter may be a single character, a C-style character escape such
|
|
|
|
as \n, or an octal or hexadecimal escape code. Octal and
|
|
|
|
hexadecimal escape codes are understood as for the printf command.
|
|
|
|
Multibyte characters are not supported.
|
|
|
|
|
2011-08-21 23:01:57 +00:00
|
|
|
=item B<--dirnamereplace> I<replace-str>
|
2011-04-27 15:12:35 +00:00
|
|
|
|
2011-08-21 23:01:57 +00:00
|
|
|
=item B<--dnr> I<replace-str>
|
2011-04-27 15:12:35 +00:00
|
|
|
|
|
|
|
Use the replacement string I<replace-str> instead of B<{//}> for
|
|
|
|
dirname of input line.
|
|
|
|
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=item B<-E> I<eof-str>
|
|
|
|
|
|
|
|
Set the end of file string to eof-str. If the end of file string
|
|
|
|
occurs as a line of input, the rest of the input is ignored. If
|
|
|
|
neither B<-E> nor B<-e> is used, no end of file string is used.
|
|
|
|
|
|
|
|
|
2013-10-21 20:31:52 +00:00
|
|
|
=item B<--delay> I<secs>
|
2012-11-26 21:57:07 +00:00
|
|
|
|
|
|
|
Delay starting next job I<secs> seconds. GNU B<parallel> will pause
|
|
|
|
I<secs> seconds after starting each job. I<secs> can be less than 1
|
|
|
|
seconds.
|
|
|
|
|
|
|
|
|
2010-12-15 23:12:02 +00:00
|
|
|
=item B<--dry-run>
|
|
|
|
|
2011-05-26 21:19:58 +00:00
|
|
|
Print the job to run on stdout (standard output), but do not run the
|
|
|
|
job. Use B<-v -v> to include the ssh/rsync wrapping if the job would
|
|
|
|
be run on a remote computer. Do not count on this literaly, though, as
|
|
|
|
the job may be scheduled on another computer or the local computer if
|
|
|
|
: is in the list.
|
2010-12-15 23:12:02 +00:00
|
|
|
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=item B<--eof>[=I<eof-str>]
|
|
|
|
|
|
|
|
=item B<-e>[I<eof-str>]
|
|
|
|
|
|
|
|
This option is a synonym for the B<-E> option. Use B<-E> instead,
|
|
|
|
because it is POSIX compliant for B<xargs> while this option is not.
|
|
|
|
If I<eof-str> is omitted, there is no end of file string. If neither
|
|
|
|
B<-E> nor B<-e> is used, no end of file string is used.
|
|
|
|
|
|
|
|
|
2013-10-21 20:31:52 +00:00
|
|
|
=item B<--env> I<var>
|
2012-10-15 13:47:10 +00:00
|
|
|
|
|
|
|
Copy environment variable I<var>. This will copy I<var> to the
|
|
|
|
environment that the command is run in. This is especially useful for
|
2013-07-20 15:33:26 +00:00
|
|
|
remote execution.
|
2012-10-15 13:47:10 +00:00
|
|
|
|
2013-07-20 15:33:26 +00:00
|
|
|
In Bash I<var> can also be a Bash function - just remember to B<export
|
2014-03-31 19:29:47 +00:00
|
|
|
-f> the function, see B<command>.
|
2012-10-15 13:47:10 +00:00
|
|
|
|
2013-08-14 18:11:00 +00:00
|
|
|
The variable '_' is special. It will copy all enviroment variables
|
2013-08-15 17:38:39 +00:00
|
|
|
except for the ones mentioned in ~/.parallel/ignored_vars.
|
2013-08-14 18:11:00 +00:00
|
|
|
|
|
|
|
See also: B<--record-env>.
|
|
|
|
|
2012-10-15 13:47:10 +00:00
|
|
|
|
2011-12-21 23:17:49 +00:00
|
|
|
=item B<--eta>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Show the estimated number of seconds before finishing. This forces GNU
|
|
|
|
B<parallel> to read all jobs before starting to find the number of
|
|
|
|
jobs. GNU B<parallel> normally only reads the next job to run.
|
|
|
|
Implies B<--progress>.
|
|
|
|
|
|
|
|
|
2011-05-05 18:50:53 +00:00
|
|
|
=item B<--fg>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Run command in foreground thus GNU B<parallel> will wait for
|
|
|
|
completion of the command before exiting.
|
|
|
|
|
2013-08-14 18:11:00 +00:00
|
|
|
See also B<--bg>, B<man sem>.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Implies B<--semaphore>.
|
|
|
|
|
|
|
|
|
2014-06-23 01:35:59 +00:00
|
|
|
=item B<--fifo>
|
2014-03-23 00:07:18 +00:00
|
|
|
|
|
|
|
Create a temporary fifo with content. Normally B<--pipe> will give
|
|
|
|
data to the program on stdin (standard input). With B<--fifo> GNU
|
|
|
|
B<parallel> will create a temporary fifo with the name in {}, so you
|
|
|
|
can do: B<parallel --pipe --fifo wc {}>.
|
|
|
|
|
|
|
|
Beware: If data is not read from the fifo, the job will block forever.
|
|
|
|
|
|
|
|
See also B<--cat>.
|
|
|
|
|
|
|
|
|
2014-02-22 10:32:42 +00:00
|
|
|
=item B<--filter-hosts>
|
2012-06-23 05:34:35 +00:00
|
|
|
|
|
|
|
Remove down hosts. For each remote host: check that login through ssh
|
|
|
|
works. If not: do not use this host.
|
|
|
|
|
2013-02-10 12:32:50 +00:00
|
|
|
Currently you can I<not> put B<--filter-hosts> in a profile,
|
2012-06-23 05:34:35 +00:00
|
|
|
$PARALLEL, /etc/parallel/config or similar. This is because GNU
|
|
|
|
B<parallel> uses GNU B<parallel> to compute this, so you will get an
|
|
|
|
infinite loop. This will likely be fixed in a later release.
|
|
|
|
|
|
|
|
|
2014-03-21 21:39:54 +00:00
|
|
|
=item B<--gnu>
|
2011-02-18 14:23:00 +00:00
|
|
|
|
|
|
|
Behave like GNU B<parallel>. If B<--tollef> and B<--gnu> are both set,
|
2014-03-21 21:39:54 +00:00
|
|
|
B<--gnu> takes precedence. B<--tollef> is retired, but B<--gnu> is
|
|
|
|
kept for compatibility.
|
2011-02-18 14:23:00 +00:00
|
|
|
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=item B<--group>
|
|
|
|
|
2011-08-09 20:00:31 +00:00
|
|
|
Group output. Output from each jobs is grouped together and is only
|
|
|
|
printed when the command is finished. stderr (standard error) first
|
|
|
|
followed by stdout (standard output). This takes some CPU time. In
|
2011-10-14 22:21:23 +00:00
|
|
|
rare situations GNU B<parallel> takes up lots of CPU time and if it is
|
2011-12-09 22:25:20 +00:00
|
|
|
acceptable that the outputs from different commands are mixed
|
|
|
|
together, then disabling grouping with B<-u> can speedup GNU
|
|
|
|
B<parallel> by a factor of 10.
|
2011-08-09 20:00:31 +00:00
|
|
|
|
2011-10-10 20:14:55 +00:00
|
|
|
B<--group> is the default. Can be reversed with B<-u>.
|
2011-08-09 20:00:31 +00:00
|
|
|
|
2014-05-22 12:53:33 +00:00
|
|
|
See also: B<--line-buffer> B<--ungroup>
|
|
|
|
|
2011-02-18 14:23:00 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=item B<--help>
|
|
|
|
|
|
|
|
=item B<-h>
|
|
|
|
|
|
|
|
Print a summary of the options to GNU B<parallel> and exit.
|
|
|
|
|
|
|
|
|
2014-07-20 17:55:53 +00:00
|
|
|
=item B<--halt-on-error> I<val> (alpha testing)
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2014-07-20 17:55:53 +00:00
|
|
|
=item B<--halt> I<val> (alpha testing)
|
2011-10-10 20:14:55 +00:00
|
|
|
|
2014-07-20 17:55:53 +00:00
|
|
|
How should GNU B<parallel> terminate if one of more jobs fail?
|
|
|
|
|
|
|
|
=over 7
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2013-07-26 12:58:42 +00:00
|
|
|
=item Z<>0
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Do not halt if a job fails. Exit status will be the number of jobs
|
|
|
|
failed. This is the default.
|
|
|
|
|
2013-07-26 12:58:42 +00:00
|
|
|
=item Z<>1
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Do not start new jobs if a job fails, but complete the running jobs
|
|
|
|
including cleanup. The exit status will be the exit status from the
|
|
|
|
last failing job.
|
|
|
|
|
2013-07-26 12:58:42 +00:00
|
|
|
=item Z<>2
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Kill off all jobs immediately and exit without cleanup. The exit
|
|
|
|
status will be the exit status from the failing job.
|
|
|
|
|
2014-07-20 17:55:53 +00:00
|
|
|
=item Z<>1-99%
|
|
|
|
|
|
|
|
If I<val>% of the jobs fail and minimum 3: Do not start new jobs, but
|
|
|
|
complete the running jobs including cleanup. The exit status will be
|
|
|
|
the exit status from the last failing job.
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=back
|
|
|
|
|
|
|
|
|
2013-10-21 20:31:52 +00:00
|
|
|
=item B<--header> I<regexp>
|
2012-01-13 23:53:19 +00:00
|
|
|
|
2013-08-17 16:24:51 +00:00
|
|
|
Use regexp as header. For normal usage the matched header (typically
|
|
|
|
the first line: B<--header '.*\n'>) will be split using B<--colsep>
|
|
|
|
(which will default to '\t') and column names can be used as
|
|
|
|
replacement variables: B<{column name}>.
|
2012-01-13 23:53:19 +00:00
|
|
|
|
2013-08-17 16:24:51 +00:00
|
|
|
For B<--pipe> the matched header will be prepended to each output.
|
|
|
|
|
|
|
|
B<--header :> is an alias for B<--header '.*\n'>.
|
|
|
|
|
|
|
|
If I<regexp> is a number, it will match that many lines.
|
2012-01-13 23:53:19 +00:00
|
|
|
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=item B<-I> I<replace-str>
|
|
|
|
|
|
|
|
Use the replacement string I<replace-str> instead of {}.
|
|
|
|
|
|
|
|
|
|
|
|
=item B<--replace>[=I<replace-str>]
|
|
|
|
|
|
|
|
=item B<-i>[I<replace-str>]
|
|
|
|
|
|
|
|
This option is a synonym for B<-I>I<replace-str> if I<replace-str> is
|
|
|
|
specified, and for B<-I>{} otherwise. This option is deprecated;
|
|
|
|
use B<-I> instead.
|
|
|
|
|
|
|
|
|
2014-01-22 02:06:39 +00:00
|
|
|
=item B<--joblog> I<logfile>
|
2011-01-18 17:15:42 +00:00
|
|
|
|
2012-03-12 22:38:38 +00:00
|
|
|
Logfile for executed jobs. Save a list of the executed jobs to
|
2011-01-19 15:25:25 +00:00
|
|
|
I<logfile> in the following TAB separated format: sequence number,
|
|
|
|
sshlogin, start time as seconds since epoch, run time in seconds,
|
2012-05-27 16:14:25 +00:00
|
|
|
bytes in files transferred, bytes in files returned, exit status,
|
2013-11-28 14:24:34 +00:00
|
|
|
signal, and command run.
|
2011-01-18 17:15:42 +00:00
|
|
|
|
|
|
|
To convert the times into ISO-8601 strict do:
|
2011-01-19 15:25:25 +00:00
|
|
|
|
2011-01-18 17:15:42 +00:00
|
|
|
B<perl -a -F"\t" -ne 'chomp($F[2]=`date -d \@$F[2] +%FT%T`); print join("\t",@F)'>
|
|
|
|
|
2012-01-07 02:29:48 +00:00
|
|
|
See also B<--resume>.
|
|
|
|
|
2011-01-18 17:15:42 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=item B<--jobs> I<N>
|
|
|
|
|
|
|
|
=item B<-j> I<N>
|
|
|
|
|
|
|
|
=item B<--max-procs> I<N>
|
|
|
|
|
|
|
|
=item B<-P> I<N>
|
|
|
|
|
2011-01-02 00:01:21 +00:00
|
|
|
Number of jobslots. Run up to N jobs in parallel. 0 means as many as
|
2011-03-20 21:40:12 +00:00
|
|
|
possible. Default is 100% which will run one job per CPU core.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
If B<--semaphore> is set default is 1 thus making a mutex.
|
|
|
|
|
|
|
|
|
|
|
|
=item B<--jobs> I<+N>
|
|
|
|
|
|
|
|
=item B<-j> I<+N>
|
|
|
|
|
|
|
|
=item B<--max-procs> I<+N>
|
|
|
|
|
|
|
|
=item B<-P> I<+N>
|
|
|
|
|
2012-03-15 20:23:53 +00:00
|
|
|
Add N to the number of CPU cores. Run this many jobs in parallel.
|
|
|
|
See also B<--use-cpus-instead-of-cores>.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
|
|
|
=item B<--jobs> I<-N>
|
|
|
|
|
|
|
|
=item B<-j> I<-N>
|
|
|
|
|
|
|
|
=item B<--max-procs> I<-N>
|
|
|
|
|
|
|
|
=item B<-P> I<-N>
|
|
|
|
|
|
|
|
Subtract N from the number of CPU cores. Run this many jobs in parallel.
|
|
|
|
If the evaluated number is less than 1 then 1 will be used. See also
|
|
|
|
B<--use-cpus-instead-of-cores>.
|
|
|
|
|
|
|
|
|
|
|
|
=item B<--jobs> I<N>%
|
|
|
|
|
|
|
|
=item B<-j> I<N>%
|
|
|
|
|
|
|
|
=item B<--max-procs> I<N>%
|
|
|
|
|
|
|
|
=item B<-P> I<N>%
|
|
|
|
|
|
|
|
Multiply N% with the number of CPU cores. Run this many jobs in parallel.
|
|
|
|
If the evaluated number is less than 1 then 1 will be used. See also
|
|
|
|
B<--use-cpus-instead-of-cores>.
|
|
|
|
|
|
|
|
|
|
|
|
=item B<--jobs> I<procfile>
|
|
|
|
|
|
|
|
=item B<-j> I<procfile>
|
|
|
|
|
|
|
|
=item B<--max-procs> I<procfile>
|
|
|
|
|
|
|
|
=item B<-P> I<procfile>
|
|
|
|
|
|
|
|
Read parameter from file. Use the content of I<procfile> as parameter
|
|
|
|
for I<-j>. E.g. I<procfile> could contain the string 100% or +2 or
|
|
|
|
10. If I<procfile> is changed when a job completes, I<procfile> is
|
|
|
|
read again and the new number of jobs is computed. If the number is
|
|
|
|
lower than before, running jobs will be allowed to finish but new jobs
|
|
|
|
will not be started until the wanted number of jobs has been reached.
|
|
|
|
This makes it possible to change the number of simultaneous running
|
|
|
|
jobs while GNU B<parallel> is running.
|
|
|
|
|
|
|
|
|
2011-05-05 16:26:29 +00:00
|
|
|
=item B<--keep-order>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
=item B<-k>
|
|
|
|
|
2011-10-17 01:10:32 +00:00
|
|
|
Keep sequence of output same as the order of input. Normally the
|
|
|
|
output of a job will be printed as soon as the job completes. Try this
|
|
|
|
to see the difference:
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-05-05 16:26:29 +00:00
|
|
|
parallel -j4 sleep {}\; echo {} ::: 2 1 4 3
|
|
|
|
parallel -j4 -k sleep {}\; echo {} ::: 2 1 4 3
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2014-01-22 01:54:18 +00:00
|
|
|
If used with B<--onall> or B<--nonall> the output will grouped by
|
|
|
|
sshlogin in sorted order.
|
2013-11-29 00:56:35 +00:00
|
|
|
|
|
|
|
|
2012-08-08 19:25:18 +00:00
|
|
|
=item B<-L> I<max-lines>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2013-05-21 22:29:41 +00:00
|
|
|
When used with B<--pipe>: Read records of I<max-lines>.
|
2012-06-14 21:13:11 +00:00
|
|
|
|
|
|
|
When used otherwise: Use at most I<max-lines> nonblank input lines per
|
|
|
|
command line. Trailing blanks cause an input line to be logically
|
|
|
|
continued on the next input line.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-02-23 15:22:08 +00:00
|
|
|
B<-L 0> means read one line, but insert 0 arguments on the command
|
|
|
|
line.
|
|
|
|
|
2012-06-14 21:13:11 +00:00
|
|
|
Implies B<-X> unless B<-m>, B<--xargs>, or B<--pipe> is set.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
|
|
|
=item B<--max-lines>[=I<max-lines>]
|
|
|
|
|
|
|
|
=item B<-l>[I<max-lines>]
|
|
|
|
|
2013-05-21 22:29:41 +00:00
|
|
|
When used with B<--pipe>: Read records of I<max-lines>.
|
2012-06-14 21:13:11 +00:00
|
|
|
|
|
|
|
When used otherwise: Synonym for the B<-L> option. Unlike B<-L>, the
|
|
|
|
I<max-lines> argument is optional. If I<max-lines> is not specified,
|
|
|
|
it defaults to one. The B<-l> option is deprecated since the POSIX
|
|
|
|
standard specifies B<-L> instead.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-03-01 22:04:15 +00:00
|
|
|
B<-l 0> is an alias for B<-l 1>.
|
2011-02-23 15:22:08 +00:00
|
|
|
|
2012-06-14 22:46:15 +00:00
|
|
|
Implies B<-X> unless B<-m>, B<--xargs>, or B<--pipe> is set.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
2014-07-14 16:25:45 +00:00
|
|
|
=item B<--line-buffer>
|
2013-08-22 15:24:36 +00:00
|
|
|
|
|
|
|
Buffer output on line basis. B<--group> will keep the output together
|
|
|
|
for a whole job. B<--ungroup> allows output to mixup with half a line
|
|
|
|
coming from one job and half a line coming from another
|
|
|
|
job. B<--line-buffer> fits between these two: GNU B<parallel> will
|
|
|
|
print a full line, but will allow for mixing lines of different jobs.
|
|
|
|
|
2014-05-22 12:53:33 +00:00
|
|
|
B<--line-buffer> takes more CPU power than than both B<--group> and
|
|
|
|
B<--ungroup>, but can be faster than B<--group> if the CPU is not the
|
|
|
|
limiting factor.
|
|
|
|
|
|
|
|
See also: B<--group> B<--ungroup>
|
2013-08-22 15:24:36 +00:00
|
|
|
|
|
|
|
|
2013-06-22 12:50:48 +00:00
|
|
|
=item B<--load> I<max-load>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2013-04-21 21:33:02 +00:00
|
|
|
Do not start new jobs on a given computer unless the number of running
|
|
|
|
processes on the computer is less than I<max-load>. I<max-load> uses
|
|
|
|
the same syntax as B<--jobs>, so I<100%> for one per CPU is a valid
|
|
|
|
setting. Only difference is 0 which is interpreted as 0.01.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
2014-07-20 17:55:53 +00:00
|
|
|
=item B<--controlmaster>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2014-07-20 17:55:53 +00:00
|
|
|
=item B<-M>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Use ssh's ControlMaster to make ssh connections faster. Useful if jobs
|
|
|
|
run remote and are very fast to run. This is disabled for sshlogins
|
|
|
|
that specify their own ssh command.
|
|
|
|
|
|
|
|
|
|
|
|
=item B<--xargs>
|
|
|
|
|
2011-07-10 14:33:33 +00:00
|
|
|
Multiple arguments. Insert as many arguments as the command line
|
|
|
|
length permits.
|
|
|
|
|
|
|
|
If B<{}> is not used the arguments will be appended to the
|
|
|
|
line. If B<{}> is used multiple times each B<{}> will be replaced
|
|
|
|
with all the arguments.
|
|
|
|
|
|
|
|
Support for B<--xargs> with B<--sshlogin> is limited and may fail.
|
|
|
|
|
|
|
|
See also B<-X> for context replace. If in doubt use B<-X> as that will
|
|
|
|
most likely do what is needed.
|
|
|
|
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=item B<-m>
|
|
|
|
|
2011-06-27 21:21:26 +00:00
|
|
|
Multiple arguments. Insert as many arguments as the command line
|
|
|
|
length permits. If multiple jobs are being run in parallel: distribute
|
|
|
|
the arguments evenly among the jobs. Use B<-j1> to avoid this.
|
|
|
|
|
|
|
|
If B<{}> is not used the arguments will be appended to the
|
2010-12-06 23:30:08 +00:00
|
|
|
line. If B<{}> is used multiple times each B<{}> will be replaced
|
|
|
|
with all the arguments.
|
|
|
|
|
|
|
|
Support for B<-m> with B<--sshlogin> is limited and may fail.
|
|
|
|
|
|
|
|
See also B<-X> for context replace. If in doubt use B<-X> as that will
|
|
|
|
most likely do what is needed.
|
|
|
|
|
|
|
|
|
2011-08-21 23:01:57 +00:00
|
|
|
=item B<--minversion> I<version>
|
2011-06-05 16:27:50 +00:00
|
|
|
|
|
|
|
Print the version GNU B<parallel> and exit. If the current version of
|
|
|
|
GNU B<parallel> is less than I<version> the exit code is
|
|
|
|
255. Otherwise it is 0.
|
|
|
|
|
|
|
|
This is useful for scripts that depend on features only available from
|
|
|
|
a certain version of GNU B<parallel>.
|
|
|
|
|
|
|
|
|
2014-02-22 10:32:42 +00:00
|
|
|
=item B<--nonall>
|
2011-06-07 20:57:50 +00:00
|
|
|
|
|
|
|
B<--onall> with no arguments. Run the command on all computers given
|
|
|
|
with B<--sshlogin> but take no arguments. GNU B<parallel> will log
|
|
|
|
into B<--jobs> number of computers in parallel and run the job on the
|
|
|
|
computer. B<-j> adjusts how many computers to log into in parallel.
|
|
|
|
|
|
|
|
This is useful for running the same command (e.g. uptime) on a list of
|
|
|
|
servers.
|
|
|
|
|
|
|
|
|
2014-02-22 10:32:42 +00:00
|
|
|
=item B<--onall>
|
2011-05-26 21:19:58 +00:00
|
|
|
|
|
|
|
Run all the jobs on all computers given with B<--sshlogin>. GNU
|
|
|
|
B<parallel> will log into B<--jobs> number of computers in parallel
|
|
|
|
and run one job at a time on the computer. The order of the jobs will
|
2011-06-07 20:57:50 +00:00
|
|
|
not be changed, but some computers may finish before others. B<-j>
|
|
|
|
adjusts how many computers to log into in parallel.
|
2011-05-26 21:19:58 +00:00
|
|
|
|
2011-05-28 14:33:22 +00:00
|
|
|
When using B<--group> the output will be grouped by each server, so
|
|
|
|
all the output from one server will be grouped together.
|
|
|
|
|
2011-05-26 21:19:58 +00:00
|
|
|
|
2014-01-22 02:06:39 +00:00
|
|
|
=item B<--output-as-files>
|
2011-01-22 22:40:15 +00:00
|
|
|
|
2014-01-22 02:06:39 +00:00
|
|
|
=item B<--outputasfiles>
|
2011-01-22 22:40:15 +00:00
|
|
|
|
2014-01-22 02:06:39 +00:00
|
|
|
=item B<--files>
|
2011-01-22 22:40:15 +00:00
|
|
|
|
|
|
|
Instead of printing the output to stdout (standard output) the output
|
|
|
|
of each job is saved in a file and the filename is then printed.
|
|
|
|
|
|
|
|
|
2013-10-21 20:31:52 +00:00
|
|
|
=item B<--pipe>
|
2011-01-22 22:40:15 +00:00
|
|
|
|
2013-10-21 20:31:52 +00:00
|
|
|
=item B<--spreadstdin>
|
2011-01-22 22:40:15 +00:00
|
|
|
|
2011-05-26 21:19:58 +00:00
|
|
|
Spread input to jobs on stdin (standard input). Read a block of data
|
|
|
|
from stdin (standard input) and give one block of data as input to one
|
|
|
|
job.
|
2011-01-22 22:40:15 +00:00
|
|
|
|
2011-01-27 21:56:59 +00:00
|
|
|
The block size is determined by B<--block>. The strings B<--recstart>
|
|
|
|
and B<--recend> tell GNU B<parallel> how a record starts and/or
|
|
|
|
ends. The block read will have the final partial record removed before
|
|
|
|
the block is passed on to the job. The partial record will be
|
2011-01-22 22:40:15 +00:00
|
|
|
prepended to next block.
|
|
|
|
|
|
|
|
If B<--recstart> is given this will be used to split at record start.
|
|
|
|
|
|
|
|
If B<--recend> is given this will be used to split at record end.
|
|
|
|
|
|
|
|
If both B<--recstart> and B<--recend> are given both will have to
|
|
|
|
match to find a split position.
|
|
|
|
|
|
|
|
If neither B<--recstart> nor B<--recend> are given B<--recend>
|
|
|
|
defaults to '\n'. To have no record separator use B<--recend "">.
|
|
|
|
|
2011-01-27 21:56:59 +00:00
|
|
|
B<--files> is often used with B<--pipe>.
|
2011-01-22 22:40:15 +00:00
|
|
|
|
2014-04-22 10:03:41 +00:00
|
|
|
See also: B<--recstart>, B<--recend>, B<--fifo>, B<--cat>, B<--pipepart>.
|
|
|
|
|
|
|
|
|
2014-07-14 16:25:45 +00:00
|
|
|
=item B<--pipepart> (beta testing)
|
2014-04-22 10:03:41 +00:00
|
|
|
|
|
|
|
Pipe parts of a physical file. B<--pipepart> works similar to
|
|
|
|
B<--pipe>, but is much faster. It has a few limitations:
|
|
|
|
|
|
|
|
=over 3
|
|
|
|
|
|
|
|
=item Z<>*
|
|
|
|
|
|
|
|
The file must be a physical (seekable) file and must be given using B<-a> or B<::::>.
|
|
|
|
|
|
|
|
=item Z<>*
|
|
|
|
|
2014-07-15 16:02:58 +00:00
|
|
|
Record counting (B<-N>) and line counting (B<-L>/B<-l>) do not work.
|
2014-04-22 10:03:41 +00:00
|
|
|
|
|
|
|
=back
|
|
|
|
|
2011-01-22 22:40:15 +00:00
|
|
|
|
2013-01-21 22:59:44 +00:00
|
|
|
=item B<--plain>
|
2012-08-08 19:25:18 +00:00
|
|
|
|
2014-03-21 21:39:54 +00:00
|
|
|
Ignore any B<--profile>, $PARALLEL, and ~/.parallel/config to get full
|
|
|
|
control on the command line (used by GNU B<parallel> internally when
|
|
|
|
called with B<--sshlogin>).
|
2012-08-08 16:38:29 +00:00
|
|
|
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=item B<--progress>
|
|
|
|
|
|
|
|
Show progress of computations. List the computers involved in the task
|
|
|
|
with number of CPU cores detected and the max number of jobs to
|
|
|
|
run. After that show progress for each computer: number of running
|
|
|
|
jobs, number of completed jobs, and percentage of all jobs done by
|
|
|
|
this computer. The percentage will only be available after all jobs
|
|
|
|
have been scheduled as GNU B<parallel> only read the next job when
|
|
|
|
ready to schedule it - this is to avoid wasting time and memory by
|
|
|
|
reading everything at startup.
|
|
|
|
|
|
|
|
By sending GNU B<parallel> SIGUSR2 you can toggle turning on/off
|
|
|
|
B<--progress> on a running GNU B<parallel> process.
|
|
|
|
|
2013-08-14 18:11:00 +00:00
|
|
|
See also B<--eta>.
|
2011-06-16 12:47:55 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2013-06-22 12:50:48 +00:00
|
|
|
=item B<--max-args>=I<max-args>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2013-06-22 12:50:48 +00:00
|
|
|
=item B<-n> I<max-args>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Use at most I<max-args> arguments per command line. Fewer than
|
|
|
|
I<max-args> arguments will be used if the size (see the B<-s> option)
|
|
|
|
is exceeded, unless the B<-x> option is given, in which case
|
|
|
|
GNU B<parallel> will exit.
|
|
|
|
|
2011-02-23 15:22:08 +00:00
|
|
|
B<-n 0> means read one argument, but insert 0 arguments on the command
|
|
|
|
line.
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
Implies B<-X> unless B<-m> is set.
|
|
|
|
|
|
|
|
|
2013-06-22 12:50:48 +00:00
|
|
|
=item B<--max-replace-args>=I<max-args>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2013-06-22 12:50:48 +00:00
|
|
|
=item B<-N> I<max-args>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Use at most I<max-args> arguments per command line. Like B<-n> but
|
|
|
|
also makes replacement strings B<{1}> .. B<{>I<max-args>B<}> that
|
|
|
|
represents argument 1 .. I<max-args>. If too few args the B<{>I<n>B<}> will
|
|
|
|
be empty.
|
|
|
|
|
2011-02-23 15:22:08 +00:00
|
|
|
B<-N 0> means read one argument, but insert 0 arguments on the command
|
|
|
|
line.
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
This will set the owner of the homedir to the user:
|
|
|
|
|
2011-10-13 21:58:02 +00:00
|
|
|
B<tr ':' '\n' < /etc/passwd | parallel -N7 chown {1} {6}>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-06-11 23:19:29 +00:00
|
|
|
Implies B<-X> unless B<-m> or B<--pipe> is set.
|
2011-01-24 19:06:30 +00:00
|
|
|
|
|
|
|
When used with B<--pipe> B<-N> is the number of records to read. This
|
2013-03-03 12:45:31 +00:00
|
|
|
is somewhat slower than B<--block>.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
|
|
|
=item B<--max-line-length-allowed>
|
|
|
|
|
2012-03-12 22:38:38 +00:00
|
|
|
Print the maximal number of characters allowed on the command line and
|
2010-12-06 23:30:08 +00:00
|
|
|
exit (used by GNU B<parallel> itself to determine the line length
|
|
|
|
on remote computers).
|
|
|
|
|
|
|
|
|
|
|
|
=item B<--number-of-cpus>
|
|
|
|
|
|
|
|
Print the number of physical CPUs and exit (used by GNU B<parallel>
|
|
|
|
itself to determine the number of physical CPUs on remote computers).
|
|
|
|
|
|
|
|
|
|
|
|
=item B<--number-of-cores>
|
|
|
|
|
|
|
|
Print the number of CPU cores and exit (used by GNU B<parallel> itself
|
|
|
|
to determine the number of CPU cores on remote computers).
|
|
|
|
|
|
|
|
|
2014-01-22 02:06:39 +00:00
|
|
|
=item B<--no-notice>
|
2013-11-22 22:31:46 +00:00
|
|
|
|
|
|
|
Do not display citation notice. A citation notice is printed on stderr
|
|
|
|
(standard error) only if stderr (standard error) is a terminal, the
|
|
|
|
user has not specified B<--no-notice>, and the user has not run
|
|
|
|
B<--bibtex> once.
|
|
|
|
|
|
|
|
|
2011-03-01 22:04:15 +00:00
|
|
|
=item B<--nice> I<niceness>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Run the command at this niceness. For simple commands you can just add
|
|
|
|
B<nice> in front of the command. But if the command consists of more
|
|
|
|
sub commands (Like: ls|wc) then prepending B<nice> will not always
|
|
|
|
work. B<--nice> will make sure all sub commands are niced.
|
|
|
|
|
|
|
|
|
|
|
|
=item B<--interactive>
|
|
|
|
|
|
|
|
=item B<-p>
|
|
|
|
|
|
|
|
Prompt the user about whether to run each command line and read a line
|
|
|
|
from the terminal. Only run the command line if the response starts
|
|
|
|
with 'y' or 'Y'. Implies B<-t>.
|
|
|
|
|
|
|
|
|
2014-07-14 16:25:45 +00:00
|
|
|
=item B<--parens> I<parensstring> (alpha testing)
|
|
|
|
|
|
|
|
Use to define start and end parenthesis for B<{= perl expression =}>. The
|
|
|
|
left and the right parenthesis can be multiple characters and are
|
|
|
|
assumed to be the same length. The default is B<{==}> giving
|
|
|
|
B<{=> as the start parenthesis and B<=}> as the end parenthesis.
|
|
|
|
|
|
|
|
Another useful setting is B<,,,,> which would make both parenthesis
|
|
|
|
B<,,>:
|
|
|
|
|
|
|
|
parallel --parens ,,,, echo foo is ,,s/I/O/g,, ::: FII
|
|
|
|
|
|
|
|
See also: B<--rpl> B<{= perl expression =}>
|
|
|
|
|
|
|
|
|
2011-12-21 23:17:49 +00:00
|
|
|
=item B<--profile> I<profilename>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-12-21 23:17:49 +00:00
|
|
|
=item B<-J> I<profilename>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Use profile I<profilename> for options. This is useful if you want to
|
|
|
|
have multiple profiles. You could have one profile for running jobs in
|
2010-12-21 17:08:16 +00:00
|
|
|
parallel on the local computer and a different profile for running jobs
|
|
|
|
on remote computers. See the section PROFILE FILES for examples.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
I<profilename> corresponds to the file ~/.parallel/I<profilename>.
|
|
|
|
|
2011-10-13 21:58:02 +00:00
|
|
|
You can give multiple profiles by repeating B<--profile>. If parts of
|
|
|
|
the profiles conflict, the later ones will be used.
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
Default: config
|
|
|
|
|
2011-10-13 21:58:02 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=item B<--quote>
|
|
|
|
|
|
|
|
=item B<-q>
|
|
|
|
|
|
|
|
Quote I<command>. This will quote the command line so special
|
|
|
|
characters are not interpreted by the shell. See the section
|
|
|
|
QUOTING. Most people will never need this. Quoting is disabled by
|
|
|
|
default.
|
|
|
|
|
|
|
|
|
|
|
|
=item B<--no-run-if-empty>
|
|
|
|
|
|
|
|
=item B<-r>
|
|
|
|
|
|
|
|
If the stdin (standard input) only contains whitespace, do not run the command.
|
|
|
|
|
2011-09-09 19:15:00 +00:00
|
|
|
If used with B<--pipe> this is slow.
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2014-05-22 12:53:33 +00:00
|
|
|
=item B<--noswap>
|
|
|
|
|
|
|
|
Do not start new jobs on a given computer if there is both swap-in and
|
|
|
|
swap-out activity.
|
|
|
|
|
|
|
|
The swap activity is only sampled every 10 seconds as the sampling
|
|
|
|
takes 1 second to do.
|
|
|
|
|
|
|
|
Swap activity is computed as (swap-in)*(swap-out) which in practice is
|
|
|
|
a good value: swapping out is not a problem, swapping in is not a
|
|
|
|
problem, but both swapping in and out usually indicates a problem.
|
|
|
|
|
|
|
|
|
2013-10-21 20:31:52 +00:00
|
|
|
=item B<--record-env>
|
2013-08-14 18:11:00 +00:00
|
|
|
|
2013-08-15 17:38:39 +00:00
|
|
|
Record current environment variables in ~/.parallel/ignored_vars. This
|
2013-08-14 18:11:00 +00:00
|
|
|
is useful before using B<--env _>.
|
|
|
|
|
|
|
|
See also B<--env>.
|
|
|
|
|
|
|
|
|
2011-05-05 18:50:53 +00:00
|
|
|
=item B<--recstart> I<startstring>
|
2011-01-22 22:40:15 +00:00
|
|
|
|
2011-05-05 18:50:53 +00:00
|
|
|
=item B<--recend> I<endstring>
|
2011-01-22 22:40:15 +00:00
|
|
|
|
2011-01-27 21:56:59 +00:00
|
|
|
If B<--recstart> is given I<startstring> will be used to split at record start.
|
2011-01-22 22:40:15 +00:00
|
|
|
|
2011-01-27 21:56:59 +00:00
|
|
|
If B<--recend> is given I<endstring> will be used to split at record end.
|
2011-01-22 22:40:15 +00:00
|
|
|
|
2011-12-09 22:25:20 +00:00
|
|
|
If both B<--recstart> and B<--recend> are given the combined string
|
|
|
|
I<endstring>I<startstring> will have to match to find a split
|
2011-02-02 15:36:29 +00:00
|
|
|
position. This is useful if either I<startstring> or I<endstring>
|
2011-01-24 19:06:30 +00:00
|
|
|
match in the middle of a record.
|
2011-01-22 22:40:15 +00:00
|
|
|
|
2011-01-24 19:06:30 +00:00
|
|
|
If neither B<--recstart> nor B<--recend> are given then B<--recend>
|
2011-01-22 22:40:15 +00:00
|
|
|
defaults to '\n'. To have no record separator use B<--recend "">.
|
|
|
|
|
2011-01-24 19:06:30 +00:00
|
|
|
B<--recstart> and B<--recend> are used with B<--pipe>.
|
2011-01-22 22:40:15 +00:00
|
|
|
|
2011-01-27 21:56:59 +00:00
|
|
|
Use B<--regexp> to interpret B<--recstart> and B<--recend> as regular
|
|
|
|
expressions. This is slow, however.
|
|
|
|
|
|
|
|
|
2011-08-21 23:01:57 +00:00
|
|
|
=item B<--regexp>
|
2011-01-27 21:56:59 +00:00
|
|
|
|
|
|
|
Use B<--regexp> to interpret B<--recstart> and B<--recend> as regular
|
|
|
|
expressions. This is slow, however.
|
|
|
|
|
2011-01-22 22:40:15 +00:00
|
|
|
|
2011-05-05 18:50:53 +00:00
|
|
|
=item B<--remove-rec-sep>
|
2011-01-25 23:34:08 +00:00
|
|
|
|
2011-05-05 18:50:53 +00:00
|
|
|
=item B<--removerecsep>
|
2011-01-25 23:34:08 +00:00
|
|
|
|
2011-05-05 18:50:53 +00:00
|
|
|
=item B<--rrs>
|
2011-01-25 23:34:08 +00:00
|
|
|
|
|
|
|
Remove the text matched by B<--recstart> and B<--recend> before piping
|
|
|
|
it to the command.
|
|
|
|
|
|
|
|
Only used with B<--pipe>.
|
|
|
|
|
2011-02-02 15:36:29 +00:00
|
|
|
|
2013-10-21 20:31:52 +00:00
|
|
|
=item B<--results> I<prefix>
|
2012-09-30 22:04:52 +00:00
|
|
|
|
2013-10-21 20:31:52 +00:00
|
|
|
=item B<--res> I<prefix>
|
2012-09-30 22:04:52 +00:00
|
|
|
|
2012-12-22 06:02:06 +00:00
|
|
|
Save the output into files. The files will be stored in a directory tree
|
|
|
|
rooted at I<prefix>. Within this directory tree, each command will result
|
|
|
|
in two files: I<prefix>/<ARGS>/stdout and I<prefix>/<ARGS>/stderr, where
|
|
|
|
<ARGS> is a sequence of directories representing the header of the input
|
|
|
|
source (if using B<--header :>) or the number of the input source and
|
|
|
|
corresponding values.
|
2012-09-30 22:04:52 +00:00
|
|
|
|
|
|
|
E.g:
|
|
|
|
|
2012-12-22 06:02:06 +00:00
|
|
|
parallel --header : --results foo echo {a} {b} ::: a I II ::: b III IIII
|
2012-09-30 22:04:52 +00:00
|
|
|
|
2012-10-17 00:09:03 +00:00
|
|
|
will generate the files:
|
2012-09-30 22:04:52 +00:00
|
|
|
|
2012-12-22 06:02:06 +00:00
|
|
|
foo/a/I/b/III/stderr
|
|
|
|
foo/a/I/b/III/stdout
|
2014-04-19 08:11:32 +00:00
|
|
|
foo/a/I/b/IIII/stderr
|
2012-12-22 06:02:06 +00:00
|
|
|
foo/a/I/b/IIII/stdout
|
2014-04-19 08:11:32 +00:00
|
|
|
foo/a/II/b/III/stderr
|
2012-12-22 06:02:06 +00:00
|
|
|
foo/a/II/b/III/stdout
|
2014-04-19 08:11:32 +00:00
|
|
|
foo/a/II/b/IIII/stderr
|
2012-12-22 06:02:06 +00:00
|
|
|
foo/a/II/b/IIII/stdout
|
2012-09-30 22:04:52 +00:00
|
|
|
|
|
|
|
and
|
|
|
|
|
2012-12-22 06:02:06 +00:00
|
|
|
parallel --results foo echo {1} {2} ::: I II ::: III IIII
|
2012-09-30 22:04:52 +00:00
|
|
|
|
2012-10-17 00:09:03 +00:00
|
|
|
will generate the files:
|
2012-09-30 22:04:52 +00:00
|
|
|
|
2012-12-22 06:02:06 +00:00
|
|
|
foo/1/I/2/III/stderr
|
|
|
|
foo/1/I/2/III/stdout
|
2014-04-19 08:11:32 +00:00
|
|
|
foo/1/I/2/IIII/stderr
|
2012-12-22 06:02:06 +00:00
|
|
|
foo/1/I/2/IIII/stdout
|
2014-04-19 08:11:32 +00:00
|
|
|
foo/1/II/2/III/stderr
|
2012-12-22 06:02:06 +00:00
|
|
|
foo/1/II/2/III/stdout
|
2014-04-19 08:11:32 +00:00
|
|
|
foo/1/II/2/IIII/stderr
|
2012-12-22 06:02:06 +00:00
|
|
|
foo/1/II/2/IIII/stdout
|
2012-09-30 22:04:52 +00:00
|
|
|
|
|
|
|
See also B<--files>, B<--header>, B<--joblog>.
|
|
|
|
|
|
|
|
|
2014-01-22 02:06:39 +00:00
|
|
|
=item B<--resume>
|
2012-01-07 02:29:48 +00:00
|
|
|
|
2013-11-22 22:31:46 +00:00
|
|
|
Resumes from the last unfinished job. By reading B<--joblog> or the
|
|
|
|
B<--results> dir GNU B<parallel> will figure out the last unfinished
|
|
|
|
job and continue from there. As GNU B<parallel> only looks at the
|
|
|
|
sequence numbers in B<--joblog> then the input, the command, and
|
|
|
|
B<--joblog> all have to remain unchanged; otherwise GNU B<parallel>
|
|
|
|
may run wrong commands.
|
2012-01-07 02:29:48 +00:00
|
|
|
|
2013-11-22 22:31:46 +00:00
|
|
|
See also B<--joblog>, B<--results>, B<--resume-failed>.
|
2013-02-17 23:59:59 +00:00
|
|
|
|
|
|
|
|
2013-05-21 22:29:41 +00:00
|
|
|
=item B<--resume-failed>
|
2013-02-17 23:59:59 +00:00
|
|
|
|
|
|
|
Retry all failed and resume from the last unfinished job. By reading
|
|
|
|
B<--joblog> GNU B<parallel> will figure out the failed jobs and run
|
|
|
|
those again. After that it will resume last unfinished job and
|
|
|
|
continue from there. As GNU B<parallel> only looks at the sequence
|
|
|
|
numbers in B<--joblog> then the input, the command, and B<--joblog>
|
|
|
|
all have to remain unchanged; otherwise GNU B<parallel> may run wrong
|
|
|
|
commands.
|
|
|
|
|
2013-08-14 18:11:00 +00:00
|
|
|
See also B<--joblog>, B<--resume>.
|
2012-01-07 02:29:48 +00:00
|
|
|
|
|
|
|
|
2011-05-05 18:50:53 +00:00
|
|
|
=item B<--retries> I<n>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2014-05-31 06:42:56 +00:00
|
|
|
If a job fails, retry it on another computer on which it has not
|
|
|
|
failed. Do this I<n> times. If there are fewer than I<n> computers in
|
|
|
|
B<--sshlogin> GNU B<parallel> will re-use all the computers. This is
|
|
|
|
useful if some jobs fail for no apparent reason (such as network
|
|
|
|
failure).
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
2014-01-22 02:06:39 +00:00
|
|
|
=item B<--return> I<filename>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2010-12-21 17:08:16 +00:00
|
|
|
Transfer files from remote computers. B<--return> is used with
|
|
|
|
B<--sshlogin> when the arguments are files on the remote computers. When
|
2010-12-06 23:30:08 +00:00
|
|
|
processing is done the file I<filename> will be transferred
|
2010-12-21 17:08:16 +00:00
|
|
|
from the remote computer using B<rsync> and will be put relative to
|
2010-12-06 23:30:08 +00:00
|
|
|
the default login dir. E.g.
|
|
|
|
|
|
|
|
echo foo/bar.txt | parallel \
|
|
|
|
--sshlogin server.example.com --return {.}.out touch {.}.out
|
|
|
|
|
2010-12-21 17:08:16 +00:00
|
|
|
This will transfer the file I<$HOME/foo/bar.out> from the computer
|
2010-12-06 23:30:08 +00:00
|
|
|
I<server.example.com> to the file I<foo/bar.out> after running
|
|
|
|
B<touch foo/bar.out> on I<server.example.com>.
|
|
|
|
|
|
|
|
echo /tmp/foo/bar.txt | parallel \
|
|
|
|
--sshlogin server.example.com --return {.}.out touch {.}.out
|
|
|
|
|
2010-12-21 17:08:16 +00:00
|
|
|
This will transfer the file I</tmp/foo/bar.out> from the computer
|
2010-12-06 23:30:08 +00:00
|
|
|
I<server.example.com> to the file I</tmp/foo/bar.out> after running
|
|
|
|
B<touch /tmp/foo/bar.out> on I<server.example.com>.
|
|
|
|
|
|
|
|
Multiple files can be transferred by repeating the options multiple
|
|
|
|
times:
|
|
|
|
|
|
|
|
echo /tmp/foo/bar.txt | \
|
|
|
|
parallel --sshlogin server.example.com \
|
|
|
|
--return {.}.out --return {.}.out2 touch {.}.out {.}.out2
|
|
|
|
|
|
|
|
B<--return> is often used with B<--transfer> and B<--cleanup>.
|
|
|
|
|
|
|
|
B<--return> is ignored when used with B<--sshlogin :> or when not used
|
|
|
|
with B<--sshlogin>.
|
|
|
|
|
|
|
|
|
2013-10-21 20:31:52 +00:00
|
|
|
=item B<--round-robin>
|
2013-07-20 15:33:26 +00:00
|
|
|
|
2013-10-21 20:31:52 +00:00
|
|
|
=item B<--round>
|
2013-07-20 15:33:26 +00:00
|
|
|
|
|
|
|
Normally B<--pipe> will give a single block to each instance of the
|
|
|
|
command. With B<--round-robin> all blocks will at random be written to
|
|
|
|
commands already running. This is useful if the command takes a long
|
|
|
|
time to initialize.
|
|
|
|
|
|
|
|
B<--keep-order> will not work with B<--round-robin> as it is
|
|
|
|
impossible to track which input block corresponds to which output.
|
|
|
|
|
|
|
|
|
2014-07-14 16:25:45 +00:00
|
|
|
=item B<--rpl> 'I<tag> I<perl expression>' (alpha testing)
|
|
|
|
|
|
|
|
Use I<tag> as a replacement string for I<perl expression>. This makes
|
|
|
|
it possible to define your own replacement strings. GNU B<parallel>'s
|
|
|
|
7 replacement strings are implemented as:
|
|
|
|
|
|
|
|
--rpl '{} '
|
|
|
|
--rpl '{#} $_=$job->seq()'
|
|
|
|
--rpl '{%} $_=$job->slot()'
|
|
|
|
--rpl '{/} s:.*/::'
|
|
|
|
--rpl '{//} $Global::use{"File::Basename"} ||= eval "use File::Basename; 1;"; $_ = dirname($_);'
|
|
|
|
--rpl '{/.} s:.*/::; s:\.[^/.]+$::;'
|
|
|
|
--rpl '{.} s:\.[^/.]+$::'
|
|
|
|
|
|
|
|
If the user defined replacement string starts with '{' it can also be
|
|
|
|
used as a positional replacement string (like B<{2.}>).
|
|
|
|
|
|
|
|
It is recommended to only change $_ but you have full access to all
|
|
|
|
of GNU B<parallel>'s internal functions and data structures.
|
|
|
|
|
2014-07-15 13:58:31 +00:00
|
|
|
Here are a few examples:
|
|
|
|
|
|
|
|
Remove 2 extensions (e.g. .tar.gz)
|
|
|
|
--rpl '{..} s:\.[^/.]+$::;s:\.[^/.]+$::;'
|
|
|
|
Keep only the extension
|
|
|
|
--rpl '{ext} s:.*\.::'
|
|
|
|
Is the job sequence even or odd?
|
|
|
|
--rpl '{odd} $_=$job->seq()%2?"odd":"even"'
|
|
|
|
|
2014-07-14 16:25:45 +00:00
|
|
|
See also: B<{= perl expression =}> B<--parens>
|
|
|
|
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=item B<--max-chars>=I<max-chars>
|
|
|
|
|
|
|
|
=item B<-s> I<max-chars>
|
|
|
|
|
|
|
|
Use at most I<max-chars> characters per command line, including the
|
|
|
|
command and initial-arguments and the terminating nulls at the ends of
|
|
|
|
the argument strings. The largest allowed value is system-dependent,
|
|
|
|
and is calculated as the argument length limit for exec, less the size
|
|
|
|
of your environment. The default value is the maximum.
|
|
|
|
|
|
|
|
Implies B<-X> unless B<-m> is set.
|
|
|
|
|
|
|
|
|
|
|
|
=item B<--show-limits>
|
|
|
|
|
|
|
|
Display the limits on the command-line length which are imposed by the
|
|
|
|
operating system and the B<-s> option. Pipe the input from /dev/null
|
|
|
|
(and perhaps specify --no-run-if-empty) if you don't want GNU B<parallel>
|
|
|
|
to do anything.
|
|
|
|
|
|
|
|
|
2014-07-14 16:25:45 +00:00
|
|
|
=item B<--semaphore>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Work as a counting semaphore. B<--semaphore> will cause GNU
|
|
|
|
B<parallel> to start I<command> in the background. When the number of
|
|
|
|
simultaneous jobs is reached, GNU B<parallel> will wait for one of
|
|
|
|
these to complete before starting another command.
|
|
|
|
|
|
|
|
B<--semaphore> implies B<--bg> unless B<--fg> is specified.
|
|
|
|
|
|
|
|
B<--semaphore> implies B<--semaphorename `tty`> unless
|
|
|
|
B<--semaphorename> is specified.
|
|
|
|
|
|
|
|
Used with B<--fg>, B<--wait>, and B<--semaphorename>.
|
|
|
|
|
|
|
|
The command B<sem> is an alias for B<parallel --semaphore>.
|
|
|
|
|
2013-08-14 18:11:00 +00:00
|
|
|
See also B<man sem>.
|
2011-06-25 07:22:05 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
=item B<--semaphorename> I<name>
|
|
|
|
|
|
|
|
=item B<--id> I<name>
|
|
|
|
|
2011-06-25 07:22:05 +00:00
|
|
|
Use B<name> as the name of the semaphore. Default is the name of the
|
|
|
|
controlling tty (output from B<tty>).
|
|
|
|
|
|
|
|
The default normally works as expected when used interactively, but
|
|
|
|
when used in a script I<name> should be set. I<$$> or I<my_task_name>
|
|
|
|
are often a good value.
|
|
|
|
|
|
|
|
The semaphore is stored in ~/.parallel/semaphores/
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Implies B<--semaphore>.
|
|
|
|
|
2013-08-14 18:11:00 +00:00
|
|
|
See also B<man sem>.
|
2011-06-25 07:22:05 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2014-07-14 16:25:45 +00:00
|
|
|
=item B<--semaphoretimeout> I<secs>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
If the semaphore is not released within secs seconds, take it anyway.
|
|
|
|
|
|
|
|
Implies B<--semaphore>.
|
|
|
|
|
2013-08-14 18:11:00 +00:00
|
|
|
See also B<man sem>.
|
2011-06-25 07:22:05 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-04-08 19:57:19 +00:00
|
|
|
=item B<--seqreplace> I<replace-str>
|
|
|
|
|
|
|
|
Use the replacement string I<replace-str> instead of B<{#}> for
|
|
|
|
job sequence number.
|
|
|
|
|
|
|
|
|
2014-06-23 01:35:59 +00:00
|
|
|
=item B<--shebang> (alpha testing)
|
2012-11-25 20:24:27 +00:00
|
|
|
|
2014-06-23 01:35:59 +00:00
|
|
|
=item B<--hashbang> (alpha testing)
|
2012-11-25 20:24:27 +00:00
|
|
|
|
|
|
|
GNU B<parallel> can be called as a shebang (#!) command as the first
|
|
|
|
line of a script. The content of the file will be treated as
|
|
|
|
inputsource.
|
|
|
|
|
|
|
|
Like this:
|
|
|
|
|
|
|
|
#!/usr/bin/parallel --shebang -r traceroute
|
|
|
|
|
|
|
|
foss.org.my
|
|
|
|
debian.org
|
|
|
|
freenetproject.org
|
|
|
|
|
|
|
|
B<--shebang> must be set as the first option.
|
|
|
|
|
2014-06-23 00:10:53 +00:00
|
|
|
On FreeBSD B<env> is needed:
|
|
|
|
|
|
|
|
#!/usr/bin/env -S parallel --shebang -r traceroute
|
|
|
|
|
|
|
|
foss.org.my
|
|
|
|
debian.org
|
|
|
|
freenetproject.org
|
|
|
|
|
2012-11-25 20:24:27 +00:00
|
|
|
|
2014-06-23 01:35:59 +00:00
|
|
|
=item B<--shebang-wrap> (alpha testing)
|
2012-11-25 20:24:27 +00:00
|
|
|
|
|
|
|
GNU B<parallel> can parallelize scripts by wrapping the shebang
|
|
|
|
line. If the program can be run like this:
|
|
|
|
|
|
|
|
cat arguments | parallel the_program
|
|
|
|
|
|
|
|
then the script can be changed to:
|
|
|
|
|
|
|
|
#!/usr/bin/parallel --shebang-wrap /the/original/parser --with-options
|
|
|
|
|
|
|
|
E.g.
|
|
|
|
|
|
|
|
#!/usr/bin/parallel --shebang-wrap /usr/bin/python
|
|
|
|
|
|
|
|
If the program can be run like this:
|
|
|
|
|
|
|
|
cat data | parallel --pipe the_program
|
|
|
|
|
|
|
|
then the script can be changed to:
|
|
|
|
|
|
|
|
#!/usr/bin/parallel --shebang-wrap --pipe /the/original/parser --with-options
|
|
|
|
|
|
|
|
E.g.
|
|
|
|
|
|
|
|
#!/usr/bin/parallel --shebang-wrap --pipe /usr/bin/perl -w
|
|
|
|
|
|
|
|
B<--shebang-wrap> must be set as the first option.
|
|
|
|
|
|
|
|
|
2011-09-09 19:15:00 +00:00
|
|
|
=item B<--shellquote>
|
|
|
|
|
2011-09-16 13:26:17 +00:00
|
|
|
Does not run the command but quotes it. Useful for making quoted
|
2011-09-09 19:15:00 +00:00
|
|
|
composed commands for GNU B<parallel>.
|
|
|
|
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=item B<--skip-first-line>
|
|
|
|
|
|
|
|
Do not use the first line of input (used by GNU B<parallel> itself
|
|
|
|
when called with B<--shebang>).
|
|
|
|
|
|
|
|
|
2013-06-22 12:50:48 +00:00
|
|
|
=item B<--sshdelay> I<secs>
|
2013-01-21 22:09:47 +00:00
|
|
|
|
|
|
|
Delay starting next ssh by I<secs> seconds. GNU B<parallel> will pause
|
|
|
|
I<secs> seconds after starting each ssh. I<secs> can be less than 1
|
|
|
|
seconds.
|
|
|
|
|
|
|
|
|
2011-08-21 23:01:57 +00:00
|
|
|
=item B<-S> I<[ncpu/]sshlogin[,[ncpu/]sshlogin[,...]]>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-08-21 23:01:57 +00:00
|
|
|
=item B<--sshlogin> I<[ncpu/]sshlogin[,[ncpu/]sshlogin[,...]]>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2010-12-21 17:08:16 +00:00
|
|
|
Distribute jobs to remote computers. The jobs will be run on a list of
|
|
|
|
remote computers. GNU B<parallel> will determine the number of CPU
|
|
|
|
cores on the remote computers and run the number of jobs as specified by
|
2010-12-06 23:30:08 +00:00
|
|
|
B<-j>. If the number I<ncpu> is given GNU B<parallel> will use this
|
|
|
|
number for number of CPU cores on the host. Normally I<ncpu> will not
|
|
|
|
be needed.
|
|
|
|
|
|
|
|
An I<sshlogin> is of the form:
|
|
|
|
|
2013-10-21 20:31:52 +00:00
|
|
|
[sshcommand [options]] [username@]hostname
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
The sshlogin must not require a password.
|
|
|
|
|
|
|
|
The sshlogin ':' is special, it means 'no ssh' and will therefore run
|
|
|
|
on the local computer.
|
|
|
|
|
|
|
|
The sshlogin '..' is special, it read sshlogins from ~/.parallel/sshloginfile
|
|
|
|
|
2011-06-03 12:53:14 +00:00
|
|
|
The sshlogin '-' is special, too, it read sshlogins from stdin
|
|
|
|
(standard input).
|
2011-05-26 21:19:58 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
To specify more sshlogins separate the sshlogins by comma or repeat
|
|
|
|
the options multiple times.
|
|
|
|
|
|
|
|
For examples: see B<--sshloginfile>.
|
|
|
|
|
|
|
|
The remote host must have GNU B<parallel> installed.
|
|
|
|
|
|
|
|
B<--sshlogin> is known to cause problems with B<-m> and B<-X>.
|
|
|
|
|
2011-01-22 22:40:15 +00:00
|
|
|
B<--sshlogin> is often used with B<--transfer>, B<--return>,
|
|
|
|
B<--cleanup>, and B<--trc>.
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2013-10-21 20:31:52 +00:00
|
|
|
=item B<--sshloginfile> I<filename>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2013-10-21 20:31:52 +00:00
|
|
|
=item B<--slf> I<filename>
|
2011-10-17 01:10:32 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
File with sshlogins. The file consists of sshlogins on separate
|
|
|
|
lines. Empty lines and lines starting with '#' are ignored. Example:
|
|
|
|
|
|
|
|
server.example.com
|
|
|
|
username@server2.example.com
|
|
|
|
8/my-8-core-server.example.com
|
|
|
|
2/my_other_username@my-dualcore.example.net
|
|
|
|
# This server has SSH running on port 2222
|
|
|
|
ssh -p 2222 server.example.net
|
|
|
|
4/ssh -p 2222 quadserver.example.net
|
|
|
|
# Use a different ssh program
|
|
|
|
myssh -p 2222 -l myusername hexacpu.example.net
|
|
|
|
# Use a different ssh program with default number of cores
|
|
|
|
//usr/local/bin/myssh -p 2222 -l myusername hexacpu.example.net
|
|
|
|
# Use a different ssh program with 6 cores
|
|
|
|
6//usr/local/bin/myssh -p 2222 -l myusername hexacpu.example.net
|
|
|
|
# Assume 16 cores on the local computer
|
|
|
|
16/:
|
|
|
|
|
|
|
|
When using a different ssh program the last argument must be the hostname.
|
|
|
|
|
2011-10-17 01:10:32 +00:00
|
|
|
Multiple B<--sshloginfile> are allowed.
|
|
|
|
|
2013-08-22 15:24:36 +00:00
|
|
|
GNU B<parallel> will first look for the file in current dir; if that
|
|
|
|
fails it look for the file in ~/.parallel.
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
The sshloginfile '..' is special, it read sshlogins from
|
|
|
|
~/.parallel/sshloginfile
|
|
|
|
|
2012-01-13 23:57:48 +00:00
|
|
|
The sshloginfile '.' is special, it read sshlogins from
|
|
|
|
/etc/parallel/sshloginfile
|
|
|
|
|
2011-06-03 12:53:14 +00:00
|
|
|
The sshloginfile '-' is special, too, it read sshlogins from stdin
|
|
|
|
(standard input).
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-08-21 23:01:57 +00:00
|
|
|
|
2014-07-14 16:25:45 +00:00
|
|
|
=item B<--slotreplace> I<replace-str>
|
2011-05-26 12:55:15 +00:00
|
|
|
|
2014-05-22 12:53:33 +00:00
|
|
|
Use the replacement string I<replace-str> instead of B<{%}> for
|
|
|
|
job slot number.
|
2011-05-26 12:55:15 +00:00
|
|
|
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=item B<--silent>
|
|
|
|
|
|
|
|
Silent. The job to be run will not be printed. This is the default.
|
|
|
|
Can be reversed with B<-v>.
|
|
|
|
|
|
|
|
|
2011-05-05 18:50:53 +00:00
|
|
|
=item B<--tty>
|
2010-12-21 17:08:16 +00:00
|
|
|
|
|
|
|
Open terminal tty. If GNU B<parallel> is used for starting an
|
|
|
|
interactive program then this option may be needed. It will start only
|
|
|
|
one job at a time (i.e. B<-j1>), not buffer the output (i.e. B<-u>),
|
|
|
|
and it will open a tty for the job. When the job is done, the next job
|
|
|
|
will get the tty.
|
|
|
|
|
|
|
|
|
2011-12-21 23:17:49 +00:00
|
|
|
=item B<--tag>
|
2011-09-24 00:16:06 +00:00
|
|
|
|
|
|
|
Tag lines with arguments. Each output line will be prepended with the
|
2011-10-10 20:14:55 +00:00
|
|
|
arguments and TAB (\t). When combined with B<--onall> or B<--nonall>
|
|
|
|
the lines will be prepended with the sshlogin instead.
|
|
|
|
|
|
|
|
B<--tag> is ignored when using B<-u>.
|
2011-09-24 00:16:06 +00:00
|
|
|
|
|
|
|
|
2012-05-06 10:43:40 +00:00
|
|
|
=item B<--tagstring> I<str>
|
|
|
|
|
|
|
|
Tag lines with a string. Each output line will be prepended with
|
|
|
|
I<str> and TAB (\t). I<str> can contain replacement strings such as
|
|
|
|
{}.
|
|
|
|
|
|
|
|
B<--tagstring> is ignored when using B<-u>, B<--onall>, and B<--nonall>.
|
|
|
|
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=item B<--tmpdir> I<dirname>
|
|
|
|
|
|
|
|
Directory for temporary files. GNU B<parallel> normally buffers output
|
|
|
|
into temporary files in /tmp. By setting B<--tmpdir> you can use a
|
|
|
|
different dir for the files. Setting B<--tmpdir> is equivalent to
|
|
|
|
setting $TMPDIR.
|
|
|
|
|
|
|
|
|
2014-07-16 09:07:41 +00:00
|
|
|
=item B<--tmux> (alpha testing)
|
|
|
|
|
|
|
|
Use B<tmux> for output. Start a B<tmux> session and run each job in a
|
|
|
|
window in that session. No other output will be produced.
|
|
|
|
|
|
|
|
|
2013-10-21 20:31:52 +00:00
|
|
|
=item B<--timeout> I<val>
|
2011-08-21 23:01:57 +00:00
|
|
|
|
2013-04-28 22:43:43 +00:00
|
|
|
Time out for command. If the command runs for longer than I<val>
|
2011-08-21 23:01:57 +00:00
|
|
|
seconds it will get killed with SIGTERM, followed by SIGTERM 200 ms
|
|
|
|
later, followed by SIGKILL 200 ms later.
|
|
|
|
|
2013-04-28 22:43:43 +00:00
|
|
|
If I<val> is followed by a % then the timeout will dynamically be
|
2013-06-26 14:07:14 +00:00
|
|
|
computed as a percentage of the median average runtime. Only values
|
2013-04-28 22:43:43 +00:00
|
|
|
> 100% will make sense.
|
|
|
|
|
2011-08-21 23:01:57 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=item B<--verbose>
|
|
|
|
|
|
|
|
=item B<-t>
|
|
|
|
|
2011-05-26 21:19:58 +00:00
|
|
|
Print the job to be run on stderr (standard error).
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2013-08-14 18:11:00 +00:00
|
|
|
See also B<-v>, B<-p>.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
2014-01-22 02:06:39 +00:00
|
|
|
=item B<--transfer>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2010-12-21 17:08:16 +00:00
|
|
|
Transfer files to remote computers. B<--transfer> is used with
|
2013-11-22 22:31:46 +00:00
|
|
|
B<--sshlogin> when the arguments are files and should be transferred
|
|
|
|
to the remote computers. The files will be transferred using B<rsync>
|
|
|
|
and will be put relative to the default work dir. If the path contains
|
|
|
|
/./ the remaining path will be relative to the work dir. E.g.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
echo foo/bar.txt | parallel \
|
|
|
|
--sshlogin server.example.com --transfer wc
|
|
|
|
|
2010-12-21 17:08:16 +00:00
|
|
|
This will transfer the file I<foo/bar.txt> to the computer
|
2010-12-06 23:30:08 +00:00
|
|
|
I<server.example.com> to the file I<$HOME/foo/bar.txt> before running
|
|
|
|
B<wc foo/bar.txt> on I<server.example.com>.
|
|
|
|
|
|
|
|
echo /tmp/foo/bar.txt | parallel \
|
|
|
|
--sshlogin server.example.com --transfer wc
|
|
|
|
|
2010-12-21 17:08:16 +00:00
|
|
|
This will transfer the file I<foo/bar.txt> to the computer
|
2010-12-06 23:30:08 +00:00
|
|
|
I<server.example.com> to the file I</tmp/foo/bar.txt> before running
|
|
|
|
B<wc /tmp/foo/bar.txt> on I<server.example.com>.
|
|
|
|
|
|
|
|
B<--transfer> is often used with B<--return> and B<--cleanup>.
|
|
|
|
|
|
|
|
B<--transfer> is ignored when used with B<--sshlogin :> or when not used with B<--sshlogin>.
|
|
|
|
|
|
|
|
|
|
|
|
=item B<--trc> I<filename>
|
|
|
|
|
|
|
|
Transfer, Return, Cleanup. Short hand for:
|
|
|
|
|
|
|
|
B<--transfer> B<--return> I<filename> B<--cleanup>
|
|
|
|
|
|
|
|
|
|
|
|
=item B<--trim> <n|l|r|lr|rl>
|
|
|
|
|
|
|
|
Trim white space in input.
|
|
|
|
|
|
|
|
=over 4
|
|
|
|
|
|
|
|
=item n
|
|
|
|
|
|
|
|
No trim. Input is not modified. This is the default.
|
|
|
|
|
|
|
|
=item l
|
|
|
|
|
|
|
|
Left trim. Remove white space from start of input. E.g. " a bc " -> "a bc ".
|
|
|
|
|
|
|
|
=item r
|
|
|
|
|
|
|
|
Right trim. Remove white space from end of input. E.g. " a bc " -> " a bc".
|
|
|
|
|
|
|
|
=item lr
|
|
|
|
|
|
|
|
=item rl
|
|
|
|
|
|
|
|
Both trim. Remove white space from both start and end of input. E.g. "
|
|
|
|
a bc " -> "a bc". This is the default if B<--colsep> is used.
|
|
|
|
|
|
|
|
=back
|
|
|
|
|
|
|
|
|
|
|
|
=item B<--ungroup>
|
|
|
|
|
|
|
|
=item B<-u>
|
|
|
|
|
2012-05-06 10:43:40 +00:00
|
|
|
Ungroup output. Output is printed as soon as possible and by passes
|
2012-05-06 10:54:51 +00:00
|
|
|
GNU B<parallel> internal processing. This may cause output from
|
|
|
|
different commands to be mixed thus should only be used if you do not
|
|
|
|
care about the output. Compare these:
|
2012-05-06 10:43:40 +00:00
|
|
|
|
|
|
|
B<parallel -j0 'sleep {};echo -n start{};sleep {};echo {}end' ::: 1 2 3 4>
|
|
|
|
|
|
|
|
B<parallel -u -j0 'sleep {};echo -n start{};sleep {};echo {}end' ::: 1 2 3 4>
|
|
|
|
|
|
|
|
It also disables B<--tag>. GNU B<parallel> runs faster with B<-u>. Can
|
|
|
|
be reversed with B<--group>.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2014-05-22 12:53:33 +00:00
|
|
|
See also: B<--line-buffer> B<--group>
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
=item B<--extensionreplace> I<replace-str>
|
|
|
|
|
2011-10-10 20:14:55 +00:00
|
|
|
=item B<--er> I<replace-str>
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
Use the replacement string I<replace-str> instead of {.} for input line without extension.
|
|
|
|
|
|
|
|
|
|
|
|
=item B<--use-cpus-instead-of-cores>
|
|
|
|
|
|
|
|
Count the number of physical CPUs instead of CPU cores. When computing
|
|
|
|
how many jobs to run simultaneously relative to the number of CPU cores
|
|
|
|
you can ask GNU B<parallel> to instead look at the number of physical
|
|
|
|
CPUs. This will make sense for computers that have hyperthreading as
|
|
|
|
two jobs running on one CPU with hyperthreading will run slower than
|
|
|
|
two jobs running on two physical CPUs. Some multi-core CPUs can run
|
|
|
|
faster if only one thread is running per physical CPU. Most users will
|
|
|
|
not need this option.
|
|
|
|
|
|
|
|
|
|
|
|
=item B<-v>
|
|
|
|
|
2011-05-26 21:19:58 +00:00
|
|
|
Verbose. Print the job to be run on stdout (standard output). Can be reversed
|
2010-12-15 23:12:02 +00:00
|
|
|
with B<--silent>. See also B<-t>.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Use B<-v> B<-v> to print the wrapping ssh command when running remotely.
|
|
|
|
|
|
|
|
|
|
|
|
=item B<--version>
|
|
|
|
|
|
|
|
=item B<-V>
|
|
|
|
|
|
|
|
Print the version GNU B<parallel> and exit.
|
|
|
|
|
|
|
|
|
2014-02-22 10:32:42 +00:00
|
|
|
=item B<--workdir> I<mydir>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2014-02-22 10:32:42 +00:00
|
|
|
=item B<--wd> I<mydir>
|
2011-10-10 20:14:55 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
Files transferred using B<--transfer> and B<--return> will be relative
|
2010-12-21 17:08:16 +00:00
|
|
|
to I<mydir> on remote computers, and the command will be executed in
|
2012-02-20 00:48:28 +00:00
|
|
|
the dir I<mydir>.
|
|
|
|
|
|
|
|
The special I<mydir> value B<...> will create working dirs under
|
|
|
|
B<~/.parallel/tmp/> on the remote computers. If B<--cleanup> is given
|
|
|
|
these dirs will be removed.
|
|
|
|
|
|
|
|
The special I<mydir> value B<.> uses the current working dir. If the
|
|
|
|
current working dir is beneath your home dir, the value B<.> is
|
|
|
|
treated as the relative path to your home dir. This means that if your
|
|
|
|
home dir is different on remote computers (e.g. if your login is
|
|
|
|
different) the relative path will still be relative to your home dir.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2013-02-22 14:01:46 +00:00
|
|
|
To see the difference try:
|
|
|
|
|
|
|
|
B<parallel -S server pwd ::: "">
|
|
|
|
|
|
|
|
B<parallel --wd . -S server pwd ::: "">
|
|
|
|
|
|
|
|
B<parallel --wd ... -S server pwd ::: "">
|
|
|
|
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-05-05 18:50:53 +00:00
|
|
|
=item B<--wait>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Wait for all commands to complete.
|
|
|
|
|
|
|
|
Implies B<--semaphore>.
|
|
|
|
|
2013-08-14 18:11:00 +00:00
|
|
|
See also B<man sem>.
|
2011-06-25 07:22:05 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
=item B<-X>
|
|
|
|
|
|
|
|
Multiple arguments with context replace. Insert as many arguments as
|
2011-06-27 21:21:26 +00:00
|
|
|
the command line length permits. If multiple jobs are being run in
|
|
|
|
parallel: distribute the arguments evenly among the jobs. Use B<-j1>
|
|
|
|
to avoid this.
|
|
|
|
|
|
|
|
If B<{}> is not used the arguments will be appended to the line. If
|
|
|
|
B<{}> is used as part of a word (like I<pic{}.jpg>) then the whole
|
|
|
|
word will be repeated. If B<{}> is used multiple times each B<{}> will
|
|
|
|
be replaced with the arguments.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Normally B<-X> will do the right thing, whereas B<-m> can give
|
|
|
|
unexpected results if B<{}> is used as part of a word.
|
|
|
|
|
|
|
|
Support for B<-X> with B<--sshlogin> is limited and may fail.
|
|
|
|
|
|
|
|
See also B<-m>.
|
|
|
|
|
|
|
|
|
|
|
|
=item B<--exit>
|
|
|
|
|
|
|
|
=item B<-x>
|
|
|
|
|
|
|
|
Exit if the size (see the B<-s> option) is exceeded.
|
|
|
|
|
|
|
|
|
2013-08-22 15:24:36 +00:00
|
|
|
=item B<--xapply>
|
2011-05-04 18:55:01 +00:00
|
|
|
|
2012-01-13 23:53:19 +00:00
|
|
|
Read multiple input sources like B<xapply>. If multiple input sources
|
|
|
|
are given, one argument will be read from each of the input
|
|
|
|
sources. The arguments can be accessed in the command as B<{1}>
|
|
|
|
.. B<{>I<n>B<}>, so B<{1}> will be a line from the first input source, and
|
|
|
|
B<{6}> will refer to the line with the same line number from the 6th
|
|
|
|
input source.
|
2011-05-04 18:55:01 +00:00
|
|
|
|
2011-05-05 21:36:12 +00:00
|
|
|
Compare these two:
|
|
|
|
|
|
|
|
parallel echo {1} {2} ::: 1 2 3 ::: a b c
|
|
|
|
parallel --xapply echo {1} {2} ::: 1 2 3 ::: a b c
|
|
|
|
|
2013-06-22 12:50:48 +00:00
|
|
|
Arguments will be recycled if one input source has more arguments than the others:
|
|
|
|
|
|
|
|
parallel --xapply echo {1} {2} {3} ::: 1 2 ::: I II III ::: a b c d e f g
|
|
|
|
|
2012-01-22 03:42:05 +00:00
|
|
|
See also B<--header>.
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=back
|
|
|
|
|
2011-04-07 19:54:02 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=head1 EXAMPLE: Working as xargs -n1. Argument appending
|
|
|
|
|
|
|
|
GNU B<parallel> can work similar to B<xargs -n1>.
|
|
|
|
|
|
|
|
To compress all html files using B<gzip> run:
|
|
|
|
|
2014-03-03 18:26:19 +00:00
|
|
|
B<find . -name '*.html' | parallel gzip --best>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
If the file names may contain a newline use B<-0>. Substitute FOO BAR with
|
|
|
|
FUBAR in all files in this dir and subdirs:
|
|
|
|
|
|
|
|
B<find . -type f -print0 | parallel -q0 perl -i -pe 's/FOO BAR/FUBAR/g'>
|
|
|
|
|
|
|
|
Note B<-q> is needed because of the space in 'FOO BAR'.
|
|
|
|
|
|
|
|
|
|
|
|
=head1 EXAMPLE: Reading arguments from command line
|
|
|
|
|
|
|
|
GNU B<parallel> can take the arguments from command line instead of
|
|
|
|
stdin (standard input). To compress all html files in the current dir
|
|
|
|
using B<gzip> run:
|
|
|
|
|
2014-03-03 18:26:19 +00:00
|
|
|
B<parallel gzip --best ::: *.html>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
To convert *.wav to *.mp3 using LAME running one process per CPU core
|
|
|
|
run:
|
|
|
|
|
2011-02-23 15:22:08 +00:00
|
|
|
B<parallel lame {} -o {.}.mp3 ::: *.wav>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
|
|
|
=head1 EXAMPLE: Inserting multiple arguments
|
|
|
|
|
2011-08-02 21:18:28 +00:00
|
|
|
When moving a lot of files like this: B<mv *.log destdir> you will
|
2010-12-06 23:30:08 +00:00
|
|
|
sometimes get the error:
|
|
|
|
|
|
|
|
B<bash: /bin/mv: Argument list too long>
|
|
|
|
|
|
|
|
because there are too many files. You can instead do:
|
|
|
|
|
2011-08-02 21:18:28 +00:00
|
|
|
B<ls | grep -E '\.log$' | parallel mv {} destdir>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
This will run B<mv> for each file. It can be done faster if B<mv> gets
|
|
|
|
as many arguments that will fit on the line:
|
|
|
|
|
2011-08-02 21:18:28 +00:00
|
|
|
B<ls | grep -E '\.log$' | parallel -m mv {} destdir>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
|
|
|
=head1 EXAMPLE: Context replace
|
|
|
|
|
|
|
|
To remove the files I<pict0000.jpg> .. I<pict9999.jpg> you could do:
|
|
|
|
|
|
|
|
B<seq -w 0 9999 | parallel rm pict{}.jpg>
|
|
|
|
|
|
|
|
You could also do:
|
|
|
|
|
|
|
|
B<seq -w 0 9999 | perl -pe 's/(.*)/pict$1.jpg/' | parallel -m rm>
|
|
|
|
|
|
|
|
The first will run B<rm> 10000 times, while the last will only run
|
|
|
|
B<rm> as many times needed to keep the command line length short
|
|
|
|
enough to avoid B<Argument list too long> (it typically runs 1-2 times).
|
|
|
|
|
|
|
|
You could also run:
|
|
|
|
|
|
|
|
B<seq -w 0 9999 | parallel -X rm pict{}.jpg>
|
|
|
|
|
|
|
|
This will also only run B<rm> as many times needed to keep the command
|
|
|
|
line length short enough.
|
|
|
|
|
|
|
|
|
|
|
|
=head1 EXAMPLE: Compute intensive jobs and substitution
|
|
|
|
|
|
|
|
If ImageMagick is installed this will generate a thumbnail of a jpg
|
|
|
|
file:
|
|
|
|
|
|
|
|
B<convert -geometry 120 foo.jpg thumb_foo.jpg>
|
|
|
|
|
2011-03-29 14:22:56 +00:00
|
|
|
This will run with number-of-cpu-cores jobs in parallel for all jpg
|
|
|
|
files in a directory:
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-03-29 14:22:56 +00:00
|
|
|
B<ls *.jpg | parallel convert -geometry 120 {} thumb_{}>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
To do it recursively use B<find>:
|
|
|
|
|
2011-03-29 14:22:56 +00:00
|
|
|
B<find . -name '*.jpg' | parallel convert -geometry 120 {} {}_thumb.jpg>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Notice how the argument has to start with B<{}> as B<{}> will include path
|
|
|
|
(e.g. running B<convert -geometry 120 ./foo/bar.jpg
|
|
|
|
thumb_./foo/bar.jpg> would clearly be wrong). The command will
|
|
|
|
generate files like ./foo/bar.jpg_thumb.jpg.
|
|
|
|
|
|
|
|
Use B<{.}> to avoid the extra .jpg in the file name. This command will
|
|
|
|
make files like ./foo/bar_thumb.jpg:
|
|
|
|
|
2011-03-29 14:22:56 +00:00
|
|
|
B<find . -name '*.jpg' | parallel convert -geometry 120 {} {.}_thumb.jpg>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
|
|
|
=head1 EXAMPLE: Substitution and redirection
|
|
|
|
|
|
|
|
This will generate an uncompressed version of .gz-files next to the .gz-file:
|
|
|
|
|
|
|
|
B<parallel zcat {} ">>B<"{.} ::: *.gz>
|
|
|
|
|
|
|
|
Quoting of > is necessary to postpone the redirection. Another
|
|
|
|
solution is to quote the whole command:
|
|
|
|
|
|
|
|
B<parallel "zcat {} >>B<{.}" ::: *.gz>
|
|
|
|
|
2012-05-27 16:14:25 +00:00
|
|
|
Other special shell characters (such as * ; $ > < | >> <<) also need
|
2010-12-06 23:30:08 +00:00
|
|
|
to be put in quotes, as they may otherwise be interpreted by the shell
|
|
|
|
and not given to GNU B<parallel>.
|
|
|
|
|
2011-04-07 19:54:02 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=head1 EXAMPLE: Composed commands
|
|
|
|
|
|
|
|
A job can consist of several commands. This will print the number of
|
|
|
|
files in each directory:
|
|
|
|
|
|
|
|
B<ls | parallel 'echo -n {}" "; ls {}|wc -l'>
|
|
|
|
|
|
|
|
To put the output in a file called <name>.dir:
|
|
|
|
|
|
|
|
B<ls | parallel '(echo -n {}" "; ls {}|wc -l) >> B<{}.dir'>
|
|
|
|
|
|
|
|
Even small shell scripts can be run by GNU B<parallel>:
|
|
|
|
|
|
|
|
B<find . | parallel 'a={}; name=${a##*/}; upper=$(echo "$name" | tr "[:lower:]" "[:upper:]"); echo "$name - $upper"'>
|
|
|
|
|
2011-01-11 12:42:14 +00:00
|
|
|
B<ls | parallel 'mv {} "$(echo {} | tr "[:upper:]" "[:lower:]")"'>
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
Given a list of URLs, list all URLs that fail to download. Print the
|
|
|
|
line number and the URL.
|
|
|
|
|
|
|
|
B<cat urlfile | parallel "wget {} 2>>B</dev/null || grep -n {} urlfile">
|
|
|
|
|
2011-03-29 14:22:56 +00:00
|
|
|
Create a mirror directory with the same filenames except all files and
|
|
|
|
symlinks are empty files.
|
|
|
|
|
|
|
|
B<cp -rs /the/source/dir mirror_dir; find mirror_dir -type l | parallel -m rm {} '&&' touch {}>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2012-05-27 16:14:25 +00:00
|
|
|
Find the files in a list that do not exist
|
2011-12-18 16:10:26 +00:00
|
|
|
|
|
|
|
B<cat file_list | parallel 'if [ ! -e {} ] ; then echo {}; fi'>
|
2011-04-07 19:54:02 +00:00
|
|
|
|
2013-06-30 16:11:36 +00:00
|
|
|
|
|
|
|
=head1 EXAMPLE: Calling Bash functions
|
|
|
|
|
|
|
|
If the composed command is longer than a line, it becomes hard to
|
|
|
|
read. In Bash you can use functions. Just remember to B<export -f> the
|
|
|
|
function.
|
|
|
|
|
|
|
|
doit() {
|
|
|
|
echo Doing it for $1
|
|
|
|
sleep 2
|
|
|
|
echo Done with $1
|
|
|
|
}
|
|
|
|
export -f doit
|
|
|
|
parallel doit ::: 1 2 3
|
|
|
|
|
|
|
|
doubleit() {
|
|
|
|
echo Doing it for $1 $2
|
|
|
|
sleep 2
|
|
|
|
echo Done with $1 $2
|
|
|
|
}
|
|
|
|
export -f doubleit
|
|
|
|
parallel doubleit ::: 1 2 3 ::: a b
|
|
|
|
|
2013-12-02 20:28:48 +00:00
|
|
|
To do this on remote servers you need to transfer the function using
|
|
|
|
B<--env>:
|
|
|
|
|
|
|
|
parallel --env doit -S server doit ::: 1 2 3
|
|
|
|
parallel --env doubleit -S server doubleit ::: 1 2 3 ::: a b
|
2013-06-30 16:11:36 +00:00
|
|
|
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=head1 EXAMPLE: Removing file extension when processing files
|
|
|
|
|
|
|
|
When processing files removing the file extension using B<{.}> is
|
|
|
|
often useful.
|
|
|
|
|
|
|
|
Create a directory for each zip-file and unzip it in that dir:
|
|
|
|
|
|
|
|
B<parallel 'mkdir {.}; cd {.}; unzip ../{}' ::: *.zip>
|
|
|
|
|
|
|
|
Recompress all .gz files in current directory using B<bzip2> running 1
|
|
|
|
job per CPU core in parallel:
|
|
|
|
|
2011-02-23 15:22:08 +00:00
|
|
|
B<parallel "zcat {} | bzip2 >>B<{.}.bz2 && rm {}" ::: *.gz>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Convert all WAV files to MP3 using LAME:
|
|
|
|
|
2011-02-23 15:22:08 +00:00
|
|
|
B<find sounddir -type f -name '*.wav' | parallel lame {} -o {.}.mp3>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Put all converted in the same directory:
|
|
|
|
|
2011-02-23 15:22:08 +00:00
|
|
|
B<find sounddir -type f -name '*.wav' | parallel lame {} -o mydir/{/.}.mp3>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-04-07 19:54:02 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=head1 EXAMPLE: Removing two file extensions when processing files and
|
|
|
|
calling GNU Parallel from itself
|
|
|
|
|
|
|
|
If you have directory with tar.gz files and want these extracted in
|
|
|
|
the corresponding dir (e.g foo.tar.gz will be extracted in the dir
|
|
|
|
foo) you can do:
|
|
|
|
|
2011-10-10 20:14:55 +00:00
|
|
|
B<ls *.tar.gz| parallel --er {tar} 'echo {tar}|parallel "mkdir -p {.} ; tar -C {.} -xf {.}.tar.gz"'>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-04-07 19:54:02 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=head1 EXAMPLE: Download 10 images for each of the past 30 days
|
|
|
|
|
|
|
|
Let us assume a website stores images like:
|
|
|
|
|
2013-06-30 16:11:36 +00:00
|
|
|
http://www.example.com/path/to/YYYYMMDD_##.jpg
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-05-26 11:14:03 +00:00
|
|
|
where YYYYMMDD is the date and ## is the number 01-10. This will
|
|
|
|
download images for the past 30 days:
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-05-26 11:14:03 +00:00
|
|
|
B<parallel wget http://www.example.com/path/to/'$(date -d "today -{1} days" +%Y%m%d)_{2}.jpg' ::: $(seq 30) ::: $(seq -w 10)>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-08-20 19:13:25 +00:00
|
|
|
B<$(date -d "today -{1} days" +%Y%m%d)> will give the dates in
|
2011-05-26 21:19:58 +00:00
|
|
|
YYYYMMDD with {1} days subtracted.
|
|
|
|
|
2011-04-07 19:54:02 +00:00
|
|
|
|
2014-07-14 16:25:45 +00:00
|
|
|
=head1 EXAMPLE: Digtal clock with "blinking" :
|
|
|
|
|
|
|
|
The : in a digital clock blinks. To make every other line have a ':'
|
|
|
|
and the rest a ' ' a perl expression is used to look at the 3rd input
|
|
|
|
source. If the value modudo 2 is 1: Use ":" otherwise use " ":
|
|
|
|
|
|
|
|
B<parallel -k echo {1}'{=3 $_=$_%2?":":" "=}'{2}{3} ::: {0..12} ::: {0..5} ::: {0..9}>
|
|
|
|
|
|
|
|
|
2011-10-10 20:14:55 +00:00
|
|
|
=head1 EXAMPLE: Breadth first parallel web crawler/mirrorer
|
2011-07-27 17:34:30 +00:00
|
|
|
|
2011-10-10 20:14:55 +00:00
|
|
|
This script below will crawl and mirror a URL in parallel. It
|
|
|
|
downloads first pages that are 1 click down, then 2 clicks down, then
|
|
|
|
3; instead of the normal depth first, where the first link link on
|
|
|
|
each page is fetched first.
|
|
|
|
|
|
|
|
Run like this:
|
2011-07-27 17:34:30 +00:00
|
|
|
|
2011-07-28 18:12:02 +00:00
|
|
|
B<PARALLEL=-j100 ./parallel-crawl http://gatt.org.yeslab.org/>
|
|
|
|
|
|
|
|
Remove the B<wget> part if you only want a web crawler.
|
|
|
|
|
|
|
|
It works by fetching a page from a list of URLs and looking for links
|
|
|
|
in that page that are within the same starting URL and that have not
|
|
|
|
already been seen. These links are added to a new queue. When all the
|
|
|
|
pages from the list is done, the new queue is moved to the list of
|
|
|
|
URLs and the process is started over until no unseen links are found.
|
2011-07-27 17:34:30 +00:00
|
|
|
|
|
|
|
#!/bin/bash
|
|
|
|
|
2011-07-28 18:12:02 +00:00
|
|
|
# E.g. http://gatt.org.yeslab.org/
|
2011-07-27 17:34:30 +00:00
|
|
|
URL=$1
|
2011-07-28 18:12:02 +00:00
|
|
|
# Stay inside the start dir
|
|
|
|
BASEURL=$(echo $URL | perl -pe 's:#.*::; s:(//.*/)[^/]*:$1:')
|
2011-07-27 17:34:30 +00:00
|
|
|
URLLIST=$(mktemp urllist.XXXX)
|
|
|
|
URLLIST2=$(mktemp urllist.XXXX)
|
|
|
|
SEEN=$(mktemp seen.XXXX)
|
|
|
|
|
|
|
|
# Spider to get the URLs
|
|
|
|
echo $URL >$URLLIST
|
|
|
|
cp $URLLIST $SEEN
|
|
|
|
|
|
|
|
while [ -s $URLLIST ] ; do
|
|
|
|
cat $URLLIST |
|
2011-07-28 18:12:02 +00:00
|
|
|
parallel lynx -listonly -image_links -dump {} \; wget -qm -l1 -Q1 {} \; echo Spidered: {} \>\&2 |
|
2011-07-27 17:34:30 +00:00
|
|
|
perl -ne 's/#.*//; s/\s+\d+.\s(\S+)$/$1/ and do { $seen{$1}++ or print }' |
|
2011-07-28 18:12:02 +00:00
|
|
|
grep -F $BASEURL |
|
2011-07-27 17:34:30 +00:00
|
|
|
grep -v -x -F -f $SEEN | tee -a $SEEN > $URLLIST2
|
|
|
|
mv $URLLIST2 $URLLIST
|
|
|
|
done
|
|
|
|
|
|
|
|
rm -f $URLLIST $URLLIST2 $SEEN
|
|
|
|
|
|
|
|
|
2011-04-07 19:54:02 +00:00
|
|
|
=head1 EXAMPLE: Process files from a tar file while unpacking
|
|
|
|
|
|
|
|
If the files to be processed are in a tar file then unpacking one file
|
|
|
|
and processing it immediately may be faster than first unpacking all
|
|
|
|
files.
|
|
|
|
|
|
|
|
B<tar xvf foo.tgz | perl -ne 'print $l;$l=$_;END{print $l}' |
|
|
|
|
parallel echo>
|
|
|
|
|
|
|
|
The Perl one-liner is needed to avoid race condition.
|
|
|
|
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=head1 EXAMPLE: Rewriting a for-loop and a while-read-loop
|
|
|
|
|
|
|
|
for-loops like this:
|
|
|
|
|
|
|
|
(for x in `cat list` ; do
|
|
|
|
do_something $x
|
|
|
|
done) | process_output
|
|
|
|
|
|
|
|
and while-read-loops like this:
|
|
|
|
|
|
|
|
cat list | (while read x ; do
|
|
|
|
do_something $x
|
|
|
|
done) | process_output
|
|
|
|
|
|
|
|
can be written like this:
|
|
|
|
|
|
|
|
B<cat list | parallel do_something | process_output>
|
|
|
|
|
2012-12-10 18:12:35 +00:00
|
|
|
For example: Find which host name in a list has IP address 1.2.3 4:
|
|
|
|
|
|
|
|
B<cat hosts.txt | parallel -P 100 host | grep 1.2.3.4>
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
If the processing requires more steps the for-loop like this:
|
|
|
|
|
|
|
|
(for x in `cat list` ; do
|
|
|
|
no_extension=${x%.*};
|
|
|
|
do_something $x scale $no_extension.jpg
|
|
|
|
do_step2 <$x $no_extension
|
|
|
|
done) | process_output
|
|
|
|
|
|
|
|
and while-loops like this:
|
|
|
|
|
|
|
|
cat list | (while read x ; do
|
|
|
|
no_extension=${x%.*};
|
|
|
|
do_something $x scale $no_extension.jpg
|
|
|
|
do_step2 <$x $no_extension
|
|
|
|
done) | process_output
|
|
|
|
|
|
|
|
can be written like this:
|
|
|
|
|
|
|
|
B<cat list | parallel "do_something {} scale {.}.jpg ; do_step2 <{} {.}" | process_output>
|
|
|
|
|
|
|
|
|
2011-05-03 22:30:56 +00:00
|
|
|
=head1 EXAMPLE: Rewriting nested for-loops
|
|
|
|
|
|
|
|
Nested for-loops like this:
|
|
|
|
|
|
|
|
(for x in `cat xlist` ; do
|
|
|
|
for y in `cat ylist` ; do
|
|
|
|
do_something $x $y
|
|
|
|
done
|
|
|
|
done) | process_output
|
|
|
|
|
|
|
|
can be written like this:
|
|
|
|
|
2011-06-04 20:26:26 +00:00
|
|
|
B<parallel do_something {1} {2} :::: xlist ylist | process_output>
|
2011-05-03 22:30:56 +00:00
|
|
|
|
2011-08-09 20:00:31 +00:00
|
|
|
Nested for-loops like this:
|
|
|
|
|
|
|
|
(for gender in M F ; do
|
|
|
|
for size in S M L XL XXL ; do
|
|
|
|
echo $gender $size
|
|
|
|
done
|
|
|
|
done) | sort
|
|
|
|
|
|
|
|
can be written like this:
|
|
|
|
|
|
|
|
B<parallel echo {1} {2} ::: M F ::: S M L XL XXL | sort>
|
|
|
|
|
2011-05-03 22:30:56 +00:00
|
|
|
|
2013-08-27 09:04:14 +00:00
|
|
|
=head1 EXAMPLE: Finding the lowest difference between files
|
|
|
|
|
|
|
|
B<diff> is good for finding differences in text files. B<diff | wc -l>
|
|
|
|
gives an indication of the size of the difference. To find the
|
|
|
|
differences between all files in the current dir do:
|
|
|
|
|
|
|
|
B<parallel --tag 'diff {1} {2} | wc -l' ::: * ::: * | sort -nk3>
|
|
|
|
|
|
|
|
This way it is possible to see if some files are closer to other
|
|
|
|
files.
|
|
|
|
|
|
|
|
|
2012-01-22 03:42:05 +00:00
|
|
|
=head1 EXAMPLE: for-loops with column names
|
|
|
|
|
|
|
|
When doing multiple nested for-loops it can be easier to keep track of
|
|
|
|
the loop variable if is is named instead of just having a number. Use
|
|
|
|
B<--header :> to let the first argument be an named alias for the
|
|
|
|
positional replacement string:
|
|
|
|
|
|
|
|
parallel --header : echo {gender} {size} ::: gender M F ::: size S M L XL XXL
|
|
|
|
|
|
|
|
This also works if the input file is a file with columns:
|
|
|
|
|
|
|
|
cat addressbook.tsv | parallel --colsep '\t' --header : echo {Name} {E-mail address}
|
|
|
|
|
|
|
|
|
2012-10-15 23:24:35 +00:00
|
|
|
=head1 EXAMPLE: Count the differences between all files in a dir
|
|
|
|
|
|
|
|
Using B<--results> the results are saved in /tmp/diffcount*.
|
|
|
|
|
|
|
|
parallel --results /tmp/diffcount "diff -U 0 {1} {2} |tail -n +3 |grep -v '^@'|wc -l" ::: * ::: *
|
|
|
|
|
|
|
|
To see the difference between file A and file B look at the file
|
|
|
|
'/tmp/diffcount 1 A 2 B' where spaces are TABs (\t).
|
|
|
|
|
|
|
|
|
2012-09-27 09:41:40 +00:00
|
|
|
=head1 EXAMPLE: Speeding up fast jobs
|
|
|
|
|
|
|
|
Starting a job on the local machine takes around 3 ms. This can be a
|
|
|
|
big overhead if the job takes very few ms to run. Often you can group
|
|
|
|
small jobs together using B<-X> which will make the overhead less
|
|
|
|
significant. Compare the speed of these:
|
|
|
|
|
2012-10-17 00:09:03 +00:00
|
|
|
seq -w 0 9999 | parallel touch pict{}.jpg
|
2012-09-27 09:41:40 +00:00
|
|
|
|
2012-10-17 00:09:03 +00:00
|
|
|
seq -w 0 9999 | parallel -X touch pict{}.jpg
|
2012-09-27 09:41:40 +00:00
|
|
|
|
|
|
|
If your program cannot take multiple arguments, then you can use GNU
|
|
|
|
B<parallel> to spawn multiple GNU B<parallel>s:
|
|
|
|
|
2012-10-17 00:09:03 +00:00
|
|
|
seq -w 0 999999 | parallel -j10 --pipe parallel -j0 touch pict{}.jpg
|
2012-09-27 09:41:40 +00:00
|
|
|
|
|
|
|
If B<-j0> normally spawns 506 jobs, then the above will try to spawn
|
|
|
|
5060 jobs. It is likely that you this way will hit the limit of number
|
|
|
|
of processes and/or filehandles. Look at 'ulimit -n' and 'ulimit -u'
|
|
|
|
to raise these limits.
|
|
|
|
|
|
|
|
|
2012-01-22 03:42:05 +00:00
|
|
|
=head1 EXAMPLE: Using shell variables
|
|
|
|
|
|
|
|
When using shell variables you need to quote them correctly as they
|
|
|
|
may otherwise be split on spaces.
|
|
|
|
|
|
|
|
Notice the difference between:
|
|
|
|
|
|
|
|
V=("My brother's 12\" records are worth <\$\$\$>"'!' Foo Bar)
|
|
|
|
parallel echo ::: ${V[@]} # This is probably not what you want
|
|
|
|
|
|
|
|
and:
|
|
|
|
|
|
|
|
V=("My brother's 12\" records are worth <\$\$\$>"'!' Foo Bar)
|
|
|
|
parallel echo ::: "${V[@]}"
|
|
|
|
|
|
|
|
When using variables in the actual command that contains special
|
|
|
|
characters (e.g. space) you can quote them using B<'"$VAR"'> or using
|
|
|
|
"'s and B<-q>:
|
|
|
|
|
|
|
|
V="Here are two "
|
|
|
|
parallel echo "'$V'" ::: spaces
|
|
|
|
parallel -q echo "$V" ::: spaces
|
|
|
|
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=head1 EXAMPLE: Group output lines
|
|
|
|
|
|
|
|
When running jobs that output data, you often do not want the output
|
2014-06-23 00:10:53 +00:00
|
|
|
of multiple jobs to run together. GNU B<parallel> defaults to grouping
|
|
|
|
the output of each job, so the output is printed when the job
|
|
|
|
finishes. If you want full lines to be printed while the job is
|
|
|
|
running you can use B<--line-buffer>. If you want output to be
|
|
|
|
printed as soon as possible you can use B<-u>.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Compare the output of:
|
|
|
|
|
|
|
|
B<parallel traceroute ::: foss.org.my debian.org freenetproject.org>
|
|
|
|
|
|
|
|
to the output of:
|
|
|
|
|
2014-06-23 00:10:53 +00:00
|
|
|
B<parallel --line-buffer traceroute ::: foss.org.my debian.org freenetproject.org>
|
|
|
|
|
|
|
|
and:
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
B<parallel -u traceroute ::: foss.org.my debian.org freenetproject.org>
|
|
|
|
|
|
|
|
|
2011-10-10 20:14:55 +00:00
|
|
|
=head1 EXAMPLE: Tag output lines
|
|
|
|
|
|
|
|
GNU B<parallel> groups the output lines, but it can be hard to see
|
|
|
|
where the different jobs begin. B<--tag> prepends the argument to make
|
|
|
|
that more visible:
|
|
|
|
|
|
|
|
B<parallel --tag traceroute ::: foss.org.my debian.org freenetproject.org>
|
|
|
|
|
2014-06-23 00:10:53 +00:00
|
|
|
B<--tag> works with B<--line-buffer> but not with B<-u>:
|
|
|
|
|
|
|
|
B<parallel --tag --line-buffer traceroute ::: foss.org.my debian.org freenetproject.org>
|
|
|
|
|
2011-10-10 20:20:18 +00:00
|
|
|
Check the uptime of the servers in I<~/.parallel/sshloginfile>:
|
|
|
|
|
|
|
|
B<parallel --tag -S .. --nonall uptime>
|
|
|
|
|
2011-10-10 20:14:55 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=head1 EXAMPLE: Keep order of output same as order of input
|
|
|
|
|
|
|
|
Normally the output of a job will be printed as soon as it
|
|
|
|
completes. Sometimes you want the order of the output to remain the
|
|
|
|
same as the order of the input. This is often important, if the output
|
|
|
|
is used as input for another system. B<-k> will make sure the order of
|
|
|
|
output will be in the same order as input even if later jobs end
|
|
|
|
before earlier jobs.
|
|
|
|
|
|
|
|
Append a string to every line in a text file:
|
|
|
|
|
|
|
|
B<cat textfile | parallel -k echo {} append_string>
|
|
|
|
|
|
|
|
If you remove B<-k> some of the lines may come out in the wrong order.
|
|
|
|
|
|
|
|
Another example is B<traceroute>:
|
|
|
|
|
|
|
|
B<parallel traceroute ::: foss.org.my debian.org freenetproject.org>
|
|
|
|
|
|
|
|
will give traceroute of foss.org.my, debian.org and
|
|
|
|
freenetproject.org, but it will be sorted according to which job
|
|
|
|
completed first.
|
|
|
|
|
|
|
|
To keep the order the same as input run:
|
|
|
|
|
|
|
|
B<parallel -k traceroute ::: foss.org.my debian.org freenetproject.org>
|
|
|
|
|
|
|
|
This will make sure the traceroute to foss.org.my will be printed
|
|
|
|
first.
|
|
|
|
|
|
|
|
A bit more complex example is downloading a huge file in chunks in
|
|
|
|
parallel: Some internet connections will deliver more data if you
|
|
|
|
download files in parallel. For downloading files in parallel see:
|
|
|
|
"EXAMPLE: Download 10 images for each of the past 30 days". But if you
|
|
|
|
are downloading a big file you can download the file in chunks in
|
|
|
|
parallel.
|
|
|
|
|
|
|
|
To download byte 10000000-19999999 you can use B<curl>:
|
|
|
|
|
|
|
|
B<curl -r 10000000-19999999 http://example.com/the/big/file> > B<file.part>
|
|
|
|
|
|
|
|
To download a 1 GB file we need 100 10MB chunks downloaded and
|
|
|
|
combined in the correct order.
|
|
|
|
|
|
|
|
B<seq 0 99 | parallel -k curl -r \
|
|
|
|
{}0000000-{}9999999 http://example.com/the/big/file> > B<file>
|
|
|
|
|
2011-04-07 19:54:02 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=head1 EXAMPLE: Parallel grep
|
|
|
|
|
|
|
|
B<grep -r> greps recursively through directories. On multicore CPUs
|
|
|
|
GNU B<parallel> can often speed this up.
|
|
|
|
|
|
|
|
B<find . -type f | parallel -k -j150% -n 1000 -m grep -H -n STRING {}>
|
|
|
|
|
|
|
|
This will run 1.5 job per core, and give 1000 arguments to B<grep>.
|
|
|
|
|
|
|
|
|
2014-07-15 00:40:38 +00:00
|
|
|
=head1 EXAMPLE: Grepping n lines for m regular expressions.
|
|
|
|
|
|
|
|
The simplest solution to grep a big file for a lot of regexps is:
|
|
|
|
|
|
|
|
grep -f regexps.txt bigfile
|
|
|
|
|
|
|
|
Or if the regexps are fixed strings:
|
|
|
|
|
|
|
|
grep -F -f regexps.txt bigfile
|
|
|
|
|
|
|
|
There are 2 limiting factors: CPU and disk I/O. CPU is easy to
|
|
|
|
measure: If the grep takes >90% CPU (e.g. when running top), then the
|
|
|
|
CPU is a limiting factor, and parallelization will speed this up. If
|
|
|
|
not, then disk I/O is the limiting factor, and depending on the disk
|
|
|
|
system it may be faster or slower to parallelize. The only way to know
|
|
|
|
for certain is to measure.
|
|
|
|
|
|
|
|
If the CPU is the limiting factor parallelization should be done on the regexps:
|
|
|
|
|
|
|
|
cat regexp.txt | parallel --pipe -L1000 --round-robin grep -f - bigfile
|
|
|
|
|
|
|
|
This will start one grep per CPU and read bigfile one time per CPU,
|
|
|
|
but as that is done in parallel, all reads except the first will be
|
|
|
|
cached in RAM. Depending on the size of regexp.txt it may be faster to
|
|
|
|
use --block 10m instead of -L1000. If regexp.txt is too big to fit in
|
|
|
|
RAM, remove --round-robin and adjust -L1000. This will cause bigfile
|
|
|
|
to be read more times.
|
|
|
|
|
|
|
|
Some storage systems perform better when reading multiple chunks in
|
|
|
|
parallel. This is true for some RAID systems and for some network file
|
|
|
|
systems. To parallelize the reading of bigfile:
|
|
|
|
|
|
|
|
parallel --pipepart --block 100M -a bigfile grep -f regexp.txt
|
|
|
|
|
|
|
|
This will split bigfile into 100MB chunks and run grep on each of
|
|
|
|
these chunks. To parallelize both reading of bigfile and regexp.txt
|
|
|
|
combine the two using --fifo:
|
|
|
|
|
|
|
|
parallel --pipepart --block 100M -a bigfile --fifo cat regexp.txt \
|
|
|
|
\| parallel --pipe -L1000 --round-robin grep -f - {}
|
|
|
|
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=head1 EXAMPLE: Using remote computers
|
|
|
|
|
|
|
|
To run commands on a remote computer SSH needs to be set up and you
|
2011-05-03 22:30:56 +00:00
|
|
|
must be able to login without entering a password (The commands
|
|
|
|
B<ssh-copy-id> and B<ssh-agent> may help you do that).
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2013-10-21 20:31:52 +00:00
|
|
|
If you need to login to a whole cluster, you typically do not want to
|
|
|
|
accept the host key for every host. You want to accept them the first
|
|
|
|
time and be warned if they are ever changed. To do that:
|
|
|
|
|
|
|
|
# Add the servers to the sshloginfile
|
|
|
|
(echo servera; echo serverb) > .parallel/my_cluster
|
|
|
|
# Make sure .ssh/config exist
|
|
|
|
touch .ssh/config
|
|
|
|
cp .ssh/config .ssh/config.backup
|
|
|
|
# Disable StrictHostKeyChecking temporarily
|
|
|
|
(echo 'Host *'; echo StrictHostKeyChecking no) >> .ssh/config
|
|
|
|
parallel --slf my_cluster --nonall true
|
|
|
|
# Remove the disabling of StrictHostKeyChecking
|
|
|
|
mv .ssh/config.backup .ssh/config
|
|
|
|
|
|
|
|
The servers in B<.parallel/my_cluster> are now added in B<.ssh/known_hosts>.
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
To run B<echo> on B<server.example.com>:
|
|
|
|
|
2011-03-09 15:23:53 +00:00
|
|
|
seq 10 | parallel --sshlogin server.example.com echo
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
To run commands on more than one remote computer run:
|
|
|
|
|
2011-03-09 15:23:53 +00:00
|
|
|
seq 10 | parallel --sshlogin server.example.com,server2.example.net echo
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Or:
|
|
|
|
|
2011-03-09 15:23:53 +00:00
|
|
|
seq 10 | parallel --sshlogin server.example.com \
|
2010-12-06 23:30:08 +00:00
|
|
|
--sshlogin server2.example.net echo
|
|
|
|
|
|
|
|
If the login username is I<foo> on I<server2.example.net> use:
|
|
|
|
|
2011-03-09 15:23:53 +00:00
|
|
|
seq 10 | parallel --sshlogin server.example.com \
|
2010-12-06 23:30:08 +00:00
|
|
|
--sshlogin foo@server2.example.net echo
|
|
|
|
|
2014-06-13 12:30:14 +00:00
|
|
|
If your list of hosts is I<server1-88.example.net> with login I<foo>:
|
|
|
|
|
|
|
|
seq 10 | parallel -Sfoo@server{1..88}.example.net echo
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
To distribute the commands to a list of computers, make a file
|
|
|
|
I<mycomputers> with all the computers:
|
|
|
|
|
|
|
|
server.example.com
|
|
|
|
foo@server2.example.com
|
|
|
|
server3.example.com
|
|
|
|
|
|
|
|
Then run:
|
|
|
|
|
2011-03-09 15:23:53 +00:00
|
|
|
seq 10 | parallel --sshloginfile mycomputers echo
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
To include the local computer add the special sshlogin ':' to the list:
|
|
|
|
|
|
|
|
server.example.com
|
|
|
|
foo@server2.example.com
|
|
|
|
server3.example.com
|
|
|
|
:
|
|
|
|
|
|
|
|
GNU B<parallel> will try to determine the number of CPU cores on each
|
2011-02-23 15:22:08 +00:00
|
|
|
of the remote computers, and run one job per CPU core - even if the
|
|
|
|
remote computers do not have the same number of CPU cores.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2010-12-21 17:08:16 +00:00
|
|
|
If the number of CPU cores on the remote computers is not identified
|
2010-12-06 23:30:08 +00:00
|
|
|
correctly the number of CPU cores can be added in front. Here the
|
2010-12-21 17:08:16 +00:00
|
|
|
computer has 8 CPU cores.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-03-09 15:23:53 +00:00
|
|
|
seq 10 | parallel --sshlogin 8/server.example.com echo
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
|
|
|
=head1 EXAMPLE: Transferring of files
|
|
|
|
|
2010-12-21 17:08:16 +00:00
|
|
|
To recompress gzipped files with B<bzip2> using a remote computer run:
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
find logs/ -name '*.gz' | \
|
|
|
|
parallel --sshlogin server.example.com \
|
|
|
|
--transfer "zcat {} | bzip2 -9 >{.}.bz2"
|
|
|
|
|
|
|
|
This will list the .gz-files in the I<logs> directory and all
|
|
|
|
directories below. Then it will transfer the files to
|
|
|
|
I<server.example.com> to the corresponding directory in
|
|
|
|
I<$HOME/logs>. On I<server.example.com> the file will be recompressed
|
|
|
|
using B<zcat> and B<bzip2> resulting in the corresponding file with
|
|
|
|
I<.gz> replaced with I<.bz2>.
|
|
|
|
|
|
|
|
If you want the resulting bz2-file to be transferred back to the local
|
|
|
|
computer add I<--return {.}.bz2>:
|
|
|
|
|
|
|
|
find logs/ -name '*.gz' | \
|
|
|
|
parallel --sshlogin server.example.com \
|
|
|
|
--transfer --return {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
|
|
|
|
|
|
|
|
After the recompressing is done the I<.bz2>-file is transferred back to
|
|
|
|
the local computer and put next to the original I<.gz>-file.
|
|
|
|
|
|
|
|
If you want to delete the transferred files on the remote computer add
|
|
|
|
I<--cleanup>. This will remove both the file transferred to the remote
|
|
|
|
computer and the files transferred from the remote computer:
|
|
|
|
|
|
|
|
find logs/ -name '*.gz' | \
|
|
|
|
parallel --sshlogin server.example.com \
|
|
|
|
--transfer --return {.}.bz2 --cleanup "zcat {} | bzip2 -9 >{.}.bz2"
|
|
|
|
|
2010-12-21 17:08:16 +00:00
|
|
|
If you want run on several computers add the computers to I<--sshlogin>
|
2010-12-06 23:30:08 +00:00
|
|
|
either using ',' or multiple I<--sshlogin>:
|
|
|
|
|
|
|
|
find logs/ -name '*.gz' | \
|
|
|
|
parallel --sshlogin server.example.com,server2.example.com \
|
|
|
|
--sshlogin server3.example.com \
|
|
|
|
--transfer --return {.}.bz2 --cleanup "zcat {} | bzip2 -9 >{.}.bz2"
|
|
|
|
|
|
|
|
You can add the local computer using I<--sshlogin :>. This will disable the
|
|
|
|
removing and transferring for the local computer only:
|
|
|
|
|
|
|
|
find logs/ -name '*.gz' | \
|
|
|
|
parallel --sshlogin server.example.com,server2.example.com \
|
|
|
|
--sshlogin server3.example.com \
|
|
|
|
--sshlogin : \
|
|
|
|
--transfer --return {.}.bz2 --cleanup "zcat {} | bzip2 -9 >{.}.bz2"
|
|
|
|
|
|
|
|
Often I<--transfer>, I<--return> and I<--cleanup> are used together. They can be
|
|
|
|
shortened to I<--trc>:
|
|
|
|
|
|
|
|
find logs/ -name '*.gz' | \
|
|
|
|
parallel --sshlogin server.example.com,server2.example.com \
|
|
|
|
--sshlogin server3.example.com \
|
|
|
|
--sshlogin : \
|
|
|
|
--trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
|
|
|
|
|
|
|
|
With the file I<mycomputers> containing the list of computers it becomes:
|
|
|
|
|
|
|
|
find logs/ -name '*.gz' | parallel --sshloginfile mycomputers \
|
|
|
|
--trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
|
|
|
|
|
|
|
|
If the file I<~/.parallel/sshloginfile> contains the list of computers
|
|
|
|
the special short hand I<-S ..> can be used:
|
|
|
|
|
|
|
|
find logs/ -name '*.gz' | parallel -S .. \
|
|
|
|
--trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
|
|
|
|
|
2011-04-07 19:54:02 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=head1 EXAMPLE: Distributing work to local and remote computers
|
|
|
|
|
|
|
|
Convert *.mp3 to *.ogg running one process per CPU core on local computer and server2:
|
|
|
|
|
2011-02-23 15:22:08 +00:00
|
|
|
parallel --trc {.}.ogg -S server2,: \
|
2010-12-06 23:30:08 +00:00
|
|
|
'mpg321 -w - {} | oggenc -q0 - -o {.}.ogg' ::: *.mp3
|
|
|
|
|
2011-04-07 19:54:02 +00:00
|
|
|
|
2011-06-07 20:57:50 +00:00
|
|
|
=head1 EXAMPLE: Running the same command on remote computers
|
2011-05-26 21:19:58 +00:00
|
|
|
|
|
|
|
To run the command B<uptime> on remote computers you can do:
|
|
|
|
|
2011-10-13 21:58:02 +00:00
|
|
|
B<parallel --tag --nonall -S server1,server2 uptime>
|
2011-05-26 21:19:58 +00:00
|
|
|
|
|
|
|
B<--nonall> reads no arguments. If you have a list of jobs you want
|
|
|
|
run on each computer you can do:
|
|
|
|
|
2011-10-13 21:58:02 +00:00
|
|
|
B<parallel --tag --onall -S server1,server2 echo ::: 1 2 3>
|
|
|
|
|
|
|
|
Remove B<--tag> if you do not want the sshlogin added before the
|
|
|
|
output.
|
2011-05-26 21:19:58 +00:00
|
|
|
|
2011-06-07 20:57:50 +00:00
|
|
|
If you have a lot of hosts use '-j0' to access more hosts in parallel.
|
|
|
|
|
2011-05-26 21:19:58 +00:00
|
|
|
|
2011-08-02 21:18:28 +00:00
|
|
|
=head1 EXAMPLE: Parallelizing rsync
|
|
|
|
|
|
|
|
B<rsync> is a great tool, but sometimes it will not fill up the
|
|
|
|
available bandwidth. This is often a problem when copying several big
|
|
|
|
files over high speed connections.
|
|
|
|
|
|
|
|
The following will start one B<rsync> per big file in I<src-dir> to
|
|
|
|
I<dest-dir> on the server I<fooserver>:
|
|
|
|
|
2013-03-12 09:17:55 +00:00
|
|
|
B<cd src-dir; find . -type f -size +100000 | parallel -v ssh fooserver
|
2011-08-02 21:18:28 +00:00
|
|
|
mkdir -p /dest-dir/{//}\;rsync -Havessh {} fooserver:/dest-dir/{}>
|
|
|
|
|
|
|
|
The dirs created may end up with wrong permissions and smaller files
|
|
|
|
are not being transferred. To fix those run B<rsync> a final time:
|
|
|
|
|
|
|
|
B<rsync -Havessh src-dir/ fooserver:/dest-dir/>
|
|
|
|
|
2013-03-12 09:17:55 +00:00
|
|
|
If you are unable to push data, but need to pull them and the files
|
|
|
|
are called digits.png (e.g. 000000.png) you might be able to do:
|
|
|
|
|
2013-08-14 18:11:00 +00:00
|
|
|
B<seq -w 0 99 | parallel rsync -Havessh fooserver:src-path/*{}.png destdir/>
|
2013-03-12 09:17:55 +00:00
|
|
|
|
2011-08-02 21:18:28 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=head1 EXAMPLE: Use multiple inputs in one command
|
|
|
|
|
|
|
|
Copy files like foo.es.ext to foo.ext:
|
|
|
|
|
|
|
|
B<ls *.es.* | perl -pe 'print; s/\.es//' | parallel -N2 cp {1} {2}>
|
|
|
|
|
|
|
|
The perl command spits out 2 lines for each input. GNU B<parallel>
|
|
|
|
takes 2 inputs (using B<-N2>) and replaces {1} and {2} with the inputs.
|
|
|
|
|
2011-05-26 11:14:03 +00:00
|
|
|
Count in binary:
|
|
|
|
|
|
|
|
B<parallel -k echo ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1 ::: 0 1>
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
Print the number on the opposing sides of a six sided die:
|
|
|
|
|
2011-05-26 11:14:03 +00:00
|
|
|
B<parallel --xapply -a <(seq 6) -a <(seq 6 -1 1) echo>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-05-26 11:14:03 +00:00
|
|
|
B<parallel --xapply echo :::: <(seq 6) <(seq 6 -1 1)>
|
2011-05-03 22:30:56 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
Convert files from all subdirs to PNG-files with consecutive numbers
|
|
|
|
(useful for making input PNG's for B<ffmpeg>):
|
|
|
|
|
2011-05-26 11:14:03 +00:00
|
|
|
B<parallel --xapply -a <(find . -type f | sort) -a <(seq $(find . -type f|wc -l)) convert {1} {2}.png>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Alternative version:
|
|
|
|
|
2011-05-26 11:14:03 +00:00
|
|
|
B<find . -type f | sort | parallel convert {} {#}.png>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
|
|
|
=head1 EXAMPLE: Use a table as input
|
|
|
|
|
|
|
|
Content of table_file.tsv:
|
|
|
|
|
|
|
|
foo<TAB>bar
|
|
|
|
baz <TAB> quux
|
|
|
|
|
|
|
|
To run:
|
|
|
|
|
|
|
|
cmd -o bar -i foo
|
|
|
|
cmd -o quux -i baz
|
|
|
|
|
|
|
|
you can run:
|
|
|
|
|
|
|
|
B<parallel -a table_file.tsv --colsep '\t' cmd -o {2} -i {1}>
|
|
|
|
|
|
|
|
Note: The default for GNU B<parallel> is to remove the spaces around the columns. To keep the spaces:
|
|
|
|
|
|
|
|
B<parallel -a table_file.tsv --trim n --colsep '\t' cmd -o {2} -i {1}>
|
|
|
|
|
|
|
|
|
2011-02-23 15:22:08 +00:00
|
|
|
=head1 EXAMPLE: Run the same command 10 times
|
|
|
|
|
|
|
|
If you want to run the same command with the same arguments 10 times
|
|
|
|
in parallel you can do:
|
|
|
|
|
2011-03-09 15:23:53 +00:00
|
|
|
B<seq 10 | parallel -n0 my_command my_args>
|
2011-02-23 15:22:08 +00:00
|
|
|
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=head1 EXAMPLE: Working as cat | sh. Resource inexpensive jobs and evaluation
|
|
|
|
|
|
|
|
GNU B<parallel> can work similar to B<cat | sh>.
|
|
|
|
|
|
|
|
A resource inexpensive job is a job that takes very little CPU, disk
|
|
|
|
I/O and network I/O. Ping is an example of a resource inexpensive
|
|
|
|
job. wget is too - if the webpages are small.
|
|
|
|
|
|
|
|
The content of the file jobs_to_run:
|
|
|
|
|
|
|
|
ping -c 1 10.0.0.1
|
2011-05-03 22:30:56 +00:00
|
|
|
wget http://example.com/status.cgi?ip=10.0.0.1
|
2010-12-06 23:30:08 +00:00
|
|
|
ping -c 1 10.0.0.2
|
2011-05-03 22:30:56 +00:00
|
|
|
wget http://example.com/status.cgi?ip=10.0.0.2
|
2010-12-06 23:30:08 +00:00
|
|
|
...
|
|
|
|
ping -c 1 10.0.0.255
|
2011-05-03 22:30:56 +00:00
|
|
|
wget http://example.com/status.cgi?ip=10.0.0.255
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
To run 100 processes simultaneously do:
|
|
|
|
|
|
|
|
B<parallel -j 100 < jobs_to_run>
|
|
|
|
|
2010-12-19 11:58:36 +00:00
|
|
|
As there is not a I<command> the jobs will be evaluated by the shell.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
2011-02-02 15:36:29 +00:00
|
|
|
=head1 EXAMPLE: Processing a big file using more cores
|
|
|
|
|
|
|
|
To process a big file or some output you can use B<--pipe> to split up
|
|
|
|
the data into blocks and pipe the blocks into the processing program.
|
|
|
|
|
|
|
|
If the program is B<gzip -9> you can do:
|
|
|
|
|
|
|
|
B<cat bigfile | parallel --pipe --recend '' -k gzip -9 >>B<bigfile.gz>
|
|
|
|
|
|
|
|
This will split B<bigfile> into blocks of 1 MB and pass that to B<gzip
|
|
|
|
-9> in parallel. One B<gzip> will be run per CPU core. The output of
|
|
|
|
B<gzip -9> will be kept in order and saved to B<bigfile.gz>
|
|
|
|
|
|
|
|
B<gzip> works fine if the output is appended, but some processing does
|
|
|
|
not work like that - for example sorting. For this GNU B<parallel> can
|
|
|
|
put the output of each command into a file. This will sort a big file
|
|
|
|
in parallel:
|
|
|
|
|
|
|
|
B<cat bigfile | parallel --pipe --files sort | parallel -Xj1 sort -m {} ';' rm {} >>B<bigfile.sort>
|
|
|
|
|
|
|
|
Here B<bigfile> is split into blocks of around 1MB, each block ending
|
|
|
|
in '\n' (which is the default for B<--recend>). Each block is passed
|
|
|
|
to B<sort> and the output from B<sort> is saved into files. These
|
|
|
|
files are passed to the second B<parallel> that runs B<sort -m> on the
|
|
|
|
files before it removes the files. The output is saved to
|
|
|
|
B<bigfile.sort>.
|
|
|
|
|
2014-05-28 21:45:13 +00:00
|
|
|
GNU B<parallel>'s B<--pipe> maxes out at around 100 MB/s because every
|
|
|
|
byte has to be copied through GNU B<parallel>. But if B<bigfile> is a
|
|
|
|
real (seekable) file GNU B<parallel> can by-pass the copying and send
|
|
|
|
the parts directly to the program:
|
|
|
|
|
|
|
|
B<parallel --pipepart --block 100m -a bigfile --files sort | parallel -Xj1 sort -m {} ';' rm {} >>B<bigfile.sort>
|
|
|
|
|
2011-02-02 15:36:29 +00:00
|
|
|
|
2013-11-28 14:24:34 +00:00
|
|
|
=head1 EXAMPLE: Running more than 500 jobs workaround
|
|
|
|
|
|
|
|
If you need to run a massive amount of jobs in parallel, then you will
|
|
|
|
likely hit the filehandle limit which is often around 500 jobs. If you
|
|
|
|
are super user you can raise the limit in /etc/security/limits.conf
|
|
|
|
but you can also use this workaround. The filehandle limit is per
|
|
|
|
process. That means that if you just spawn more GNU B<parallel>s then
|
|
|
|
each of them can run 500 jobs. This will spawn up to 2500 jobs:
|
|
|
|
|
|
|
|
B<cat myinput | parallel --pipe -N 50 --round-robin -j50 parallel -j50 your_prg>
|
|
|
|
|
|
|
|
This will spawn up to 250000 jobs (use with caution - you need 250 GB RAM to do this):
|
|
|
|
|
|
|
|
B<cat myinput | parallel --pipe -N 500 --round-robin -j500 parallel -j500 your_prg>
|
|
|
|
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=head1 EXAMPLE: Working as mutex and counting semaphore
|
|
|
|
|
|
|
|
The command B<sem> is an alias for B<parallel --semaphore>.
|
|
|
|
|
|
|
|
A counting semaphore will allow a given number of jobs to be started
|
|
|
|
in the background. When the number of jobs are running in the
|
|
|
|
background, GNU B<sem> will wait for one of these to complete before
|
|
|
|
starting another command. B<sem --wait> will wait for all jobs to
|
|
|
|
complete.
|
|
|
|
|
|
|
|
Run 10 jobs concurrently in the background:
|
|
|
|
|
2012-06-03 20:33:00 +00:00
|
|
|
for i in *.log ; do
|
2010-12-06 23:30:08 +00:00
|
|
|
echo $i
|
|
|
|
sem -j10 gzip $i ";" echo done
|
|
|
|
done
|
|
|
|
sem --wait
|
|
|
|
|
|
|
|
A mutex is a counting semaphore allowing only one job to run. This
|
|
|
|
will edit the file I<myfile> and prepends the file with lines with the
|
|
|
|
numbers 1 to 3.
|
|
|
|
|
2011-03-09 15:23:53 +00:00
|
|
|
seq 3 | parallel sem sed -i -e 'i{}' myfile
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
As I<myfile> can be very big it is important only one process edits
|
|
|
|
the file at the same time.
|
|
|
|
|
|
|
|
Name the semaphore to have multiple different semaphores active at the
|
|
|
|
same time:
|
|
|
|
|
2011-03-09 15:23:53 +00:00
|
|
|
seq 3 | parallel sem --id mymutex sed -i -e 'i{}' myfile
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
|
|
|
=head1 EXAMPLE: Start editor with filenames from stdin (standard input)
|
|
|
|
|
2011-10-14 22:21:23 +00:00
|
|
|
You can use GNU B<parallel> to start interactive programs like emacs or vi:
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2012-01-07 01:24:50 +00:00
|
|
|
B<cat filelist | parallel --tty -X emacs>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2012-01-07 01:24:50 +00:00
|
|
|
B<cat filelist | parallel --tty -X vi>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
If there are more files than will fit on a single command line, the
|
|
|
|
editor will be started again with the remaining files.
|
|
|
|
|
|
|
|
|
2011-05-13 23:03:09 +00:00
|
|
|
=head1 EXAMPLE: Running sudo
|
|
|
|
|
|
|
|
B<sudo> requires a password to run a command as root. It caches the
|
|
|
|
access, so you only need to enter the password again if you have not
|
|
|
|
used B<sudo> for a while.
|
|
|
|
|
|
|
|
The command:
|
|
|
|
|
|
|
|
parallel sudo echo ::: This is a bad idea
|
|
|
|
|
|
|
|
is no good, as you would be prompted for the sudo password for each of
|
|
|
|
the jobs. You can either do:
|
|
|
|
|
|
|
|
sudo echo This
|
|
|
|
parallel sudo echo ::: is a good idea
|
|
|
|
|
|
|
|
or:
|
|
|
|
|
|
|
|
sudo parallel echo ::: This is a good idea
|
|
|
|
|
|
|
|
This way you only have to enter the sudo password once.
|
|
|
|
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=head1 EXAMPLE: GNU Parallel as queue system/batch manager
|
|
|
|
|
2011-01-02 00:01:21 +00:00
|
|
|
GNU B<parallel> can work as a simple job queue system or batch manager.
|
|
|
|
The idea is to put the jobs into a file and have GNU B<parallel> read
|
|
|
|
from that continuously. As GNU B<parallel> will stop at end of file we
|
|
|
|
use B<tail> to continue reading:
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2012-09-03 21:09:00 +00:00
|
|
|
B<true >>B<jobqueue>; B<tail -f jobqueue | parallel>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
To submit your jobs to the queue:
|
|
|
|
|
|
|
|
B<echo my_command my_arg >>>B< jobqueue>
|
|
|
|
|
|
|
|
You can of course use B<-S> to distribute the jobs to remote
|
|
|
|
computers:
|
|
|
|
|
2014-07-14 16:25:45 +00:00
|
|
|
B<true >>B<jobqueue>; B<tail -f jobqueue | parallel -S ..>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2013-11-02 22:17:20 +00:00
|
|
|
There is a a small issue when using GNU B<parallel> as queue
|
|
|
|
system/batch manager: You have to submit JobSlot number of jobs before
|
|
|
|
they will start, and after that you can submit one at a time, and job
|
|
|
|
will start immediately if free slots are available. Output from the
|
|
|
|
running or completed jobs are held back and will only be printed when
|
|
|
|
JobSlots more jobs has been started (unless you use --ungroup or -u,
|
|
|
|
in which case the output from the jobs are printed immediately).
|
|
|
|
E.g. if you have 10 jobslots then the output from the first completed
|
|
|
|
job will only be printed when job 11 has started, and the output of
|
|
|
|
second completed job will only be printed when job 12 has started.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-04-07 19:54:02 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=head1 EXAMPLE: GNU Parallel as dir processor
|
|
|
|
|
|
|
|
If you have a dir in which users drop files that needs to be processed
|
|
|
|
you can do this on GNU/Linux (If you know what B<inotifywait> is
|
|
|
|
called on other platforms file a bug report):
|
|
|
|
|
2012-06-03 20:33:00 +00:00
|
|
|
B<inotifywait -q -m -r -e MOVED_TO -e CLOSE_WRITE --format %w%f my_dir | parallel
|
2010-12-06 23:30:08 +00:00
|
|
|
-u echo>
|
|
|
|
|
|
|
|
This will run the command B<echo> on each file put into B<my_dir> or
|
|
|
|
subdirs of B<my_dir>.
|
|
|
|
|
|
|
|
You can of course use B<-S> to distribute the jobs to remote
|
|
|
|
computers:
|
|
|
|
|
2012-06-03 20:33:00 +00:00
|
|
|
B<inotifywait -q -m -r -e MOVED_TO -e CLOSE_WRITE --format %w%f my_dir
|
|
|
|
| parallel -S .. -u echo>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-04-07 19:54:02 +00:00
|
|
|
If the files to be processed are in a tar file then unpacking one file
|
|
|
|
and processing it immediately may be faster than first unpacking all
|
|
|
|
files. Set up the dir processor as above and unpack into the dir.
|
|
|
|
|
2013-01-14 23:20:59 +00:00
|
|
|
Using GNU Parallel as dir processor has the same limitations as using
|
|
|
|
GNU Parallel as queue system/batch manager.
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
=head1 QUOTING
|
|
|
|
|
2010-12-15 23:25:41 +00:00
|
|
|
GNU B<parallel> is very liberal in quoting. You only need to quote
|
|
|
|
characters that have special meaning in shell:
|
|
|
|
|
|
|
|
( ) $ ` ' " < > ; | \
|
|
|
|
|
|
|
|
and depending on context these needs to be quoted, too:
|
|
|
|
|
2012-10-17 00:09:03 +00:00
|
|
|
~ & # ! ? space * {
|
2010-12-15 23:25:41 +00:00
|
|
|
|
2010-12-19 00:38:36 +00:00
|
|
|
Therefore most people will never need more quoting than putting '\'
|
|
|
|
in front of the special characters.
|
|
|
|
|
2012-09-27 09:41:40 +00:00
|
|
|
Often you can simply put \' around every ':
|
|
|
|
|
2012-10-17 00:09:03 +00:00
|
|
|
perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"' file
|
2012-09-27 09:41:40 +00:00
|
|
|
|
|
|
|
can be quoted:
|
|
|
|
|
2012-10-17 00:09:03 +00:00
|
|
|
parallel perl -ne \''/^\S+\s+\S+$/ and print $ARGV,"\n"'\' ::: file
|
2012-09-27 09:41:40 +00:00
|
|
|
|
2010-12-19 00:38:36 +00:00
|
|
|
However, when you want to use a shell variable you need to quote the
|
2010-12-15 23:25:41 +00:00
|
|
|
$-sign. Here is an example using $PARALLEL_SEQ. This variable is set
|
|
|
|
by GNU B<parallel> itself, so the evaluation of the $ must be done by
|
|
|
|
the sub shell started by GNU B<parallel>:
|
|
|
|
|
2011-03-09 15:23:53 +00:00
|
|
|
B<seq 10 | parallel -N2 echo seq:\$PARALLEL_SEQ arg1:{1} arg2:{2}>
|
2010-12-15 23:25:41 +00:00
|
|
|
|
|
|
|
If the variable is set before GNU B<parallel> starts you can do this:
|
|
|
|
|
|
|
|
B<VAR=this_is_set_before_starting>
|
|
|
|
|
|
|
|
B<echo test | parallel echo {} $VAR>
|
|
|
|
|
|
|
|
Prints: B<test this_is_set_before_starting>
|
|
|
|
|
2010-12-19 00:38:36 +00:00
|
|
|
It is a little more tricky if the variable contains more than one space in a row:
|
|
|
|
|
|
|
|
B<VAR="two spaces between each word">
|
|
|
|
|
|
|
|
B<echo test | parallel echo {} \'"$VAR"\'>
|
|
|
|
|
|
|
|
Prints: B<test two spaces between each word>
|
|
|
|
|
2010-12-15 23:25:41 +00:00
|
|
|
If the variable should not be evaluated by the shell starting GNU
|
|
|
|
B<parallel> but be evaluated by the sub shell started by GNU
|
|
|
|
B<parallel>, then you need to quote it:
|
|
|
|
|
|
|
|
B<echo test | parallel VAR=this_is_set_after_starting \; echo {} \$VAR>
|
|
|
|
|
|
|
|
Prints: B<test this_is_set_after_starting>
|
|
|
|
|
2010-12-19 00:38:36 +00:00
|
|
|
It is a little more tricky if the variable contains space:
|
|
|
|
|
|
|
|
B<echo test | parallel VAR='"two spaces between each word"' echo {} \'"$VAR"\'>
|
|
|
|
|
|
|
|
Prints: B<test two spaces between each word>
|
|
|
|
|
2010-12-15 23:25:41 +00:00
|
|
|
$$ is the shell variable containing the process id of the shell. This
|
|
|
|
will print the process id of the shell running GNU B<parallel>:
|
|
|
|
|
2011-03-09 15:23:53 +00:00
|
|
|
B<seq 10 | parallel echo $$>
|
2010-12-15 23:25:41 +00:00
|
|
|
|
|
|
|
And this will print the process ids of the sub shells started by GNU
|
|
|
|
B<parallel>.
|
|
|
|
|
2011-03-09 15:23:53 +00:00
|
|
|
B<seq 10 | parallel echo \$\$>
|
2010-12-15 23:25:41 +00:00
|
|
|
|
|
|
|
If the special characters should not be evaluated by the sub shell
|
|
|
|
then you need to protect it against evaluation from both the shell
|
|
|
|
starting GNU B<parallel> and the sub shell:
|
|
|
|
|
|
|
|
B<echo test | parallel echo {} \\\$VAR>
|
|
|
|
|
|
|
|
Prints: B<test $VAR>
|
|
|
|
|
|
|
|
GNU B<parallel> can protect against evaluation by the sub shell by
|
|
|
|
using -q:
|
|
|
|
|
|
|
|
B<echo test | parallel -q echo {} \$VAR>
|
|
|
|
|
|
|
|
Prints: B<test $VAR>
|
|
|
|
|
|
|
|
This is particularly useful if you have lots of quoting. If you want to run a perl script like this:
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
B<perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"' file>
|
|
|
|
|
2010-12-15 23:25:41 +00:00
|
|
|
It needs to be quoted like this:
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2013-07-03 22:50:08 +00:00
|
|
|
B<ls | parallel perl -ne '/^\\S+\\s+\\S+\$/\ and\ print\ \$ARGV,\"\\n\"'>
|
|
|
|
B<ls | parallel perl -ne \''/^\S+\s+\S+$/ and print $ARGV,"\n"'\'>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2010-12-15 23:25:41 +00:00
|
|
|
Notice how spaces, \'s, "'s, and $'s need to be quoted. GNU B<parallel>
|
|
|
|
can do the quoting by using option -q:
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
B<ls | parallel -q perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"'>
|
|
|
|
|
2010-12-15 23:25:41 +00:00
|
|
|
However, this means you cannot make the sub shell interpret special
|
2010-12-19 00:38:36 +00:00
|
|
|
characters. For example because of B<-q> this WILL NOT WORK:
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
B<ls *.gz | parallel -q "zcat {} >>B<{.}">
|
|
|
|
|
|
|
|
B<ls *.gz | parallel -q "zcat {} | bzip2 >>B<{.}.bz2">
|
|
|
|
|
2010-12-15 23:25:41 +00:00
|
|
|
because > and | need to be interpreted by the sub shell.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
If you get errors like:
|
|
|
|
|
|
|
|
sh: -c: line 0: syntax error near unexpected token
|
|
|
|
sh: Syntax error: Unterminated quoted string
|
|
|
|
sh: -c: line 0: unexpected EOF while looking for matching `''
|
|
|
|
sh: -c: line 1: syntax error: unexpected end of file
|
|
|
|
|
|
|
|
then you might try using B<-q>.
|
|
|
|
|
|
|
|
If you are using B<bash> process substitution like B<<(cat foo)> then
|
|
|
|
you may try B<-q> and prepending I<command> with B<bash -c>:
|
|
|
|
|
|
|
|
B<ls | parallel -q bash -c 'wc -c <(echo {})'>
|
|
|
|
|
|
|
|
Or for substituting output:
|
|
|
|
|
|
|
|
B<ls | parallel -q bash -c 'tar c {} | tee >>B<(gzip >>B<{}.tar.gz) | bzip2 >>B<{}.tar.bz2'>
|
|
|
|
|
|
|
|
B<Conclusion>: To avoid dealing with the quoting problems it may be
|
2013-07-03 22:50:08 +00:00
|
|
|
easier just to write a small script or a function (remember to
|
|
|
|
B<export -f> the function) and have GNU B<parallel> call that.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
|
|
|
=head1 LIST RUNNING JOBS
|
|
|
|
|
|
|
|
If you want a list of the jobs currently running you can run:
|
|
|
|
|
|
|
|
B<killall -USR1 parallel>
|
|
|
|
|
2011-05-26 21:19:58 +00:00
|
|
|
GNU B<parallel> will then print the currently running jobs on stderr
|
|
|
|
(standard error).
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
|
|
|
=head1 COMPLETE RUNNING JOBS BUT DO NOT START NEW JOBS
|
|
|
|
|
|
|
|
If you regret starting a lot of jobs you can simply break GNU B<parallel>,
|
2012-05-27 16:14:25 +00:00
|
|
|
but if you want to make sure you do not have half-completed jobs you
|
2010-12-06 23:30:08 +00:00
|
|
|
should send the signal B<SIGTERM> to GNU B<parallel>:
|
|
|
|
|
|
|
|
B<killall -TERM parallel>
|
|
|
|
|
|
|
|
This will tell GNU B<parallel> to not start any new jobs, but wait until
|
|
|
|
the currently running jobs are finished before exiting.
|
|
|
|
|
|
|
|
|
|
|
|
=head1 ENVIRONMENT VARIABLES
|
|
|
|
|
|
|
|
=over 9
|
|
|
|
|
|
|
|
=item $PARALLEL_PID
|
|
|
|
|
|
|
|
The environment variable $PARALLEL_PID is set by GNU B<parallel> and
|
|
|
|
is visible to the jobs started from GNU B<parallel>. This makes it
|
|
|
|
possible for the jobs to communicate directly to GNU B<parallel>.
|
|
|
|
Remember to quote the $, so it gets evaluated by the correct
|
|
|
|
shell.
|
|
|
|
|
|
|
|
B<Example:> If each of the jobs tests a solution and one of jobs finds
|
|
|
|
the solution the job can tell GNU B<parallel> not to start more jobs
|
|
|
|
by: B<kill -TERM $PARALLEL_PID>. This only works on the local
|
|
|
|
computer.
|
|
|
|
|
|
|
|
|
|
|
|
=item $PARALLEL_SEQ
|
|
|
|
|
|
|
|
$PARALLEL_SEQ will be set to the sequence number of the job
|
|
|
|
running. Remember to quote the $, so it gets evaluated by the correct
|
|
|
|
shell.
|
|
|
|
|
|
|
|
B<Example:>
|
|
|
|
|
2011-03-09 15:23:53 +00:00
|
|
|
B<seq 10 | parallel -N2 echo seq:'$'PARALLEL_SEQ arg1:{1} arg2:{2}>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
|
|
|
=item $TMPDIR
|
|
|
|
|
|
|
|
Directory for temporary files. See: B<--tmpdir>.
|
|
|
|
|
|
|
|
|
|
|
|
=item $PARALLEL
|
|
|
|
|
|
|
|
The environment variable $PARALLEL will be used as default options for
|
|
|
|
GNU B<parallel>. If the variable contains special shell characters
|
|
|
|
(e.g. $, *, or space) then these need to be to be escaped with \.
|
|
|
|
|
|
|
|
B<Example:>
|
|
|
|
|
|
|
|
B<cat list | parallel -j1 -k -v ls>
|
|
|
|
|
|
|
|
can be written as:
|
|
|
|
|
|
|
|
B<cat list | PARALLEL="-kvj1" parallel ls>
|
|
|
|
|
|
|
|
B<cat list | parallel -j1 -k -v -S"myssh user@server" ls>
|
|
|
|
|
|
|
|
can be written as:
|
|
|
|
|
|
|
|
B<cat list | PARALLEL='-kvj1 -S myssh\ user@server' parallel echo>
|
|
|
|
|
|
|
|
Notice the \ in the middle is needed because 'myssh' and 'user@server'
|
|
|
|
must be one argument.
|
|
|
|
|
|
|
|
=back
|
|
|
|
|
2011-04-07 19:54:02 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=head1 DEFAULT PROFILE (CONFIG FILE)
|
|
|
|
|
|
|
|
The file ~/.parallel/config (formerly known as .parallelrc) will be
|
|
|
|
read if it exists. Lines starting with '#' will be ignored. It can be
|
|
|
|
formatted like the environment variable $PARALLEL, but it is often
|
|
|
|
easier to simply put each option on its own line.
|
|
|
|
|
|
|
|
Options on the command line takes precedence over the environment
|
|
|
|
variable $PARALLEL which takes precedence over the file
|
|
|
|
~/.parallel/config.
|
|
|
|
|
2011-04-07 19:54:02 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=head1 PROFILE FILES
|
|
|
|
|
2011-02-02 15:36:29 +00:00
|
|
|
If B<--profile> set, GNU B<parallel> will read the profile from that file instead of
|
2011-10-13 21:58:02 +00:00
|
|
|
~/.parallel/config. You can have multiple B<--profiles>.
|
|
|
|
|
|
|
|
Example: Profile for running a command on every sshlogin in
|
|
|
|
~/.ssh/sshlogins and prepend the output with the sshlogin:
|
|
|
|
|
|
|
|
echo --tag -S .. --nonall > ~/.parallel/n
|
|
|
|
parallel -Jn uptime
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-02-23 15:22:08 +00:00
|
|
|
Example: Profile for running every command with B<-j-1> and B<nice>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-02-23 15:22:08 +00:00
|
|
|
echo -j-1 nice > ~/.parallel/nice_profile
|
2010-12-06 23:30:08 +00:00
|
|
|
parallel -J nice_profile bzip2 -9 ::: *
|
|
|
|
|
|
|
|
Example: Profile for running a perl script before every command:
|
|
|
|
|
|
|
|
echo "perl -e '\$a=\$\$; print \$a,\" \",'\$PARALLEL_SEQ',\" \";';" > ~/.parallel/pre_perl
|
|
|
|
parallel -J pre_perl echo ::: *
|
|
|
|
|
|
|
|
Note how the $ and " need to be quoted using \.
|
|
|
|
|
|
|
|
Example: Profile for running distributed jobs with B<nice> on the
|
2010-12-21 17:08:16 +00:00
|
|
|
remote computers:
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
echo -S .. nice > ~/.parallel/dist
|
|
|
|
parallel -J dist --trc {.}.bz2 bzip2 -9 ::: *
|
|
|
|
|
|
|
|
|
|
|
|
=head1 EXIT STATUS
|
|
|
|
|
|
|
|
If B<--halt-on-error> 0 or not specified:
|
|
|
|
|
|
|
|
=over 6
|
|
|
|
|
2013-07-26 12:58:42 +00:00
|
|
|
=item Z<>0
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
All jobs ran without error.
|
|
|
|
|
2013-07-26 12:58:42 +00:00
|
|
|
=item Z<>1-253
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Some of the jobs failed. The exit status gives the number of failed jobs
|
|
|
|
|
2013-07-26 12:58:42 +00:00
|
|
|
=item Z<>254
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
More than 253 jobs failed.
|
|
|
|
|
2013-07-26 12:58:42 +00:00
|
|
|
=item Z<>255
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Other error.
|
|
|
|
|
|
|
|
=back
|
|
|
|
|
|
|
|
If B<--halt-on-error> 1 or 2: Exit status of the failing job.
|
|
|
|
|
|
|
|
|
|
|
|
=head1 DIFFERENCES BETWEEN GNU Parallel AND ALTERNATIVES
|
|
|
|
|
|
|
|
There are a lot programs with some of the functionality of GNU
|
|
|
|
B<parallel>. GNU B<parallel> strives to include the best of the
|
2012-05-27 16:14:25 +00:00
|
|
|
functionality without sacrificing ease of use.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
=head2 SUMMARY TABLE
|
|
|
|
|
|
|
|
The following features are in some of the comparable tools:
|
|
|
|
|
|
|
|
Inputs
|
|
|
|
I1. Arguments can be read from stdin
|
|
|
|
I2. Arguments can be read from a file
|
|
|
|
I3. Arguments can be read from multiple files
|
|
|
|
I4. Arguments can be read from command line
|
|
|
|
I5. Arguments can be read from a table
|
|
|
|
I6. Arguments can be read from the same file using #! (shebang)
|
|
|
|
I7. Line oriented input as default (Quoting of special chars not needed)
|
|
|
|
|
|
|
|
Manipulation of input
|
|
|
|
M1. Composed command
|
|
|
|
M2. Multiple arguments can fill up an execution line
|
|
|
|
M3. Arguments can be put anywhere in the execution line
|
|
|
|
M4. Multiple arguments can be put anywhere in the execution line
|
|
|
|
M5. Arguments can be replaced with context
|
|
|
|
M6. Input can be treated as complete execution line
|
|
|
|
|
|
|
|
Outputs
|
|
|
|
O1. Grouping output so output from different jobs do not mix
|
2011-05-26 21:19:58 +00:00
|
|
|
O2. Send stderr (standard error) to stderr (standard error)
|
|
|
|
O3. Send stdout (standard output) to stdout (standard output)
|
2010-12-06 23:30:08 +00:00
|
|
|
O4. Order of output can be same as order of input
|
2011-05-26 21:19:58 +00:00
|
|
|
O5. Stdout only contains stdout (standard output) from the command
|
|
|
|
O6. Stderr only contains stderr (standard error) from the command
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Execution
|
|
|
|
E1. Running jobs in parallel
|
|
|
|
E2. List running jobs
|
|
|
|
E3. Finish running jobs, but do not start new jobs
|
|
|
|
E4. Number of running jobs can depend on number of cpus
|
|
|
|
E5. Finish running jobs, but do not start new jobs after first failure
|
|
|
|
E6. Number of running jobs can be adjusted while running
|
|
|
|
|
|
|
|
Remote execution
|
|
|
|
R1. Jobs can be run on remote computers
|
|
|
|
R2. Basefiles can be transferred
|
|
|
|
R3. Argument files can be transferred
|
|
|
|
R4. Result files can be transferred
|
|
|
|
R5. Cleanup of transferred files
|
|
|
|
R6. No config files needed
|
2014-03-22 20:41:14 +00:00
|
|
|
R7. Do not run more than SSHD's MaxStartups can handle
|
2010-12-06 23:30:08 +00:00
|
|
|
R8. Configurable SSH command
|
2012-05-27 16:14:25 +00:00
|
|
|
R9. Retry if connection breaks occasionally
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Semaphore
|
|
|
|
S1. Possibility to work as a mutex
|
|
|
|
S2. Possibility to work as a counting semaphore
|
|
|
|
|
|
|
|
Legend
|
|
|
|
- = no
|
|
|
|
x = not applicable
|
|
|
|
ID = yes
|
|
|
|
|
|
|
|
As every new version of the programs are not tested the table may be
|
|
|
|
outdated. Please file a bug-report if you find errors (See REPORTING
|
|
|
|
BUGS).
|
|
|
|
|
|
|
|
parallel:
|
|
|
|
I1 I2 I3 I4 I5 I6 I7
|
|
|
|
M1 M2 M3 M4 M5 M6
|
|
|
|
O1 O2 O3 O4 O5 O6
|
|
|
|
E1 E2 E3 E4 E5 E6
|
|
|
|
R1 R2 R3 R4 R5 R6 R7 R8 R9
|
|
|
|
S1 S2
|
|
|
|
|
|
|
|
xargs:
|
|
|
|
I1 I2 - - - - -
|
|
|
|
- M2 M3 - - -
|
|
|
|
- O2 O3 - O5 O6
|
|
|
|
E1 - - - - -
|
|
|
|
- - - - - x - - -
|
|
|
|
- -
|
|
|
|
|
|
|
|
find -exec:
|
|
|
|
- - - x - x -
|
|
|
|
- M2 M3 - - - -
|
|
|
|
- O2 O3 O4 O5 O6
|
|
|
|
- - - - - - -
|
|
|
|
- - - - - - - - -
|
|
|
|
x x
|
|
|
|
|
|
|
|
make -j:
|
|
|
|
- - - - - - -
|
|
|
|
- - - - - -
|
|
|
|
O1 O2 O3 - x O6
|
|
|
|
E1 - - - E5 -
|
|
|
|
- - - - - - - - -
|
|
|
|
- -
|
|
|
|
|
|
|
|
ppss:
|
|
|
|
I1 I2 - - - - I7
|
|
|
|
M1 - M3 - - M6
|
|
|
|
O1 - - x - -
|
|
|
|
E1 E2 ?E3 E4 - -
|
|
|
|
R1 R2 R3 R4 - - ?R7 ? ?
|
|
|
|
- -
|
|
|
|
|
|
|
|
pexec:
|
|
|
|
I1 I2 - I4 I5 - -
|
|
|
|
M1 - M3 - - M6
|
|
|
|
O1 O2 O3 - O5 O6
|
|
|
|
E1 - - E4 - E6
|
|
|
|
R1 - - - - R6 - - -
|
|
|
|
S1 -
|
|
|
|
|
|
|
|
xjobs: TODO - Please file a bug-report if you know what features xjobs
|
|
|
|
supports (See REPORTING BUGS).
|
|
|
|
|
|
|
|
prll: TODO - Please file a bug-report if you know what features prll
|
|
|
|
supports (See REPORTING BUGS).
|
|
|
|
|
|
|
|
dxargs: TODO - Please file a bug-report if you know what features dxargs
|
|
|
|
supports (See REPORTING BUGS).
|
|
|
|
|
|
|
|
mdm/middelman: TODO - Please file a bug-report if you know what
|
|
|
|
features mdm/middelman supports (See REPORTING BUGS).
|
|
|
|
|
|
|
|
xapply: TODO - Please file a bug-report if you know what features xapply
|
|
|
|
supports (See REPORTING BUGS).
|
|
|
|
|
|
|
|
paexec: TODO - Please file a bug-report if you know what features paexec
|
|
|
|
supports (See REPORTING BUGS).
|
|
|
|
|
|
|
|
ClusterSSH: TODO - Please file a bug-report if you know what features ClusterSSH
|
|
|
|
supports (See REPORTING BUGS).
|
|
|
|
|
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN xargs AND GNU Parallel
|
|
|
|
|
2012-05-27 16:14:25 +00:00
|
|
|
B<xargs> offer some of the same possibilities as GNU B<parallel>.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
B<xargs> deals badly with special characters (such as space, ' and
|
|
|
|
"). To see the problem try this:
|
|
|
|
|
|
|
|
touch important_file
|
|
|
|
touch 'not important_file'
|
|
|
|
ls not* | xargs rm
|
2011-01-11 12:42:14 +00:00
|
|
|
mkdir -p "My brother's 12\" records"
|
2010-12-06 23:30:08 +00:00
|
|
|
ls | xargs rmdir
|
|
|
|
|
|
|
|
You can specify B<-0> or B<-d "\n">, but many input generators are not
|
|
|
|
optimized for using B<NUL> as separator but are optimized for
|
|
|
|
B<newline> as separator. E.g B<head>, B<tail>, B<awk>, B<ls>, B<echo>,
|
|
|
|
B<sed>, B<tar -v>, B<perl> (B<-0> and \0 instead of \n), B<locate>
|
|
|
|
(requires using B<-0>), B<find> (requires using B<-print0>), B<grep>
|
|
|
|
(requires user to use B<-z> or B<-Z>), B<sort> (requires using B<-z>).
|
|
|
|
|
|
|
|
So GNU B<parallel>'s newline separation can be emulated with:
|
|
|
|
|
|
|
|
B<cat | xargs -d "\n" -n1 I<command>>
|
|
|
|
|
|
|
|
B<xargs> can run a given number of jobs in parallel, but has no
|
|
|
|
support for running number-of-cpu-cores jobs in parallel.
|
|
|
|
|
|
|
|
B<xargs> has no support for grouping the output, therefore output may
|
|
|
|
run together, e.g. the first half of a line is from one process and
|
|
|
|
the last half of the line is from another process. The example
|
|
|
|
B<Parallel grep> cannot be done reliably with B<xargs> because of
|
|
|
|
this. To see this in action try:
|
|
|
|
|
|
|
|
parallel perl -e '\$a=\"1{}\"x10000000\;print\ \$a,\"\\n\"' '>' {} ::: a b c d e f
|
|
|
|
ls -l a b c d e f
|
|
|
|
parallel -kP4 -n1 grep 1 > out.par ::: a b c d e f
|
|
|
|
echo a b c d e f | xargs -P4 -n1 grep 1 > out.xargs-unbuf
|
|
|
|
echo a b c d e f | xargs -P4 -n1 grep --line-buffered 1 > out.xargs-linebuf
|
2014-03-21 21:39:54 +00:00
|
|
|
echo a b c d e f | xargs -n1 grep 1 > out.xargs-serial
|
2010-12-06 23:30:08 +00:00
|
|
|
ls -l out*
|
|
|
|
md5sum out*
|
|
|
|
|
|
|
|
B<xargs> has no support for keeping the order of the output, therefore
|
|
|
|
if running jobs in parallel using B<xargs> the output of the second
|
|
|
|
job cannot be postponed till the first job is done.
|
|
|
|
|
|
|
|
B<xargs> has no support for running jobs on remote computers.
|
|
|
|
|
|
|
|
B<xargs> has no support for context replace, so you will have to create the
|
|
|
|
arguments.
|
|
|
|
|
|
|
|
If you use a replace string in B<xargs> (B<-I>) you can not force
|
|
|
|
B<xargs> to use more than one argument.
|
|
|
|
|
|
|
|
Quoting in B<xargs> works like B<-q> in GNU B<parallel>. This means
|
|
|
|
composed commands and redirection require using B<bash -c>.
|
|
|
|
|
|
|
|
B<ls | parallel "wc {} >> B<{}.wc">
|
|
|
|
|
2011-03-29 14:22:56 +00:00
|
|
|
becomes (assuming you have 8 cores)
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-03-29 14:22:56 +00:00
|
|
|
B<ls | xargs -d "\n" -P8 -I {} bash -c "wc {} >>B< {}.wc">
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
and
|
|
|
|
|
|
|
|
B<ls | parallel "echo {}; ls {}|wc">
|
|
|
|
|
2011-03-29 14:22:56 +00:00
|
|
|
becomes (assuming you have 8 cores)
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-03-29 14:22:56 +00:00
|
|
|
B<ls | xargs -d "\n" -P8 -I {} bash -c "echo {}; ls {}|wc">
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN find -exec AND GNU Parallel
|
|
|
|
|
2012-05-27 16:14:25 +00:00
|
|
|
B<find -exec> offer some of the same possibilities as GNU B<parallel>.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
B<find -exec> only works on files. So processing other input (such as
|
|
|
|
hosts or URLs) will require creating these inputs as files. B<find
|
|
|
|
-exec> has no support for running commands in parallel.
|
|
|
|
|
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN make -j AND GNU Parallel
|
|
|
|
|
|
|
|
B<make -j> can run jobs in parallel, but requires a crafted Makefile
|
|
|
|
to do this. That results in extra quoting to get filename containing
|
|
|
|
newline to work correctly.
|
|
|
|
|
|
|
|
B<make -j> has no support for grouping the output, therefore output
|
|
|
|
may run together, e.g. the first half of a line is from one process
|
|
|
|
and the last half of the line is from another process. The example
|
|
|
|
B<Parallel grep> cannot be done reliably with B<make -j> because of
|
|
|
|
this.
|
|
|
|
|
|
|
|
(Very early versions of GNU B<parallel> were coincidently implemented
|
|
|
|
using B<make -j>).
|
|
|
|
|
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN ppss AND GNU Parallel
|
|
|
|
|
|
|
|
B<ppss> is also a tool for running jobs in parallel.
|
|
|
|
|
|
|
|
The output of B<ppss> is status information and thus not useful for
|
|
|
|
using as input for another command. The output from the jobs are put
|
|
|
|
into files.
|
|
|
|
|
|
|
|
The argument replace string ($ITEM) cannot be changed. Arguments must
|
|
|
|
be quoted - thus arguments containing special characters (space '"&!*)
|
|
|
|
may cause problems. More than one argument is not supported. File
|
|
|
|
names containing newlines are not processed correctly. When reading
|
2011-04-07 19:54:02 +00:00
|
|
|
input from a file null cannot be used as a terminator. B<ppss> needs
|
|
|
|
to read the whole input file before starting any jobs.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Output and status information is stored in ppss_dir and thus requires
|
|
|
|
cleanup when completed. If the dir is not removed before running
|
|
|
|
B<ppss> again it may cause nothing to happen as B<ppss> thinks the
|
|
|
|
task is already done. GNU B<parallel> will normally not need cleaning
|
|
|
|
up if running locally and will only need cleaning up if stopped
|
|
|
|
abnormally and running remote (B<--cleanup> may not complete if
|
|
|
|
stopped abnormally). The example B<Parallel grep> would require extra
|
|
|
|
postprocessing if written using B<ppss>.
|
|
|
|
|
|
|
|
For remote systems PPSS requires 3 steps: config, deploy, and
|
|
|
|
start. GNU B<parallel> only requires one step.
|
|
|
|
|
|
|
|
=head3 EXAMPLES FROM ppss MANUAL
|
|
|
|
|
|
|
|
Here are the examples from B<ppss>'s manual page with the equivalent
|
|
|
|
using GNU B<parallel>:
|
|
|
|
|
|
|
|
B<1> ./ppss.sh standalone -d /path/to/files -c 'gzip '
|
|
|
|
|
2011-02-23 15:22:08 +00:00
|
|
|
B<1> find /path/to/files -type f | parallel gzip
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
B<2> ./ppss.sh standalone -d /path/to/files -c 'cp "$ITEM" /destination/dir '
|
|
|
|
|
2011-02-23 15:22:08 +00:00
|
|
|
B<2> find /path/to/files -type f | parallel cp {} /destination/dir
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
B<3> ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q '
|
|
|
|
|
|
|
|
B<3> parallel -a list-of-urls.txt wget -q
|
|
|
|
|
|
|
|
B<4> ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q "$ITEM"'
|
|
|
|
|
|
|
|
B<4> parallel -a list-of-urls.txt wget -q {}
|
|
|
|
|
|
|
|
B<5> ./ppss config -C config.cfg -c 'encode.sh ' -d /source/dir -m
|
|
|
|
192.168.1.100 -u ppss -k ppss-key.key -S ./encode.sh -n nodes.txt -o
|
|
|
|
/some/output/dir --upload --download ; ./ppss deploy -C config.cfg ;
|
|
|
|
./ppss start -C config
|
|
|
|
|
|
|
|
B<5> # parallel does not use configs. If you want a different username put it in nodes.txt: user@hostname
|
|
|
|
|
|
|
|
B<5> find source/dir -type f | parallel --sshloginfile nodes.txt --trc {.}.mp3 lame -a {} -o {.}.mp3 --preset standard --quiet
|
|
|
|
|
|
|
|
B<6> ./ppss stop -C config.cfg
|
|
|
|
|
|
|
|
B<6> killall -TERM parallel
|
|
|
|
|
|
|
|
B<7> ./ppss pause -C config.cfg
|
|
|
|
|
|
|
|
B<7> Press: CTRL-Z or killall -SIGTSTP parallel
|
|
|
|
|
|
|
|
B<8> ./ppss continue -C config.cfg
|
|
|
|
|
|
|
|
B<8> Enter: fg or killall -SIGCONT parallel
|
|
|
|
|
|
|
|
B<9> ./ppss.sh status -C config.cfg
|
|
|
|
|
|
|
|
B<9> killall -SIGUSR2 parallel
|
|
|
|
|
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN pexec AND GNU Parallel
|
|
|
|
|
|
|
|
B<pexec> is also a tool for running jobs in parallel.
|
|
|
|
|
|
|
|
Here are the examples from B<pexec>'s info page with the equivalent
|
|
|
|
using GNU B<parallel>:
|
|
|
|
|
|
|
|
B<1> pexec -o sqrt-%s.dat -p "$(seq 10)" -e NUM -n 4 -c -- \
|
|
|
|
'echo "scale=10000;sqrt($NUM)" | bc'
|
|
|
|
|
|
|
|
B<1> seq 10 | parallel -j4 'echo "scale=10000;sqrt({})" | bc > sqrt-{}.dat'
|
|
|
|
|
|
|
|
B<2> pexec -p "$(ls myfiles*.ext)" -i %s -o %s.sort -- sort
|
|
|
|
|
|
|
|
B<2> ls myfiles*.ext | parallel sort {} ">{}.sort"
|
|
|
|
|
|
|
|
B<3> pexec -f image.list -n auto -e B -u star.log -c -- \
|
|
|
|
'fistar $B.fits -f 100 -F id,x,y,flux -o $B.star'
|
|
|
|
|
2011-02-23 15:22:08 +00:00
|
|
|
B<3> parallel -a image.list \
|
2010-12-06 23:30:08 +00:00
|
|
|
'fistar {}.fits -f 100 -F id,x,y,flux -o {}.star' 2>star.log
|
|
|
|
|
|
|
|
B<4> pexec -r *.png -e IMG -c -o - -- \
|
|
|
|
'convert $IMG ${IMG%.png}.jpeg ; "echo $IMG: done"'
|
|
|
|
|
|
|
|
B<4> ls *.png | parallel 'convert {} {.}.jpeg; echo {}: done'
|
|
|
|
|
|
|
|
B<5> pexec -r *.png -i %s -o %s.jpg -c 'pngtopnm | pnmtojpeg'
|
|
|
|
|
|
|
|
B<5> ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {}.jpg'
|
|
|
|
|
|
|
|
B<6> for p in *.png ; do echo ${p%.png} ; done | \
|
|
|
|
pexec -f - -i %s.png -o %s.jpg -c 'pngtopnm | pnmtojpeg'
|
|
|
|
|
|
|
|
B<6> ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {.}.jpg'
|
|
|
|
|
|
|
|
B<7> LIST=$(for p in *.png ; do echo ${p%.png} ; done)
|
|
|
|
pexec -r $LIST -i %s.png -o %s.jpg -c 'pngtopnm | pnmtojpeg'
|
|
|
|
|
|
|
|
B<7> ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {.}.jpg'
|
|
|
|
|
|
|
|
B<8> pexec -n 8 -r *.jpg -y unix -e IMG -c \
|
|
|
|
'pexec -j -m blockread -d $IMG | \
|
|
|
|
jpegtopnm | pnmscale 0.5 | pnmtojpeg | \
|
|
|
|
pexec -j -m blockwrite -s th_$IMG'
|
|
|
|
|
|
|
|
B<8> Combining GNU B<parallel> and GNU B<sem>.
|
|
|
|
|
|
|
|
B<8> ls *jpg | parallel -j8 'sem --id blockread cat {} | jpegtopnm |' \
|
|
|
|
'pnmscale 0.5 | pnmtojpeg | sem --id blockwrite cat > th_{}'
|
|
|
|
|
|
|
|
B<8> If reading and writing is done to the same disk, this may be
|
|
|
|
faster as only one process will be either reading or writing:
|
|
|
|
|
|
|
|
B<8> ls *jpg | parallel -j8 'sem --id diskio cat {} | jpegtopnm |' \
|
|
|
|
'pnmscale 0.5 | pnmtojpeg | sem --id diskio cat > th_{}'
|
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN xjobs AND GNU Parallel
|
|
|
|
|
|
|
|
B<xjobs> is also a tool for running jobs in parallel. It only supports
|
|
|
|
running jobs on your local computer.
|
|
|
|
|
|
|
|
B<xjobs> deals badly with special characters just like B<xargs>. See
|
|
|
|
the section B<DIFFERENCES BETWEEN xargs AND GNU Parallel>.
|
|
|
|
|
|
|
|
Here are the examples from B<xjobs>'s man page with the equivalent
|
|
|
|
using GNU B<parallel>:
|
|
|
|
|
|
|
|
B<1> ls -1 *.zip | xjobs unzip
|
|
|
|
|
|
|
|
B<1> ls *.zip | parallel unzip
|
|
|
|
|
|
|
|
B<2> ls -1 *.zip | xjobs -n unzip
|
|
|
|
|
|
|
|
B<2> ls *.zip | parallel unzip >/dev/null
|
|
|
|
|
|
|
|
B<3> find . -name '*.bak' | xjobs gzip
|
|
|
|
|
|
|
|
B<3> find . -name '*.bak' | parallel gzip
|
|
|
|
|
|
|
|
B<4> ls -1 *.jar | sed 's/\(.*\)/\1 > \1.idx/' | xjobs jar tf
|
|
|
|
|
|
|
|
B<4> ls *.jar | parallel jar tf {} '>' {}.idx
|
|
|
|
|
|
|
|
B<5> xjobs -s script
|
|
|
|
|
|
|
|
B<5> cat script | parallel
|
|
|
|
|
|
|
|
B<6> mkfifo /var/run/my_named_pipe;
|
|
|
|
xjobs -s /var/run/my_named_pipe &
|
|
|
|
echo unzip 1.zip >> /var/run/my_named_pipe;
|
|
|
|
echo tar cf /backup/myhome.tar /home/me >> /var/run/my_named_pipe
|
|
|
|
|
|
|
|
B<6> mkfifo /var/run/my_named_pipe;
|
|
|
|
cat /var/run/my_named_pipe | parallel &
|
|
|
|
echo unzip 1.zip >> /var/run/my_named_pipe;
|
|
|
|
echo tar cf /backup/myhome.tar /home/me >> /var/run/my_named_pipe
|
|
|
|
|
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN prll AND GNU Parallel
|
|
|
|
|
|
|
|
B<prll> is also a tool for running jobs in parallel. It does not
|
|
|
|
support running jobs on remote computers.
|
|
|
|
|
|
|
|
B<prll> encourages using BASH aliases and BASH functions instead of
|
2012-03-21 20:42:26 +00:00
|
|
|
scripts. GNU B<parallel> will never support running aliases (see why
|
|
|
|
http://www.perlmonks.org/index.pl?node_id=484296). However, scripts,
|
|
|
|
composed commands, or functions exported with B<export -f> work just
|
|
|
|
fine.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-05-26 21:19:58 +00:00
|
|
|
B<prll> generates a lot of status information on stderr (standard
|
|
|
|
error) which makes it harder to use the stderr (standard error) output
|
|
|
|
of the job directly as input for another program.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Here is the example from B<prll>'s man page with the equivalent
|
|
|
|
using GNU B<parallel>:
|
|
|
|
|
|
|
|
prll -s 'mogrify -flip $1' *.jpg
|
|
|
|
|
|
|
|
parallel mogrify -flip ::: *.jpg
|
|
|
|
|
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN dxargs AND GNU Parallel
|
|
|
|
|
|
|
|
B<dxargs> is also a tool for running jobs in parallel.
|
|
|
|
|
|
|
|
B<dxargs> does not deal well with more simultaneous jobs than SSHD's
|
2014-03-22 20:41:14 +00:00
|
|
|
MaxStartups. B<dxargs> is only built for remote run jobs, but does not
|
2010-12-06 23:30:08 +00:00
|
|
|
support transferring of files.
|
|
|
|
|
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN mdm/middleman AND GNU Parallel
|
|
|
|
|
|
|
|
middleman(mdm) is also a tool for running jobs in parallel.
|
|
|
|
|
|
|
|
Here are the shellscripts of http://mdm.berlios.de/usage.html ported
|
|
|
|
to GNU B<parallel>:
|
|
|
|
|
2011-03-09 15:23:53 +00:00
|
|
|
B<seq 19 | parallel buffon -o - | sort -n >>B< result>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-02-23 15:22:08 +00:00
|
|
|
B<cat files | parallel cmd>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-02-23 15:22:08 +00:00
|
|
|
B<find dir -execdir sem cmd {} \;>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN xapply AND GNU Parallel
|
|
|
|
|
|
|
|
B<xapply> can run jobs in parallel on the local computer.
|
|
|
|
|
|
|
|
Here are the examples from B<xapply>'s man page with the equivalent
|
|
|
|
using GNU B<parallel>:
|
|
|
|
|
|
|
|
B<1> xapply '(cd %1 && make all)' */
|
|
|
|
|
|
|
|
B<1> parallel 'cd {} && make all' ::: */
|
|
|
|
|
|
|
|
B<2> xapply -f 'diff %1 ../version5/%1' manifest | more
|
|
|
|
|
|
|
|
B<2> parallel diff {} ../version5/{} < manifest | more
|
|
|
|
|
|
|
|
B<3> xapply -p/dev/null -f 'diff %1 %2' manifest1 checklist1
|
|
|
|
|
2011-05-26 11:14:03 +00:00
|
|
|
B<3> parallel --xapply diff {1} {2} :::: manifest1 checklist1
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
B<4> xapply 'indent' *.c
|
|
|
|
|
|
|
|
B<4> parallel indent ::: *.c
|
|
|
|
|
|
|
|
B<5> find ~ksb/bin -type f ! -perm -111 -print | xapply -f -v 'chmod a+x' -
|
|
|
|
|
|
|
|
B<5> find ~ksb/bin -type f ! -perm -111 -print | parallel -v chmod a+x
|
|
|
|
|
|
|
|
B<6> find */ -... | fmt 960 1024 | xapply -f -i /dev/tty 'vi' -
|
|
|
|
|
|
|
|
B<6> sh <(find */ -... | parallel -s 1024 echo vi)
|
|
|
|
|
|
|
|
B<6> find */ -... | parallel -s 1024 -Xuj1 vi
|
|
|
|
|
|
|
|
B<7> find ... | xapply -f -5 -i /dev/tty 'vi' - - - - -
|
|
|
|
|
|
|
|
B<7> sh <(find ... |parallel -n5 echo vi)
|
|
|
|
|
|
|
|
B<7> find ... |parallel -n5 -uj1 vi
|
|
|
|
|
|
|
|
B<8> xapply -fn "" /etc/passwd
|
|
|
|
|
|
|
|
B<8> parallel -k echo < /etc/passwd
|
|
|
|
|
|
|
|
B<9> tr ':' '\012' < /etc/passwd | xapply -7 -nf 'chown %1 %6' - - - - - - -
|
|
|
|
|
|
|
|
B<9> tr ':' '\012' < /etc/passwd | parallel -N7 chown {1} {6}
|
|
|
|
|
|
|
|
B<10> xapply '[ -d %1/RCS ] || echo %1' */
|
|
|
|
|
|
|
|
B<10> parallel '[ -d {}/RCS ] || echo {}' ::: */
|
|
|
|
|
|
|
|
B<11> xapply -f '[ -f %1 ] && echo %1' List | ...
|
|
|
|
|
|
|
|
B<11> parallel '[ -f {} ] && echo {}' < List | ...
|
|
|
|
|
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN paexec AND GNU Parallel
|
|
|
|
|
|
|
|
B<paexec> can run jobs in parallel on both the local and remote computers.
|
|
|
|
|
|
|
|
B<paexec> requires commands to print a blank line as the last
|
|
|
|
output. This means you will have to write a wrapper for most programs.
|
|
|
|
|
|
|
|
B<paexec> has a job dependency facility so a job can depend on another
|
|
|
|
job to be executed successfully. Sort of a poor-man's B<make>.
|
|
|
|
|
|
|
|
Here are the examples from B<paexec>'s example catalog with the equivalent
|
|
|
|
using GNU B<parallel>:
|
|
|
|
|
|
|
|
=over 1
|
|
|
|
|
|
|
|
=item 1_div_X_run:
|
|
|
|
|
|
|
|
../../paexec -s -l -c "`pwd`/1_div_X_cmd" -n +1 <<EOF [...]
|
|
|
|
parallel echo {} '|' `pwd`/1_div_X_cmd <<EOF [...]
|
|
|
|
|
|
|
|
=item all_substr_run:
|
|
|
|
|
|
|
|
../../paexec -lp -c "`pwd`/all_substr_cmd" -n +3 <<EOF [...]
|
|
|
|
parallel echo {} '|' `pwd`/all_substr_cmd <<EOF [...]
|
|
|
|
|
|
|
|
=item cc_wrapper_run:
|
|
|
|
|
|
|
|
../../paexec -c "env CC=gcc CFLAGS=-O2 `pwd`/cc_wrapper_cmd" \
|
|
|
|
-n 'host1 host2' \
|
|
|
|
-t '/usr/bin/ssh -x' <<EOF [...]
|
|
|
|
parallel echo {} '|' "env CC=gcc CFLAGS=-O2 `pwd`/cc_wrapper_cmd" \
|
|
|
|
-S host1,host2 <<EOF [...]
|
|
|
|
# This is not exactly the same, but avoids the wrapper
|
|
|
|
parallel gcc -O2 -c -o {.}.o {} \
|
|
|
|
-S host1,host2 <<EOF [...]
|
|
|
|
|
|
|
|
=item toupper_run:
|
|
|
|
|
|
|
|
../../paexec -lp -c "`pwd`/toupper_cmd" -n +10 <<EOF [...]
|
|
|
|
parallel echo {} '|' ./toupper_cmd <<EOF [...]
|
|
|
|
# Without the wrapper:
|
|
|
|
parallel echo {} '| awk {print\ toupper\(\$0\)}' <<EOF [...]
|
|
|
|
|
|
|
|
=back
|
|
|
|
|
2014-03-03 18:26:19 +00:00
|
|
|
=head2 DIFFERENCES BETWEEN map AND GNU Parallel
|
|
|
|
|
|
|
|
B<map> sees it as a feature to have less features and in doing so it
|
|
|
|
also handles corner cases incorrectly. A lot of GNU B<parallel>'s code
|
|
|
|
is to handle corner cases correctly on every platform, so you will not
|
|
|
|
get a nasty surprise if a user for example saves a file called: I<My
|
|
|
|
brother's 12" records.txt>
|
|
|
|
|
|
|
|
B<map>'s example showing how to deal with special characters fails on
|
|
|
|
special characters:
|
|
|
|
|
|
|
|
echo "The Cure" > My\ brother\'s\ 12\"\ records
|
|
|
|
|
|
|
|
ls | map 'echo -n `gzip < "%" | wc -c`; echo -n '*100/'; wc -c < "%"' | bc
|
|
|
|
|
|
|
|
It works with GNU B<parallel>:
|
|
|
|
|
|
|
|
ls | parallel 'echo -n `gzip < {} | wc -c`; echo -n '*100/'; wc -c < {}' | bc
|
|
|
|
|
|
|
|
And you can even get the file name prepended:
|
|
|
|
|
|
|
|
ls | parallel --tag '(echo -n `gzip < {} | wc -c`'*100/'; wc -c < {}) | bc'
|
|
|
|
|
|
|
|
B<map> has no support for grouping. So this gives the wrong results
|
|
|
|
without any warnings:
|
|
|
|
|
|
|
|
parallel perl -e '\$a=\"1{}\"x10000000\;print\ \$a,\"\\n\"' '>' {} ::: a b c d e f
|
|
|
|
ls -l a b c d e f
|
|
|
|
parallel -kP4 -n1 grep 1 > out.par ::: a b c d e f
|
|
|
|
map -p 4 'grep 1' a b c d e f > out.map-unbuf
|
|
|
|
map -p 4 'grep --line-buffered 1' a b c d e f > out.map-linebuf
|
|
|
|
map -p 1 'grep --line-buffered 1' a b c d e f > out.map-serial
|
|
|
|
ls -l out*
|
|
|
|
md5sum out*
|
|
|
|
|
|
|
|
The documentation shows a workaround, but not only does that mix
|
|
|
|
stdout (standard output) with stderr (standard error) it also fails
|
|
|
|
completely for certain jobs (and may even be considered less readable):
|
|
|
|
|
|
|
|
parallel echo -n {} ::: 1 2 3
|
|
|
|
|
|
|
|
map -p 4 'echo -n % 2>&1 | sed -e "s/^/$$:/"' 1 2 3 | sort | cut -f2- -d:
|
|
|
|
|
|
|
|
B<map> cannot handle bundled options: B<map -vp 0 echo this fails>
|
|
|
|
|
|
|
|
B<map> does not have an argument separator on the command line, but
|
|
|
|
uses the first argument as command. This makes quoting harder which again
|
|
|
|
may affect readability. Compare:
|
|
|
|
|
|
|
|
map -p 2 perl\\\ -ne\\\ \\\'/^\\\\S+\\\\s+\\\\S+\\\$/\\\ and\\\ print\\\ \\\$ARGV,\\\"\\\\n\\\"\\\' *
|
|
|
|
|
|
|
|
parallel -q perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"' ::: *
|
|
|
|
|
|
|
|
B<map> can do multiple arguments with context replace, but not without
|
|
|
|
context replace:
|
|
|
|
|
|
|
|
parallel --xargs echo 'BEGIN{'{}'}END' ::: 1 2 3
|
|
|
|
|
|
|
|
B<map> does not set exit value according to whether one of the jobs
|
|
|
|
failed:
|
|
|
|
|
|
|
|
parallel false ::: 1 || echo Job failed
|
|
|
|
|
|
|
|
map false 1 || echo Never run
|
|
|
|
|
|
|
|
B<map> requires Perl v5.10.0 making it harder to use on old systems.
|
|
|
|
|
|
|
|
B<map> has no way of using % in the command (GNU Parallel has -I to
|
|
|
|
specify another replacement string than {}).
|
|
|
|
|
|
|
|
By design B<map> is option incompatible with B<xargs>, it does not
|
|
|
|
have remote job execution, a structured way of saving results,
|
|
|
|
multiple input sources, progress indicator, configurable record
|
|
|
|
delimiter (only field delimiter), logging of jobs run with possibility
|
|
|
|
to resume, keeping the output in the same order as input, --pipe
|
|
|
|
processing, and dynamically timeouts.
|
|
|
|
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=head2 DIFFERENCES BETWEEN ClusterSSH AND GNU Parallel
|
|
|
|
|
|
|
|
ClusterSSH solves a different problem than GNU B<parallel>.
|
|
|
|
|
2012-03-21 20:42:26 +00:00
|
|
|
ClusterSSH opens a terminal window for each computer and using a
|
|
|
|
master window you can run the same command on all the computers. This
|
|
|
|
is typically used for administrating several computers that are almost
|
|
|
|
identical.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
GNU B<parallel> runs the same (or different) commands with different
|
2010-12-21 17:08:16 +00:00
|
|
|
arguments in parallel possibly using remote computers to help
|
|
|
|
computing. If more than one computer is listed in B<-S> GNU B<parallel> may
|
2010-12-06 23:30:08 +00:00
|
|
|
only use one of these (e.g. if there are 8 jobs to be run and one
|
2010-12-21 17:08:16 +00:00
|
|
|
computer has 8 cores).
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
GNU B<parallel> can be used as a poor-man's version of ClusterSSH:
|
|
|
|
|
2012-03-21 20:42:26 +00:00
|
|
|
B<parallel --nonall -S server-a,server-b do_stuff foo bar>
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
|
|
|
=head1 BUGS
|
|
|
|
|
|
|
|
=head2 Quoting of newline
|
|
|
|
|
|
|
|
Because of the way newline is quoted this will not work:
|
|
|
|
|
2011-03-14 16:37:30 +00:00
|
|
|
echo 1,2,3 | parallel -vkd, "echo 'a{}b'"
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-03-14 16:37:30 +00:00
|
|
|
However, these will all work:
|
|
|
|
|
|
|
|
echo 1,2,3 | parallel -vkd, echo a{}b
|
|
|
|
|
|
|
|
echo 1,2,3 | parallel -vkd, "echo 'a'{}'b'"
|
|
|
|
|
|
|
|
echo 1,2,3 | parallel -vkd, "echo 'a'"{}"'b'"
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
2012-02-26 01:14:46 +00:00
|
|
|
=head2 Speed
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2012-02-26 01:14:46 +00:00
|
|
|
=head3 Startup
|
|
|
|
|
2014-06-25 23:16:54 +00:00
|
|
|
GNU B<parallel> is slow at starting up - around 250 ms the first time
|
|
|
|
and 150 ms after that.
|
2012-02-26 01:14:46 +00:00
|
|
|
|
|
|
|
=head3 Job startup
|
|
|
|
|
2012-03-12 22:38:38 +00:00
|
|
|
Starting a job on the local machine takes around 3 ms. This can be a
|
|
|
|
big overhead if the job takes very few ms to run. Often you can group
|
|
|
|
small jobs together using B<-X> which will make the overhead less
|
2014-06-25 23:16:54 +00:00
|
|
|
significant. Or you can run multiple GNU B<parallel>s as described in
|
|
|
|
B<EXAMPLE: Speeding up fast jobs>.
|
2012-02-26 01:14:46 +00:00
|
|
|
|
|
|
|
Using B<--ungroup> the 3 ms can be lowered to around 2 ms.
|
|
|
|
|
|
|
|
=head3 SSH
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
When using multiple computers GNU B<parallel> opens B<ssh> connections
|
|
|
|
to them to figure out how many connections can be used reliably
|
2014-03-22 20:41:14 +00:00
|
|
|
simultaneously (Namely SSHD's MaxStartups). This test is done for each
|
2012-02-26 01:14:46 +00:00
|
|
|
host in serial, so if your B<--sshloginfile> contains many hosts it may
|
2010-12-06 23:30:08 +00:00
|
|
|
be slow.
|
|
|
|
|
2012-03-12 22:38:38 +00:00
|
|
|
If your jobs are short you may see that there are fewer jobs running
|
|
|
|
on the remove systems than expected. This is due to time spent logging
|
|
|
|
in and out. B<-M> may help here.
|
|
|
|
|
2012-02-26 01:14:46 +00:00
|
|
|
=head3 Disk access
|
|
|
|
|
|
|
|
A single disk can normally read data faster if it reads one file at a
|
|
|
|
time instead of reading a lot of files in parallel, as this will avoid
|
|
|
|
disk seeks. However, newer disk systems with multiple drives can read
|
|
|
|
faster if reading from multiple files in parallel.
|
|
|
|
|
|
|
|
If the jobs are of the form read-all-compute-all-write-all, so
|
|
|
|
everything is read before anything is written, it may be faster to
|
|
|
|
force only one disk access at the time:
|
|
|
|
|
|
|
|
sem --id diskio cat file | compute | sem --id diskio cat > file
|
|
|
|
|
|
|
|
If the jobs are of the form read-compute-write, so writing starts
|
|
|
|
before all reading is done, it may be faster to force only one reader
|
|
|
|
and writer at the time:
|
|
|
|
|
|
|
|
sem --id read cat file | compute | sem --id write cat > file
|
|
|
|
|
|
|
|
If the jobs are of the form read-compute-read-compute, it may be
|
|
|
|
faster to run more jobs in parallel than the system has CPUs, as some
|
|
|
|
of the jobs will be stuck waiting for disk access.
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=head2 --nice limits command length
|
|
|
|
|
|
|
|
The current implementation of B<--nice> is too pessimistic in the max
|
|
|
|
allowed command length. It only uses a little more than half of what
|
2012-02-26 01:14:46 +00:00
|
|
|
it could. This affects B<-X> and B<-m>. If this becomes a real problem for
|
2010-12-06 23:30:08 +00:00
|
|
|
you file a bug-report.
|
|
|
|
|
2011-04-23 12:01:22 +00:00
|
|
|
=head2 Aliases and functions do not work
|
|
|
|
|
|
|
|
If you get:
|
|
|
|
|
|
|
|
B<Can't exec "I<command>": No such file or directory>
|
|
|
|
|
|
|
|
or:
|
|
|
|
|
|
|
|
B<open3: exec of by I<command> failed>
|
|
|
|
|
|
|
|
it may be because I<command> is not known, but it could also be
|
2012-03-12 22:38:38 +00:00
|
|
|
because I<command> is an alias or a function. If it is a function you
|
|
|
|
need to B<export -f> the function first. An alias will, however, not
|
|
|
|
work (see why http://www.perlmonks.org/index.pl?node_id=484296), so
|
|
|
|
change your alias to a script.
|
2011-04-23 12:01:22 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
=head1 REPORTING BUGS
|
|
|
|
|
2011-03-14 16:37:30 +00:00
|
|
|
Report bugs to <bug-parallel@gnu.org> or
|
|
|
|
https://savannah.gnu.org/bugs/?func=additem&group=parallel
|
2010-12-06 23:30:08 +00:00
|
|
|
|
2011-08-09 20:00:31 +00:00
|
|
|
Your bug report should always include:
|
2011-03-20 21:40:12 +00:00
|
|
|
|
|
|
|
=over 2
|
|
|
|
|
|
|
|
=item *
|
|
|
|
|
2012-12-20 14:01:49 +00:00
|
|
|
The error message you get (if any).
|
|
|
|
|
|
|
|
=item *
|
|
|
|
|
2013-09-28 13:59:03 +00:00
|
|
|
The complete output of B<parallel --version>. If you are not running
|
|
|
|
the latest released version you should specify why you believe the
|
|
|
|
problem is not fixed in that version.
|
2011-03-20 21:40:12 +00:00
|
|
|
|
|
|
|
=item *
|
|
|
|
|
2013-02-17 23:59:59 +00:00
|
|
|
A complete example that others can run that shows the problem. This
|
2013-09-28 13:59:03 +00:00
|
|
|
should preferably be small and simple. A combination of B<yes>,
|
|
|
|
B<seq>, B<cat>, B<echo>, and B<sleep> can reproduce most errors. If
|
|
|
|
your example requires large files, see if you can make them by
|
|
|
|
something like B<seq 1000000> > B<file> or B<yes | head -n 10000000> >
|
|
|
|
B<file>. If your example requires remote execution, see if you can
|
2013-12-19 01:19:19 +00:00
|
|
|
use B<localhost> - maybe using another login.
|
2011-03-20 21:40:12 +00:00
|
|
|
|
2012-06-28 15:19:47 +00:00
|
|
|
=item *
|
|
|
|
|
|
|
|
The output of your example. If your problem is not easily reproduced
|
|
|
|
by others, the output might help them figure out the problem.
|
|
|
|
|
2013-09-28 13:59:03 +00:00
|
|
|
=item *
|
|
|
|
|
|
|
|
Whether you have watched the intro videos
|
|
|
|
(http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1), walked
|
|
|
|
through the tutorial (man parallel_tutorial), and read the EXAMPLE
|
|
|
|
section in the man page (man parallel - search for EXAMPLE:).
|
|
|
|
|
2011-03-20 21:40:12 +00:00
|
|
|
=back
|
|
|
|
|
2012-12-20 14:01:49 +00:00
|
|
|
If you suspect the error is dependent on your environment or
|
|
|
|
distribution, please see if you can reproduce the error on one of
|
|
|
|
these VirtualBox images:
|
2012-03-15 20:23:53 +00:00
|
|
|
http://sourceforge.net/projects/virtualboximage/files/
|
|
|
|
|
|
|
|
Specifying the name of your distribution is not enough as you may have
|
|
|
|
installed software that is not in the VirtualBox images.
|
2011-11-15 00:15:23 +00:00
|
|
|
|
2012-06-28 15:19:47 +00:00
|
|
|
If you cannot reproduce the error on any of the VirtualBox images
|
|
|
|
above, you should assume the debugging will be done through you. That
|
|
|
|
will put more burden on you and it is extra important you give any
|
2013-12-21 00:49:17 +00:00
|
|
|
information that help. In general the problem will be fixed faster and
|
|
|
|
with less work for you if you can reproduce the error on a VirtualBox.
|
2012-06-28 15:19:47 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
=head1 AUTHOR
|
|
|
|
|
2011-10-14 22:21:23 +00:00
|
|
|
When using GNU B<parallel> for a publication please cite:
|
2011-03-14 16:37:30 +00:00
|
|
|
|
|
|
|
O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login:
|
|
|
|
The USENIX Magazine, February 2011:42-47.
|
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
Copyright (C) 2007-10-18 Ole Tange, http://ole.tange.dk
|
|
|
|
|
|
|
|
Copyright (C) 2008,2009,2010 Ole Tange, http://ole.tange.dk
|
|
|
|
|
2014-03-24 14:49:49 +00:00
|
|
|
Copyright (C) 2010,2011,2012,2013,2014 Ole Tange, http://ole.tange.dk
|
|
|
|
and Free Software Foundation, Inc.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
Parts of the manual concerning B<xargs> compatibility is inspired by
|
|
|
|
the manual of B<xargs> from GNU findutils 4.4.2.
|
|
|
|
|
|
|
|
|
|
|
|
=head1 LICENSE
|
|
|
|
|
2013-02-10 12:32:50 +00:00
|
|
|
Copyright (C) 2007,2008,2009,2010,2011,2012,2013 Free Software Foundation,
|
2012-01-07 03:17:13 +00:00
|
|
|
Inc.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
This program is free software; you can redistribute it and/or modify
|
|
|
|
it under the terms of the GNU General Public License as published by
|
|
|
|
the Free Software Foundation; either version 3 of the License, or
|
|
|
|
at your option any later version.
|
|
|
|
|
|
|
|
This program is distributed in the hope that it will be useful,
|
|
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
GNU General Public License for more details.
|
|
|
|
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
|
|
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
|
|
|
|
|
|
|
=head2 Documentation license I
|
|
|
|
|
|
|
|
Permission is granted to copy, distribute and/or modify this documentation
|
|
|
|
under the terms of the GNU Free Documentation License, Version 1.3 or
|
|
|
|
any later version published by the Free Software Foundation; with no
|
|
|
|
Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
|
|
|
|
Texts. A copy of the license is included in the file fdl.txt.
|
|
|
|
|
|
|
|
=head2 Documentation license II
|
|
|
|
|
|
|
|
You are free:
|
|
|
|
|
|
|
|
=over 9
|
|
|
|
|
|
|
|
=item B<to Share>
|
|
|
|
|
|
|
|
to copy, distribute and transmit the work
|
|
|
|
|
|
|
|
=item B<to Remix>
|
|
|
|
|
|
|
|
to adapt the work
|
|
|
|
|
|
|
|
=back
|
|
|
|
|
|
|
|
Under the following conditions:
|
|
|
|
|
|
|
|
=over 9
|
|
|
|
|
|
|
|
=item B<Attribution>
|
|
|
|
|
|
|
|
You must attribute the work in the manner specified by the author or
|
|
|
|
licensor (but not in any way that suggests that they endorse you or
|
|
|
|
your use of the work).
|
|
|
|
|
|
|
|
=item B<Share Alike>
|
|
|
|
|
|
|
|
If you alter, transform, or build upon this work, you may distribute
|
|
|
|
the resulting work only under the same, similar or a compatible
|
|
|
|
license.
|
|
|
|
|
|
|
|
=back
|
|
|
|
|
|
|
|
With the understanding that:
|
|
|
|
|
|
|
|
=over 9
|
|
|
|
|
|
|
|
=item B<Waiver>
|
|
|
|
|
|
|
|
Any of the above conditions can be waived if you get permission from
|
|
|
|
the copyright holder.
|
|
|
|
|
|
|
|
=item B<Public Domain>
|
|
|
|
|
|
|
|
Where the work or any of its elements is in the public domain under
|
|
|
|
applicable law, that status is in no way affected by the license.
|
|
|
|
|
|
|
|
=item B<Other Rights>
|
|
|
|
|
|
|
|
In no way are any of the following rights affected by the license:
|
|
|
|
|
|
|
|
=over 2
|
|
|
|
|
|
|
|
=item *
|
|
|
|
|
|
|
|
Your fair dealing or fair use rights, or other applicable
|
|
|
|
copyright exceptions and limitations;
|
|
|
|
|
|
|
|
=item *
|
|
|
|
|
|
|
|
The author's moral rights;
|
|
|
|
|
|
|
|
=item *
|
|
|
|
|
|
|
|
Rights other persons may have either in the work itself or in
|
|
|
|
how the work is used, such as publicity or privacy rights.
|
|
|
|
|
|
|
|
=back
|
|
|
|
|
|
|
|
=back
|
|
|
|
|
|
|
|
=over 9
|
|
|
|
|
|
|
|
=item B<Notice>
|
|
|
|
|
|
|
|
For any reuse or distribution, you must make clear to others the
|
|
|
|
license terms of this work.
|
|
|
|
|
|
|
|
=back
|
|
|
|
|
|
|
|
A copy of the full license is included in the file as cc-by-sa.txt.
|
|
|
|
|
2011-04-07 19:54:02 +00:00
|
|
|
|
2010-12-06 23:30:08 +00:00
|
|
|
=head1 DEPENDENCIES
|
|
|
|
|
|
|
|
GNU B<parallel> uses Perl, and the Perl modules Getopt::Long,
|
|
|
|
IPC::Open3, Symbol, IO::File, POSIX, and File::Temp. For remote usage
|
2011-04-07 19:54:02 +00:00
|
|
|
it also uses rsync with ssh.
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
|
|
|
|
=head1 SEE ALSO
|
|
|
|
|
2013-08-14 18:11:00 +00:00
|
|
|
B<ssh>(1), B<rsync>(1), B<find>(1), B<xargs>(1), B<dirname>(1),
|
2011-04-27 15:12:35 +00:00
|
|
|
B<make>(1), B<pexec>(1), B<ppss>(1), B<xjobs>(1), B<prll>(1),
|
2013-08-14 18:11:00 +00:00
|
|
|
B<dxargs>(1), B<mdm>(1)
|
2010-12-06 23:30:08 +00:00
|
|
|
|
|
|
|
=cut
|
2014-06-23 00:10:53 +00:00
|
|
|
|