mirror of
https://git.savannah.gnu.org/git/parallel.git
synced 2024-11-22 05:57:54 +00:00
3167 lines
73 KiB
Plaintext
3167 lines
73 KiB
Plaintext
#!/usr/bin/perl -w
|
|
|
|
=head1 GNU Parallel Tutorial
|
|
|
|
This tutorial shows off much of GNU B<parallel>'s functionality. The
|
|
tutorial is meant to learn the options in and syntax of GNU
|
|
B<parallel>. The tutorial is B<not> to show realistic examples from the
|
|
real world.
|
|
|
|
=head2 Reader's guide
|
|
|
|
If you prefer reading a book buy B<GNU Parallel 2018> at
|
|
http://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html
|
|
or download it at: https://doi.org/10.5281/zenodo.1146014
|
|
|
|
Otherwise start by watching the intro videos for a quick introduction:
|
|
http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
|
|
|
|
Then browse through the B<EXAMPLE>s after the list of B<OPTIONS> in
|
|
B<man parallel> (Use B<LESS=+/EXAMPLE: man parallel>). That will give
|
|
you an idea of what GNU B<parallel> is capable of.
|
|
|
|
If you want to dive even deeper: spend a couple of hours walking
|
|
through the tutorial (B<man parallel_tutorial>). Your command line
|
|
will love you for it.
|
|
|
|
Finally you may want to look at the rest of the manual (B<man
|
|
parallel>) if you have special needs not already covered.
|
|
|
|
If you want to know the design decisions behind GNU B<parallel>, try:
|
|
B<man parallel_design>. This is also a good intro if you intend to
|
|
change GNU B<parallel>.
|
|
|
|
|
|
|
|
=head1 Prerequisites
|
|
|
|
To run this tutorial you must have the following:
|
|
|
|
=over 9
|
|
|
|
=item parallel >= version 20160822
|
|
|
|
Install the newest version using your package manager (recommended for
|
|
security reasons), the way described in README, or with this command:
|
|
|
|
$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
|
|
fetch -o - http://pi.dk/3 ) > install.sh
|
|
$ sha1sum install.sh
|
|
12345678 3374ec53 bacb199b 245af2dd a86df6c9
|
|
$ md5sum install.sh
|
|
029a9ac0 6e8b5bc6 052eac57 b2c3c9ca
|
|
$ sha512sum install.sh
|
|
40f53af6 9e20dae5 713ba06c f517006d 9897747b ed8a4694 b1acba1b 1464beb4
|
|
60055629 3f2356f3 3e9c4e3c 76e3f3af a9db4b32 bd33322b 975696fc e6b23cfb
|
|
$ bash install.sh
|
|
|
|
This will also install the newest version of the tutorial which you
|
|
can see by running this:
|
|
|
|
man parallel_tutorial
|
|
|
|
Most of the tutorial will work on older versions, too.
|
|
|
|
|
|
=item abc-file:
|
|
|
|
The file can be generated by this command:
|
|
|
|
parallel -k echo ::: A B C > abc-file
|
|
|
|
=item def-file:
|
|
|
|
The file can be generated by this command:
|
|
|
|
parallel -k echo ::: D E F > def-file
|
|
|
|
=item abc0-file:
|
|
|
|
The file can be generated by this command:
|
|
|
|
perl -e 'printf "A\0B\0C\0"' > abc0-file
|
|
|
|
=item abc_-file:
|
|
|
|
The file can be generated by this command:
|
|
|
|
perl -e 'printf "A_B_C_"' > abc_-file
|
|
|
|
=item tsv-file.tsv
|
|
|
|
The file can be generated by this command:
|
|
|
|
perl -e 'printf "f1\tf2\nA\tB\nC\tD\n"' > tsv-file.tsv
|
|
|
|
=item num8
|
|
|
|
The file can be generated by this command:
|
|
|
|
perl -e 'for(1..8){print "$_\n"}' > num8
|
|
|
|
=item num128
|
|
|
|
The file can be generated by this command:
|
|
|
|
perl -e 'for(1..128){print "$_\n"}' > num128
|
|
|
|
=item num30000
|
|
|
|
The file can be generated by this command:
|
|
|
|
perl -e 'for(1..30000){print "$_\n"}' > num30000
|
|
|
|
=item num1000000
|
|
|
|
The file can be generated by this command:
|
|
|
|
perl -e 'for(1..1000000){print "$_\n"}' > num1000000
|
|
|
|
=item num_%header
|
|
|
|
The file can be generated by this command:
|
|
|
|
(echo %head1; echo %head2; \
|
|
perl -e 'for(1..10){print "$_\n"}') > num_%header
|
|
|
|
=item fixedlen
|
|
|
|
The file can be generated by this command:
|
|
|
|
perl -e 'print "HHHHAAABBBCCC"' > fixedlen
|
|
|
|
=item For remote running: ssh login on 2 servers with no password in
|
|
$SERVER1 and $SERVER2 must work.
|
|
|
|
SERVER1=server.example.com
|
|
SERVER2=server2.example.net
|
|
|
|
So you must be able to do this without entering a password:
|
|
|
|
ssh $SERVER1 echo works
|
|
ssh $SERVER2 echo works
|
|
|
|
It can be setup by running 'ssh-keygen -t dsa; ssh-copy-id $SERVER1'
|
|
and using an empty passphrase, or you can use B<ssh-agent>.
|
|
|
|
=back
|
|
|
|
|
|
=head1 Input sources
|
|
|
|
GNU B<parallel> reads input from input sources. These can be files, the
|
|
command line, and stdin (standard input or a pipe).
|
|
|
|
=head2 A single input source
|
|
|
|
Input can be read from the command line:
|
|
|
|
parallel echo ::: A B C
|
|
|
|
Output (the order may be different because the jobs are run in
|
|
parallel):
|
|
|
|
A
|
|
B
|
|
C
|
|
|
|
The input source can be a file:
|
|
|
|
parallel -a abc-file echo
|
|
|
|
Output: Same as above.
|
|
|
|
STDIN (standard input) can be the input source:
|
|
|
|
cat abc-file | parallel echo
|
|
|
|
Output: Same as above.
|
|
|
|
|
|
=head2 Multiple input sources
|
|
|
|
GNU B<parallel> can take multiple input sources given on the command
|
|
line. GNU B<parallel> then generates all combinations of the input
|
|
sources:
|
|
|
|
parallel echo ::: A B C ::: D E F
|
|
|
|
Output (the order may be different):
|
|
|
|
A D
|
|
A E
|
|
A F
|
|
B D
|
|
B E
|
|
B F
|
|
C D
|
|
C E
|
|
C F
|
|
|
|
The input sources can be files:
|
|
|
|
parallel -a abc-file -a def-file echo
|
|
|
|
Output: Same as above.
|
|
|
|
STDIN (standard input) can be one of the input sources using B<->:
|
|
|
|
cat abc-file | parallel -a - -a def-file echo
|
|
|
|
Output: Same as above.
|
|
|
|
Instead of B<-a> files can be given after B<::::>:
|
|
|
|
cat abc-file | parallel echo :::: - def-file
|
|
|
|
Output: Same as above.
|
|
|
|
::: and :::: can be mixed:
|
|
|
|
parallel echo ::: A B C :::: def-file
|
|
|
|
Output: Same as above.
|
|
|
|
=head3 Linking arguments from input sources
|
|
|
|
With B<--link> you can link the input sources and get one argument
|
|
from each input source:
|
|
|
|
parallel --link echo ::: A B C ::: D E F
|
|
|
|
Output (the order may be different):
|
|
|
|
A D
|
|
B E
|
|
C F
|
|
|
|
If one of the input sources is too short, its values will wrap:
|
|
|
|
parallel --link echo ::: A B C D E ::: F G
|
|
|
|
Output (the order may be different):
|
|
|
|
A F
|
|
B G
|
|
C F
|
|
D G
|
|
E F
|
|
|
|
For more flexible linking you can use B<:::+> and B<::::+>. They work
|
|
like B<:::> and B<::::> except they link the previous input source to
|
|
this input source.
|
|
|
|
This will link ABC to GHI:
|
|
|
|
parallel echo :::: abc-file :::+ G H I :::: def-file
|
|
|
|
Output (the order may be different):
|
|
|
|
A G D
|
|
A G E
|
|
A G F
|
|
B H D
|
|
B H E
|
|
B H F
|
|
C I D
|
|
C I E
|
|
C I F
|
|
|
|
This will link GHI to DEF:
|
|
|
|
parallel echo :::: abc-file ::: G H I ::::+ def-file
|
|
|
|
Output (the order may be different):
|
|
|
|
A G D
|
|
A H E
|
|
A I F
|
|
B G D
|
|
B H E
|
|
B I F
|
|
C G D
|
|
C H E
|
|
C I F
|
|
|
|
If one of the input sources is too short when using B<:::+> or
|
|
B<::::+>, the rest will be ignored:
|
|
|
|
parallel echo ::: A B C D E :::+ F G
|
|
|
|
Output (the order may be different):
|
|
|
|
A F
|
|
B G
|
|
|
|
|
|
=head2 Changing the argument separator.
|
|
|
|
GNU B<parallel> can use other separators than B<:::> or B<::::>. This is
|
|
typically useful if B<:::> or B<::::> is used in the command to run:
|
|
|
|
parallel --arg-sep ,, echo ,, A B C :::: def-file
|
|
|
|
Output (the order may be different):
|
|
|
|
A D
|
|
A E
|
|
A F
|
|
B D
|
|
B E
|
|
B F
|
|
C D
|
|
C E
|
|
C F
|
|
|
|
Changing the argument file separator:
|
|
|
|
parallel --arg-file-sep // echo ::: A B C // def-file
|
|
|
|
Output: Same as above.
|
|
|
|
|
|
=head2 Changing the argument delimiter
|
|
|
|
GNU B<parallel> will normally treat a full line as a single argument: It
|
|
uses B<\n> as argument delimiter. This can be changed with B<-d>:
|
|
|
|
parallel -d _ echo :::: abc_-file
|
|
|
|
Output (the order may be different):
|
|
|
|
A
|
|
B
|
|
C
|
|
|
|
NUL can be given as B<\0>:
|
|
|
|
parallel -d '\0' echo :::: abc0-file
|
|
|
|
Output: Same as above.
|
|
|
|
A shorthand for B<-d '\0'> is B<-0> (this will often be used to read files
|
|
from B<find ... -print0>):
|
|
|
|
parallel -0 echo :::: abc0-file
|
|
|
|
Output: Same as above.
|
|
|
|
=head2 End-of-file value for input source
|
|
|
|
GNU B<parallel> can stop reading when it encounters a certain value:
|
|
|
|
parallel -E stop echo ::: A B stop C D
|
|
|
|
Output:
|
|
|
|
A
|
|
B
|
|
|
|
=head2 Skipping empty lines
|
|
|
|
Using B<--no-run-if-empty> GNU B<parallel> will skip empty lines.
|
|
|
|
(echo 1; echo; echo 2) | parallel --no-run-if-empty echo
|
|
|
|
Output:
|
|
|
|
1
|
|
2
|
|
|
|
|
|
=head1 Building the command line
|
|
|
|
=head2 No command means arguments are commands
|
|
|
|
If no command is given after parallel the arguments themselves are
|
|
treated as commands:
|
|
|
|
parallel ::: ls 'echo foo' pwd
|
|
|
|
Output (the order may be different):
|
|
|
|
[list of files in current dir]
|
|
foo
|
|
[/path/to/current/working/dir]
|
|
|
|
The command can be a script, a binary or a Bash function if the function is
|
|
exported using B<export -f>:
|
|
|
|
# Only works in Bash
|
|
my_func() {
|
|
echo in my_func $1
|
|
}
|
|
export -f my_func
|
|
parallel my_func ::: 1 2 3
|
|
|
|
Output (the order may be different):
|
|
|
|
in my_func 1
|
|
in my_func 2
|
|
in my_func 3
|
|
|
|
=head2 Replacement strings
|
|
|
|
=head3 The 7 predefined replacement strings
|
|
|
|
GNU B<parallel> has several replacement strings. If no replacement
|
|
strings are used the default is to append B<{}>:
|
|
|
|
parallel echo ::: A/B.C
|
|
|
|
Output:
|
|
|
|
A/B.C
|
|
|
|
The default replacement string is B<{}>:
|
|
|
|
parallel echo {} ::: A/B.C
|
|
|
|
Output:
|
|
|
|
A/B.C
|
|
|
|
The replacement string B<{.}> removes the extension:
|
|
|
|
parallel echo {.} ::: A/B.C
|
|
|
|
Output:
|
|
|
|
A/B
|
|
|
|
The replacement string B<{/}> removes the path:
|
|
|
|
parallel echo {/} ::: A/B.C
|
|
|
|
Output:
|
|
|
|
B.C
|
|
|
|
The replacement string B<{//}> keeps only the path:
|
|
|
|
parallel echo {//} ::: A/B.C
|
|
|
|
Output:
|
|
|
|
A
|
|
|
|
The replacement string B<{/.}> removes the path and the extension:
|
|
|
|
parallel echo {/.} ::: A/B.C
|
|
|
|
Output:
|
|
|
|
B
|
|
|
|
The replacement string B<{#}> gives the job number:
|
|
|
|
parallel echo {#} ::: A B C
|
|
|
|
Output (the order may be different):
|
|
|
|
1
|
|
2
|
|
3
|
|
|
|
The replacement string B<{%}> gives the job slot number (between 1 and
|
|
number of jobs to run in parallel):
|
|
|
|
parallel -j 2 echo {%} ::: A B C
|
|
|
|
Output (the order may be different and 1 and 2 may be swapped):
|
|
|
|
1
|
|
2
|
|
1
|
|
|
|
=head3 Changing the replacement strings
|
|
|
|
The replacement string B<{}> can be changed with B<-I>:
|
|
|
|
parallel -I ,, echo ,, ::: A/B.C
|
|
|
|
Output:
|
|
|
|
A/B.C
|
|
|
|
The replacement string B<{.}> can be changed with B<--extensionreplace>:
|
|
|
|
parallel --extensionreplace ,, echo ,, ::: A/B.C
|
|
|
|
Output:
|
|
|
|
A/B
|
|
|
|
The replacement string B<{/}> can be replaced with B<--basenamereplace>:
|
|
|
|
parallel --basenamereplace ,, echo ,, ::: A/B.C
|
|
|
|
Output:
|
|
|
|
B.C
|
|
|
|
The replacement string B<{//}> can be changed with B<--dirnamereplace>:
|
|
|
|
parallel --dirnamereplace ,, echo ,, ::: A/B.C
|
|
|
|
Output:
|
|
|
|
A
|
|
|
|
The replacement string B<{/.}> can be changed with B<--basenameextensionreplace>:
|
|
|
|
parallel --basenameextensionreplace ,, echo ,, ::: A/B.C
|
|
|
|
Output:
|
|
|
|
B
|
|
|
|
The replacement string B<{#}> can be changed with B<--seqreplace>:
|
|
|
|
parallel --seqreplace ,, echo ,, ::: A B C
|
|
|
|
Output (the order may be different):
|
|
|
|
1
|
|
2
|
|
3
|
|
|
|
The replacement string B<{%}> can be changed with B<--slotreplace>:
|
|
|
|
parallel -j2 --slotreplace ,, echo ,, ::: A B C
|
|
|
|
Output (the order may be different and 1 and 2 may be swapped):
|
|
|
|
1
|
|
2
|
|
1
|
|
|
|
=head3 Perl expression replacement string
|
|
|
|
When predefined replacement strings are not flexible enough a perl
|
|
expression can be used instead. One example is to remove two
|
|
extensions: foo.tar.gz becomes foo
|
|
|
|
parallel echo '{= s:\.[^.]+$::;s:\.[^.]+$::; =}' ::: foo.tar.gz
|
|
|
|
Output:
|
|
|
|
foo
|
|
|
|
In B<{= =}> you can access all of GNU B<parallel>'s internal functions
|
|
and variables. A few are worth mentioning.
|
|
|
|
B<total_jobs()> returns the total number of jobs:
|
|
|
|
parallel echo Job {#} of {= '$_=total_jobs()' =} ::: {1..5}
|
|
|
|
Output:
|
|
|
|
Job 1 of 5
|
|
Job 2 of 5
|
|
Job 3 of 5
|
|
Job 4 of 5
|
|
Job 5 of 5
|
|
|
|
B<Q(...)> shell quotes the string:
|
|
|
|
parallel echo {} shell quoted is {= '$_=Q($_)' =} ::: '*/!#$'
|
|
|
|
Output:
|
|
|
|
*/!#$ shell quoted is \*/\!\#\$
|
|
|
|
B<skip()> skips the job:
|
|
|
|
parallel echo {= 'if($_==3) { skip() }' =} ::: {1..5}
|
|
|
|
Output:
|
|
|
|
1
|
|
2
|
|
4
|
|
5
|
|
|
|
B<@arg> contains the input source variables:
|
|
|
|
parallel echo {= 'if($arg[1]==$arg[2]) { skip() }' =} \
|
|
::: {1..3} ::: {1..3}
|
|
|
|
Output:
|
|
|
|
1 2
|
|
1 3
|
|
2 1
|
|
2 3
|
|
3 1
|
|
3 2
|
|
|
|
If the strings B<{=> and B<=}> cause problems they can be replaced with B<--parens>:
|
|
|
|
parallel --parens ,,,, echo ',, s:\.[^.]+$::;s:\.[^.]+$::; ,,' \
|
|
::: foo.tar.gz
|
|
|
|
Output:
|
|
|
|
foo
|
|
|
|
To define a shorthand replacement string use B<--rpl>:
|
|
|
|
parallel --rpl '.. s:\.[^.]+$::;s:\.[^.]+$::;' echo '..' \
|
|
::: foo.tar.gz
|
|
|
|
Output: Same as above.
|
|
|
|
If the shorthand starts with B<{> it can be used as a positional
|
|
replacement string, too:
|
|
|
|
parallel --rpl '{..} s:\.[^.]+$::;s:\.[^.]+$::;' echo '{..}'
|
|
::: foo.tar.gz
|
|
|
|
Output: Same as above.
|
|
|
|
If the shorthand contains matching parenthesis the replacement string
|
|
becomes a dynamic replacement string and the string in the parenthesis
|
|
can be accessed as $$1. If there are multiple matching parenthesis,
|
|
the matched strings can be accessed using $$2, $$3 and so on.
|
|
|
|
You can think of this as giving arguments to the replacement
|
|
string. Here we give the argument B<.tar.gz> to the replacement string
|
|
B<{%I<string>}> which removes I<string>:
|
|
|
|
parallel --rpl '{%(.+?)} s/$$1$//;' echo {%.tar.gz}.zip ::: foo.tar.gz
|
|
|
|
Output:
|
|
|
|
foo.zip
|
|
|
|
Here we give the two arguments B<tar.gz> and B<zip> to the replacement
|
|
string B<{/I<string1>/I<string2>}> which replaces I<string1> with
|
|
I<string2>:
|
|
|
|
parallel --rpl '{/(.+?)/(.*?)} s/$$1/$$2/;' echo {/tar.gz/zip} \
|
|
::: foo.tar.gz
|
|
|
|
Output:
|
|
|
|
foo.zip
|
|
|
|
|
|
GNU B<parallel>'s 7 replacement strings are implemented as this:
|
|
|
|
--rpl '{} '
|
|
--rpl '{#} $_=$job->seq()'
|
|
--rpl '{%} $_=$job->slot()'
|
|
--rpl '{/} s:.*/::'
|
|
--rpl '{//} $Global::use{"File::Basename"} ||=
|
|
eval "use File::Basename; 1;"; $_ = dirname($_);'
|
|
--rpl '{/.} s:.*/::; s:\.[^/.]+$::;'
|
|
--rpl '{.} s:\.[^/.]+$::'
|
|
|
|
=head3 Positional replacement strings
|
|
|
|
With multiple input sources the argument from the individual input
|
|
sources can be accessed with S<< B<{>numberB<}> >>:
|
|
|
|
parallel echo {1} and {2} ::: A B ::: C D
|
|
|
|
Output (the order may be different):
|
|
|
|
A and C
|
|
A and D
|
|
B and C
|
|
B and D
|
|
|
|
The positional replacement strings can also be modified using B</>, B<//>, B</.>, and B<.>:
|
|
|
|
parallel echo /={1/} //={1//} /.={1/.} .={1.} ::: A/B.C D/E.F
|
|
|
|
Output (the order may be different):
|
|
|
|
/=B.C //=A /.=B .=A/B
|
|
/=E.F //=D /.=E .=D/E
|
|
|
|
If a position is negative, it will refer to the input source counted
|
|
from behind:
|
|
|
|
parallel echo 1={1} 2={2} 3={3} -1={-1} -2={-2} -3={-3} \
|
|
::: A B ::: C D ::: E F
|
|
|
|
Output (the order may be different):
|
|
|
|
1=A 2=C 3=E -1=E -2=C -3=A
|
|
1=A 2=C 3=F -1=F -2=C -3=A
|
|
1=A 2=D 3=E -1=E -2=D -3=A
|
|
1=A 2=D 3=F -1=F -2=D -3=A
|
|
1=B 2=C 3=E -1=E -2=C -3=B
|
|
1=B 2=C 3=F -1=F -2=C -3=B
|
|
1=B 2=D 3=E -1=E -2=D -3=B
|
|
1=B 2=D 3=F -1=F -2=D -3=B
|
|
|
|
|
|
=head3 Positional perl expression replacement string
|
|
|
|
To use a perl expression as a positional replacement string simply
|
|
prepend the perl expression with number and space:
|
|
|
|
parallel echo '{=2 s:\.[^.]+$::;s:\.[^.]+$::; =} {1}' \
|
|
::: bar ::: foo.tar.gz
|
|
|
|
Output:
|
|
|
|
foo bar
|
|
|
|
If a shorthand defined using B<--rpl> starts with B<{> it can be used as
|
|
a positional replacement string, too:
|
|
|
|
parallel --rpl '{..} s:\.[^.]+$::;s:\.[^.]+$::;' echo '{2..} {1}' \
|
|
::: bar ::: foo.tar.gz
|
|
|
|
Output: Same as above.
|
|
|
|
|
|
=head3 Input from columns
|
|
|
|
The columns in a file can be bound to positional replacement strings
|
|
using B<--colsep>. Here the columns are separated by TAB (\t):
|
|
|
|
parallel --colsep '\t' echo 1={1} 2={2} :::: tsv-file.tsv
|
|
|
|
Output (the order may be different):
|
|
|
|
1=f1 2=f2
|
|
1=A 2=B
|
|
1=C 2=D
|
|
|
|
=head3 Header defined replacement strings
|
|
|
|
With B<--header> GNU B<parallel> will use the first value of the input
|
|
source as the name of the replacement string. Only the non-modified
|
|
version B<{}> is supported:
|
|
|
|
parallel --header : echo f1={f1} f2={f2} ::: f1 A B ::: f2 C D
|
|
|
|
Output (the order may be different):
|
|
|
|
f1=A f2=C
|
|
f1=A f2=D
|
|
f1=B f2=C
|
|
f1=B f2=D
|
|
|
|
It is useful with B<--colsep> for processing files with TAB separated values:
|
|
|
|
parallel --header : --colsep '\t' echo f1={f1} f2={f2} \
|
|
:::: tsv-file.tsv
|
|
|
|
Output (the order may be different):
|
|
|
|
f1=A f2=B
|
|
f1=C f2=D
|
|
|
|
=head3 More pre-defined replacement strings with --plus
|
|
|
|
B<--plus> adds the replacement strings B<{+/} {+.} {+..} {+...} {..} {...}
|
|
{/..} {/...} {##}>. The idea being that B<{+foo}> matches the opposite of B<{foo}>
|
|
and B<{}> = B<{+/}>/B<{/}> = B<{.}>.B<{+.}> = B<{+/}>/B<{/.}>.B<{+.}> = B<{..}>.B<{+..}> =
|
|
B<{+/}>/B<{/..}>.B<{+..}> = B<{...}>.B<{+...}> = B<{+/}>/B<{/...}>.B<{+...}>.
|
|
|
|
parallel --plus echo {} ::: dir/sub/file.ex1.ex2.ex3
|
|
parallel --plus echo {+/}/{/} ::: dir/sub/file.ex1.ex2.ex3
|
|
parallel --plus echo {.}.{+.} ::: dir/sub/file.ex1.ex2.ex3
|
|
parallel --plus echo {+/}/{/.}.{+.} ::: dir/sub/file.ex1.ex2.ex3
|
|
parallel --plus echo {..}.{+..} ::: dir/sub/file.ex1.ex2.ex3
|
|
parallel --plus echo {+/}/{/..}.{+..} ::: dir/sub/file.ex1.ex2.ex3
|
|
parallel --plus echo {...}.{+...} ::: dir/sub/file.ex1.ex2.ex3
|
|
parallel --plus echo {+/}/{/...}.{+...} ::: dir/sub/file.ex1.ex2.ex3
|
|
|
|
Output:
|
|
|
|
dir/sub/file.ex1.ex2.ex3
|
|
|
|
B<{##}> is simply the number of jobs:
|
|
|
|
parallel --plus echo Job {#} of {##} ::: {1..5}
|
|
|
|
Output:
|
|
|
|
Job 1 of 5
|
|
Job 2 of 5
|
|
Job 3 of 5
|
|
Job 4 of 5
|
|
Job 5 of 5
|
|
|
|
=head3 Dynamic replacement strings with --plus
|
|
|
|
B<--plus> also defines these dynamic replacement strings:
|
|
|
|
=over 19
|
|
|
|
=item B<{:-I<string>}>
|
|
|
|
Default value is I<string> if the argument is empty.
|
|
|
|
=item B<{:I<number>}>
|
|
|
|
Substring from I<number> till end of string.
|
|
|
|
=item B<{:I<number1>:I<number2>}>
|
|
|
|
Substring from I<number1> to I<number2>.
|
|
|
|
=item B<{#I<string>}>
|
|
|
|
If the argument starts with I<string>, remove it.
|
|
|
|
=item B<{%I<string>}>
|
|
|
|
If the argument ends with I<string>, remove it.
|
|
|
|
=item B<{/I<string1>/I<string2>}>
|
|
|
|
Replace I<string1> with I<string2>.
|
|
|
|
=item B<{^I<string>}>
|
|
|
|
If the argument starts with I<string>, upper case it. I<string> must
|
|
be a single letter.
|
|
|
|
=item B<{^^I<string>}>
|
|
|
|
If the argument contains I<string>, upper case it. I<string> must be a
|
|
single letter.
|
|
|
|
=item B<{,I<string>}>
|
|
|
|
If the argument starts with I<string>, lower case it. I<string> must
|
|
be a single letter.
|
|
|
|
=item B<{,,I<string>}>
|
|
|
|
If the argument contains I<string>, lower case it. I<string> must be a
|
|
single letter.
|
|
|
|
=back
|
|
|
|
They are inspired from B<Bash>:
|
|
|
|
unset myvar
|
|
echo ${myvar:-myval}
|
|
parallel --plus echo {:-myval} ::: "$myvar"
|
|
|
|
myvar=abcAaAdef
|
|
echo ${myvar:2}
|
|
parallel --plus echo {:2} ::: "$myvar"
|
|
|
|
echo ${myvar:2:3}
|
|
parallel --plus echo {:2:3} ::: "$myvar"
|
|
|
|
echo ${myvar#bc}
|
|
parallel --plus echo {#bc} ::: "$myvar"
|
|
echo ${myvar#abc}
|
|
parallel --plus echo {#abc} ::: "$myvar"
|
|
|
|
echo ${myvar%de}
|
|
parallel --plus echo {%de} ::: "$myvar"
|
|
echo ${myvar%def}
|
|
parallel --plus echo {%def} ::: "$myvar"
|
|
|
|
echo ${myvar/def/ghi}
|
|
parallel --plus echo {/def/ghi} ::: "$myvar"
|
|
|
|
echo ${myvar^a}
|
|
parallel --plus echo {^a} ::: "$myvar"
|
|
echo ${myvar^^a}
|
|
parallel --plus echo {^^a} ::: "$myvar"
|
|
|
|
myvar=AbcAaAdef
|
|
echo ${myvar,A}
|
|
parallel --plus echo '{,A}' ::: "$myvar"
|
|
echo ${myvar,,A}
|
|
parallel --plus echo '{,,A}' ::: "$myvar"
|
|
|
|
Output:
|
|
|
|
myval
|
|
myval
|
|
cAaAdef
|
|
cAaAdef
|
|
cAa
|
|
cAa
|
|
abcAaAdef
|
|
abcAaAdef
|
|
AaAdef
|
|
AaAdef
|
|
abcAaAdef
|
|
abcAaAdef
|
|
abcAaA
|
|
abcAaA
|
|
abcAaAghi
|
|
abcAaAghi
|
|
AbcAaAdef
|
|
AbcAaAdef
|
|
AbcAAAdef
|
|
AbcAAAdef
|
|
abcAaAdef
|
|
abcAaAdef
|
|
abcaaadef
|
|
abcaaadef
|
|
|
|
|
|
=head2 More than one argument
|
|
|
|
With B<--xargs> GNU B<parallel> will fit as many arguments as possible on a
|
|
single line:
|
|
|
|
cat num30000 | parallel --xargs echo | wc -l
|
|
|
|
Output (if you run this under Bash on GNU/Linux):
|
|
|
|
2
|
|
|
|
The 30000 arguments fitted on 2 lines.
|
|
|
|
The maximal length of a single line can be set with B<-s>. With a maximal
|
|
line length of 10000 chars 17 commands will be run:
|
|
|
|
cat num30000 | parallel --xargs -s 10000 echo | wc -l
|
|
|
|
Output:
|
|
|
|
17
|
|
|
|
For better parallelism GNU B<parallel> can distribute the arguments
|
|
between all the parallel jobs when end of file is met.
|
|
|
|
Below GNU B<parallel> reads the last argument when generating the second
|
|
job. When GNU B<parallel> reads the last argument, it spreads all the
|
|
arguments for the second job over 4 jobs instead, as 4 parallel jobs
|
|
are requested.
|
|
|
|
The first job will be the same as the B<--xargs> example above, but the
|
|
second job will be split into 4 evenly sized jobs, resulting in a
|
|
total of 5 jobs:
|
|
|
|
cat num30000 | parallel --jobs 4 -m echo | wc -l
|
|
|
|
Output (if you run this under Bash on GNU/Linux):
|
|
|
|
5
|
|
|
|
This is even more visible when running 4 jobs with 10 arguments. The
|
|
10 arguments are being spread over 4 jobs:
|
|
|
|
parallel --jobs 4 -m echo ::: 1 2 3 4 5 6 7 8 9 10
|
|
|
|
Output:
|
|
|
|
1 2 3
|
|
4 5 6
|
|
7 8 9
|
|
10
|
|
|
|
A replacement string can be part of a word. B<-m> will not repeat the context:
|
|
|
|
parallel --jobs 4 -m echo pre-{}-post ::: A B C D E F G
|
|
|
|
Output (the order may be different):
|
|
|
|
pre-A B-post
|
|
pre-C D-post
|
|
pre-E F-post
|
|
pre-G-post
|
|
|
|
To repeat the context use B<-X> which otherwise works like B<-m>:
|
|
|
|
parallel --jobs 4 -X echo pre-{}-post ::: A B C D E F G
|
|
|
|
Output (the order may be different):
|
|
|
|
pre-A-post pre-B-post
|
|
pre-C-post pre-D-post
|
|
pre-E-post pre-F-post
|
|
pre-G-post
|
|
|
|
To limit the number of arguments use B<-N>:
|
|
|
|
parallel -N3 echo ::: A B C D E F G H
|
|
|
|
Output (the order may be different):
|
|
|
|
A B C
|
|
D E F
|
|
G H
|
|
|
|
B<-N> also sets the positional replacement strings:
|
|
|
|
parallel -N3 echo 1={1} 2={2} 3={3} ::: A B C D E F G H
|
|
|
|
Output (the order may be different):
|
|
|
|
1=A 2=B 3=C
|
|
1=D 2=E 3=F
|
|
1=G 2=H 3=
|
|
|
|
B<-N0> reads 1 argument but inserts none:
|
|
|
|
parallel -N0 echo foo ::: 1 2 3
|
|
|
|
Output:
|
|
|
|
foo
|
|
foo
|
|
foo
|
|
|
|
=head2 Quoting
|
|
|
|
Command lines that contain special characters may need to be protected from the shell.
|
|
|
|
The B<perl> program B<print "@ARGV\n"> basically works like B<echo>.
|
|
|
|
perl -e 'print "@ARGV\n"' A
|
|
|
|
Output:
|
|
|
|
A
|
|
|
|
To run that in parallel the command needs to be quoted:
|
|
|
|
parallel perl -e 'print "@ARGV\n"' ::: This wont work
|
|
|
|
Output:
|
|
|
|
[Nothing]
|
|
|
|
To quote the command use B<-q>:
|
|
|
|
parallel -q perl -e 'print "@ARGV\n"' ::: This works
|
|
|
|
Output (the order may be different):
|
|
|
|
This
|
|
works
|
|
|
|
Or you can quote the critical part using B<\'>:
|
|
|
|
parallel perl -e \''print "@ARGV\n"'\' ::: This works, too
|
|
|
|
Output (the order may be different):
|
|
|
|
This
|
|
works,
|
|
too
|
|
|
|
GNU B<parallel> can also \-quote full lines. Simply run this:
|
|
|
|
parallel --shellquote
|
|
Warning: Input is read from the terminal. You either know what you
|
|
Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot
|
|
Warning: ::: or :::: or to pipe data into parallel. If so
|
|
Warning: consider going through the tutorial: man parallel_tutorial
|
|
Warning: Press CTRL-D to exit.
|
|
perl -e 'print "@ARGV\n"'
|
|
[CTRL-D]
|
|
|
|
Output:
|
|
|
|
perl\ -e\ \'print\ \"@ARGV\\n\"\'
|
|
|
|
This can then be used as the command:
|
|
|
|
parallel perl\ -e\ \'print\ \"@ARGV\\n\"\' ::: This also works
|
|
|
|
Output (the order may be different):
|
|
|
|
This
|
|
also
|
|
works
|
|
|
|
|
|
=head2 Trimming space
|
|
|
|
Space can be trimmed on the arguments using B<--trim>:
|
|
|
|
parallel --trim r echo pre-{}-post ::: ' A '
|
|
|
|
Output:
|
|
|
|
pre- A-post
|
|
|
|
To trim on the left side:
|
|
|
|
parallel --trim l echo pre-{}-post ::: ' A '
|
|
|
|
Output:
|
|
|
|
pre-A -post
|
|
|
|
To trim on the both sides:
|
|
|
|
parallel --trim lr echo pre-{}-post ::: ' A '
|
|
|
|
Output:
|
|
|
|
pre-A-post
|
|
|
|
|
|
=head2 Respecting the shell
|
|
|
|
This tutorial uses Bash as the shell. GNU B<parallel> respects which
|
|
shell you are using, so in B<zsh> you can do:
|
|
|
|
parallel echo \={} ::: zsh bash ls
|
|
|
|
Output:
|
|
|
|
/usr/bin/zsh
|
|
/bin/bash
|
|
/bin/ls
|
|
|
|
In B<csh> you can do:
|
|
|
|
parallel 'set a="{}"; if( { test -d "$a" } ) echo "$a is a dir"' ::: *
|
|
|
|
Output:
|
|
|
|
[somedir] is a dir
|
|
|
|
This also becomes useful if you use GNU B<parallel> in a shell script:
|
|
GNU B<parallel> will use the same shell as the shell script.
|
|
|
|
|
|
=head1 Controlling the output
|
|
|
|
The output can prefixed with the argument:
|
|
|
|
parallel --tag echo foo-{} ::: A B C
|
|
|
|
Output (the order may be different):
|
|
|
|
A foo-A
|
|
B foo-B
|
|
C foo-C
|
|
|
|
To prefix it with another string use B<--tagstring>:
|
|
|
|
parallel --tagstring {}-bar echo foo-{} ::: A B C
|
|
|
|
Output (the order may be different):
|
|
|
|
A-bar foo-A
|
|
B-bar foo-B
|
|
C-bar foo-C
|
|
|
|
To see what commands will be run without running them use B<--dryrun>:
|
|
|
|
parallel --dryrun echo {} ::: A B C
|
|
|
|
Output (the order may be different):
|
|
|
|
echo A
|
|
echo B
|
|
echo C
|
|
|
|
To print the command before running them use B<--verbose>:
|
|
|
|
parallel --verbose echo {} ::: A B C
|
|
|
|
Output (the order may be different):
|
|
|
|
echo A
|
|
echo B
|
|
A
|
|
echo C
|
|
B
|
|
C
|
|
|
|
GNU B<parallel> will postpone the output until the command completes:
|
|
|
|
parallel -j2 'printf "%s-start\n%s" {} {};
|
|
sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
|
|
|
|
Output:
|
|
|
|
2-start
|
|
2-middle
|
|
2-end
|
|
1-start
|
|
1-middle
|
|
1-end
|
|
4-start
|
|
4-middle
|
|
4-end
|
|
|
|
To get the output immediately use B<--ungroup>:
|
|
|
|
parallel -j2 --ungroup 'printf "%s-start\n%s" {} {};
|
|
sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
|
|
|
|
Output:
|
|
|
|
4-start
|
|
42-start
|
|
2-middle
|
|
2-end
|
|
1-start
|
|
1-middle
|
|
1-end
|
|
-middle
|
|
4-end
|
|
|
|
B<--ungroup> is fast, but can cause half a line from one job to be mixed
|
|
with half a line of another job. That has happened in the second line,
|
|
where the line '4-middle' is mixed with '2-start'.
|
|
|
|
To avoid this use B<--linebuffer>:
|
|
|
|
parallel -j2 --linebuffer 'printf "%s-start\n%s" {} {};
|
|
sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
|
|
|
|
Output:
|
|
|
|
4-start
|
|
2-start
|
|
2-middle
|
|
2-end
|
|
1-start
|
|
1-middle
|
|
1-end
|
|
4-middle
|
|
4-end
|
|
|
|
To force the output in the same order as the arguments use B<--keep-order>/B<-k>:
|
|
|
|
parallel -j2 -k 'printf "%s-start\n%s" {} {};
|
|
sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1
|
|
|
|
Output:
|
|
|
|
4-start
|
|
4-middle
|
|
4-end
|
|
2-start
|
|
2-middle
|
|
2-end
|
|
1-start
|
|
1-middle
|
|
1-end
|
|
|
|
|
|
=head2 Saving output into files
|
|
|
|
GNU B<parallel> can save the output of each job into files:
|
|
|
|
parallel --files echo ::: A B C
|
|
|
|
Output will be similar to this:
|
|
|
|
/tmp/pAh6uWuQCg.par
|
|
/tmp/opjhZCzAX4.par
|
|
/tmp/W0AT_Rph2o.par
|
|
|
|
By default GNU B<parallel> will cache the output in files in B</tmp>. This
|
|
can be changed by setting B<$TMPDIR> or B<--tmpdir>:
|
|
|
|
parallel --tmpdir /var/tmp --files echo ::: A B C
|
|
|
|
Output will be similar to this:
|
|
|
|
/var/tmp/N_vk7phQRc.par
|
|
/var/tmp/7zA4Ccf3wZ.par
|
|
/var/tmp/LIuKgF_2LP.par
|
|
|
|
Or:
|
|
|
|
TMPDIR=/var/tmp parallel --files echo ::: A B C
|
|
|
|
Output: Same as above.
|
|
|
|
The output files can be saved in a structured way using B<--results>:
|
|
|
|
parallel --results outdir echo ::: A B C
|
|
|
|
Output:
|
|
|
|
A
|
|
B
|
|
C
|
|
|
|
These files were also generated containing the standard output
|
|
(stdout), standard error (stderr), and the sequence number (seq):
|
|
|
|
outdir/1/A/seq
|
|
outdir/1/A/stderr
|
|
outdir/1/A/stdout
|
|
outdir/1/B/seq
|
|
outdir/1/B/stderr
|
|
outdir/1/B/stdout
|
|
outdir/1/C/seq
|
|
outdir/1/C/stderr
|
|
outdir/1/C/stdout
|
|
|
|
B<--header :> will take the first value as name and use that in the
|
|
directory structure. This is useful if you are using multiple input
|
|
sources:
|
|
|
|
parallel --header : --results outdir echo ::: f1 A B ::: f2 C D
|
|
|
|
Generated files:
|
|
|
|
outdir/f1/A/f2/C/seq
|
|
outdir/f1/A/f2/C/stderr
|
|
outdir/f1/A/f2/C/stdout
|
|
outdir/f1/A/f2/D/seq
|
|
outdir/f1/A/f2/D/stderr
|
|
outdir/f1/A/f2/D/stdout
|
|
outdir/f1/B/f2/C/seq
|
|
outdir/f1/B/f2/C/stderr
|
|
outdir/f1/B/f2/C/stdout
|
|
outdir/f1/B/f2/D/seq
|
|
outdir/f1/B/f2/D/stderr
|
|
outdir/f1/B/f2/D/stdout
|
|
|
|
The directories are named after the variables and their values.
|
|
|
|
=head1 Controlling the execution
|
|
|
|
=head2 Number of simultaneous jobs
|
|
|
|
The number of concurrent jobs is given with B<--jobs>/B<-j>:
|
|
|
|
/usr/bin/time parallel -N0 -j64 sleep 1 :::: num128
|
|
|
|
With 64 jobs in parallel the 128 B<sleep>s will take 2-8 seconds to run -
|
|
depending on how fast your machine is.
|
|
|
|
By default B<--jobs> is the same as the number of CPU cores. So this:
|
|
|
|
/usr/bin/time parallel -N0 sleep 1 :::: num128
|
|
|
|
should take twice the time of running 2 jobs per CPU core:
|
|
|
|
/usr/bin/time parallel -N0 --jobs 200% sleep 1 :::: num128
|
|
|
|
B<--jobs 0> will run as many jobs in parallel as possible:
|
|
|
|
/usr/bin/time parallel -N0 --jobs 0 sleep 1 :::: num128
|
|
|
|
which should take 1-7 seconds depending on how fast your machine is.
|
|
|
|
B<--jobs> can read from a file which is re-read when a job finishes:
|
|
|
|
echo 50% > my_jobs
|
|
/usr/bin/time parallel -N0 --jobs my_jobs sleep 1 :::: num128 &
|
|
sleep 1
|
|
echo 0 > my_jobs
|
|
wait
|
|
|
|
The first second only 50% of the CPU cores will run a job. Then B<0> is
|
|
put into B<my_jobs> and then the rest of the jobs will be started in
|
|
parallel.
|
|
|
|
Instead of basing the percentage on the number of CPU cores
|
|
GNU B<parallel> can base it on the number of CPUs:
|
|
|
|
parallel --use-cpus-instead-of-cores -N0 sleep 1 :::: num8
|
|
|
|
=head2 Shuffle job order
|
|
|
|
If you have many jobs (e.g. by multiple combinations of input
|
|
sources), it can be handy to shuffle the jobs, so you get different
|
|
values run. Use B<--shuf> for that:
|
|
|
|
parallel --shuf echo ::: 1 2 3 ::: a b c ::: A B C
|
|
|
|
Output:
|
|
|
|
All combinations but different order for each run.
|
|
|
|
=head2 Interactivity
|
|
|
|
GNU B<parallel> can ask the user if a command should be run using B<--interactive>:
|
|
|
|
parallel --interactive echo ::: 1 2 3
|
|
|
|
Output:
|
|
|
|
echo 1 ?...y
|
|
echo 2 ?...n
|
|
1
|
|
echo 3 ?...y
|
|
3
|
|
|
|
GNU B<parallel> can be used to put arguments on the command line for an
|
|
interactive command such as B<emacs> to edit one file at a time:
|
|
|
|
parallel --tty emacs ::: 1 2 3
|
|
|
|
Or give multiple argument in one go to open multiple files:
|
|
|
|
parallel -X --tty vi ::: 1 2 3
|
|
|
|
=head2 A terminal for every job
|
|
|
|
Using B<--tmux> GNU B<parallel> can start a terminal for every job run:
|
|
|
|
seq 10 20 | parallel --tmux 'echo start {}; sleep {}; echo done {}'
|
|
|
|
This will tell you to run something similar to:
|
|
|
|
tmux -S /tmp/tmsrPrO0 attach
|
|
|
|
Using normal B<tmux> keystrokes (CTRL-b n or CTRL-b p) you can cycle
|
|
between windows of the running jobs. When a job is finished it will
|
|
pause for 10 seconds before closing the window.
|
|
|
|
=head2 Timing
|
|
|
|
Some jobs do heavy I/O when they start. To avoid a thundering herd GNU
|
|
B<parallel> can delay starting new jobs. B<--delay> I<X> will make
|
|
sure there is at least I<X> seconds between each start:
|
|
|
|
parallel --delay 2.5 echo Starting {}\;date ::: 1 2 3
|
|
|
|
Output:
|
|
|
|
Starting 1
|
|
Thu Aug 15 16:24:33 CEST 2013
|
|
Starting 2
|
|
Thu Aug 15 16:24:35 CEST 2013
|
|
Starting 3
|
|
Thu Aug 15 16:24:38 CEST 2013
|
|
|
|
|
|
If jobs taking more than a certain amount of time are known to fail,
|
|
they can be stopped with B<--timeout>. The accuracy of B<--timeout> is
|
|
2 seconds:
|
|
|
|
parallel --timeout 4.1 sleep {}\; echo {} ::: 2 4 6 8
|
|
|
|
Output:
|
|
|
|
2
|
|
4
|
|
|
|
GNU B<parallel> can compute the median runtime for jobs and kill those
|
|
that take more than 200% of the median runtime:
|
|
|
|
parallel --timeout 200% sleep {}\; echo {} ::: 2.1 2.2 3 7 2.3
|
|
|
|
Output:
|
|
|
|
2.1
|
|
2.2
|
|
3
|
|
2.3
|
|
|
|
=head2 Progress information
|
|
|
|
Based on the runtime of completed jobs GNU B<parallel> can estimate the
|
|
total runtime:
|
|
|
|
parallel --eta sleep ::: 1 3 2 2 1 3 3 2 1
|
|
|
|
Output:
|
|
|
|
Computers / CPU cores / Max jobs to run
|
|
1:local / 2 / 2
|
|
|
|
Computer:jobs running/jobs completed/%of started jobs/
|
|
Average seconds to complete
|
|
ETA: 2s 0left 1.11avg local:0/9/100%/1.1s
|
|
|
|
GNU B<parallel> can give progress information with B<--progress>:
|
|
|
|
parallel --progress sleep ::: 1 3 2 2 1 3 3 2 1
|
|
|
|
Output:
|
|
|
|
Computers / CPU cores / Max jobs to run
|
|
1:local / 2 / 2
|
|
|
|
Computer:jobs running/jobs completed/%of started jobs/
|
|
Average seconds to complete
|
|
local:0/9/100%/1.1s
|
|
|
|
A progress bar can be shown with B<--bar>:
|
|
|
|
parallel --bar sleep ::: 1 3 2 2 1 3 3 2 1
|
|
|
|
And a graphic bar can be shown with B<--bar> and B<zenity>:
|
|
|
|
seq 1000 | parallel -j10 --bar '(echo -n {};sleep 0.1)' \
|
|
2> >(zenity --progress --auto-kill --auto-close)
|
|
|
|
A logfile of the jobs completed so far can be generated with B<--joblog>:
|
|
|
|
parallel --joblog /tmp/log exit ::: 1 2 3 0
|
|
cat /tmp/log
|
|
|
|
Output:
|
|
|
|
Seq Host Starttime Runtime Send Receive Exitval Signal Command
|
|
1 : 1376577364.974 0.008 0 0 1 0 exit 1
|
|
2 : 1376577364.982 0.013 0 0 2 0 exit 2
|
|
3 : 1376577364.990 0.013 0 0 3 0 exit 3
|
|
4 : 1376577365.003 0.003 0 0 0 0 exit 0
|
|
|
|
The log contains the job sequence, which host the job was run on, the
|
|
start time and run time, how much data was transferred, the exit
|
|
value, the signal that killed the job, and finally the command being
|
|
run.
|
|
|
|
With a joblog GNU B<parallel> can be stopped and later pickup where it
|
|
left off. It it important that the input of the completed jobs is
|
|
unchanged.
|
|
|
|
parallel --joblog /tmp/log exit ::: 1 2 3 0
|
|
cat /tmp/log
|
|
parallel --resume --joblog /tmp/log exit ::: 1 2 3 0 0 0
|
|
cat /tmp/log
|
|
|
|
Output:
|
|
|
|
Seq Host Starttime Runtime Send Receive Exitval Signal Command
|
|
1 : 1376580069.544 0.008 0 0 1 0 exit 1
|
|
2 : 1376580069.552 0.009 0 0 2 0 exit 2
|
|
3 : 1376580069.560 0.012 0 0 3 0 exit 3
|
|
4 : 1376580069.571 0.005 0 0 0 0 exit 0
|
|
|
|
Seq Host Starttime Runtime Send Receive Exitval Signal Command
|
|
1 : 1376580069.544 0.008 0 0 1 0 exit 1
|
|
2 : 1376580069.552 0.009 0 0 2 0 exit 2
|
|
3 : 1376580069.560 0.012 0 0 3 0 exit 3
|
|
4 : 1376580069.571 0.005 0 0 0 0 exit 0
|
|
5 : 1376580070.028 0.009 0 0 0 0 exit 0
|
|
6 : 1376580070.038 0.007 0 0 0 0 exit 0
|
|
|
|
Note how the start time of the last 2 jobs is clearly different from the second run.
|
|
|
|
With B<--resume-failed> GNU B<parallel> will re-run the jobs that failed:
|
|
|
|
parallel --resume-failed --joblog /tmp/log exit ::: 1 2 3 0 0 0
|
|
cat /tmp/log
|
|
|
|
Output:
|
|
|
|
Seq Host Starttime Runtime Send Receive Exitval Signal Command
|
|
1 : 1376580069.544 0.008 0 0 1 0 exit 1
|
|
2 : 1376580069.552 0.009 0 0 2 0 exit 2
|
|
3 : 1376580069.560 0.012 0 0 3 0 exit 3
|
|
4 : 1376580069.571 0.005 0 0 0 0 exit 0
|
|
5 : 1376580070.028 0.009 0 0 0 0 exit 0
|
|
6 : 1376580070.038 0.007 0 0 0 0 exit 0
|
|
1 : 1376580154.433 0.010 0 0 1 0 exit 1
|
|
2 : 1376580154.444 0.022 0 0 2 0 exit 2
|
|
3 : 1376580154.466 0.005 0 0 3 0 exit 3
|
|
|
|
Note how seq 1 2 3 have been repeated because they had exit value
|
|
different from 0.
|
|
|
|
B<--retry-failed> does almost the same as B<--resume-failed>. Where
|
|
B<--resume-failed> reads the commands from the command line (and
|
|
ignores the commands in the joblog), B<--retry-failed> ignores the
|
|
command line and reruns the commands mentioned in the joblog.
|
|
|
|
parallel --retry-failed --joblog /tmp/log
|
|
cat /tmp/log
|
|
|
|
Output:
|
|
|
|
Seq Host Starttime Runtime Send Receive Exitval Signal Command
|
|
1 : 1376580069.544 0.008 0 0 1 0 exit 1
|
|
2 : 1376580069.552 0.009 0 0 2 0 exit 2
|
|
3 : 1376580069.560 0.012 0 0 3 0 exit 3
|
|
4 : 1376580069.571 0.005 0 0 0 0 exit 0
|
|
5 : 1376580070.028 0.009 0 0 0 0 exit 0
|
|
6 : 1376580070.038 0.007 0 0 0 0 exit 0
|
|
1 : 1376580154.433 0.010 0 0 1 0 exit 1
|
|
2 : 1376580154.444 0.022 0 0 2 0 exit 2
|
|
3 : 1376580154.466 0.005 0 0 3 0 exit 3
|
|
1 : 1376580164.633 0.010 0 0 1 0 exit 1
|
|
2 : 1376580164.644 0.022 0 0 2 0 exit 2
|
|
3 : 1376580164.666 0.005 0 0 3 0 exit 3
|
|
|
|
|
|
=head2 Termination
|
|
|
|
=head3 Unconditional termination
|
|
|
|
By default GNU B<parallel> will wait for all jobs to finish before exiting.
|
|
|
|
If you send GNU B<parallel> the B<TERM> signal, GNU B<parallel> will
|
|
stop spawning new jobs and wait for the remaining jobs to finish. If
|
|
you send GNU B<parallel> the B<TERM> signal again, GNU B<parallel>
|
|
will kill all running jobs and exit.
|
|
|
|
=head3 Termination dependent on job status
|
|
|
|
For certain jobs there is no need to continue if one of the jobs fails
|
|
and has an exit code different from 0. GNU B<parallel> will stop spawning new jobs
|
|
with B<--halt soon,fail=1>:
|
|
|
|
parallel -j2 --halt soon,fail=1 echo {}\; exit {} ::: 0 0 1 2 3
|
|
|
|
Output:
|
|
|
|
0
|
|
0
|
|
1
|
|
parallel: This job failed:
|
|
echo 1; exit 1
|
|
parallel: Starting no more jobs. Waiting for 1 jobs to finish.
|
|
2
|
|
|
|
With B<--halt now,fail=1> the running jobs will be killed immediately:
|
|
|
|
parallel -j2 --halt now,fail=1 echo {}\; exit {} ::: 0 0 1 2 3
|
|
|
|
Output:
|
|
|
|
0
|
|
0
|
|
1
|
|
parallel: This job failed:
|
|
echo 1; exit 1
|
|
|
|
If B<--halt> is given a percentage this percentage of the jobs must fail
|
|
before GNU B<parallel> stops spawning more jobs:
|
|
|
|
parallel -j2 --halt soon,fail=20% echo {}\; exit {} \
|
|
::: 0 1 2 3 4 5 6 7 8 9
|
|
|
|
Output:
|
|
|
|
0
|
|
1
|
|
parallel: This job failed:
|
|
echo 1; exit 1
|
|
2
|
|
parallel: This job failed:
|
|
echo 2; exit 2
|
|
parallel: Starting no more jobs. Waiting for 1 jobs to finish.
|
|
3
|
|
parallel: This job failed:
|
|
echo 3; exit 3
|
|
|
|
If you are looking for success instead of failures, you can use
|
|
B<success>. This will finish as soon as the first job succeeds:
|
|
|
|
parallel -j2 --halt now,success=1 echo {}\; exit {} ::: 1 2 3 0 4 5 6
|
|
|
|
Output:
|
|
|
|
1
|
|
2
|
|
3
|
|
0
|
|
parallel: This job succeeded:
|
|
echo 0; exit 0
|
|
|
|
GNU B<parallel> can retry the command with B<--retries>. This is useful if a
|
|
command fails for unknown reasons now and then.
|
|
|
|
parallel -k --retries 3 \
|
|
'echo tried {} >>/tmp/runs; echo completed {}; exit {}' ::: 1 2 0
|
|
cat /tmp/runs
|
|
|
|
Output:
|
|
|
|
completed 1
|
|
completed 2
|
|
completed 0
|
|
|
|
tried 1
|
|
tried 2
|
|
tried 1
|
|
tried 2
|
|
tried 1
|
|
tried 2
|
|
tried 0
|
|
|
|
Note how job 1 and 2 were tried 3 times, but 0 was not retried because it had exit code 0.
|
|
|
|
=head3 Termination signals (advanced)
|
|
|
|
Using B<--termseq> you can control which signals are sent when killing
|
|
children. Normally children will be killed by sending them B<SIGTERM>,
|
|
waiting 200 ms, then another B<SIGTERM>, waiting 100 ms, then another
|
|
B<SIGTERM>, waiting 50 ms, then a B<SIGKILL>, finally waiting 25 ms
|
|
before giving up. It looks like this:
|
|
|
|
show_signals() {
|
|
perl -e 'for(keys %SIG) {
|
|
$SIG{$_} = eval "sub { print \"Got $_\\n\"; }";
|
|
}
|
|
while(1){sleep 1}'
|
|
}
|
|
export -f show_signals
|
|
echo | parallel --termseq TERM,200,TERM,100,TERM,50,KILL,25 \
|
|
-u --timeout 1 show_signals
|
|
|
|
Output:
|
|
|
|
Got TERM
|
|
Got TERM
|
|
Got TERM
|
|
|
|
Or just:
|
|
|
|
echo | parallel -u --timeout 1 show_signals
|
|
|
|
Output: Same as above.
|
|
|
|
You can change this to B<SIGINT>, B<SIGTERM>, B<SIGKILL>:
|
|
|
|
echo | parallel --termseq INT,200,TERM,100,KILL,25 \
|
|
-u --timeout 1 show_signals
|
|
|
|
Output:
|
|
|
|
Got INT
|
|
Got TERM
|
|
|
|
The B<SIGKILL> does not show because it cannot be caught, and thus the
|
|
child dies.
|
|
|
|
|
|
=head2 Limiting the resources
|
|
|
|
To avoid overloading systems GNU B<parallel> can look at the system load
|
|
before starting another job:
|
|
|
|
parallel --load 100% echo load is less than {} job per cpu ::: 1
|
|
|
|
Output:
|
|
|
|
[when then load is less than the number of cpu cores]
|
|
load is less than 1 job per cpu
|
|
|
|
GNU B<parallel> can also check if the system is swapping.
|
|
|
|
parallel --noswap echo the system is not swapping ::: now
|
|
|
|
Output:
|
|
|
|
[when then system is not swapping]
|
|
the system is not swapping now
|
|
|
|
Some jobs need a lot of memory, and should only be started when there
|
|
is enough memory free. Using B<--memfree> GNU B<parallel> can check if
|
|
there is enough memory free. Additionally, GNU B<parallel> will kill
|
|
off the youngest job if the memory free falls below 50% of the
|
|
size. The killed job will put back on the queue and retried later.
|
|
|
|
parallel --memfree 1G echo will run if more than 1 GB is ::: free
|
|
|
|
GNU B<parallel> can run the jobs with a nice value. This will work both
|
|
locally and remotely.
|
|
|
|
parallel --nice 17 echo this is being run with nice -n ::: 17
|
|
|
|
Output:
|
|
|
|
this is being run with nice -n 17
|
|
|
|
=head1 Remote execution
|
|
|
|
GNU B<parallel> can run jobs on remote servers. It uses B<ssh> to
|
|
communicate with the remote machines.
|
|
|
|
=head2 Sshlogin
|
|
|
|
The most basic sshlogin is B<-S> I<host>:
|
|
|
|
parallel -S $SERVER1 echo running on ::: $SERVER1
|
|
|
|
Output:
|
|
|
|
running on [$SERVER1]
|
|
|
|
To use a different username prepend the server with I<username@>:
|
|
|
|
parallel -S username@$SERVER1 echo running on ::: username@$SERVER1
|
|
|
|
Output:
|
|
|
|
running on [username@$SERVER1]
|
|
|
|
The special sshlogin B<:> is the local machine:
|
|
|
|
parallel -S : echo running on ::: the_local_machine
|
|
|
|
Output:
|
|
|
|
running on the_local_machine
|
|
|
|
If B<ssh> is not in $PATH it can be prepended to $SERVER1:
|
|
|
|
parallel -S '/usr/bin/ssh '$SERVER1 echo custom ::: ssh
|
|
|
|
Output:
|
|
|
|
custom ssh
|
|
|
|
The B<ssh> command can also be given using B<--ssh>:
|
|
|
|
parallel --ssh /usr/bin/ssh -S $SERVER1 echo custom ::: ssh
|
|
|
|
or by setting B<$PARALLEL_SSH>:
|
|
|
|
export PARALLEL_SSH=/usr/bin/ssh
|
|
parallel -S $SERVER1 echo custom ::: ssh
|
|
|
|
Several servers can be given using multiple B<-S>:
|
|
|
|
parallel -S $SERVER1 -S $SERVER2 echo ::: running on more hosts
|
|
|
|
Output (the order may be different):
|
|
|
|
running
|
|
on
|
|
more
|
|
hosts
|
|
|
|
Or they can be separated by B<,>:
|
|
|
|
parallel -S $SERVER1,$SERVER2 echo ::: running on more hosts
|
|
|
|
Output: Same as above.
|
|
|
|
Or newline:
|
|
|
|
# This gives a \n between $SERVER1 and $SERVER2
|
|
SERVERS="`echo $SERVER1; echo $SERVER2`"
|
|
parallel -S "$SERVERS" echo ::: running on more hosts
|
|
|
|
They can also be read from a file (replace I<user@> with the user on B<$SERVER2>):
|
|
|
|
echo $SERVER1 > nodefile
|
|
# Force 4 cores, special ssh-command, username
|
|
echo 4//usr/bin/ssh user@$SERVER2 >> nodefile
|
|
parallel --sshloginfile nodefile echo ::: running on more hosts
|
|
|
|
Output: Same as above.
|
|
|
|
Every time a job finished, the B<--sshloginfile> will be re-read, so
|
|
it is possible to both add and remove hosts while running.
|
|
|
|
The special B<--sshloginfile ..> reads from B<~/.parallel/sshloginfile>.
|
|
|
|
To force GNU B<parallel> to treat a server having a given number of CPU
|
|
cores prepend the number of core followed by B</> to the sshlogin:
|
|
|
|
parallel -S 4/$SERVER1 echo force {} cpus on server ::: 4
|
|
|
|
Output:
|
|
|
|
force 4 cpus on server
|
|
|
|
Servers can be put into groups by prepending I<@groupname> to the
|
|
server and the group can then be selected by appending I<@groupname> to
|
|
the argument if using B<--hostgroup>:
|
|
|
|
parallel --hostgroup -S @grp1/$SERVER1 -S @grp2/$SERVER2 echo {} \
|
|
::: run_on_grp1@grp1 run_on_grp2@grp2
|
|
|
|
Output:
|
|
|
|
run_on_grp1
|
|
run_on_grp2
|
|
|
|
A host can be in multiple groups by separating the groups with B<+>, and
|
|
you can force GNU B<parallel> to limit the groups on which the command
|
|
can be run with B<-S> I<@groupname>:
|
|
|
|
parallel -S @grp1 -S @grp1+grp2/$SERVER1 -S @grp2/SERVER2 echo {} \
|
|
::: run_on_grp1 also_grp1
|
|
|
|
Output:
|
|
|
|
run_on_grp1
|
|
also_grp1
|
|
|
|
=head2 Transferring files
|
|
|
|
GNU B<parallel> can transfer the files to be processed to the remote
|
|
host. It does that using rsync.
|
|
|
|
echo This is input_file > input_file
|
|
parallel -S $SERVER1 --transferfile {} cat ::: input_file
|
|
|
|
Output:
|
|
|
|
This is input_file
|
|
|
|
If the files are processed into another file, the resulting file can be
|
|
transferred back:
|
|
|
|
echo This is input_file > input_file
|
|
parallel -S $SERVER1 --transferfile {} --return {}.out \
|
|
cat {} ">"{}.out ::: input_file
|
|
cat input_file.out
|
|
|
|
Output: Same as above.
|
|
|
|
To remove the input and output file on the remote server use B<--cleanup>:
|
|
|
|
echo This is input_file > input_file
|
|
parallel -S $SERVER1 --transferfile {} --return {}.out --cleanup \
|
|
cat {} ">"{}.out ::: input_file
|
|
cat input_file.out
|
|
|
|
Output: Same as above.
|
|
|
|
There is a shorthand for B<--transferfile {} --return --cleanup> called B<--trc>:
|
|
|
|
echo This is input_file > input_file
|
|
parallel -S $SERVER1 --trc {}.out cat {} ">"{}.out ::: input_file
|
|
cat input_file.out
|
|
|
|
Output: Same as above.
|
|
|
|
Some jobs need a common database for all jobs. GNU B<parallel> can
|
|
transfer that using B<--basefile> which will transfer the file before the
|
|
first job:
|
|
|
|
echo common data > common_file
|
|
parallel --basefile common_file -S $SERVER1 \
|
|
cat common_file\; echo {} ::: foo
|
|
|
|
Output:
|
|
|
|
common data
|
|
foo
|
|
|
|
To remove it from the remote host after the last job use B<--cleanup>.
|
|
|
|
|
|
=head2 Working dir
|
|
|
|
The default working dir on the remote machines is the login dir. This
|
|
can be changed with B<--workdir> I<mydir>.
|
|
|
|
Files transferred using B<--transferfile> and B<--return> will be relative
|
|
to I<mydir> on remote computers, and the command will be executed in
|
|
the dir I<mydir>.
|
|
|
|
The special I<mydir> value B<...> will create working dirs under
|
|
B<~/.parallel/tmp> on the remote computers. If B<--cleanup> is given
|
|
these dirs will be removed.
|
|
|
|
The special I<mydir> value B<.> uses the current working dir. If the
|
|
current working dir is beneath your home dir, the value B<.> is
|
|
treated as the relative path to your home dir. This means that if your
|
|
home dir is different on remote computers (e.g. if your login is
|
|
different) the relative path will still be relative to your home dir.
|
|
|
|
parallel -S $SERVER1 pwd ::: ""
|
|
parallel --workdir . -S $SERVER1 pwd ::: ""
|
|
parallel --workdir ... -S $SERVER1 pwd ::: ""
|
|
|
|
Output:
|
|
|
|
[the login dir on $SERVER1]
|
|
[current dir relative on $SERVER1]
|
|
[a dir in ~/.parallel/tmp/...]
|
|
|
|
|
|
=head2 Avoid overloading sshd
|
|
|
|
If many jobs are started on the same server, B<sshd> can be
|
|
overloaded. GNU B<parallel> can insert a delay between each job run on
|
|
the same server:
|
|
|
|
parallel -S $SERVER1 --sshdelay 0.2 echo ::: 1 2 3
|
|
|
|
Output (the order may be different):
|
|
|
|
1
|
|
2
|
|
3
|
|
|
|
B<sshd> will be less overloaded if using B<--controlmaster>, which will
|
|
multiplex ssh connections:
|
|
|
|
parallel --controlmaster -S $SERVER1 echo ::: 1 2 3
|
|
|
|
Output: Same as above.
|
|
|
|
=head2 Ignore hosts that are down
|
|
|
|
In clusters with many hosts a few of them are often down. GNU B<parallel>
|
|
can ignore those hosts. In this case the host 173.194.32.46 is down:
|
|
|
|
parallel --filter-hosts -S 173.194.32.46,$SERVER1 echo ::: bar
|
|
|
|
Output:
|
|
|
|
bar
|
|
|
|
=head2 Running the same commands on all hosts
|
|
|
|
GNU B<parallel> can run the same command on all the hosts:
|
|
|
|
parallel --onall -S $SERVER1,$SERVER2 echo ::: foo bar
|
|
|
|
Output (the order may be different):
|
|
|
|
foo
|
|
bar
|
|
foo
|
|
bar
|
|
|
|
Often you will just want to run a single command on all hosts with out
|
|
arguments. B<--nonall> is a no argument B<--onall>:
|
|
|
|
parallel --nonall -S $SERVER1,$SERVER2 echo foo bar
|
|
|
|
Output:
|
|
|
|
foo bar
|
|
foo bar
|
|
|
|
When B<--tag> is used with B<--nonall> and B<--onall> the B<--tagstring> is the host:
|
|
|
|
parallel --nonall --tag -S $SERVER1,$SERVER2 echo foo bar
|
|
|
|
Output (the order may be different):
|
|
|
|
$SERVER1 foo bar
|
|
$SERVER2 foo bar
|
|
|
|
B<--jobs> sets the number of servers to log in to in parallel.
|
|
|
|
=head2 Transferring environment variables and functions
|
|
|
|
B<env_parallel> is a shell function that transfers all aliases,
|
|
functions, variables, and arrays. You active it by running:
|
|
|
|
source `which env_parallel.bash`
|
|
|
|
Replace B<bash> with the shell you use.
|
|
|
|
Now you can use B<env_parallel> instead of B<parallel> and still have
|
|
your environment:
|
|
|
|
alias myecho=echo
|
|
myvar="Joe's var is"
|
|
env_parallel -S $SERVER1 'myecho $myvar' ::: green
|
|
|
|
Output:
|
|
|
|
Joe's var is green
|
|
|
|
The disadvantage is that if your environment is huge B<env_parallel>
|
|
will fail.
|
|
|
|
When B<env_parallel> fails, you can still use B<--env> to tell GNU
|
|
B<parallel> to transfer an environment variable to the remote system.
|
|
|
|
MYVAR='foo bar'
|
|
export MYVAR
|
|
parallel --env MYVAR -S $SERVER1 echo '$MYVAR' ::: baz
|
|
|
|
Output:
|
|
|
|
foo bar baz
|
|
|
|
This works for functions, too, if your shell is Bash:
|
|
|
|
# This only works in Bash
|
|
my_func() {
|
|
echo in my_func $1
|
|
}
|
|
export -f my_func
|
|
parallel --env my_func -S $SERVER1 my_func ::: baz
|
|
|
|
Output:
|
|
|
|
in my_func baz
|
|
|
|
GNU B<parallel> can copy all user defined variables and functions to
|
|
the remote system. It just needs to record which ones to ignore in
|
|
B<~/.parallel/ignored_vars>. Do that by running this once:
|
|
|
|
parallel --record-env
|
|
cat ~/.parallel/ignored_vars
|
|
|
|
Output:
|
|
|
|
[list of variables to ignore - including $PATH and $HOME]
|
|
|
|
Now all other variables and functions defined will be copied when
|
|
using B<--env _>.
|
|
|
|
# The function is only copied if using Bash
|
|
my_func2() {
|
|
echo in my_func2 $VAR $1
|
|
}
|
|
export -f my_func2
|
|
VAR=foo
|
|
export VAR
|
|
|
|
parallel --env _ -S $SERVER1 'echo $VAR; my_func2' ::: bar
|
|
|
|
Output:
|
|
|
|
foo
|
|
in my_func2 foo bar
|
|
|
|
If you use B<env_parallel> the variables, functions, and aliases do
|
|
not even need to be exported to be copied:
|
|
|
|
NOT='not exported var'
|
|
alias myecho=echo
|
|
not_ex() {
|
|
myecho in not_exported_func $NOT $1
|
|
}
|
|
env_parallel --env _ -S $SERVER1 'echo $NOT; not_ex' ::: bar
|
|
|
|
Output:
|
|
|
|
not exported var
|
|
in not_exported_func not exported var bar
|
|
|
|
|
|
=head2 Showing what is actually run
|
|
|
|
B<--verbose> will show the command that would be run on the local
|
|
machine.
|
|
|
|
When using B<--cat>, B<--pipepart>, or when a job is run on a remote
|
|
machine, the command is wrapped with helper scripts. B<-vv> shows all
|
|
of this.
|
|
|
|
parallel -vv --pipepart --block 1M wc :::: num30000
|
|
|
|
Output:
|
|
|
|
<num30000 perl -e 'while(@ARGV) { sysseek(STDIN,shift,0) || die;
|
|
$left = shift; while($read = sysread(STDIN,$buf, ($left > 131072
|
|
? 131072 : $left))){ $left -= $read; syswrite(STDOUT,$buf); } }'
|
|
0 0 0 168894 | (wc)
|
|
30000 30000 168894
|
|
|
|
When the command gets more complex, the output is so hard to read,
|
|
that it is only useful for debugging:
|
|
|
|
my_func3() {
|
|
echo in my_func $1 > $1.out
|
|
}
|
|
export -f my_func3
|
|
parallel -vv --workdir ... --nice 17 --env _ --trc {}.out \
|
|
-S $SERVER1 my_func3 {} ::: abc-file
|
|
|
|
Output will be similar to:
|
|
|
|
|
|
( ssh server -- mkdir -p ./.parallel/tmp/aspire-1928520-1;rsync
|
|
--protocol 30 -rlDzR -essh ./abc-file
|
|
server:./.parallel/tmp/aspire-1928520-1 );ssh server -- exec perl -e
|
|
\''@GNU_Parallel=("use","IPC::Open3;","use","MIME::Base64");
|
|
eval"@GNU_Parallel";my$eval=decode_base64(join"",@ARGV);eval$eval;'\'
|
|
c3lzdGVtKCJta2RpciIsIi1wIiwiLS0iLCIucGFyYWxsZWwvdG1wL2FzcGlyZS0xOTI4N
|
|
TsgY2hkaXIgIi5wYXJhbGxlbC90bXAvYXNwaXJlLTE5Mjg1MjAtMSIgfHxwcmludChTVE
|
|
BhcmFsbGVsOiBDYW5ub3QgY2hkaXIgdG8gLnBhcmFsbGVsL3RtcC9hc3BpcmUtMTkyODU
|
|
iKSAmJiBleGl0IDI1NTskRU5WeyJPTERQV0QifT0iL2hvbWUvdGFuZ2UvcHJpdmF0L3Bh
|
|
IjskRU5WeyJQQVJBTExFTF9QSUQifT0iMTkyODUyMCI7JEVOVnsiUEFSQUxMRUxfU0VRI
|
|
0BiYXNoX2Z1bmN0aW9ucz1xdyhteV9mdW5jMyk7IGlmKCRFTlZ7IlNIRUxMIn09fi9jc2
|
|
ByaW50IFNUREVSUiAiQ1NIL1RDU0ggRE8gTk9UIFNVUFBPUlQgbmV3bGluZXMgSU4gVkF
|
|
TL0ZVTkNUSU9OUy4gVW5zZXQgQGJhc2hfZnVuY3Rpb25zXG4iOyBleGVjICJmYWxzZSI7
|
|
YXNoZnVuYyA9ICJteV9mdW5jMygpIHsgIGVjaG8gaW4gbXlfZnVuYyBcJDEgPiBcJDEub
|
|
Xhwb3J0IC1mIG15X2Z1bmMzID4vZGV2L251bGw7IjtAQVJHVj0ibXlfZnVuYzMgYWJjLW
|
|
RzaGVsbD0iJEVOVntTSEVMTH0iOyR0bXBkaXI9Ii90bXAiOyRuaWNlPTE3O2RveyRFTlZ
|
|
MRUxfVE1QfT0kdG1wZGlyLiIvcGFyIi5qb2luIiIsbWFweygwLi45LCJhIi4uInoiLCJB
|
|
KVtyYW5kKDYyKV19KDEuLjUpO313aGlsZSgtZSRFTlZ7UEFSQUxMRUxfVE1QfSk7JFNJ
|
|
fT1zdWJ7JGRvbmU9MTt9OyRwaWQ9Zm9yazt1bmxlc3MoJHBpZCl7c2V0cGdycDtldmFse
|
|
W9yaXR5KDAsMCwkbmljZSl9O2V4ZWMkc2hlbGwsIi1jIiwoJGJhc2hmdW5jLiJAQVJHVi
|
|
JleGVjOiQhXG4iO31kb3skcz0kczwxPzAuMDAxKyRzKjEuMDM6JHM7c2VsZWN0KHVuZGV
|
|
mLHVuZGVmLCRzKTt9dW50aWwoJGRvbmV8fGdldHBwaWQ9PTEpO2tpbGwoU0lHSFVQLC0k
|
|
dW5sZXNzJGRvbmU7d2FpdDtleGl0KCQ/JjEyNz8xMjgrKCQ/JjEyNyk6MSskPz4+OCk=;
|
|
_EXIT_status=$?; mkdir -p ./.; rsync --protocol 30 --rsync-path=cd\
|
|
./.parallel/tmp/aspire-1928520-1/./.\;\ rsync -rlDzR -essh
|
|
server:./abc-file.out ./.;ssh server -- \(rm\ -f\
|
|
./.parallel/tmp/aspire-1928520-1/abc-file\;\ sh\ -c\ \'rmdir\
|
|
./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\ ./.parallel/\
|
|
2\>/dev/null\'\;rm\ -rf\ ./.parallel/tmp/aspire-1928520-1\;\);ssh
|
|
server -- \(rm\ -f\ ./.parallel/tmp/aspire-1928520-1/abc-file.out\;\
|
|
sh\ -c\ \'rmdir\ ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\
|
|
./.parallel/\ 2\>/dev/null\'\;rm\ -rf\
|
|
./.parallel/tmp/aspire-1928520-1\;\);ssh server -- rm -rf
|
|
.parallel/tmp/aspire-1928520-1; exit $_EXIT_status;
|
|
|
|
=head1 Saving output to shell variables (advanced)
|
|
|
|
GNU B<parset> will set shell variables to the output of GNU
|
|
B<parallel>. GNU B<parset> has one important limitation: It cannot be
|
|
part of a pipe. In particular this means it cannot read anything from
|
|
standard input (stdin) or pipe output to another program.
|
|
|
|
To use GNU B<parset> prepend command with destination variables:
|
|
|
|
parset myvar1,myvar2 echo ::: a b
|
|
echo $myvar1
|
|
echo $myvar2
|
|
|
|
Output:
|
|
|
|
a
|
|
b
|
|
|
|
If you only give a single variable, it will be treated as an array:
|
|
|
|
parset myarray seq {} 5 ::: 1 2 3
|
|
echo "${myarray[1]}"
|
|
|
|
Output:
|
|
|
|
2
|
|
3
|
|
4
|
|
5
|
|
|
|
The commands to run can be an array:
|
|
|
|
cmd=("echo '<<joe \"double space\" cartoon>>'" "pwd")
|
|
parset data ::: "${cmd[@]}"
|
|
echo "${data[0]}"
|
|
echo "${data[1]}"
|
|
|
|
Output:
|
|
|
|
<<joe "double space" cartoon>>
|
|
[current dir]
|
|
|
|
|
|
=head1 Saving to an SQL base (advanced)
|
|
|
|
GNU B<parallel> can save into an SQL base. Point GNU B<parallel> to a
|
|
table and it will put the joblog there together with the variables and
|
|
the output each in their own column.
|
|
|
|
=head2 CSV as SQL base
|
|
|
|
The simplest is to use a CSV file as the storage table:
|
|
|
|
parallel --sqlandworker csv:///%2Ftmp/log.csv \
|
|
seq ::: 10 ::: 12 13 14
|
|
cat /tmp/log.csv
|
|
|
|
Note how '/' in the path must be written as %2F.
|
|
|
|
Output will be similar to:
|
|
|
|
Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal,
|
|
Command,V1,V2,Stdout,Stderr
|
|
1,:,1458254498.254,0.069,0,9,0,0,"seq 10 12",10,12,"10
|
|
11
|
|
12
|
|
",
|
|
2,:,1458254498.278,0.080,0,12,0,0,"seq 10 13",10,13,"10
|
|
11
|
|
12
|
|
13
|
|
",
|
|
3,:,1458254498.301,0.083,0,15,0,0,"seq 10 14",10,14,"10
|
|
11
|
|
12
|
|
13
|
|
14
|
|
",
|
|
|
|
A proper CSV reader (like LibreOffice or R's read.csv) will read this
|
|
format correctly - even with fields containing newlines as above.
|
|
|
|
If the output is big you may want to put it into files using B<--results>:
|
|
|
|
parallel --results outdir --sqlandworker csv:///%2Ftmp/log2.csv \
|
|
seq ::: 10 ::: 12 13 14
|
|
cat /tmp/log2.csv
|
|
|
|
Output will be similar to:
|
|
|
|
Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal,
|
|
Command,V1,V2,Stdout,Stderr
|
|
1,:,1458824738.287,0.029,0,9,0,0,
|
|
"seq 10 12",10,12,outdir/1/10/2/12/stdout,outdir/1/10/2/12/stderr
|
|
2,:,1458824738.298,0.025,0,12,0,0,
|
|
"seq 10 13",10,13,outdir/1/10/2/13/stdout,outdir/1/10/2/13/stderr
|
|
3,:,1458824738.309,0.026,0,15,0,0,
|
|
"seq 10 14",10,14,outdir/1/10/2/14/stdout,outdir/1/10/2/14/stderr
|
|
|
|
|
|
=head2 DBURL as table
|
|
|
|
The CSV file is an example of a DBURL.
|
|
|
|
GNU B<parallel> uses a DBURL to address the table. A DBURL has this format:
|
|
|
|
vendor://[[user][:password]@][host][:port]/[database[/table]
|
|
|
|
Example:
|
|
|
|
mysql://scott:tiger@my.example.com/mydatabase/mytable
|
|
postgresql://scott:tiger@pg.example.com/mydatabase/mytable
|
|
sqlite3:///%2Ftmp%2Fmydatabase/mytable
|
|
csv:///%2Ftmp/log.csv
|
|
|
|
To refer to B</tmp/mydatabase> with B<sqlite> or B<csv> you need to
|
|
encode the B</> as B<%2F>.
|
|
|
|
Run a job using B<sqlite> on B<mytable> in B</tmp/mydatabase>:
|
|
|
|
DBURL=sqlite3:///%2Ftmp%2Fmydatabase
|
|
DBURLTABLE=$DBURL/mytable
|
|
parallel --sqlandworker $DBURLTABLE echo ::: foo bar ::: baz quuz
|
|
|
|
To see the result:
|
|
|
|
sql $DBURL 'SELECT * FROM mytable ORDER BY Seq;'
|
|
|
|
Output will be similar to:
|
|
|
|
Seq|Host|Starttime|JobRuntime|Send|Receive|Exitval|_Signal|
|
|
Command|V1|V2|Stdout|Stderr
|
|
1|:|1451619638.903|0.806||8|0|0|echo foo baz|foo|baz|foo baz
|
|
|
|
|
2|:|1451619639.265|1.54||9|0|0|echo foo quuz|foo|quuz|foo quuz
|
|
|
|
|
3|:|1451619640.378|1.43||8|0|0|echo bar baz|bar|baz|bar baz
|
|
|
|
|
4|:|1451619641.473|0.958||9|0|0|echo bar quuz|bar|quuz|bar quuz
|
|
|
|
|
|
|
The first columns are well known from B<--joblog>. B<V1> and B<V2> are
|
|
data from the input sources. B<Stdout> and B<Stderr> are standard
|
|
output and standard error, respectively.
|
|
|
|
=head2 Using multiple workers
|
|
|
|
Using an SQL base as storage costs overhead in the order of 1 second
|
|
per job.
|
|
|
|
One of the situations where it makes sense is if you have multiple
|
|
workers.
|
|
|
|
You can then have a single master machine that submits jobs to the SQL
|
|
base (but does not do any of the work):
|
|
|
|
parallel --sqlmaster $DBURLTABLE echo ::: foo bar ::: baz quuz
|
|
|
|
On the worker machines you run exactly the same command except you
|
|
replace B<--sqlmaster> with B<--sqlworker>.
|
|
|
|
parallel --sqlworker $DBURLTABLE echo ::: foo bar ::: baz quuz
|
|
|
|
To run a master and a worker on the same machine use B<--sqlandworker>
|
|
as shown earlier.
|
|
|
|
|
|
=head1 --pipe
|
|
|
|
The B<--pipe> functionality puts GNU B<parallel> in a different mode:
|
|
Instead of treating the data on stdin (standard input) as arguments
|
|
for a command to run, the data will be sent to stdin (standard input)
|
|
of the command.
|
|
|
|
The typical situation is:
|
|
|
|
command_A | command_B | command_C
|
|
|
|
where command_B is slow, and you want to speed up command_B.
|
|
|
|
=head2 Chunk size
|
|
|
|
By default GNU B<parallel> will start an instance of command_B, read a
|
|
chunk of 1 MB, and pass that to the instance. Then start another
|
|
instance, read another chunk, and pass that to the second instance.
|
|
|
|
cat num1000000 | parallel --pipe wc
|
|
|
|
Output (the order may be different):
|
|
|
|
165668 165668 1048571
|
|
149797 149797 1048579
|
|
149796 149796 1048572
|
|
149797 149797 1048579
|
|
149797 149797 1048579
|
|
149796 149796 1048572
|
|
85349 85349 597444
|
|
|
|
The size of the chunk is not exactly 1 MB because GNU B<parallel> only
|
|
passes full lines - never half a line, thus the blocksize is only
|
|
1 MB on average. You can change the block size to 2 MB with B<--block>:
|
|
|
|
cat num1000000 | parallel --pipe --block 2M wc
|
|
|
|
Output (the order may be different):
|
|
|
|
315465 315465 2097150
|
|
299593 299593 2097151
|
|
299593 299593 2097151
|
|
85349 85349 597444
|
|
|
|
GNU B<parallel> treats each line as a record. If the order of records
|
|
is unimportant (e.g. you need all lines processed, but you do not care
|
|
which is processed first), then you can use B<--roundrobin>. Without
|
|
B<--roundrobin> GNU B<parallel> will start a command per block; with
|
|
B<--roundrobin> only the requested number of jobs will be started
|
|
(B<--jobs>). The records will then be distributed between the running
|
|
jobs:
|
|
|
|
cat num1000000 | parallel --pipe -j4 --roundrobin wc
|
|
|
|
Output will be similar to:
|
|
|
|
149797 149797 1048579
|
|
299593 299593 2097151
|
|
315465 315465 2097150
|
|
235145 235145 1646016
|
|
|
|
One of the 4 instances got a single record, 2 instances got 2 full
|
|
records each, and one instance got 1 full and 1 partial record.
|
|
|
|
=head2 Records
|
|
|
|
GNU B<parallel> sees the input as records. The default record is a single
|
|
line.
|
|
|
|
Using B<-N140000> GNU B<parallel> will read 140000 records at a time:
|
|
|
|
cat num1000000 | parallel --pipe -N140000 wc
|
|
|
|
Output (the order may be different):
|
|
|
|
140000 140000 868895
|
|
140000 140000 980000
|
|
140000 140000 980000
|
|
140000 140000 980000
|
|
140000 140000 980000
|
|
140000 140000 980000
|
|
140000 140000 980000
|
|
20000 20000 140001
|
|
|
|
Note how that the last job could not get the full 140000 lines, but
|
|
only 20000 lines.
|
|
|
|
If a record is 75 lines B<-L> can be used:
|
|
|
|
cat num1000000 | parallel --pipe -L75 wc
|
|
|
|
Output (the order may be different):
|
|
|
|
165600 165600 1048095
|
|
149850 149850 1048950
|
|
149775 149775 1048425
|
|
149775 149775 1048425
|
|
149850 149850 1048950
|
|
149775 149775 1048425
|
|
85350 85350 597450
|
|
25 25 176
|
|
|
|
Note how GNU B<parallel> still reads a block of around 1 MB; but
|
|
instead of passing full lines to B<wc> it passes full 75 lines at a
|
|
time. This of course does not hold for the last job (which in this
|
|
case got 25 lines).
|
|
|
|
=head2 Fixed length records
|
|
|
|
Fixed length records can be processed by setting B<--recend ''> and
|
|
B<--block I<recordsize>>. A header of size I<n> can be processed with
|
|
B<--header .{I<n>}>.
|
|
|
|
Here is how to process a file with a 4-byte header and a 3-byte record
|
|
size:
|
|
|
|
cat fixedlen | parallel --pipe --header .{4} --block 3 --recend '' \
|
|
'echo start; cat; echo'
|
|
|
|
Output:
|
|
|
|
start
|
|
HHHHAAA
|
|
start
|
|
HHHHCCC
|
|
start
|
|
HHHHBBB
|
|
|
|
It may be more efficient to increase B<--block> to a multiplum of the
|
|
record size.
|
|
|
|
=head2 Record separators
|
|
|
|
GNU B<parallel> uses separators to determine where two records split.
|
|
|
|
B<--recstart> gives the string that starts a record; B<--recend> gives the
|
|
string that ends a record. The default is B<--recend '\n'> (newline).
|
|
|
|
If both B<--recend> and B<--recstart> are given, then the record will only
|
|
split if the recend string is immediately followed by the recstart
|
|
string.
|
|
|
|
Here the B<--recend> is set to B<', '>:
|
|
|
|
echo /foo, bar/, /baz, qux/, | \
|
|
parallel -kN1 --recend ', ' --pipe echo JOB{#}\;cat\;echo END
|
|
|
|
Output:
|
|
|
|
JOB1
|
|
/foo, END
|
|
JOB2
|
|
bar/, END
|
|
JOB3
|
|
/baz, END
|
|
JOB4
|
|
qux/,
|
|
END
|
|
|
|
Here the B<--recstart> is set to B</>:
|
|
|
|
echo /foo, bar/, /baz, qux/, | \
|
|
parallel -kN1 --recstart / --pipe echo JOB{#}\;cat\;echo END
|
|
|
|
Output:
|
|
|
|
JOB1
|
|
/foo, barEND
|
|
JOB2
|
|
/, END
|
|
JOB3
|
|
/baz, quxEND
|
|
JOB4
|
|
/,
|
|
END
|
|
|
|
Here both B<--recend> and B<--recstart> are set:
|
|
|
|
echo /foo, bar/, /baz, qux/, | \
|
|
parallel -kN1 --recend ', ' --recstart / --pipe \
|
|
echo JOB{#}\;cat\;echo END
|
|
|
|
Output:
|
|
|
|
JOB1
|
|
/foo, bar/, END
|
|
JOB2
|
|
/baz, qux/,
|
|
END
|
|
|
|
Note the difference between setting one string and setting both strings.
|
|
|
|
With B<--regexp> the B<--recend> and B<--recstart> will be treated as
|
|
a regular expression:
|
|
|
|
echo foo,bar,_baz,__qux, | \
|
|
parallel -kN1 --regexp --recend ,_+ --pipe \
|
|
echo JOB{#}\;cat\;echo END
|
|
|
|
Output:
|
|
|
|
JOB1
|
|
foo,bar,_END
|
|
JOB2
|
|
baz,__END
|
|
JOB3
|
|
qux,
|
|
END
|
|
|
|
GNU B<parallel> can remove the record separators with
|
|
B<--remove-rec-sep>/B<--rrs>:
|
|
|
|
echo foo,bar,_baz,__qux, | \
|
|
parallel -kN1 --rrs --regexp --recend ,_+ --pipe \
|
|
echo JOB{#}\;cat\;echo END
|
|
|
|
Output:
|
|
|
|
JOB1
|
|
foo,barEND
|
|
JOB2
|
|
bazEND
|
|
JOB3
|
|
qux,
|
|
END
|
|
|
|
=head2 Header
|
|
|
|
If the input data has a header, the header can be repeated for each
|
|
job by matching the header with B<--header>. If headers start with
|
|
B<%> you can do this:
|
|
|
|
cat num_%header | \
|
|
parallel --header '(%.*\n)*' --pipe -N3 echo JOB{#}\;cat
|
|
|
|
Output (the order may be different):
|
|
|
|
JOB1
|
|
%head1
|
|
%head2
|
|
1
|
|
2
|
|
3
|
|
JOB2
|
|
%head1
|
|
%head2
|
|
4
|
|
5
|
|
6
|
|
JOB3
|
|
%head1
|
|
%head2
|
|
7
|
|
8
|
|
9
|
|
JOB4
|
|
%head1
|
|
%head2
|
|
10
|
|
|
|
If the header is 2 lines, B<--header> 2 will work:
|
|
|
|
cat num_%header | parallel --header 2 --pipe -N3 echo JOB{#}\;cat
|
|
|
|
Output: Same as above.
|
|
|
|
=head2 --pipepart
|
|
|
|
B<--pipe> is not very efficient. It maxes out at around 500
|
|
MB/s. B<--pipepart> can easily deliver 5 GB/s. But there are a few
|
|
limitations. The input has to be a normal file (not a pipe) given by
|
|
B<-a> or B<::::> and B<-L>/B<-l>/B<-N> do not work. B<--recend> and
|
|
B<--recstart>, however, I<do> work, and records can often be split on
|
|
that alone.
|
|
|
|
parallel --pipepart -a num1000000 --block 3m wc
|
|
|
|
Output (the order may be different):
|
|
|
|
444443 444444 3000002
|
|
428572 428572 3000004
|
|
126985 126984 888890
|
|
|
|
|
|
=head1 Shebang
|
|
|
|
=head2 Input data and parallel command in the same file
|
|
|
|
GNU B<parallel> is often called as this:
|
|
|
|
cat input_file | parallel command
|
|
|
|
With B<--shebang> the I<input_file> and B<parallel> can be combined into the same script.
|
|
|
|
UNIX shell scripts start with a shebang line like this:
|
|
|
|
#!/bin/bash
|
|
|
|
GNU B<parallel> can do that, too. With B<--shebang> the arguments can be
|
|
listed in the file. The B<parallel> command is the first line of the
|
|
script:
|
|
|
|
#!/usr/bin/parallel --shebang -r echo
|
|
|
|
foo
|
|
bar
|
|
baz
|
|
|
|
Output (the order may be different):
|
|
|
|
foo
|
|
bar
|
|
baz
|
|
|
|
=head2 Parallelizing existing scripts
|
|
|
|
GNU B<parallel> is often called as this:
|
|
|
|
cat input_file | parallel command
|
|
parallel command ::: foo bar
|
|
|
|
If B<command> is a script, B<parallel> can be combined into a single
|
|
file so this will run the script in parallel:
|
|
|
|
cat input_file | command
|
|
command foo bar
|
|
|
|
This B<perl> script B<perl_echo> works like B<echo>:
|
|
|
|
#!/usr/bin/perl
|
|
|
|
print "@ARGV\n"
|
|
|
|
It can be called as this:
|
|
|
|
parallel perl_echo ::: foo bar
|
|
|
|
By changing the B<#!>-line it can be run in parallel:
|
|
|
|
#!/usr/bin/parallel --shebang-wrap /usr/bin/perl
|
|
|
|
print "@ARGV\n"
|
|
|
|
Thus this will work:
|
|
|
|
perl_echo foo bar
|
|
|
|
Output (the order may be different):
|
|
|
|
foo
|
|
bar
|
|
|
|
This technique can be used for:
|
|
|
|
=over 9
|
|
|
|
=item Perl:
|
|
|
|
#!/usr/bin/parallel --shebang-wrap /usr/bin/perl
|
|
|
|
print "Arguments @ARGV\n";
|
|
|
|
|
|
=item Python:
|
|
|
|
#!/usr/bin/parallel --shebang-wrap /usr/bin/python
|
|
|
|
import sys
|
|
print 'Arguments', str(sys.argv)
|
|
|
|
|
|
=item Bash/sh/zsh/Korn shell:
|
|
|
|
#!/usr/bin/parallel --shebang-wrap /bin/bash
|
|
|
|
echo Arguments "$@"
|
|
|
|
|
|
=item csh:
|
|
|
|
#!/usr/bin/parallel --shebang-wrap /bin/csh
|
|
|
|
echo Arguments "$argv"
|
|
|
|
|
|
=item Tcl:
|
|
|
|
#!/usr/bin/parallel --shebang-wrap /usr/bin/tclsh
|
|
|
|
puts "Arguments $argv"
|
|
|
|
|
|
=item R:
|
|
|
|
#!/usr/bin/parallel --shebang-wrap /usr/bin/Rscript --vanilla --slave
|
|
|
|
args <- commandArgs(trailingOnly = TRUE)
|
|
print(paste("Arguments ",args))
|
|
|
|
|
|
=item GNUplot:
|
|
|
|
#!/usr/bin/parallel --shebang-wrap ARG={} /usr/bin/gnuplot
|
|
|
|
print "Arguments ", system('echo $ARG')
|
|
|
|
|
|
=item Ruby:
|
|
|
|
#!/usr/bin/parallel --shebang-wrap /usr/bin/ruby
|
|
|
|
print "Arguments "
|
|
puts ARGV
|
|
|
|
|
|
=item Octave:
|
|
|
|
#!/usr/bin/parallel --shebang-wrap /usr/bin/octave
|
|
|
|
printf ("Arguments");
|
|
arg_list = argv ();
|
|
for i = 1:nargin
|
|
printf (" %s", arg_list{i});
|
|
endfor
|
|
printf ("\n");
|
|
|
|
=item Common LISP:
|
|
|
|
#!/usr/bin/parallel --shebang-wrap /usr/bin/clisp
|
|
|
|
(format t "~&~S~&" 'Arguments)
|
|
(format t "~&~S~&" *args*)
|
|
|
|
=item PHP:
|
|
|
|
#!/usr/bin/parallel --shebang-wrap /usr/bin/php
|
|
<?php
|
|
echo "Arguments";
|
|
foreach(array_slice($argv,1) as $v)
|
|
{
|
|
echo " $v";
|
|
}
|
|
echo "\n";
|
|
?>
|
|
|
|
=item Node.js:
|
|
|
|
#!/usr/bin/parallel --shebang-wrap /usr/bin/node
|
|
|
|
var myArgs = process.argv.slice(2);
|
|
console.log('Arguments ', myArgs);
|
|
|
|
=item LUA:
|
|
|
|
#!/usr/bin/parallel --shebang-wrap /usr/bin/lua
|
|
|
|
io.write "Arguments"
|
|
for a = 1, #arg do
|
|
io.write(" ")
|
|
io.write(arg[a])
|
|
end
|
|
print("")
|
|
|
|
=item C#:
|
|
|
|
#!/usr/bin/parallel --shebang-wrap ARGV={} /usr/bin/csharp
|
|
|
|
var argv = Environment.GetEnvironmentVariable("ARGV");
|
|
print("Arguments "+argv);
|
|
|
|
=back
|
|
|
|
=head1 Semaphore
|
|
|
|
GNU B<parallel> can work as a counting semaphore. This is slower and less
|
|
efficient than its normal mode.
|
|
|
|
A counting semaphore is like a row of toilets. People needing a toilet
|
|
can use any toilet, but if there are more people than toilets, they
|
|
will have to wait for one of the toilets to become available.
|
|
|
|
An alias for B<parallel --semaphore> is B<sem>.
|
|
|
|
B<sem> will follow a person to the toilets, wait until a toilet is
|
|
available, leave the person in the toilet and exit.
|
|
|
|
B<sem --fg> will follow a person to the toilets, wait until a toilet is
|
|
available, stay with the person in the toilet and exit when the person
|
|
exits.
|
|
|
|
B<sem --wait> will wait for all persons to leave the toilets.
|
|
|
|
B<sem> does not have a queue discipline, so the next person is chosen
|
|
randomly.
|
|
|
|
B<-j> sets the number of toilets.
|
|
|
|
=head2 Mutex
|
|
|
|
The default is to have only one toilet (this is called a mutex). The
|
|
program is started in the background and B<sem> exits immediately. Use
|
|
B<--wait> to wait for all B<sem>s to finish:
|
|
|
|
sem 'sleep 1; echo The first finished' &&
|
|
echo The first is now running in the background &&
|
|
sem 'sleep 1; echo The second finished' &&
|
|
echo The second is now running in the background
|
|
sem --wait
|
|
|
|
Output:
|
|
|
|
The first is now running in the background
|
|
The first finished
|
|
The second is now running in the background
|
|
The second finished
|
|
|
|
The command can be run in the foreground with B<--fg>, which will only
|
|
exit when the command completes:
|
|
|
|
sem --fg 'sleep 1; echo The first finished' &&
|
|
echo The first finished running in the foreground &&
|
|
sem --fg 'sleep 1; echo The second finished' &&
|
|
echo The second finished running in the foreground
|
|
sem --wait
|
|
|
|
The difference between this and just running the command, is that a
|
|
mutex is set, so if other B<sem>s were running in the background only one
|
|
would run at a time.
|
|
|
|
To control which semaphore is used, use
|
|
B<--semaphorename>/B<--id>. Run this in one terminal:
|
|
|
|
sem --id my_id -u 'echo First started; sleep 10; echo First done'
|
|
|
|
and simultaneously this in another terminal:
|
|
|
|
sem --id my_id -u 'echo Second started; sleep 10; echo Second done'
|
|
|
|
Note how the second will only be started when the first has finished.
|
|
|
|
=head2 Counting semaphore
|
|
|
|
A mutex is like having a single toilet: When it is in use everyone
|
|
else will have to wait. A counting semaphore is like having multiple
|
|
toilets: Several people can use the toilets, but when they all are in
|
|
use, everyone else will have to wait.
|
|
|
|
B<sem> can emulate a counting semaphore. Use B<--jobs> to set the
|
|
number of toilets like this:
|
|
|
|
sem --jobs 3 --id my_id -u 'echo Start 1; sleep 5; echo 1 done' &&
|
|
sem --jobs 3 --id my_id -u 'echo Start 2; sleep 6; echo 2 done' &&
|
|
sem --jobs 3 --id my_id -u 'echo Start 3; sleep 7; echo 3 done' &&
|
|
sem --jobs 3 --id my_id -u 'echo Start 4; sleep 8; echo 4 done' &&
|
|
sem --wait --id my_id
|
|
|
|
Output:
|
|
|
|
Start 1
|
|
Start 2
|
|
Start 3
|
|
1 done
|
|
Start 4
|
|
2 done
|
|
3 done
|
|
4 done
|
|
|
|
=head2 Timeout
|
|
|
|
With B<--semaphoretimeout> you can force running the command anyway after
|
|
a period (positive number) or give up (negative number):
|
|
|
|
sem --id foo -u 'echo Slow started; sleep 5; echo Slow ended' &&
|
|
sem --id foo --semaphoretimeout 1 'echo Forced running after 1 sec' &&
|
|
sem --id foo --semaphoretimeout -2 'echo Give up after 2 secs'
|
|
sem --id foo --wait
|
|
|
|
Output:
|
|
|
|
Slow started
|
|
parallel: Warning: Semaphore timed out. Stealing the semaphore.
|
|
Forced running after 1 sec
|
|
parallel: Warning: Semaphore timed out. Exiting.
|
|
Slow ended
|
|
|
|
Note how the 'Give up' was not run.
|
|
|
|
=head1 Informational
|
|
|
|
GNU B<parallel> has some options to give short information about the
|
|
configuration.
|
|
|
|
B<--help> will print a summary of the most important options:
|
|
|
|
parallel --help
|
|
|
|
Output:
|
|
|
|
Usage:
|
|
|
|
parallel [options] [command [arguments]] < list_of_arguments
|
|
parallel [options] [command [arguments]] (::: arguments|:::: argfile(s))...
|
|
cat ... | parallel --pipe [options] [command [arguments]]
|
|
|
|
-j n Run n jobs in parallel
|
|
-k Keep same order
|
|
-X Multiple arguments with context replace
|
|
--colsep regexp Split input on regexp for positional replacements
|
|
{} {.} {/} {/.} {#} {%} {= perl code =} Replacement strings
|
|
{3} {3.} {3/} {3/.} {=3 perl code =} Positional replacement strings
|
|
With --plus: {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} =
|
|
{+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}
|
|
|
|
-S sshlogin Example: foo@server.example.com
|
|
--slf .. Use ~/.parallel/sshloginfile as the list of sshlogins
|
|
--trc {}.bar Shorthand for --transfer --return {}.bar --cleanup
|
|
--onall Run the given command with argument on all sshlogins
|
|
--nonall Run the given command with no arguments on all sshlogins
|
|
|
|
--pipe Split stdin (standard input) to multiple jobs.
|
|
--recend str Record end separator for --pipe.
|
|
--recstart str Record start separator for --pipe.
|
|
|
|
See 'man parallel' for details
|
|
|
|
Academic tradition requires you to cite works you base your article on.
|
|
When using programs that use GNU Parallel to process data for publication
|
|
please cite:
|
|
|
|
O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
|
|
;login: The USENIX Magazine, February 2011:42-47.
|
|
|
|
This helps funding further development; AND IT WON'T COST YOU A CENT.
|
|
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
|
|
|
|
When asking for help, always report the full output of this:
|
|
|
|
parallel --version
|
|
|
|
Output:
|
|
|
|
GNU parallel 20200122
|
|
Copyright (C) 2007-2020 Ole Tange, http://ole.tange.dk and Free Software
|
|
Foundation, Inc.
|
|
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
|
|
This is free software: you are free to change and redistribute it.
|
|
GNU parallel comes with no warranty.
|
|
|
|
Web site: http://www.gnu.org/software/parallel
|
|
|
|
When using programs that use GNU Parallel to process data for publication
|
|
please cite as described in 'parallel --citation'.
|
|
|
|
In scripts B<--minversion> can be used to ensure the user has at least
|
|
this version:
|
|
|
|
parallel --minversion 20130722 && \
|
|
echo Your version is at least 20130722.
|
|
|
|
Output:
|
|
|
|
20160322
|
|
Your version is at least 20130722.
|
|
|
|
If you are using GNU B<parallel> for research the BibTeX citation can be
|
|
generated using B<--citation>:
|
|
|
|
parallel --citation
|
|
|
|
Output:
|
|
|
|
Academic tradition requires you to cite works you base your article on.
|
|
When using programs that use GNU Parallel to process data for publication
|
|
please cite:
|
|
|
|
@article{Tange2011a,
|
|
title = {GNU Parallel - The Command-Line Power Tool},
|
|
author = {O. Tange},
|
|
address = {Frederiksberg, Denmark},
|
|
journal = {;login: The USENIX Magazine},
|
|
month = {Feb},
|
|
number = {1},
|
|
volume = {36},
|
|
url = {http://www.gnu.org/s/parallel},
|
|
year = {2011},
|
|
pages = {42-47},
|
|
doi = {10.5281/zenodo.16303}
|
|
}
|
|
|
|
(Feel free to use \nocite{Tange2011a})
|
|
|
|
This helps funding further development; AND IT WON'T COST YOU A CENT.
|
|
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
|
|
|
|
If you send a copy of your published article to tange@gnu.org, it will be
|
|
mentioned in the release notes of next version of GNU Parallel.
|
|
|
|
With B<--max-line-length-allowed> GNU B<parallel> will report the maximal
|
|
size of the command line:
|
|
|
|
parallel --max-line-length-allowed
|
|
|
|
Output (may vary on different systems):
|
|
|
|
131071
|
|
|
|
B<--number-of-cpus> and B<--number-of-cores> run system specific code to
|
|
determine the number of CPUs and CPU cores on the system. On
|
|
unsupported platforms they will return 1:
|
|
|
|
parallel --number-of-cpus
|
|
parallel --number-of-cores
|
|
|
|
Output (may vary on different systems):
|
|
|
|
4
|
|
64
|
|
|
|
=head1 Profiles
|
|
|
|
The defaults for GNU B<parallel> can be changed systemwide by putting the
|
|
command line options in B</etc/parallel/config>. They can be changed for
|
|
a user by putting them in B<~/.parallel/config>.
|
|
|
|
Profiles work the same way, but have to be referred to with B<--profile>:
|
|
|
|
echo '--nice 17' > ~/.parallel/nicetimeout
|
|
echo '--timeout 300%' >> ~/.parallel/nicetimeout
|
|
parallel --profile nicetimeout echo ::: A B C
|
|
|
|
Output:
|
|
|
|
A
|
|
B
|
|
C
|
|
|
|
Profiles can be combined:
|
|
|
|
echo '-vv --dry-run' > ~/.parallel/dryverbose
|
|
parallel --profile dryverbose --profile nicetimeout echo ::: A B C
|
|
|
|
Output:
|
|
|
|
echo A
|
|
echo B
|
|
echo C
|
|
|
|
|
|
=head1 Spread the word
|
|
|
|
I hope you have learned something from this tutorial.
|
|
|
|
If you like GNU B<parallel>:
|
|
|
|
=over 2
|
|
|
|
=item *
|
|
|
|
(Re-)walk through the tutorial if you have not done so in the past year
|
|
(http://www.gnu.org/software/parallel/parallel_tutorial.html)
|
|
|
|
=item *
|
|
|
|
Give a demo at your local user group/your team/your colleagues
|
|
|
|
=item *
|
|
|
|
Post the intro videos and the tutorial on Reddit, Mastodon, Diaspora*,
|
|
forums, blogs, Identi.ca, Google+, Twitter, Facebook, Linkedin, and
|
|
mailing lists
|
|
|
|
=item *
|
|
|
|
Request or write a review for your favourite blog or magazine
|
|
(especially if you do something cool with GNU B<parallel>)
|
|
|
|
=item *
|
|
|
|
Invite me for your next conference
|
|
|
|
=back
|
|
|
|
If you use GNU B<parallel> for research:
|
|
|
|
=over 2
|
|
|
|
=item *
|
|
|
|
Please cite GNU B<parallel> in you publications (use B<--citation>)
|
|
|
|
=back
|
|
|
|
If GNU B<parallel> saves you money:
|
|
|
|
=over 2
|
|
|
|
=item *
|
|
|
|
(Have your company) donate to FSF or become a member
|
|
https://my.fsf.org/donate/
|
|
|
|
=back
|
|
|
|
(C) 2013-2020 Ole Tange, FDLv1.3 (See fdl.txt)
|
|
|
|
|
|
=cut
|