mirror of
https://git.savannah.gnu.org/git/parallel.git
synced 2024-11-22 22:17:54 +00:00
2904 lines
82 KiB
Plaintext
2904 lines
82 KiB
Plaintext
#!/usr/bin/perl -w
|
|
|
|
=head1 NAME
|
|
|
|
parallel - build and execute shell command lines from standard input in parallel
|
|
|
|
|
|
=head1 SYNOPSIS
|
|
|
|
B<parallel> [options] [I<command> [arguments]] < list_of_arguments
|
|
|
|
B<parallel> [options] [I<command> [arguments]] B<:::> arguments
|
|
|
|
B<parallel> [options] [I<command> [arguments]] B<::::> argfile(s)
|
|
|
|
B<parallel> --semaphore [options] I<command>
|
|
|
|
B<#!/usr/bin/parallel> --shebang [options] [I<command> [arguments]]
|
|
|
|
|
|
=head1 DESCRIPTION
|
|
|
|
GNU B<parallel> is a shell tool for executing jobs in parallel using one
|
|
or more computers. A job is can be a single command or a small script
|
|
that has to be run for each of the lines in the input. The typical
|
|
input is a list of files, a list of hosts, a list of users, a list of
|
|
URLs, or a list of tables. A job can also be a command that reads from
|
|
a pipe. GNU B<parallel> can then split the input and pipe it into
|
|
commands in parallel.
|
|
|
|
If you use xargs and tee today you will find GNU B<parallel> very easy to
|
|
use as GNU B<parallel> is written to have the same options as xargs. If
|
|
you write loops in shell, you will find GNU B<parallel> may be able to
|
|
replace most of the loops and make them run faster by running several
|
|
jobs in parallel.
|
|
|
|
GNU B<parallel> makes sure output from the commands is the same output as
|
|
you would get had you run the commands sequentially. This makes it
|
|
possible to use output from GNU B<parallel> as input for other programs.
|
|
|
|
For each line of input GNU B<parallel> will execute I<command> with
|
|
the line as arguments. If no I<command> is given, the line of input is
|
|
executed. Several lines will be run in parallel. GNU B<parallel> can
|
|
often be used as a substitute for B<xargs> or B<cat | bash>.
|
|
|
|
Before looking at the options you may want to check out the B<EXAMPLE>s
|
|
after the list of options. That will give you an idea of what GNU
|
|
B<parallel> is capable of.
|
|
|
|
You can also watch the intro video for a quick introduction:
|
|
http://tinyogg.com/watch/TORaR/ http://tinyogg.com/watch/hfxKj/ and
|
|
http://tinyogg.com/watch/YQuXd/ or
|
|
http://www.youtube.com/watch?v=OpaiGYxkSuQ and
|
|
http://www.youtube.com/watch?v=1ntxT-47VPA
|
|
|
|
|
|
=head1 OPTIONS
|
|
|
|
=over 9
|
|
|
|
=item I<command>
|
|
|
|
Command to execute. If I<command> or the following arguments contain
|
|
{} every instance will be substituted with the input line.
|
|
|
|
If I<command> is given, GNU B<parallel> will behave similar to B<xargs>. If
|
|
I<command> is not given GNU B<parallel> will behave similar to B<cat | sh>.
|
|
|
|
The I<command> must be an executable, a script or a composed command: an
|
|
alias or a function will not work (see why
|
|
http://www.perlmonks.org/index.pl?node_id=484296).
|
|
|
|
|
|
=item B<{}>
|
|
|
|
Input line. This is the default replacement string and will normally
|
|
be used for putting the argument in the command line. It can be
|
|
changed with B<-I>.
|
|
|
|
|
|
=item B<{.}>
|
|
|
|
Input line without extension. This is a specialized replacement string
|
|
with the extension removed. If the input line contains B<.> after the
|
|
last B</> the last B<.> till the end of the string will be removed and
|
|
B<{.}> will be replaced with the remaining. E.g. I<foo.jpg> becomes
|
|
I<foo>, I<subdir/foo.jpg> becomes I<subdir/foo>, I<sub.dir/foo.jpg>
|
|
becomes I<sub.dir/foo>, I<sub.dir/bar> remains I<sub.dir/bar>. If the
|
|
input line does not contain B<.> it will remain unchanged.
|
|
|
|
B<{.}> can be used the same places as B<{}>. The replacement string
|
|
B<{.}> can be changed with B<-U>.
|
|
|
|
|
|
=item B<{/}>
|
|
|
|
Basename of input line. This is a specialized replacement string
|
|
with the directory part removed.
|
|
|
|
B<{/}> can be used the same places as B<{}>. The replacement string
|
|
B<{/}> can be changed with B<--basenamereplace>.
|
|
|
|
|
|
=item B<{//}>
|
|
|
|
Dirname of input line. This is a specialized replacement string
|
|
containing the dir of the input. See B<dirname>(1).
|
|
|
|
B<{//}> can be used the same places as B<{}>. The replacement string
|
|
B<{//}> can be changed with B<--dirnamereplace>.
|
|
|
|
|
|
=item B<{/.}>
|
|
|
|
Basename of input line without extension. This is a specialized
|
|
replacement string with the directory and extension part removed. It
|
|
is a combination of B<{/}> and B<{.}>.
|
|
|
|
B<{/.}> can be used the same places as B<{}>. The replacement string
|
|
B<{/.}> can be changed with B<--basenameextensionreplace>.
|
|
|
|
|
|
=item B<{#}> (alpha testing)
|
|
|
|
Sequence number of the job to run. The same as $PARALLEL_SEQ.
|
|
|
|
The replacement string B<{#}> can be changed with B<--seqreplace>.
|
|
|
|
|
|
=item B<{>I<n>B<}>
|
|
|
|
Argument from argument file I<n> or the I<n>'th argument. See B<-a>
|
|
and B<-N>.
|
|
|
|
B<{>I<n>B<}> can be used the same places as B<{}>.
|
|
|
|
|
|
=item B<{>I<n>.B<}>
|
|
|
|
Argument from argument file I<n> or the I<n>'th argument without
|
|
extension. It is a combination of B<{>I<n>B<}> and B<{.}>.
|
|
|
|
B<{>I<n>.B<}> can be used the same places as B<{>I<n>B<}>.
|
|
|
|
|
|
=item B<{>I<n>/B<}>
|
|
|
|
Basename of argument from argument file I<n> or the I<n>'th argument.
|
|
It is a combination of B<{>I<n>B<}> and B<{/}>. See B<-a> and B<-N>.
|
|
|
|
B<{>I<n>/B<}> can be used the same places as B<{>I<n>B<}>.
|
|
|
|
|
|
=item B<{>I<n>/.B<}>
|
|
|
|
Basename of argument from argument file I<n> or the I<n>'th argument
|
|
without extension. It is a combination of B<{>I<n>B<}>, B<{/}>, and
|
|
B<{.}>. See B<-a> and B<-N>.
|
|
|
|
B<{>I<n>/.B<}> can be used the same places as B<{>I<n>B<}>.
|
|
|
|
|
|
|
|
=item B<:::> I<arguments>
|
|
|
|
Use arguments from the command line as input instead of from stdin
|
|
(standard input). Unlike other options for GNU B<parallel> B<:::> is
|
|
placed after the I<command> and before the arguments.
|
|
|
|
The following are equivalent:
|
|
|
|
(echo file1; echo file2) | parallel gzip
|
|
parallel gzip ::: file1 file2
|
|
parallel gzip {} ::: file1 file2
|
|
parallel --arg-sep ,, gzip {} ,, file1 file2
|
|
parallel --arg-sep ,, gzip ,, file1 file2
|
|
parallel ::: "gzip file1" "gzip file2"
|
|
|
|
To avoid treating B<:::> as special use B<--arg-sep> to set the
|
|
argument separator to something else. See also B<--arg-sep>.
|
|
|
|
stdin (standard input) will be passed to the first process run.
|
|
|
|
If B<--arg-file> is set arguments from that file will be appended.
|
|
|
|
|
|
=item B<::::> I<argfiles>
|
|
|
|
Another way to write B<-a> I<argfile1> B<-a> I<argfile2> ...
|
|
|
|
See B<-a>.
|
|
|
|
|
|
=item B<--null>
|
|
|
|
=item B<-0>
|
|
|
|
Use NUL as delimiter. Normally input lines will end in \n
|
|
(newline). If they end in \0 (NUL), then use this option. It is useful
|
|
for processing arguments that may contain \n (newline).
|
|
|
|
|
|
=item B<--arg-file> I<input-file>
|
|
|
|
=item B<-a> I<input-file>
|
|
|
|
Read items from the file I<input-file> instead of stdin (standard input). If
|
|
you use this option, stdin is given to the first process run.
|
|
Otherwise, stdin is redirected from /dev/null.
|
|
|
|
If multiple B<-a> are given, one line will be read from each of the
|
|
files. The arguments can be accessed in the command as B<{1}>
|
|
.. B<{>I<n>B<}>, so B<{1}> will be a line from the first file, and
|
|
B<{6}> will refer to the line with the same line number from the 6th
|
|
file.
|
|
|
|
|
|
=item B<--arg-file-sep> I<sep-str>
|
|
|
|
Use I<sep-str> instead of B<::::> as separator string between command
|
|
and argument files. Useful if B<::::> is used for something else by the
|
|
command.
|
|
|
|
See also: B<::::>.
|
|
|
|
|
|
=item B<--arg-sep> I<sep-str>
|
|
|
|
Use I<sep-str> instead of B<:::> as separator string. Useful if B<:::>
|
|
is used for something else by the command.
|
|
|
|
Also useful if you command uses B<:::> but you still want to read
|
|
arguments from stdin (standard input): Simply change B<--arg-sep> to a
|
|
string that is not in the command line.
|
|
|
|
See also: B<:::>.
|
|
|
|
|
|
=item B<--basefile> I<file>
|
|
|
|
=item B<-B> I<file>
|
|
|
|
I<file> will be transferred to each sshlogin before a jobs is
|
|
started. It will be removed if B<--cleanup> is active. The file may be
|
|
a script to run or some common base data needed for the jobs.
|
|
Multiple B<-B> can be specified to transfer more basefiles. The
|
|
I<file> will be transferred the same way as B<--transfer>.
|
|
|
|
|
|
=item B<--basenamereplace> I<replace-str>
|
|
|
|
=item B<--bnr> I<replace-str>
|
|
|
|
Use the replacement string I<replace-str> instead of B<{/}> for
|
|
basename of input line.
|
|
|
|
|
|
=item B<--basenameextensionreplace> I<replace-str>
|
|
|
|
Use the replacement string I<replace-str> instead of B<{/.}> for basename of input line without extension.
|
|
|
|
|
|
=item B<--bg> (beta testing)
|
|
|
|
Run command in background thus GNU B<parallel> will not wait for
|
|
completion of the command before exiting. This is the default if
|
|
B<--semaphore> is set.
|
|
|
|
See also: B<--fg>
|
|
|
|
Implies B<--semaphore>.
|
|
|
|
|
|
=item B<--block> I<size> (beta testing)
|
|
|
|
=item B<--block-size> I<size> (beta testing)
|
|
|
|
Size of block in bytes. The size can be postfixed with K, M, G, or T
|
|
which would multiply the size with 1024, 1048576, 1073741824, or
|
|
1099511627776 respectively.
|
|
|
|
GNU B<parallel> tries to meet the block size but can be off by the
|
|
length of one record.
|
|
|
|
I<size> defaults to 1M.
|
|
|
|
See B<--pipe> for use of this.
|
|
|
|
|
|
=item B<--cleanup>
|
|
|
|
Remove transferred files. B<--cleanup> will remove the transferred files
|
|
on the remote computer after processing is done.
|
|
|
|
find log -name '*gz' | parallel \
|
|
--sshlogin server.example.com --transfer --return {.}.bz2 \
|
|
--cleanup "zcat {} | bzip -9 >{.}.bz2"
|
|
|
|
With B<--transfer> the file transferred to the remote computer will be
|
|
removed on the remote computer. Directories created will not be removed
|
|
- even if they are empty.
|
|
|
|
With B<--return> the file transferred from the remote computer will be
|
|
removed on the remote computer. Directories created will not be removed
|
|
- even if they are empty.
|
|
|
|
B<--cleanup> is ignored when not used with B<--transfer> or B<--return>.
|
|
|
|
|
|
=item B<--colsep> I<regexp>
|
|
|
|
=item B<-C> I<regexp>
|
|
|
|
Column separator. The input will be treated as a table with I<regexp>
|
|
separating the columns. The n'th column can be access using
|
|
B<{>I<n>B<}> or B<{>I<n>.B<}>. E.g. B<{3}> is the 3rd column.
|
|
|
|
B<--colsep> implies B<--trim rl>.
|
|
|
|
I<regexp> is a Perl Regular Expression:
|
|
http://perldoc.perl.org/perlre.html
|
|
|
|
|
|
=item B<--delimiter> I<delim>
|
|
|
|
=item B<-d> I<delim>
|
|
|
|
Input items are terminated by the specified character. Quotes and
|
|
backslash are not special; every character in the input is taken
|
|
literally. Disables the end-of-file string, which is treated like any
|
|
other argument. This can be used when the input consists of simply
|
|
newline-separated items, although it is almost always better to design
|
|
your program to use --null where this is possible. The specified
|
|
delimiter may be a single character, a C-style character escape such
|
|
as \n, or an octal or hexadecimal escape code. Octal and
|
|
hexadecimal escape codes are understood as for the printf command.
|
|
Multibyte characters are not supported.
|
|
|
|
=item B<--dirnamereplace> I<replace-str> (alpha testing)
|
|
|
|
=item B<--dnr> I<replace-str> (alpha testing)
|
|
|
|
Use the replacement string I<replace-str> instead of B<{//}> for
|
|
dirname of input line.
|
|
|
|
|
|
=item B<-E> I<eof-str>
|
|
|
|
Set the end of file string to eof-str. If the end of file string
|
|
occurs as a line of input, the rest of the input is ignored. If
|
|
neither B<-E> nor B<-e> is used, no end of file string is used.
|
|
|
|
|
|
=item B<--dry-run>
|
|
|
|
Print the job to run on standard output, but do not run the job. Use
|
|
B<-v -v> to include the ssh/rsync wrapping if the job would be run on
|
|
a remote computer. Do not count on this literaly, though, as the job
|
|
may be scheduled on another computer or the local computer if : is in
|
|
the list.
|
|
|
|
|
|
=item B<--eof>[=I<eof-str>]
|
|
|
|
=item B<-e>[I<eof-str>]
|
|
|
|
This option is a synonym for the B<-E> option. Use B<-E> instead,
|
|
because it is POSIX compliant for B<xargs> while this option is not.
|
|
If I<eof-str> is omitted, there is no end of file string. If neither
|
|
B<-E> nor B<-e> is used, no end of file string is used.
|
|
|
|
|
|
=item B<--eta> (alpha testing)
|
|
|
|
Show the estimated number of seconds before finishing. This forces GNU
|
|
B<parallel> to read all jobs before starting to find the number of
|
|
jobs. GNU B<parallel> normally only reads the next job to run.
|
|
Implies B<--progress>.
|
|
|
|
|
|
=item B<--fg> (beta testing)
|
|
|
|
Run command in foreground thus GNU B<parallel> will wait for
|
|
completion of the command before exiting.
|
|
|
|
See also: B<--bg>
|
|
|
|
Implies B<--semaphore>.
|
|
|
|
|
|
=item B<--gnu>
|
|
|
|
Behave like GNU B<parallel>. If B<--tollef> and B<--gnu> are both set,
|
|
B<--gnu> takes precedence.
|
|
|
|
|
|
=item B<--group>
|
|
|
|
=item B<-g>
|
|
|
|
Group output. Output from each jobs is grouped together and is only
|
|
printed when the command is finished. STDERR first followed by STDOUT.
|
|
B<-g> is the default. Can be reversed with B<-u>.
|
|
|
|
|
|
=item B<--help>
|
|
|
|
=item B<-h>
|
|
|
|
Print a summary of the options to GNU B<parallel> and exit.
|
|
|
|
|
|
=item B<--halt-on-error> <0|1|2>
|
|
|
|
=item B<-H> <0|1|2>
|
|
|
|
=over 3
|
|
|
|
=item 0
|
|
|
|
Do not halt if a job fails. Exit status will be the number of jobs
|
|
failed. This is the default.
|
|
|
|
=item 1
|
|
|
|
Do not start new jobs if a job fails, but complete the running jobs
|
|
including cleanup. The exit status will be the exit status from the
|
|
last failing job.
|
|
|
|
=item 2
|
|
|
|
Kill off all jobs immediately and exit without cleanup. The exit
|
|
status will be the exit status from the failing job.
|
|
|
|
=back
|
|
|
|
|
|
=item B<-I> I<replace-str>
|
|
|
|
Use the replacement string I<replace-str> instead of {}.
|
|
|
|
|
|
=item B<--replace>[=I<replace-str>]
|
|
|
|
=item B<-i>[I<replace-str>]
|
|
|
|
This option is a synonym for B<-I>I<replace-str> if I<replace-str> is
|
|
specified, and for B<-I>{} otherwise. This option is deprecated;
|
|
use B<-I> instead.
|
|
|
|
|
|
=item B<--joblog> I<logfile> (beta testing)
|
|
|
|
Logfile for executed jobs. Saved a list of the executed jobs to
|
|
I<logfile> in the following TAB separated format: sequence number,
|
|
sshlogin, start time as seconds since epoch, run time in seconds,
|
|
bytes in files transfered, bytes in files returned, exit status,
|
|
and command run.
|
|
|
|
To convert the times into ISO-8601 strict do:
|
|
|
|
B<perl -a -F"\t" -ne 'chomp($F[2]=`date -d \@$F[2] +%FT%T`); print join("\t",@F)'>
|
|
|
|
|
|
=item B<--jobs> I<N>
|
|
|
|
=item B<-j> I<N>
|
|
|
|
=item B<--max-procs> I<N>
|
|
|
|
=item B<-P> I<N>
|
|
|
|
Number of jobslots. Run up to N jobs in parallel. 0 means as many as
|
|
possible. Default is 100% which will run one job per CPU core.
|
|
|
|
If B<--semaphore> is set default is 1 thus making a mutex.
|
|
|
|
|
|
=item B<--jobs> I<+N>
|
|
|
|
=item B<-j> I<+N>
|
|
|
|
=item B<--max-procs> I<+N>
|
|
|
|
=item B<-P> I<+N>
|
|
|
|
Add N to the number of CPU cores. Run this many jobs in parallel. For
|
|
compute intensive jobs B<-j> +0 is useful as it will run
|
|
number-of-cpu-cores jobs simultaneously. See also
|
|
B<--use-cpus-instead-of-cores>.
|
|
|
|
|
|
=item B<--jobs> I<-N>
|
|
|
|
=item B<-j> I<-N>
|
|
|
|
=item B<--max-procs> I<-N>
|
|
|
|
=item B<-P> I<-N>
|
|
|
|
Subtract N from the number of CPU cores. Run this many jobs in parallel.
|
|
If the evaluated number is less than 1 then 1 will be used. See also
|
|
B<--use-cpus-instead-of-cores>.
|
|
|
|
|
|
=item B<--jobs> I<N>%
|
|
|
|
=item B<-j> I<N>%
|
|
|
|
=item B<--max-procs> I<N>%
|
|
|
|
=item B<-P> I<N>%
|
|
|
|
Multiply N% with the number of CPU cores. Run this many jobs in parallel.
|
|
If the evaluated number is less than 1 then 1 will be used. See also
|
|
B<--use-cpus-instead-of-cores>.
|
|
|
|
|
|
=item B<--jobs> I<procfile>
|
|
|
|
=item B<-j> I<procfile>
|
|
|
|
=item B<--max-procs> I<procfile>
|
|
|
|
=item B<-P> I<procfile>
|
|
|
|
Read parameter from file. Use the content of I<procfile> as parameter
|
|
for I<-j>. E.g. I<procfile> could contain the string 100% or +2 or
|
|
10. If I<procfile> is changed when a job completes, I<procfile> is
|
|
read again and the new number of jobs is computed. If the number is
|
|
lower than before, running jobs will be allowed to finish but new jobs
|
|
will not be started until the wanted number of jobs has been reached.
|
|
This makes it possible to change the number of simultaneous running
|
|
jobs while GNU B<parallel> is running.
|
|
|
|
|
|
=item B<--keeporder>
|
|
|
|
=item B<-k>
|
|
|
|
Keep sequence of output same as the order of input. If jobs 1 2 3 4
|
|
end in the sequence 3 1 4 2 the output will still be 1 2 3 4.
|
|
|
|
|
|
=item B<-L> I<max-lines>
|
|
|
|
Use at most I<max-lines> nonblank input lines per command line.
|
|
Trailing blanks cause an input line to be logically continued on the
|
|
next input line.
|
|
|
|
B<-L 0> means read one line, but insert 0 arguments on the command
|
|
line.
|
|
|
|
Implies B<-X> unless B<-m> is set.
|
|
|
|
|
|
=item B<--max-lines>[=I<max-lines>]
|
|
|
|
=item B<-l>[I<max-lines>]
|
|
|
|
Synonym for the B<-L> option. Unlike B<-L>, the I<max-lines> argument
|
|
is optional. If I<max-lines> is not specified, it defaults to one.
|
|
The B<-l> option is deprecated since the POSIX standard specifies
|
|
B<-L> instead.
|
|
|
|
B<-l 0> is an alias for B<-l 1>.
|
|
|
|
Implies B<-X> unless B<-m> is set.
|
|
|
|
|
|
=item B<--load> I<max-load> (experimental)
|
|
|
|
Do not start new jobs on a given computer unless the load is less than
|
|
I<max-load>. I<max-load> uses the same syntax as B<--jobs>, so I<100%>
|
|
for one per CPU is a valid setting.
|
|
|
|
The load average is only sampled every 10 seconds to avoid stressing
|
|
small computers.
|
|
|
|
|
|
=item B<--controlmaster> (experimental)
|
|
|
|
=item B<-M> (experimental)
|
|
|
|
Use ssh's ControlMaster to make ssh connections faster. Useful if jobs
|
|
run remote and are very fast to run. This is disabled for sshlogins
|
|
that specify their own ssh command.
|
|
|
|
|
|
=item B<--xargs>
|
|
|
|
=item B<-m>
|
|
|
|
Multiple. Insert as many arguments as the command line length
|
|
permits. If B<{}> is not used the arguments will be appended to the
|
|
line. If B<{}> is used multiple times each B<{}> will be replaced
|
|
with all the arguments.
|
|
|
|
Support for B<-m> with B<--sshlogin> is limited and may fail.
|
|
|
|
See also B<-X> for context replace. If in doubt use B<-X> as that will
|
|
most likely do what is needed.
|
|
|
|
|
|
=item B<--output-as-files> (beta testing)
|
|
|
|
=item B<--outputasfiles> (beta testing)
|
|
|
|
=item B<--files> (beta testing)
|
|
|
|
Instead of printing the output to stdout (standard output) the output
|
|
of each job is saved in a file and the filename is then printed.
|
|
|
|
|
|
=item B<--pipe> (beta testing)
|
|
|
|
=item B<--spreadstdin> (beta testing)
|
|
|
|
Spread input to jobs on stdin. Read a block of data from stdin
|
|
(standard input) and give one block of data as input to one job.
|
|
|
|
The block size is determined by B<--block>. The strings B<--recstart>
|
|
and B<--recend> tell GNU B<parallel> how a record starts and/or
|
|
ends. The block read will have the final partial record removed before
|
|
the block is passed on to the job. The partial record will be
|
|
prepended to next block.
|
|
|
|
If B<--recstart> is given this will be used to split at record start.
|
|
|
|
If B<--recend> is given this will be used to split at record end.
|
|
|
|
If both B<--recstart> and B<--recend> are given both will have to
|
|
match to find a split position.
|
|
|
|
If neither B<--recstart> nor B<--recend> are given B<--recend>
|
|
defaults to '\n'. To have no record separator use B<--recend "">.
|
|
|
|
B<--files> is often used with B<--pipe>.
|
|
|
|
|
|
=item B<--progress>
|
|
|
|
Show progress of computations. List the computers involved in the task
|
|
with number of CPU cores detected and the max number of jobs to
|
|
run. After that show progress for each computer: number of running
|
|
jobs, number of completed jobs, and percentage of all jobs done by
|
|
this computer. The percentage will only be available after all jobs
|
|
have been scheduled as GNU B<parallel> only read the next job when
|
|
ready to schedule it - this is to avoid wasting time and memory by
|
|
reading everything at startup.
|
|
|
|
By sending GNU B<parallel> SIGUSR2 you can toggle turning on/off
|
|
B<--progress> on a running GNU B<parallel> process.
|
|
|
|
|
|
=item B<--max-args>=I<max-args>
|
|
|
|
=item B<-n> I<max-args>
|
|
|
|
Use at most I<max-args> arguments per command line. Fewer than
|
|
I<max-args> arguments will be used if the size (see the B<-s> option)
|
|
is exceeded, unless the B<-x> option is given, in which case
|
|
GNU B<parallel> will exit.
|
|
|
|
B<-n 0> means read one argument, but insert 0 arguments on the command
|
|
line.
|
|
|
|
Implies B<-X> unless B<-m> is set.
|
|
|
|
|
|
=item B<--max-replace-args>=I<max-args>
|
|
|
|
=item B<-N> I<max-args>
|
|
|
|
Use at most I<max-args> arguments per command line. Like B<-n> but
|
|
also makes replacement strings B<{1}> .. B<{>I<max-args>B<}> that
|
|
represents argument 1 .. I<max-args>. If too few args the B<{>I<n>B<}> will
|
|
be empty.
|
|
|
|
B<-N 0> means read one argument, but insert 0 arguments on the command
|
|
line.
|
|
|
|
This will set the owner of the homedir to the user:
|
|
|
|
B<tr ':' '\012' < /etc/passwd | parallel -N7 chown {1} {6}>
|
|
|
|
Implies B<-X> unless B<-m> or <--pipe> is set.
|
|
|
|
When used with B<--pipe> B<-N> is the number of records to read. This
|
|
is much slower than B<--blocksize> so avoid it if performance is
|
|
important.
|
|
|
|
|
|
=item B<--max-line-length-allowed>
|
|
|
|
Print the maximal number characters allowed on the command line and
|
|
exit (used by GNU B<parallel> itself to determine the line length
|
|
on remote computers).
|
|
|
|
|
|
=item B<--number-of-cpus>
|
|
|
|
Print the number of physical CPUs and exit (used by GNU B<parallel>
|
|
itself to determine the number of physical CPUs on remote computers).
|
|
|
|
|
|
=item B<--number-of-cores>
|
|
|
|
Print the number of CPU cores and exit (used by GNU B<parallel> itself
|
|
to determine the number of CPU cores on remote computers).
|
|
|
|
|
|
=item B<--nice> I<niceness>
|
|
|
|
Run the command at this niceness. For simple commands you can just add
|
|
B<nice> in front of the command. But if the command consists of more
|
|
sub commands (Like: ls|wc) then prepending B<nice> will not always
|
|
work. B<--nice> will make sure all sub commands are niced.
|
|
|
|
|
|
=item B<--interactive>
|
|
|
|
=item B<-p>
|
|
|
|
Prompt the user about whether to run each command line and read a line
|
|
from the terminal. Only run the command line if the response starts
|
|
with 'y' or 'Y'. Implies B<-t>.
|
|
|
|
|
|
=item B<--profile> I<profilename>
|
|
|
|
=item B<-J> I<profilename>
|
|
|
|
Use profile I<profilename> for options. This is useful if you want to
|
|
have multiple profiles. You could have one profile for running jobs in
|
|
parallel on the local computer and a different profile for running jobs
|
|
on remote computers. See the section PROFILE FILES for examples.
|
|
|
|
I<profilename> corresponds to the file ~/.parallel/I<profilename>.
|
|
|
|
Default: config
|
|
|
|
=item B<--quote>
|
|
|
|
=item B<-q>
|
|
|
|
Quote I<command>. This will quote the command line so special
|
|
characters are not interpreted by the shell. See the section
|
|
QUOTING. Most people will never need this. Quoting is disabled by
|
|
default.
|
|
|
|
|
|
=item B<--no-run-if-empty>
|
|
|
|
=item B<-r>
|
|
|
|
If the stdin (standard input) only contains whitespace, do not run the command.
|
|
|
|
|
|
=item B<--recstart> I<startstring> (beta testing)
|
|
|
|
=item B<--recend> I<endstring> (beta testing)
|
|
|
|
If B<--recstart> is given I<startstring> will be used to split at record start.
|
|
|
|
If B<--recend> is given I<endstring> will be used to split at record end.
|
|
|
|
If both B<--recstart> and B<--recend> are given the string
|
|
I<startstring>I<endstring> will have to match to find a split
|
|
position. This is useful if either I<startstring> or I<endstring>
|
|
match in the middle of a record.
|
|
|
|
If neither B<--recstart> nor B<--recend> are given then B<--recend>
|
|
defaults to '\n'. To have no record separator use B<--recend "">.
|
|
|
|
B<--recstart> and B<--recend> are used with B<--pipe>.
|
|
|
|
Use B<--regexp> to interpret B<--recstart> and B<--recend> as regular
|
|
expressions. This is slow, however.
|
|
|
|
|
|
=item B<--regexp> (beta test)
|
|
|
|
Use B<--regexp> to interpret B<--recstart> and B<--recend> as regular
|
|
expressions. This is slow, however.
|
|
|
|
|
|
=item B<--remove-rec-sep> (beta testing)
|
|
|
|
=item B<--removerecsep> (beta testing)
|
|
|
|
=item B<--rrs> (beta testing)
|
|
|
|
Remove the text matched by B<--recstart> and B<--recend> before piping
|
|
it to the command.
|
|
|
|
Only used with B<--pipe>.
|
|
|
|
|
|
=item B<--retries> I<n> (beta testing)
|
|
|
|
If a job fails, retry it on another computer. Do this I<n> times. If
|
|
there are fewer than I<n> computers in B<--sshlogin> GNU parallel will
|
|
re-use the computers. This is useful if some jobs fail for no apparent
|
|
reason (such as network failure).
|
|
|
|
|
|
=item B<--return> I<filename>
|
|
|
|
Transfer files from remote computers. B<--return> is used with
|
|
B<--sshlogin> when the arguments are files on the remote computers. When
|
|
processing is done the file I<filename> will be transferred
|
|
from the remote computer using B<rsync> and will be put relative to
|
|
the default login dir. E.g.
|
|
|
|
echo foo/bar.txt | parallel \
|
|
--sshlogin server.example.com --return {.}.out touch {.}.out
|
|
|
|
This will transfer the file I<$HOME/foo/bar.out> from the computer
|
|
I<server.example.com> to the file I<foo/bar.out> after running
|
|
B<touch foo/bar.out> on I<server.example.com>.
|
|
|
|
echo /tmp/foo/bar.txt | parallel \
|
|
--sshlogin server.example.com --return {.}.out touch {.}.out
|
|
|
|
This will transfer the file I</tmp/foo/bar.out> from the computer
|
|
I<server.example.com> to the file I</tmp/foo/bar.out> after running
|
|
B<touch /tmp/foo/bar.out> on I<server.example.com>.
|
|
|
|
Multiple files can be transferred by repeating the options multiple
|
|
times:
|
|
|
|
echo /tmp/foo/bar.txt | \
|
|
parallel --sshlogin server.example.com \
|
|
--return {.}.out --return {.}.out2 touch {.}.out {.}.out2
|
|
|
|
B<--return> is often used with B<--transfer> and B<--cleanup>.
|
|
|
|
B<--return> is ignored when used with B<--sshlogin :> or when not used
|
|
with B<--sshlogin>.
|
|
|
|
|
|
=item B<--max-chars>=I<max-chars>
|
|
|
|
=item B<-s> I<max-chars>
|
|
|
|
Use at most I<max-chars> characters per command line, including the
|
|
command and initial-arguments and the terminating nulls at the ends of
|
|
the argument strings. The largest allowed value is system-dependent,
|
|
and is calculated as the argument length limit for exec, less the size
|
|
of your environment. The default value is the maximum.
|
|
|
|
Implies B<-X> unless B<-m> is set.
|
|
|
|
|
|
=item B<--show-limits>
|
|
|
|
Display the limits on the command-line length which are imposed by the
|
|
operating system and the B<-s> option. Pipe the input from /dev/null
|
|
(and perhaps specify --no-run-if-empty) if you don't want GNU B<parallel>
|
|
to do anything.
|
|
|
|
|
|
=item B<--semaphore>
|
|
|
|
Work as a counting semaphore. B<--semaphore> will cause GNU
|
|
B<parallel> to start I<command> in the background. When the number of
|
|
simultaneous jobs is reached, GNU B<parallel> will wait for one of
|
|
these to complete before starting another command.
|
|
|
|
B<--semaphore> implies B<--bg> unless B<--fg> is specified.
|
|
|
|
B<--semaphore> implies B<--semaphorename `tty`> unless
|
|
B<--semaphorename> is specified.
|
|
|
|
Used with B<--fg>, B<--wait>, and B<--semaphorename>.
|
|
|
|
The command B<sem> is an alias for B<parallel --semaphore>.
|
|
|
|
|
|
=item B<--semaphorename> I<name>
|
|
|
|
=item B<--id> I<name>
|
|
|
|
The name of the semaphore to use. The semaphore can be shared between
|
|
multiple processes.
|
|
|
|
Implies B<--semaphore>.
|
|
|
|
|
|
=item B<--semaphoretimeout> I<secs> (not implemented)
|
|
|
|
If the semaphore is not released within secs seconds, take it anyway.
|
|
|
|
Implies B<--semaphore>.
|
|
|
|
|
|
=item B<--seqreplace> I<replace-str>
|
|
|
|
Use the replacement string I<replace-str> instead of B<{#}> for
|
|
job sequence number.
|
|
|
|
|
|
=item B<--skip-first-line>
|
|
|
|
Do not use the first line of input (used by GNU B<parallel> itself
|
|
when called with B<--shebang>).
|
|
|
|
|
|
=item B<-S> I<[ncpu/]sshlogin[,[ncpu/]sshlogin[,...]]>
|
|
|
|
=item B<--sshlogin> I<[ncpu/]sshlogin[,[ncpu/]sshlogin[,...]]>
|
|
|
|
Distribute jobs to remote computers. The jobs will be run on a list of
|
|
remote computers. GNU B<parallel> will determine the number of CPU
|
|
cores on the remote computers and run the number of jobs as specified by
|
|
B<-j>. If the number I<ncpu> is given GNU B<parallel> will use this
|
|
number for number of CPU cores on the host. Normally I<ncpu> will not
|
|
be needed.
|
|
|
|
An I<sshlogin> is of the form:
|
|
|
|
[sshcommand [options]][username@]hostname
|
|
|
|
The sshlogin must not require a password.
|
|
|
|
The sshlogin ':' is special, it means 'no ssh' and will therefore run
|
|
on the local computer.
|
|
|
|
The sshlogin '..' is special, it read sshlogins from ~/.parallel/sshloginfile
|
|
|
|
To specify more sshlogins separate the sshlogins by comma or repeat
|
|
the options multiple times.
|
|
|
|
For examples: see B<--sshloginfile>.
|
|
|
|
The remote host must have GNU B<parallel> installed.
|
|
|
|
B<--sshlogin> is known to cause problems with B<-m> and B<-X>.
|
|
|
|
B<--sshlogin> is often used with B<--transfer>, B<--return>,
|
|
B<--cleanup>, and B<--trc>.
|
|
|
|
|
|
=item B<--sshloginfile> I<filename>
|
|
|
|
File with sshlogins. The file consists of sshlogins on separate
|
|
lines. Empty lines and lines starting with '#' are ignored. Example:
|
|
|
|
server.example.com
|
|
username@server2.example.com
|
|
8/my-8-core-server.example.com
|
|
2/my_other_username@my-dualcore.example.net
|
|
# This server has SSH running on port 2222
|
|
ssh -p 2222 server.example.net
|
|
4/ssh -p 2222 quadserver.example.net
|
|
# Use a different ssh program
|
|
myssh -p 2222 -l myusername hexacpu.example.net
|
|
# Use a different ssh program with default number of cores
|
|
//usr/local/bin/myssh -p 2222 -l myusername hexacpu.example.net
|
|
# Use a different ssh program with 6 cores
|
|
6//usr/local/bin/myssh -p 2222 -l myusername hexacpu.example.net
|
|
# Assume 16 cores on the local computer
|
|
16/:
|
|
|
|
When using a different ssh program the last argument must be the hostname.
|
|
|
|
The sshloginfile '..' is special, it read sshlogins from
|
|
~/.parallel/sshloginfile
|
|
|
|
|
|
=item B<--silent>
|
|
|
|
Silent. The job to be run will not be printed. This is the default.
|
|
Can be reversed with B<-v>.
|
|
|
|
|
|
=item B<--tty> (beta testing)
|
|
|
|
=item B<-T> (beta testing)
|
|
|
|
Open terminal tty. If GNU B<parallel> is used for starting an
|
|
interactive program then this option may be needed. It will start only
|
|
one job at a time (i.e. B<-j1>), not buffer the output (i.e. B<-u>),
|
|
and it will open a tty for the job. When the job is done, the next job
|
|
will get the tty.
|
|
|
|
|
|
=item B<--tmpdir> I<dirname>
|
|
|
|
Directory for temporary files. GNU B<parallel> normally buffers output
|
|
into temporary files in /tmp. By setting B<--tmpdir> you can use a
|
|
different dir for the files. Setting B<--tmpdir> is equivalent to
|
|
setting $TMPDIR.
|
|
|
|
|
|
=item B<--tollef>
|
|
|
|
Make GNU B<parallel> behave like Tollef's parallel command. To
|
|
override use B<--gnu>.
|
|
|
|
|
|
=item B<--verbose>
|
|
|
|
=item B<-t>
|
|
|
|
Print the job to be run on standard error.
|
|
|
|
See also B<-v> and B<-p>.
|
|
|
|
|
|
=item B<--transfer>
|
|
|
|
Transfer files to remote computers. B<--transfer> is used with
|
|
B<--sshlogin> when the arguments are files and should be transferred to
|
|
the remote computers. The files will be transferred using B<rsync> and
|
|
will be put relative to the default login dir. E.g.
|
|
|
|
echo foo/bar.txt | parallel \
|
|
--sshlogin server.example.com --transfer wc
|
|
|
|
This will transfer the file I<foo/bar.txt> to the computer
|
|
I<server.example.com> to the file I<$HOME/foo/bar.txt> before running
|
|
B<wc foo/bar.txt> on I<server.example.com>.
|
|
|
|
echo /tmp/foo/bar.txt | parallel \
|
|
--sshlogin server.example.com --transfer wc
|
|
|
|
This will transfer the file I<foo/bar.txt> to the computer
|
|
I<server.example.com> to the file I</tmp/foo/bar.txt> before running
|
|
B<wc /tmp/foo/bar.txt> on I<server.example.com>.
|
|
|
|
B<--transfer> is often used with B<--return> and B<--cleanup>.
|
|
|
|
B<--transfer> is ignored when used with B<--sshlogin :> or when not used with B<--sshlogin>.
|
|
|
|
|
|
=item B<--trc> I<filename>
|
|
|
|
Transfer, Return, Cleanup. Short hand for:
|
|
|
|
B<--transfer> B<--return> I<filename> B<--cleanup>
|
|
|
|
|
|
=item B<--trim> <n|l|r|lr|rl>
|
|
|
|
Trim white space in input.
|
|
|
|
=over 4
|
|
|
|
=item n
|
|
|
|
No trim. Input is not modified. This is the default.
|
|
|
|
=item l
|
|
|
|
Left trim. Remove white space from start of input. E.g. " a bc " -> "a bc ".
|
|
|
|
=item r
|
|
|
|
Right trim. Remove white space from end of input. E.g. " a bc " -> " a bc".
|
|
|
|
=item lr
|
|
|
|
=item rl
|
|
|
|
Both trim. Remove white space from both start and end of input. E.g. "
|
|
a bc " -> "a bc". This is the default if B<--colsep> is used.
|
|
|
|
=back
|
|
|
|
|
|
=item B<--ungroup>
|
|
|
|
=item B<-u>
|
|
|
|
Ungroup output. Output is printed as soon as possible. This may cause
|
|
output from different commands to be mixed. GNU B<parallel> runs
|
|
faster with B<-u>. Can be reversed with B<-g>.
|
|
|
|
|
|
=item B<--extensionreplace> I<replace-str>
|
|
|
|
=item B<-U> I<replace-str>
|
|
|
|
Use the replacement string I<replace-str> instead of {.} for input line without extension.
|
|
|
|
|
|
=item B<--use-cpus-instead-of-cores>
|
|
|
|
Count the number of physical CPUs instead of CPU cores. When computing
|
|
how many jobs to run simultaneously relative to the number of CPU cores
|
|
you can ask GNU B<parallel> to instead look at the number of physical
|
|
CPUs. This will make sense for computers that have hyperthreading as
|
|
two jobs running on one CPU with hyperthreading will run slower than
|
|
two jobs running on two physical CPUs. Some multi-core CPUs can run
|
|
faster if only one thread is running per physical CPU. Most users will
|
|
not need this option.
|
|
|
|
|
|
=item B<-v>
|
|
|
|
Verbose. Print the job to be run on standard output. Can be reversed
|
|
with B<--silent>. See also B<-t>.
|
|
|
|
Use B<-v> B<-v> to print the wrapping ssh command when running remotely.
|
|
|
|
|
|
=item B<--version>
|
|
|
|
=item B<-V>
|
|
|
|
Print the version GNU B<parallel> and exit.
|
|
|
|
|
|
=item B<--workdir> I<mydir>
|
|
|
|
=item B<-W> I<mydir>
|
|
|
|
Files transferred using B<--transfer> and B<--return> will be relative
|
|
to I<mydir> on remote computers, and the command will be executed in
|
|
that dir. The special workdir B<...> will create a workdir in
|
|
B<~/.parallel/tmp/> on the remote computers and will be removed if
|
|
using B<--cleanup>.
|
|
|
|
|
|
=item B<--wait> (beta testing)
|
|
|
|
Wait for all commands to complete.
|
|
|
|
Implies B<--semaphore>.
|
|
|
|
|
|
=item B<-X>
|
|
|
|
Multiple arguments with context replace. Insert as many arguments as
|
|
the command line length permits. If B<{}> is not used the arguments
|
|
will be appended to the line. If B<{}> is used as part of a word
|
|
(like I<pic{}.jpg>) then the whole word will be repeated. If B<{}> is
|
|
used multiple times each B<{}> will be replaced with the arguments.
|
|
|
|
Normally B<-X> will do the right thing, whereas B<-m> can give
|
|
unexpected results if B<{}> is used as part of a word.
|
|
|
|
Support for B<-X> with B<--sshlogin> is limited and may fail.
|
|
|
|
See also B<-m>.
|
|
|
|
|
|
=item B<--exit>
|
|
|
|
=item B<-x>
|
|
|
|
Exit if the size (see the B<-s> option) is exceeded.
|
|
|
|
|
|
=item B<--shebang>
|
|
|
|
=item B<--hashbang>
|
|
|
|
=item B<-Y>
|
|
|
|
GNU B<Parallel> can be called as a shebang (#!) command as the first line of a script. Like this:
|
|
|
|
#!/usr/bin/parallel -Yr traceroute
|
|
|
|
foss.org.my
|
|
debian.org
|
|
freenetproject.org
|
|
|
|
For this to work B<--shebang> or B<-Y> must be set as the first option.
|
|
|
|
|
|
=back
|
|
|
|
|
|
=head1 EXAMPLE: Working as xargs -n1. Argument appending
|
|
|
|
GNU B<parallel> can work similar to B<xargs -n1>.
|
|
|
|
To compress all html files using B<gzip> run:
|
|
|
|
B<find . -name '*.html' | parallel gzip>
|
|
|
|
If the file names may contain a newline use B<-0>. Substitute FOO BAR with
|
|
FUBAR in all files in this dir and subdirs:
|
|
|
|
B<find . -type f -print0 | parallel -q0 perl -i -pe 's/FOO BAR/FUBAR/g'>
|
|
|
|
Note B<-q> is needed because of the space in 'FOO BAR'.
|
|
|
|
|
|
=head1 EXAMPLE: Reading arguments from command line
|
|
|
|
GNU B<parallel> can take the arguments from command line instead of
|
|
stdin (standard input). To compress all html files in the current dir
|
|
using B<gzip> run:
|
|
|
|
B<parallel gzip ::: *.html>
|
|
|
|
To convert *.wav to *.mp3 using LAME running one process per CPU core
|
|
run:
|
|
|
|
B<parallel lame {} -o {.}.mp3 ::: *.wav>
|
|
|
|
|
|
=head1 EXAMPLE: Inserting multiple arguments
|
|
|
|
When moving a lot of files like this: B<mv * destdir> you will
|
|
sometimes get the error:
|
|
|
|
B<bash: /bin/mv: Argument list too long>
|
|
|
|
because there are too many files. You can instead do:
|
|
|
|
B<ls | parallel mv {} destdir>
|
|
|
|
This will run B<mv> for each file. It can be done faster if B<mv> gets
|
|
as many arguments that will fit on the line:
|
|
|
|
B<ls | parallel -m mv {} destdir>
|
|
|
|
|
|
=head1 EXAMPLE: Context replace
|
|
|
|
To remove the files I<pict0000.jpg> .. I<pict9999.jpg> you could do:
|
|
|
|
B<seq -w 0 9999 | parallel rm pict{}.jpg>
|
|
|
|
You could also do:
|
|
|
|
B<seq -w 0 9999 | perl -pe 's/(.*)/pict$1.jpg/' | parallel -m rm>
|
|
|
|
The first will run B<rm> 10000 times, while the last will only run
|
|
B<rm> as many times needed to keep the command line length short
|
|
enough to avoid B<Argument list too long> (it typically runs 1-2 times).
|
|
|
|
You could also run:
|
|
|
|
B<seq -w 0 9999 | parallel -X rm pict{}.jpg>
|
|
|
|
This will also only run B<rm> as many times needed to keep the command
|
|
line length short enough.
|
|
|
|
|
|
=head1 EXAMPLE: Compute intensive jobs and substitution
|
|
|
|
If ImageMagick is installed this will generate a thumbnail of a jpg
|
|
file:
|
|
|
|
B<convert -geometry 120 foo.jpg thumb_foo.jpg>
|
|
|
|
This will run with number-of-cpu-cores jobs in parallel for all jpg
|
|
files in a directory:
|
|
|
|
B<ls *.jpg | parallel convert -geometry 120 {} thumb_{}>
|
|
|
|
To do it recursively use B<find>:
|
|
|
|
B<find . -name '*.jpg' | parallel convert -geometry 120 {} {}_thumb.jpg>
|
|
|
|
Notice how the argument has to start with B<{}> as B<{}> will include path
|
|
(e.g. running B<convert -geometry 120 ./foo/bar.jpg
|
|
thumb_./foo/bar.jpg> would clearly be wrong). The command will
|
|
generate files like ./foo/bar.jpg_thumb.jpg.
|
|
|
|
Use B<{.}> to avoid the extra .jpg in the file name. This command will
|
|
make files like ./foo/bar_thumb.jpg:
|
|
|
|
B<find . -name '*.jpg' | parallel convert -geometry 120 {} {.}_thumb.jpg>
|
|
|
|
|
|
=head1 EXAMPLE: Substitution and redirection
|
|
|
|
This will generate an uncompressed version of .gz-files next to the .gz-file:
|
|
|
|
B<parallel zcat {} ">>B<"{.} ::: *.gz>
|
|
|
|
Quoting of > is necessary to postpone the redirection. Another
|
|
solution is to quote the whole command:
|
|
|
|
B<parallel "zcat {} >>B<{.}" ::: *.gz>
|
|
|
|
Other special shell charaters (such as * ; $ > < | >> <<) also need
|
|
to be put in quotes, as they may otherwise be interpreted by the shell
|
|
and not given to GNU B<parallel>.
|
|
|
|
|
|
=head1 EXAMPLE: Composed commands
|
|
|
|
A job can consist of several commands. This will print the number of
|
|
files in each directory:
|
|
|
|
B<ls | parallel 'echo -n {}" "; ls {}|wc -l'>
|
|
|
|
To put the output in a file called <name>.dir:
|
|
|
|
B<ls | parallel '(echo -n {}" "; ls {}|wc -l) >> B<{}.dir'>
|
|
|
|
Even small shell scripts can be run by GNU B<parallel>:
|
|
|
|
B<find . | parallel 'a={}; name=${a##*/}; upper=$(echo "$name" | tr "[:lower:]" "[:upper:]"); echo "$name - $upper"'>
|
|
|
|
B<ls | parallel 'mv {} "$(echo {} | tr "[:upper:]" "[:lower:]")"'>
|
|
|
|
Given a list of URLs, list all URLs that fail to download. Print the
|
|
line number and the URL.
|
|
|
|
B<cat urlfile | parallel "wget {} 2>>B</dev/null || grep -n {} urlfile">
|
|
|
|
Create a mirror directory with the same filenames except all files and
|
|
symlinks are empty files.
|
|
|
|
B<cp -rs /the/source/dir mirror_dir; find mirror_dir -type l | parallel -m rm {} '&&' touch {}>
|
|
|
|
|
|
=head1 EXAMPLE: Removing file extension when processing files
|
|
|
|
When processing files removing the file extension using B<{.}> is
|
|
often useful.
|
|
|
|
Create a directory for each zip-file and unzip it in that dir:
|
|
|
|
B<parallel 'mkdir {.}; cd {.}; unzip ../{}' ::: *.zip>
|
|
|
|
Recompress all .gz files in current directory using B<bzip2> running 1
|
|
job per CPU core in parallel:
|
|
|
|
B<parallel "zcat {} | bzip2 >>B<{.}.bz2 && rm {}" ::: *.gz>
|
|
|
|
Convert all WAV files to MP3 using LAME:
|
|
|
|
B<find sounddir -type f -name '*.wav' | parallel lame {} -o {.}.mp3>
|
|
|
|
Put all converted in the same directory:
|
|
|
|
B<find sounddir -type f -name '*.wav' | parallel lame {} -o mydir/{/.}.mp3>
|
|
|
|
|
|
=head1 EXAMPLE: Removing two file extensions when processing files and
|
|
calling GNU Parallel from itself
|
|
|
|
If you have directory with tar.gz files and want these extracted in
|
|
the corresponding dir (e.g foo.tar.gz will be extracted in the dir
|
|
foo) you can do:
|
|
|
|
B<ls *.tar.gz| parallel -U {tar} 'echo {tar}|parallel "mkdir -p {.} ; tar -C {.} -xf {.}.tar.gz"'>
|
|
|
|
|
|
=head1 EXAMPLE: Download 10 images for each of the past 30 days
|
|
|
|
Let us assume a website stores images like:
|
|
|
|
http://www.example.com/path/to/YYYYMMDD_##.jpg
|
|
|
|
where YYYYMMDD is the date and ## is the number 01-10. This will
|
|
generate the past 30 days as YYYYMMDD:
|
|
|
|
B<seq 30 | parallel date -d '"today -{} days"' +%Y%m%d>
|
|
|
|
Based on this we can let GNU B<parallel> generate 10 B<wget>s per day:
|
|
|
|
I<the above> B<| parallel -I {o} seq -w 10 "|" parallel wget
|
|
http://www.example.com/path/to/{o}_{}.jpg>
|
|
|
|
|
|
=head1 EXAMPLE: Process files from a tar file while unpacking
|
|
|
|
If the files to be processed are in a tar file then unpacking one file
|
|
and processing it immediately may be faster than first unpacking all
|
|
files.
|
|
|
|
B<tar xvf foo.tgz | perl -ne 'print $l;$l=$_;END{print $l}' |
|
|
parallel echo>
|
|
|
|
The Perl one-liner is needed to avoid race condition.
|
|
|
|
|
|
=head1 EXAMPLE: Rewriting a for-loop and a while-read-loop
|
|
|
|
for-loops like this:
|
|
|
|
(for x in `cat list` ; do
|
|
do_something $x
|
|
done) | process_output
|
|
|
|
and while-read-loops like this:
|
|
|
|
cat list | (while read x ; do
|
|
do_something $x
|
|
done) | process_output
|
|
|
|
can be written like this:
|
|
|
|
B<cat list | parallel do_something | process_output>
|
|
|
|
If the processing requires more steps the for-loop like this:
|
|
|
|
(for x in `cat list` ; do
|
|
no_extension=${x%.*};
|
|
do_something $x scale $no_extension.jpg
|
|
do_step2 <$x $no_extension
|
|
done) | process_output
|
|
|
|
and while-loops like this:
|
|
|
|
cat list | (while read x ; do
|
|
no_extension=${x%.*};
|
|
do_something $x scale $no_extension.jpg
|
|
do_step2 <$x $no_extension
|
|
done) | process_output
|
|
|
|
can be written like this:
|
|
|
|
B<cat list | parallel "do_something {} scale {.}.jpg ; do_step2 <{} {.}" | process_output>
|
|
|
|
|
|
=head1 EXAMPLE: Group output lines
|
|
|
|
When running jobs that output data, you often do not want the output
|
|
of multiple jobs to run together. GNU B<parallel> defaults to grouping the
|
|
output of each job, so the output is printed when the job finishes. If
|
|
you want the output to be printed while the job is running you can use
|
|
B<-u>.
|
|
|
|
Compare the output of:
|
|
|
|
B<parallel traceroute ::: foss.org.my debian.org freenetproject.org>
|
|
|
|
to the output of:
|
|
|
|
B<parallel -u traceroute ::: foss.org.my debian.org freenetproject.org>
|
|
|
|
|
|
=head1 EXAMPLE: Keep order of output same as order of input
|
|
|
|
Normally the output of a job will be printed as soon as it
|
|
completes. Sometimes you want the order of the output to remain the
|
|
same as the order of the input. This is often important, if the output
|
|
is used as input for another system. B<-k> will make sure the order of
|
|
output will be in the same order as input even if later jobs end
|
|
before earlier jobs.
|
|
|
|
Append a string to every line in a text file:
|
|
|
|
B<cat textfile | parallel -k echo {} append_string>
|
|
|
|
If you remove B<-k> some of the lines may come out in the wrong order.
|
|
|
|
Another example is B<traceroute>:
|
|
|
|
B<parallel traceroute ::: foss.org.my debian.org freenetproject.org>
|
|
|
|
will give traceroute of foss.org.my, debian.org and
|
|
freenetproject.org, but it will be sorted according to which job
|
|
completed first.
|
|
|
|
To keep the order the same as input run:
|
|
|
|
B<parallel -k traceroute ::: foss.org.my debian.org freenetproject.org>
|
|
|
|
This will make sure the traceroute to foss.org.my will be printed
|
|
first.
|
|
|
|
A bit more complex example is downloading a huge file in chunks in
|
|
parallel: Some internet connections will deliver more data if you
|
|
download files in parallel. For downloading files in parallel see:
|
|
"EXAMPLE: Download 10 images for each of the past 30 days". But if you
|
|
are downloading a big file you can download the file in chunks in
|
|
parallel.
|
|
|
|
To download byte 10000000-19999999 you can use B<curl>:
|
|
|
|
B<curl -r 10000000-19999999 http://example.com/the/big/file> > B<file.part>
|
|
|
|
To download a 1 GB file we need 100 10MB chunks downloaded and
|
|
combined in the correct order.
|
|
|
|
B<seq 0 99 | parallel -k curl -r \
|
|
{}0000000-{}9999999 http://example.com/the/big/file> > B<file>
|
|
|
|
|
|
=head1 EXAMPLE: Parallel grep
|
|
|
|
B<grep -r> greps recursively through directories. On multicore CPUs
|
|
GNU B<parallel> can often speed this up.
|
|
|
|
B<find . -type f | parallel -k -j150% -n 1000 -m grep -H -n STRING {}>
|
|
|
|
This will run 1.5 job per core, and give 1000 arguments to B<grep>.
|
|
|
|
To grep a big file in parallel use B<--pipe>:
|
|
|
|
B<cat bigfile | parallel --pipe grep foo>
|
|
|
|
Depending on your disks and CPUs it may be faster to read larger blocks:
|
|
|
|
B<cat bigfile | parallel --pipe --block 10M grep foo>
|
|
|
|
|
|
=head1 EXAMPLE: Using remote computers
|
|
|
|
To run commands on a remote computer SSH needs to be set up and you
|
|
must be able to login without entering a password (B<ssh-agent> may be
|
|
handy).
|
|
|
|
To run B<echo> on B<server.example.com>:
|
|
|
|
seq 10 | parallel --sshlogin server.example.com echo
|
|
|
|
To run commands on more than one remote computer run:
|
|
|
|
seq 10 | parallel --sshlogin server.example.com,server2.example.net echo
|
|
|
|
Or:
|
|
|
|
seq 10 | parallel --sshlogin server.example.com \
|
|
--sshlogin server2.example.net echo
|
|
|
|
If the login username is I<foo> on I<server2.example.net> use:
|
|
|
|
seq 10 | parallel --sshlogin server.example.com \
|
|
--sshlogin foo@server2.example.net echo
|
|
|
|
To distribute the commands to a list of computers, make a file
|
|
I<mycomputers> with all the computers:
|
|
|
|
server.example.com
|
|
foo@server2.example.com
|
|
server3.example.com
|
|
|
|
Then run:
|
|
|
|
seq 10 | parallel --sshloginfile mycomputers echo
|
|
|
|
To include the local computer add the special sshlogin ':' to the list:
|
|
|
|
server.example.com
|
|
foo@server2.example.com
|
|
server3.example.com
|
|
:
|
|
|
|
GNU B<parallel> will try to determine the number of CPU cores on each
|
|
of the remote computers, and run one job per CPU core - even if the
|
|
remote computers do not have the same number of CPU cores.
|
|
|
|
If the number of CPU cores on the remote computers is not identified
|
|
correctly the number of CPU cores can be added in front. Here the
|
|
computer has 8 CPU cores.
|
|
|
|
seq 10 | parallel --sshlogin 8/server.example.com echo
|
|
|
|
|
|
=head1 EXAMPLE: Transferring of files
|
|
|
|
To recompress gzipped files with B<bzip2> using a remote computer run:
|
|
|
|
find logs/ -name '*.gz' | \
|
|
parallel --sshlogin server.example.com \
|
|
--transfer "zcat {} | bzip2 -9 >{.}.bz2"
|
|
|
|
This will list the .gz-files in the I<logs> directory and all
|
|
directories below. Then it will transfer the files to
|
|
I<server.example.com> to the corresponding directory in
|
|
I<$HOME/logs>. On I<server.example.com> the file will be recompressed
|
|
using B<zcat> and B<bzip2> resulting in the corresponding file with
|
|
I<.gz> replaced with I<.bz2>.
|
|
|
|
If you want the resulting bz2-file to be transferred back to the local
|
|
computer add I<--return {.}.bz2>:
|
|
|
|
find logs/ -name '*.gz' | \
|
|
parallel --sshlogin server.example.com \
|
|
--transfer --return {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
|
|
|
|
After the recompressing is done the I<.bz2>-file is transferred back to
|
|
the local computer and put next to the original I<.gz>-file.
|
|
|
|
If you want to delete the transferred files on the remote computer add
|
|
I<--cleanup>. This will remove both the file transferred to the remote
|
|
computer and the files transferred from the remote computer:
|
|
|
|
find logs/ -name '*.gz' | \
|
|
parallel --sshlogin server.example.com \
|
|
--transfer --return {.}.bz2 --cleanup "zcat {} | bzip2 -9 >{.}.bz2"
|
|
|
|
If you want run on several computers add the computers to I<--sshlogin>
|
|
either using ',' or multiple I<--sshlogin>:
|
|
|
|
find logs/ -name '*.gz' | \
|
|
parallel --sshlogin server.example.com,server2.example.com \
|
|
--sshlogin server3.example.com \
|
|
--transfer --return {.}.bz2 --cleanup "zcat {} | bzip2 -9 >{.}.bz2"
|
|
|
|
You can add the local computer using I<--sshlogin :>. This will disable the
|
|
removing and transferring for the local computer only:
|
|
|
|
find logs/ -name '*.gz' | \
|
|
parallel --sshlogin server.example.com,server2.example.com \
|
|
--sshlogin server3.example.com \
|
|
--sshlogin : \
|
|
--transfer --return {.}.bz2 --cleanup "zcat {} | bzip2 -9 >{.}.bz2"
|
|
|
|
Often I<--transfer>, I<--return> and I<--cleanup> are used together. They can be
|
|
shortened to I<--trc>:
|
|
|
|
find logs/ -name '*.gz' | \
|
|
parallel --sshlogin server.example.com,server2.example.com \
|
|
--sshlogin server3.example.com \
|
|
--sshlogin : \
|
|
--trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
|
|
|
|
With the file I<mycomputers> containing the list of computers it becomes:
|
|
|
|
find logs/ -name '*.gz' | parallel --sshloginfile mycomputers \
|
|
--trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
|
|
|
|
If the file I<~/.parallel/sshloginfile> contains the list of computers
|
|
the special short hand I<-S ..> can be used:
|
|
|
|
find logs/ -name '*.gz' | parallel -S .. \
|
|
--trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
|
|
|
|
|
|
=head1 EXAMPLE: Distributing work to local and remote computers
|
|
|
|
Convert *.mp3 to *.ogg running one process per CPU core on local computer and server2:
|
|
|
|
parallel --trc {.}.ogg -S server2,: \
|
|
'mpg321 -w - {} | oggenc -q0 - -o {.}.ogg' ::: *.mp3
|
|
|
|
|
|
=head1 EXAMPLE: Use multiple inputs in one command
|
|
|
|
Copy files like foo.es.ext to foo.ext:
|
|
|
|
B<ls *.es.* | perl -pe 'print; s/\.es//' | parallel -N2 cp {1} {2}>
|
|
|
|
The perl command spits out 2 lines for each input. GNU B<parallel>
|
|
takes 2 inputs (using B<-N2>) and replaces {1} and {2} with the inputs.
|
|
|
|
Print the number on the opposing sides of a six sided die:
|
|
|
|
B<parallel -a <(seq 6) -a <(seq 6 -1 1) echo>
|
|
|
|
Convert files from all subdirs to PNG-files with consecutive numbers
|
|
(useful for making input PNG's for B<ffmpeg>):
|
|
|
|
B<parallel -a <(find . -type f | sort) -a <(seq $(find . -type f|wc -l)) convert {1} {2}.png>
|
|
|
|
Alternative version:
|
|
|
|
B<find . -type f | sort | parallel convert {} \$PARALLEL_SEQ.png>
|
|
|
|
|
|
=head1 EXAMPLE: Use a table as input
|
|
|
|
Content of table_file.tsv:
|
|
|
|
foo<TAB>bar
|
|
baz <TAB> quux
|
|
|
|
To run:
|
|
|
|
cmd -o bar -i foo
|
|
cmd -o quux -i baz
|
|
|
|
you can run:
|
|
|
|
B<parallel -a table_file.tsv --colsep '\t' cmd -o {2} -i {1}>
|
|
|
|
Note: The default for GNU B<parallel> is to remove the spaces around the columns. To keep the spaces:
|
|
|
|
B<parallel -a table_file.tsv --trim n --colsep '\t' cmd -o {2} -i {1}>
|
|
|
|
|
|
=head1 EXAMPLE: Run the same command 10 times
|
|
|
|
If you want to run the same command with the same arguments 10 times
|
|
in parallel you can do:
|
|
|
|
B<seq 10 | parallel -n0 my_command my_args>
|
|
|
|
|
|
=head1 EXAMPLE: Working as cat | sh. Resource inexpensive jobs and evaluation
|
|
|
|
GNU B<parallel> can work similar to B<cat | sh>.
|
|
|
|
A resource inexpensive job is a job that takes very little CPU, disk
|
|
I/O and network I/O. Ping is an example of a resource inexpensive
|
|
job. wget is too - if the webpages are small.
|
|
|
|
The content of the file jobs_to_run:
|
|
|
|
ping -c 1 10.0.0.1
|
|
wget http://status-server/status.cgi?ip=10.0.0.1
|
|
ping -c 1 10.0.0.2
|
|
wget http://status-server/status.cgi?ip=10.0.0.2
|
|
...
|
|
ping -c 1 10.0.0.255
|
|
wget http://status-server/status.cgi?ip=10.0.0.255
|
|
|
|
To run 100 processes simultaneously do:
|
|
|
|
B<parallel -j 100 < jobs_to_run>
|
|
|
|
As there is not a I<command> the jobs will be evaluated by the shell.
|
|
|
|
|
|
=head1 EXAMPLE: Processing a big file using more cores
|
|
|
|
To process a big file or some output you can use B<--pipe> to split up
|
|
the data into blocks and pipe the blocks into the processing program.
|
|
|
|
If the program is B<gzip -9> you can do:
|
|
|
|
B<cat bigfile | parallel --pipe --recend '' -k gzip -9 >>B<bigfile.gz>
|
|
|
|
This will split B<bigfile> into blocks of 1 MB and pass that to B<gzip
|
|
-9> in parallel. One B<gzip> will be run per CPU core. The output of
|
|
B<gzip -9> will be kept in order and saved to B<bigfile.gz>
|
|
|
|
B<gzip> works fine if the output is appended, but some processing does
|
|
not work like that - for example sorting. For this GNU B<parallel> can
|
|
put the output of each command into a file. This will sort a big file
|
|
in parallel:
|
|
|
|
B<cat bigfile | parallel --pipe --files sort | parallel -Xj1 sort -m {} ';' rm {} >>B<bigfile.sort>
|
|
|
|
Here B<bigfile> is split into blocks of around 1MB, each block ending
|
|
in '\n' (which is the default for B<--recend>). Each block is passed
|
|
to B<sort> and the output from B<sort> is saved into files. These
|
|
files are passed to the second B<parallel> that runs B<sort -m> on the
|
|
files before it removes the files. The output is saved to
|
|
B<bigfile.sort>.
|
|
|
|
|
|
=head1 EXAMPLE: Working as mutex and counting semaphore
|
|
|
|
The command B<sem> is an alias for B<parallel --semaphore>.
|
|
|
|
A counting semaphore will allow a given number of jobs to be started
|
|
in the background. When the number of jobs are running in the
|
|
background, GNU B<sem> will wait for one of these to complete before
|
|
starting another command. B<sem --wait> will wait for all jobs to
|
|
complete.
|
|
|
|
Run 10 jobs concurrently in the background:
|
|
|
|
for i in `ls *.log` ; do
|
|
echo $i
|
|
sem -j10 gzip $i ";" echo done
|
|
done
|
|
sem --wait
|
|
|
|
A mutex is a counting semaphore allowing only one job to run. This
|
|
will edit the file I<myfile> and prepends the file with lines with the
|
|
numbers 1 to 3.
|
|
|
|
seq 3 | parallel sem sed -i -e 'i{}' myfile
|
|
|
|
As I<myfile> can be very big it is important only one process edits
|
|
the file at the same time.
|
|
|
|
Name the semaphore to have multiple different semaphores active at the
|
|
same time:
|
|
|
|
seq 3 | parallel sem --id mymutex sed -i -e 'i{}' myfile
|
|
|
|
|
|
=head1 EXAMPLE: Start editor with filenames from stdin (standard input)
|
|
|
|
You can use GNU Parallel to start interactive programs like emacs or vi:
|
|
|
|
B<cat filelist | parallel -T -X emacs>
|
|
|
|
B<cat filelist | parallel -T -X vi>
|
|
|
|
If there are more files than will fit on a single command line, the
|
|
editor will be started again with the remaining files.
|
|
|
|
|
|
=head1 EXAMPLE: GNU Parallel as queue system/batch manager
|
|
|
|
GNU B<parallel> can work as a simple job queue system or batch manager.
|
|
The idea is to put the jobs into a file and have GNU B<parallel> read
|
|
from that continuously. As GNU B<parallel> will stop at end of file we
|
|
use B<tail> to continue reading:
|
|
|
|
B<echo >>B<jobqueue>; B<tail -f jobqueue | parallel>
|
|
|
|
To submit your jobs to the queue:
|
|
|
|
B<echo my_command my_arg >>>B< jobqueue>
|
|
|
|
You can of course use B<-S> to distribute the jobs to remote
|
|
computers:
|
|
|
|
B<echo >>B<jobqueue>; B<tail -f jobqueue | parallel -S ..>
|
|
|
|
There are a two small issues when using GNU B<parallel> as queue
|
|
system/batch manager:
|
|
|
|
=over 2
|
|
|
|
=item *
|
|
|
|
You will get a warning if you do not submit JobSlots jobs within the
|
|
first second. E.g. if you have 8 cores and use B<-j+2> you have to submit
|
|
10 jobs. These can be dummy jobs (e.g. B<echo foo>). You can also simply
|
|
ignore the warning.
|
|
|
|
=item *
|
|
|
|
Jobs will be run immediately, but output from jobs will only be
|
|
printed when JobSlots more jobs has been started. E.g. if you have 10
|
|
jobslots then the output from the first completed job will only be
|
|
printed when job 11 is started.
|
|
|
|
=back
|
|
|
|
|
|
=head1 EXAMPLE: GNU Parallel as dir processor
|
|
|
|
If you have a dir in which users drop files that needs to be processed
|
|
you can do this on GNU/Linux (If you know what B<inotifywait> is
|
|
called on other platforms file a bug report):
|
|
|
|
B<inotifywait -q -m -r -e CLOSE_WRITE --format %w%f my_dir | parallel
|
|
-u echo>
|
|
|
|
This will run the command B<echo> on each file put into B<my_dir> or
|
|
subdirs of B<my_dir>.
|
|
|
|
The B<-u> is needed because of a small bug in GNU B<parallel>. If that
|
|
proves to be a problem, file a bug report.
|
|
|
|
You can of course use B<-S> to distribute the jobs to remote
|
|
computers:
|
|
|
|
B<inotifywait -q -m -r -e CLOSE_WRITE --format %w%f my_dir | parallel -S ..
|
|
-u echo>
|
|
|
|
If the files to be processed are in a tar file then unpacking one file
|
|
and processing it immediately may be faster than first unpacking all
|
|
files. Set up the dir processor as above and unpack into the dir.
|
|
|
|
|
|
=head1 QUOTING
|
|
|
|
GNU B<parallel> is very liberal in quoting. You only need to quote
|
|
characters that have special meaning in shell:
|
|
|
|
( ) $ ` ' " < > ; | \
|
|
|
|
and depending on context these needs to be quoted, too:
|
|
|
|
* ~ & # ! ? space * {
|
|
|
|
Therefore most people will never need more quoting than putting '\'
|
|
in front of the special characters.
|
|
|
|
However, when you want to use a shell variable you need to quote the
|
|
$-sign. Here is an example using $PARALLEL_SEQ. This variable is set
|
|
by GNU B<parallel> itself, so the evaluation of the $ must be done by
|
|
the sub shell started by GNU B<parallel>:
|
|
|
|
B<seq 10 | parallel -N2 echo seq:\$PARALLEL_SEQ arg1:{1} arg2:{2}>
|
|
|
|
If the variable is set before GNU B<parallel> starts you can do this:
|
|
|
|
B<VAR=this_is_set_before_starting>
|
|
|
|
B<echo test | parallel echo {} $VAR>
|
|
|
|
Prints: B<test this_is_set_before_starting>
|
|
|
|
It is a little more tricky if the variable contains more than one space in a row:
|
|
|
|
B<VAR="two spaces between each word">
|
|
|
|
B<echo test | parallel echo {} \'"$VAR"\'>
|
|
|
|
Prints: B<test two spaces between each word>
|
|
|
|
If the variable should not be evaluated by the shell starting GNU
|
|
B<parallel> but be evaluated by the sub shell started by GNU
|
|
B<parallel>, then you need to quote it:
|
|
|
|
B<echo test | parallel VAR=this_is_set_after_starting \; echo {} \$VAR>
|
|
|
|
Prints: B<test this_is_set_after_starting>
|
|
|
|
It is a little more tricky if the variable contains space:
|
|
|
|
B<echo test | parallel VAR='"two spaces between each word"' echo {} \'"$VAR"\'>
|
|
|
|
Prints: B<test two spaces between each word>
|
|
|
|
$$ is the shell variable containing the process id of the shell. This
|
|
will print the process id of the shell running GNU B<parallel>:
|
|
|
|
B<seq 10 | parallel echo $$>
|
|
|
|
And this will print the process ids of the sub shells started by GNU
|
|
B<parallel>.
|
|
|
|
B<seq 10 | parallel echo \$\$>
|
|
|
|
If the special characters should not be evaluated by the sub shell
|
|
then you need to protect it against evaluation from both the shell
|
|
starting GNU B<parallel> and the sub shell:
|
|
|
|
B<echo test | parallel echo {} \\\$VAR>
|
|
|
|
Prints: B<test $VAR>
|
|
|
|
GNU B<parallel> can protect against evaluation by the sub shell by
|
|
using -q:
|
|
|
|
B<echo test | parallel -q echo {} \$VAR>
|
|
|
|
Prints: B<test $VAR>
|
|
|
|
This is particularly useful if you have lots of quoting. If you want to run a perl script like this:
|
|
|
|
B<perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"' file>
|
|
|
|
It needs to be quoted like this:
|
|
|
|
B<ls | parallel perl -ne '/^\\S+\\s+\\S+\$/\ and\ print\ \$ARGV,\"\\n\"'>
|
|
|
|
Notice how spaces, \'s, "'s, and $'s need to be quoted. GNU B<parallel>
|
|
can do the quoting by using option -q:
|
|
|
|
B<ls | parallel -q perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"'>
|
|
|
|
However, this means you cannot make the sub shell interpret special
|
|
characters. For example because of B<-q> this WILL NOT WORK:
|
|
|
|
B<ls *.gz | parallel -q "zcat {} >>B<{.}">
|
|
|
|
B<ls *.gz | parallel -q "zcat {} | bzip2 >>B<{.}.bz2">
|
|
|
|
because > and | need to be interpreted by the sub shell.
|
|
|
|
If you get errors like:
|
|
|
|
sh: -c: line 0: syntax error near unexpected token
|
|
sh: Syntax error: Unterminated quoted string
|
|
sh: -c: line 0: unexpected EOF while looking for matching `''
|
|
sh: -c: line 1: syntax error: unexpected end of file
|
|
|
|
then you might try using B<-q>.
|
|
|
|
If you are using B<bash> process substitution like B<<(cat foo)> then
|
|
you may try B<-q> and prepending I<command> with B<bash -c>:
|
|
|
|
B<ls | parallel -q bash -c 'wc -c <(echo {})'>
|
|
|
|
Or for substituting output:
|
|
|
|
B<ls | parallel -q bash -c 'tar c {} | tee >>B<(gzip >>B<{}.tar.gz) | bzip2 >>B<{}.tar.bz2'>
|
|
|
|
B<Conclusion>: To avoid dealing with the quoting problems it may be
|
|
easier just to write a small script and have GNU B<parallel> call that
|
|
script.
|
|
|
|
|
|
=head1 LIST RUNNING JOBS
|
|
|
|
If you want a list of the jobs currently running you can run:
|
|
|
|
B<killall -USR1 parallel>
|
|
|
|
GNU B<parallel> will then print the currently running jobs on STDERR.
|
|
|
|
|
|
=head1 COMPLETE RUNNING JOBS BUT DO NOT START NEW JOBS
|
|
|
|
If you regret starting a lot of jobs you can simply break GNU B<parallel>,
|
|
but if you want to make sure you do not have halfcompleted jobs you
|
|
should send the signal B<SIGTERM> to GNU B<parallel>:
|
|
|
|
B<killall -TERM parallel>
|
|
|
|
This will tell GNU B<parallel> to not start any new jobs, but wait until
|
|
the currently running jobs are finished before exiting.
|
|
|
|
|
|
=head1 ENVIRONMENT VARIABLES
|
|
|
|
=over 9
|
|
|
|
=item $PARALLEL_PID
|
|
|
|
The environment variable $PARALLEL_PID is set by GNU B<parallel> and
|
|
is visible to the jobs started from GNU B<parallel>. This makes it
|
|
possible for the jobs to communicate directly to GNU B<parallel>.
|
|
Remember to quote the $, so it gets evaluated by the correct
|
|
shell.
|
|
|
|
B<Example:> If each of the jobs tests a solution and one of jobs finds
|
|
the solution the job can tell GNU B<parallel> not to start more jobs
|
|
by: B<kill -TERM $PARALLEL_PID>. This only works on the local
|
|
computer.
|
|
|
|
|
|
=item $PARALLEL_SEQ
|
|
|
|
$PARALLEL_SEQ will be set to the sequence number of the job
|
|
running. Remember to quote the $, so it gets evaluated by the correct
|
|
shell.
|
|
|
|
B<Example:>
|
|
|
|
B<seq 10 | parallel -N2 echo seq:'$'PARALLEL_SEQ arg1:{1} arg2:{2}>
|
|
|
|
|
|
=item $TMPDIR
|
|
|
|
Directory for temporary files. See: B<--tmpdir>.
|
|
|
|
|
|
=item $PARALLEL
|
|
|
|
The environment variable $PARALLEL will be used as default options for
|
|
GNU B<parallel>. If the variable contains special shell characters
|
|
(e.g. $, *, or space) then these need to be to be escaped with \.
|
|
|
|
B<Example:>
|
|
|
|
B<cat list | parallel -j1 -k -v ls>
|
|
|
|
can be written as:
|
|
|
|
B<cat list | PARALLEL="-kvj1" parallel ls>
|
|
|
|
B<cat list | parallel -j1 -k -v -S"myssh user@server" ls>
|
|
|
|
can be written as:
|
|
|
|
B<cat list | PARALLEL='-kvj1 -S myssh\ user@server' parallel echo>
|
|
|
|
Notice the \ in the middle is needed because 'myssh' and 'user@server'
|
|
must be one argument.
|
|
|
|
=back
|
|
|
|
|
|
=head1 DEFAULT PROFILE (CONFIG FILE)
|
|
|
|
The file ~/.parallel/config (formerly known as .parallelrc) will be
|
|
read if it exists. Lines starting with '#' will be ignored. It can be
|
|
formatted like the environment variable $PARALLEL, but it is often
|
|
easier to simply put each option on its own line.
|
|
|
|
Options on the command line takes precedence over the environment
|
|
variable $PARALLEL which takes precedence over the file
|
|
~/.parallel/config.
|
|
|
|
|
|
=head1 PROFILE FILES
|
|
|
|
If B<--profile> set, GNU B<parallel> will read the profile from that file instead of
|
|
~/.parallel/config.
|
|
|
|
Example: Profile for running every command with B<-j-1> and B<nice>
|
|
|
|
echo -j-1 nice > ~/.parallel/nice_profile
|
|
parallel -J nice_profile bzip2 -9 ::: *
|
|
|
|
Example: Profile for running a perl script before every command:
|
|
|
|
echo "perl -e '\$a=\$\$; print \$a,\" \",'\$PARALLEL_SEQ',\" \";';" > ~/.parallel/pre_perl
|
|
parallel -J pre_perl echo ::: *
|
|
|
|
Note how the $ and " need to be quoted using \.
|
|
|
|
Example: Profile for running distributed jobs with B<nice> on the
|
|
remote computers:
|
|
|
|
echo -S .. nice > ~/.parallel/dist
|
|
parallel -J dist --trc {.}.bz2 bzip2 -9 ::: *
|
|
|
|
|
|
=head1 EXIT STATUS
|
|
|
|
If B<--halt-on-error> 0 or not specified:
|
|
|
|
=over 6
|
|
|
|
=item 0
|
|
|
|
All jobs ran without error.
|
|
|
|
=item 1-253
|
|
|
|
Some of the jobs failed. The exit status gives the number of failed jobs
|
|
|
|
=item 254
|
|
|
|
More than 253 jobs failed.
|
|
|
|
=item 255
|
|
|
|
Other error.
|
|
|
|
=back
|
|
|
|
If B<--halt-on-error> 1 or 2: Exit status of the failing job.
|
|
|
|
|
|
=head1 DIFFERENCES BETWEEN GNU Parallel AND ALTERNATIVES
|
|
|
|
There are a lot programs with some of the functionality of GNU
|
|
B<parallel>. GNU B<parallel> strives to include the best of the
|
|
functionality without sacrifying ease of use.
|
|
|
|
=head2 SUMMARY TABLE
|
|
|
|
The following features are in some of the comparable tools:
|
|
|
|
Inputs
|
|
I1. Arguments can be read from stdin
|
|
I2. Arguments can be read from a file
|
|
I3. Arguments can be read from multiple files
|
|
I4. Arguments can be read from command line
|
|
I5. Arguments can be read from a table
|
|
I6. Arguments can be read from the same file using #! (shebang)
|
|
I7. Line oriented input as default (Quoting of special chars not needed)
|
|
|
|
Manipulation of input
|
|
M1. Composed command
|
|
M2. Multiple arguments can fill up an execution line
|
|
M3. Arguments can be put anywhere in the execution line
|
|
M4. Multiple arguments can be put anywhere in the execution line
|
|
M5. Arguments can be replaced with context
|
|
M6. Input can be treated as complete execution line
|
|
|
|
Outputs
|
|
O1. Grouping output so output from different jobs do not mix
|
|
O2. Send stderr to stderr
|
|
O3. Send stdout to stdout
|
|
O4. Order of output can be same as order of input
|
|
O5. Stdout only contains stdout from the command
|
|
O6. Stderr only contains stdout from the command
|
|
|
|
Execution
|
|
E1. Running jobs in parallel
|
|
E2. List running jobs
|
|
E3. Finish running jobs, but do not start new jobs
|
|
E4. Number of running jobs can depend on number of cpus
|
|
E5. Finish running jobs, but do not start new jobs after first failure
|
|
E6. Number of running jobs can be adjusted while running
|
|
|
|
Remote execution
|
|
R1. Jobs can be run on remote computers
|
|
R2. Basefiles can be transferred
|
|
R3. Argument files can be transferred
|
|
R4. Result files can be transferred
|
|
R5. Cleanup of transferred files
|
|
R6. No config files needed
|
|
R7. Do not run more than SSHD's MaxStartup can handle
|
|
R8. Configurable SSH command
|
|
R9. Retry if connection breaks occationally
|
|
|
|
Semaphore
|
|
S1. Possibility to work as a mutex
|
|
S2. Possibility to work as a counting semaphore
|
|
|
|
Legend
|
|
- = no
|
|
x = not applicable
|
|
ID = yes
|
|
|
|
As every new version of the programs are not tested the table may be
|
|
outdated. Please file a bug-report if you find errors (See REPORTING
|
|
BUGS).
|
|
|
|
parallel:
|
|
I1 I2 I3 I4 I5 I6 I7
|
|
M1 M2 M3 M4 M5 M6
|
|
O1 O2 O3 O4 O5 O6
|
|
E1 E2 E3 E4 E5 E6
|
|
R1 R2 R3 R4 R5 R6 R7 R8 R9
|
|
S1 S2
|
|
|
|
xargs:
|
|
I1 I2 - - - - -
|
|
- M2 M3 - - -
|
|
- O2 O3 - O5 O6
|
|
E1 - - - - -
|
|
- - - - - x - - -
|
|
- -
|
|
|
|
find -exec:
|
|
- - - x - x -
|
|
- M2 M3 - - - -
|
|
- O2 O3 O4 O5 O6
|
|
- - - - - - -
|
|
- - - - - - - - -
|
|
x x
|
|
|
|
make -j:
|
|
- - - - - - -
|
|
- - - - - -
|
|
O1 O2 O3 - x O6
|
|
E1 - - - E5 -
|
|
- - - - - - - - -
|
|
- -
|
|
|
|
ppss:
|
|
I1 I2 - - - - I7
|
|
M1 - M3 - - M6
|
|
O1 - - x - -
|
|
E1 E2 ?E3 E4 - -
|
|
R1 R2 R3 R4 - - ?R7 ? ?
|
|
- -
|
|
|
|
pexec:
|
|
I1 I2 - I4 I5 - -
|
|
M1 - M3 - - M6
|
|
O1 O2 O3 - O5 O6
|
|
E1 - - E4 - E6
|
|
R1 - - - - R6 - - -
|
|
S1 -
|
|
|
|
xjobs: TODO - Please file a bug-report if you know what features xjobs
|
|
supports (See REPORTING BUGS).
|
|
|
|
prll: TODO - Please file a bug-report if you know what features prll
|
|
supports (See REPORTING BUGS).
|
|
|
|
dxargs: TODO - Please file a bug-report if you know what features dxargs
|
|
supports (See REPORTING BUGS).
|
|
|
|
mdm/middelman: TODO - Please file a bug-report if you know what
|
|
features mdm/middelman supports (See REPORTING BUGS).
|
|
|
|
xapply: TODO - Please file a bug-report if you know what features xapply
|
|
supports (See REPORTING BUGS).
|
|
|
|
paexec: TODO - Please file a bug-report if you know what features paexec
|
|
supports (See REPORTING BUGS).
|
|
|
|
ClusterSSH: TODO - Please file a bug-report if you know what features ClusterSSH
|
|
supports (See REPORTING BUGS).
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN xargs AND GNU Parallel
|
|
|
|
B<xargs> offer some of the same possibilites as GNU B<parallel>.
|
|
|
|
B<xargs> deals badly with special characters (such as space, ' and
|
|
"). To see the problem try this:
|
|
|
|
touch important_file
|
|
touch 'not important_file'
|
|
ls not* | xargs rm
|
|
mkdir -p "My brother's 12\" records"
|
|
ls | xargs rmdir
|
|
|
|
You can specify B<-0> or B<-d "\n">, but many input generators are not
|
|
optimized for using B<NUL> as separator but are optimized for
|
|
B<newline> as separator. E.g B<head>, B<tail>, B<awk>, B<ls>, B<echo>,
|
|
B<sed>, B<tar -v>, B<perl> (B<-0> and \0 instead of \n), B<locate>
|
|
(requires using B<-0>), B<find> (requires using B<-print0>), B<grep>
|
|
(requires user to use B<-z> or B<-Z>), B<sort> (requires using B<-z>).
|
|
|
|
So GNU B<parallel>'s newline separation can be emulated with:
|
|
|
|
B<cat | xargs -d "\n" -n1 I<command>>
|
|
|
|
B<xargs> can run a given number of jobs in parallel, but has no
|
|
support for running number-of-cpu-cores jobs in parallel.
|
|
|
|
B<xargs> has no support for grouping the output, therefore output may
|
|
run together, e.g. the first half of a line is from one process and
|
|
the last half of the line is from another process. The example
|
|
B<Parallel grep> cannot be done reliably with B<xargs> because of
|
|
this. To see this in action try:
|
|
|
|
parallel perl -e '\$a=\"1{}\"x10000000\;print\ \$a,\"\\n\"' '>' {} ::: a b c d e f
|
|
ls -l a b c d e f
|
|
parallel -kP4 -n1 grep 1 > out.par ::: a b c d e f
|
|
echo a b c d e f | xargs -P4 -n1 grep 1 > out.xargs-unbuf
|
|
echo a b c d e f | xargs -P4 -n1 grep --line-buffered 1 > out.xargs-linebuf
|
|
echo a b c d e f | xargs -n1 grep --line-buffered 1 > out.xargs-serial
|
|
ls -l out*
|
|
md5sum out*
|
|
|
|
B<xargs> has no support for keeping the order of the output, therefore
|
|
if running jobs in parallel using B<xargs> the output of the second
|
|
job cannot be postponed till the first job is done.
|
|
|
|
B<xargs> has no support for running jobs on remote computers.
|
|
|
|
B<xargs> has no support for context replace, so you will have to create the
|
|
arguments.
|
|
|
|
If you use a replace string in B<xargs> (B<-I>) you can not force
|
|
B<xargs> to use more than one argument.
|
|
|
|
Quoting in B<xargs> works like B<-q> in GNU B<parallel>. This means
|
|
composed commands and redirection require using B<bash -c>.
|
|
|
|
B<ls | parallel "wc {} >> B<{}.wc">
|
|
|
|
becomes (assuming you have 8 cores)
|
|
|
|
B<ls | xargs -d "\n" -P8 -I {} bash -c "wc {} >>B< {}.wc">
|
|
|
|
and
|
|
|
|
B<ls | parallel "echo {}; ls {}|wc">
|
|
|
|
becomes (assuming you have 8 cores)
|
|
|
|
B<ls | xargs -d "\n" -P8 -I {} bash -c "echo {}; ls {}|wc">
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN find -exec AND GNU Parallel
|
|
|
|
B<find -exec> offer some of the same possibilites as GNU B<parallel>.
|
|
|
|
B<find -exec> only works on files. So processing other input (such as
|
|
hosts or URLs) will require creating these inputs as files. B<find
|
|
-exec> has no support for running commands in parallel.
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN make -j AND GNU Parallel
|
|
|
|
B<make -j> can run jobs in parallel, but requires a crafted Makefile
|
|
to do this. That results in extra quoting to get filename containing
|
|
newline to work correctly.
|
|
|
|
B<make -j> has no support for grouping the output, therefore output
|
|
may run together, e.g. the first half of a line is from one process
|
|
and the last half of the line is from another process. The example
|
|
B<Parallel grep> cannot be done reliably with B<make -j> because of
|
|
this.
|
|
|
|
(Very early versions of GNU B<parallel> were coincidently implemented
|
|
using B<make -j>).
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN ppss AND GNU Parallel
|
|
|
|
B<ppss> is also a tool for running jobs in parallel.
|
|
|
|
The output of B<ppss> is status information and thus not useful for
|
|
using as input for another command. The output from the jobs are put
|
|
into files.
|
|
|
|
The argument replace string ($ITEM) cannot be changed. Arguments must
|
|
be quoted - thus arguments containing special characters (space '"&!*)
|
|
may cause problems. More than one argument is not supported. File
|
|
names containing newlines are not processed correctly. When reading
|
|
input from a file null cannot be used as a terminator. B<ppss> needs
|
|
to read the whole input file before starting any jobs.
|
|
|
|
Output and status information is stored in ppss_dir and thus requires
|
|
cleanup when completed. If the dir is not removed before running
|
|
B<ppss> again it may cause nothing to happen as B<ppss> thinks the
|
|
task is already done. GNU B<parallel> will normally not need cleaning
|
|
up if running locally and will only need cleaning up if stopped
|
|
abnormally and running remote (B<--cleanup> may not complete if
|
|
stopped abnormally). The example B<Parallel grep> would require extra
|
|
postprocessing if written using B<ppss>.
|
|
|
|
For remote systems PPSS requires 3 steps: config, deploy, and
|
|
start. GNU B<parallel> only requires one step.
|
|
|
|
=head3 EXAMPLES FROM ppss MANUAL
|
|
|
|
Here are the examples from B<ppss>'s manual page with the equivalent
|
|
using GNU B<parallel>:
|
|
|
|
B<1> ./ppss.sh standalone -d /path/to/files -c 'gzip '
|
|
|
|
B<1> find /path/to/files -type f | parallel gzip
|
|
|
|
B<2> ./ppss.sh standalone -d /path/to/files -c 'cp "$ITEM" /destination/dir '
|
|
|
|
B<2> find /path/to/files -type f | parallel cp {} /destination/dir
|
|
|
|
B<3> ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q '
|
|
|
|
B<3> parallel -a list-of-urls.txt wget -q
|
|
|
|
B<4> ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q "$ITEM"'
|
|
|
|
B<4> parallel -a list-of-urls.txt wget -q {}
|
|
|
|
B<5> ./ppss config -C config.cfg -c 'encode.sh ' -d /source/dir -m
|
|
192.168.1.100 -u ppss -k ppss-key.key -S ./encode.sh -n nodes.txt -o
|
|
/some/output/dir --upload --download ; ./ppss deploy -C config.cfg ;
|
|
./ppss start -C config
|
|
|
|
B<5> # parallel does not use configs. If you want a different username put it in nodes.txt: user@hostname
|
|
|
|
B<5> find source/dir -type f | parallel --sshloginfile nodes.txt --trc {.}.mp3 lame -a {} -o {.}.mp3 --preset standard --quiet
|
|
|
|
B<6> ./ppss stop -C config.cfg
|
|
|
|
B<6> killall -TERM parallel
|
|
|
|
B<7> ./ppss pause -C config.cfg
|
|
|
|
B<7> Press: CTRL-Z or killall -SIGTSTP parallel
|
|
|
|
B<8> ./ppss continue -C config.cfg
|
|
|
|
B<8> Enter: fg or killall -SIGCONT parallel
|
|
|
|
B<9> ./ppss.sh status -C config.cfg
|
|
|
|
B<9> killall -SIGUSR2 parallel
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN pexec AND GNU Parallel
|
|
|
|
B<pexec> is also a tool for running jobs in parallel.
|
|
|
|
Here are the examples from B<pexec>'s info page with the equivalent
|
|
using GNU B<parallel>:
|
|
|
|
B<1> pexec -o sqrt-%s.dat -p "$(seq 10)" -e NUM -n 4 -c -- \
|
|
'echo "scale=10000;sqrt($NUM)" | bc'
|
|
|
|
B<1> seq 10 | parallel -j4 'echo "scale=10000;sqrt({})" | bc > sqrt-{}.dat'
|
|
|
|
B<2> pexec -p "$(ls myfiles*.ext)" -i %s -o %s.sort -- sort
|
|
|
|
B<2> ls myfiles*.ext | parallel sort {} ">{}.sort"
|
|
|
|
B<3> pexec -f image.list -n auto -e B -u star.log -c -- \
|
|
'fistar $B.fits -f 100 -F id,x,y,flux -o $B.star'
|
|
|
|
B<3> parallel -a image.list \
|
|
'fistar {}.fits -f 100 -F id,x,y,flux -o {}.star' 2>star.log
|
|
|
|
B<4> pexec -r *.png -e IMG -c -o - -- \
|
|
'convert $IMG ${IMG%.png}.jpeg ; "echo $IMG: done"'
|
|
|
|
B<4> ls *.png | parallel 'convert {} {.}.jpeg; echo {}: done'
|
|
|
|
B<5> pexec -r *.png -i %s -o %s.jpg -c 'pngtopnm | pnmtojpeg'
|
|
|
|
B<5> ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {}.jpg'
|
|
|
|
B<6> for p in *.png ; do echo ${p%.png} ; done | \
|
|
pexec -f - -i %s.png -o %s.jpg -c 'pngtopnm | pnmtojpeg'
|
|
|
|
B<6> ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {.}.jpg'
|
|
|
|
B<7> LIST=$(for p in *.png ; do echo ${p%.png} ; done)
|
|
pexec -r $LIST -i %s.png -o %s.jpg -c 'pngtopnm | pnmtojpeg'
|
|
|
|
B<7> ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {.}.jpg'
|
|
|
|
B<8> pexec -n 8 -r *.jpg -y unix -e IMG -c \
|
|
'pexec -j -m blockread -d $IMG | \
|
|
jpegtopnm | pnmscale 0.5 | pnmtojpeg | \
|
|
pexec -j -m blockwrite -s th_$IMG'
|
|
|
|
B<8> Combining GNU B<parallel> and GNU B<sem>.
|
|
|
|
B<8> ls *jpg | parallel -j8 'sem --id blockread cat {} | jpegtopnm |' \
|
|
'pnmscale 0.5 | pnmtojpeg | sem --id blockwrite cat > th_{}'
|
|
|
|
B<8> If reading and writing is done to the same disk, this may be
|
|
faster as only one process will be either reading or writing:
|
|
|
|
B<8> ls *jpg | parallel -j8 'sem --id diskio cat {} | jpegtopnm |' \
|
|
'pnmscale 0.5 | pnmtojpeg | sem --id diskio cat > th_{}'
|
|
|
|
=head2 DIFFERENCES BETWEEN xjobs AND GNU Parallel
|
|
|
|
B<xjobs> is also a tool for running jobs in parallel. It only supports
|
|
running jobs on your local computer.
|
|
|
|
B<xjobs> deals badly with special characters just like B<xargs>. See
|
|
the section B<DIFFERENCES BETWEEN xargs AND GNU Parallel>.
|
|
|
|
Here are the examples from B<xjobs>'s man page with the equivalent
|
|
using GNU B<parallel>:
|
|
|
|
B<1> ls -1 *.zip | xjobs unzip
|
|
|
|
B<1> ls *.zip | parallel unzip
|
|
|
|
B<2> ls -1 *.zip | xjobs -n unzip
|
|
|
|
B<2> ls *.zip | parallel unzip >/dev/null
|
|
|
|
B<3> find . -name '*.bak' | xjobs gzip
|
|
|
|
B<3> find . -name '*.bak' | parallel gzip
|
|
|
|
B<4> ls -1 *.jar | sed 's/\(.*\)/\1 > \1.idx/' | xjobs jar tf
|
|
|
|
B<4> ls *.jar | parallel jar tf {} '>' {}.idx
|
|
|
|
B<5> xjobs -s script
|
|
|
|
B<5> cat script | parallel
|
|
|
|
B<6> mkfifo /var/run/my_named_pipe;
|
|
xjobs -s /var/run/my_named_pipe &
|
|
echo unzip 1.zip >> /var/run/my_named_pipe;
|
|
echo tar cf /backup/myhome.tar /home/me >> /var/run/my_named_pipe
|
|
|
|
B<6> mkfifo /var/run/my_named_pipe;
|
|
cat /var/run/my_named_pipe | parallel &
|
|
echo unzip 1.zip >> /var/run/my_named_pipe;
|
|
echo tar cf /backup/myhome.tar /home/me >> /var/run/my_named_pipe
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN prll AND GNU Parallel
|
|
|
|
B<prll> is also a tool for running jobs in parallel. It does not
|
|
support running jobs on remote computers.
|
|
|
|
B<prll> encourages using BASH aliases and BASH functions instead of
|
|
scripts. GNU B<parallel> will never support running aliases and
|
|
functions (see why
|
|
http://www.perlmonks.org/index.pl?node_id=484296). However, scripts or
|
|
composed commands work just fine.
|
|
|
|
B<prll> generates a lot of status information on STDERR which makes it
|
|
harder to use the STDERR output of the job directly as input for
|
|
another program.
|
|
|
|
Here is the example from B<prll>'s man page with the equivalent
|
|
using GNU B<parallel>:
|
|
|
|
prll -s 'mogrify -flip $1' *.jpg
|
|
|
|
parallel mogrify -flip ::: *.jpg
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN dxargs AND GNU Parallel
|
|
|
|
B<dxargs> is also a tool for running jobs in parallel.
|
|
|
|
B<dxargs> does not deal well with more simultaneous jobs than SSHD's
|
|
MaxStartup. B<dxargs> is only built for remote run jobs, but does not
|
|
support transferring of files.
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN mdm/middleman AND GNU Parallel
|
|
|
|
middleman(mdm) is also a tool for running jobs in parallel.
|
|
|
|
Here are the shellscripts of http://mdm.berlios.de/usage.html ported
|
|
to GNU B<parallel>:
|
|
|
|
B<seq 19 | parallel buffon -o - | sort -n >>B< result>
|
|
|
|
B<cat files | parallel cmd>
|
|
|
|
B<find dir -execdir sem cmd {} \;>
|
|
|
|
=head2 DIFFERENCES BETWEEN xapply AND GNU Parallel
|
|
|
|
B<xapply> can run jobs in parallel on the local computer.
|
|
|
|
Here are the examples from B<xapply>'s man page with the equivalent
|
|
using GNU B<parallel>:
|
|
|
|
B<1> xapply '(cd %1 && make all)' */
|
|
|
|
B<1> parallel 'cd {} && make all' ::: */
|
|
|
|
B<2> xapply -f 'diff %1 ../version5/%1' manifest | more
|
|
|
|
B<2> parallel diff {} ../version5/{} < manifest | more
|
|
|
|
B<3> xapply -p/dev/null -f 'diff %1 %2' manifest1 checklist1
|
|
|
|
B<3> parallel diff {1} {2} :::: manifest1 checklist1
|
|
|
|
B<4> xapply 'indent' *.c
|
|
|
|
B<4> parallel indent ::: *.c
|
|
|
|
B<5> find ~ksb/bin -type f ! -perm -111 -print | xapply -f -v 'chmod a+x' -
|
|
|
|
B<5> find ~ksb/bin -type f ! -perm -111 -print | parallel -v chmod a+x
|
|
|
|
B<6> find */ -... | fmt 960 1024 | xapply -f -i /dev/tty 'vi' -
|
|
|
|
B<6> sh <(find */ -... | parallel -s 1024 echo vi)
|
|
|
|
B<6> find */ -... | parallel -s 1024 -Xuj1 vi
|
|
|
|
B<7> find ... | xapply -f -5 -i /dev/tty 'vi' - - - - -
|
|
|
|
B<7> sh <(find ... |parallel -n5 echo vi)
|
|
|
|
B<7> find ... |parallel -n5 -uj1 vi
|
|
|
|
B<8> xapply -fn "" /etc/passwd
|
|
|
|
B<8> parallel -k echo < /etc/passwd
|
|
|
|
B<9> tr ':' '\012' < /etc/passwd | xapply -7 -nf 'chown %1 %6' - - - - - - -
|
|
|
|
B<9> tr ':' '\012' < /etc/passwd | parallel -N7 chown {1} {6}
|
|
|
|
B<10> xapply '[ -d %1/RCS ] || echo %1' */
|
|
|
|
B<10> parallel '[ -d {}/RCS ] || echo {}' ::: */
|
|
|
|
B<11> xapply -f '[ -f %1 ] && echo %1' List | ...
|
|
|
|
B<11> parallel '[ -f {} ] && echo {}' < List | ...
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN paexec AND GNU Parallel
|
|
|
|
B<paexec> can run jobs in parallel on both the local and remote computers.
|
|
|
|
B<paexec> requires commands to print a blank line as the last
|
|
output. This means you will have to write a wrapper for most programs.
|
|
|
|
B<paexec> has a job dependency facility so a job can depend on another
|
|
job to be executed successfully. Sort of a poor-man's B<make>.
|
|
|
|
Here are the examples from B<paexec>'s example catalog with the equivalent
|
|
using GNU B<parallel>:
|
|
|
|
=over 1
|
|
|
|
=item 1_div_X_run:
|
|
|
|
../../paexec -s -l -c "`pwd`/1_div_X_cmd" -n +1 <<EOF [...]
|
|
parallel echo {} '|' `pwd`/1_div_X_cmd <<EOF [...]
|
|
|
|
=item all_substr_run:
|
|
|
|
../../paexec -lp -c "`pwd`/all_substr_cmd" -n +3 <<EOF [...]
|
|
parallel echo {} '|' `pwd`/all_substr_cmd <<EOF [...]
|
|
|
|
=item cc_wrapper_run:
|
|
|
|
../../paexec -c "env CC=gcc CFLAGS=-O2 `pwd`/cc_wrapper_cmd" \
|
|
-n 'host1 host2' \
|
|
-t '/usr/bin/ssh -x' <<EOF [...]
|
|
parallel echo {} '|' "env CC=gcc CFLAGS=-O2 `pwd`/cc_wrapper_cmd" \
|
|
-S host1,host2 <<EOF [...]
|
|
# This is not exactly the same, but avoids the wrapper
|
|
parallel gcc -O2 -c -o {.}.o {} \
|
|
-S host1,host2 <<EOF [...]
|
|
|
|
=item toupper_run:
|
|
|
|
../../paexec -lp -c "`pwd`/toupper_cmd" -n +10 <<EOF [...]
|
|
parallel echo {} '|' ./toupper_cmd <<EOF [...]
|
|
# Without the wrapper:
|
|
parallel echo {} '| awk {print\ toupper\(\$0\)}' <<EOF [...]
|
|
|
|
=back
|
|
|
|
=head2 DIFFERENCES BETWEEN ClusterSSH AND GNU Parallel
|
|
|
|
ClusterSSH solves a different problem than GNU B<parallel>.
|
|
|
|
ClusterSSH runs the same command with the same arguments on a list of
|
|
computers - one per computer. This is typically used for administrating
|
|
several computers that are almost identical.
|
|
|
|
GNU B<parallel> runs the same (or different) commands with different
|
|
arguments in parallel possibly using remote computers to help
|
|
computing. If more than one computer is listed in B<-S> GNU B<parallel> may
|
|
only use one of these (e.g. if there are 8 jobs to be run and one
|
|
computer has 8 cores).
|
|
|
|
GNU B<parallel> can be used as a poor-man's version of ClusterSSH:
|
|
|
|
B<cat hostlist | parallel ssh {} do_stuff>
|
|
|
|
|
|
=head1 BUGS
|
|
|
|
=head2 Quoting of newline
|
|
|
|
Because of the way newline is quoted this will not work:
|
|
|
|
echo 1,2,3 | parallel -vkd, "echo 'a{}b'"
|
|
|
|
However, these will all work:
|
|
|
|
echo 1,2,3 | parallel -vkd, echo a{}b
|
|
|
|
echo 1,2,3 | parallel -vkd, "echo 'a'{}'b'"
|
|
|
|
echo 1,2,3 | parallel -vkd, "echo 'a'"{}"'b'"
|
|
|
|
|
|
=head2 Startup speed
|
|
|
|
GNU B<parallel> is slow at starting up. Half of the startup time on
|
|
the local computer is spent finding the maximal length of a command
|
|
line. Setting B<-s> will remove this part of the startup time.
|
|
|
|
When using multiple computers GNU B<parallel> opens B<ssh> connections
|
|
to them to figure out how many connections can be used reliably
|
|
simultaneously (Namely SSHD's MaxStartup). This test is done for each
|
|
host in serial, so if your --sshloginfile contains many hosts it may
|
|
be slow.
|
|
|
|
=head2 --nice limits command length
|
|
|
|
The current implementation of B<--nice> is too pessimistic in the max
|
|
allowed command length. It only uses a little more than half of what
|
|
it could. This affects -X and -m. If this becomes a real problem for
|
|
you file a bug-report.
|
|
|
|
=head2 Aliases and functions do not work
|
|
|
|
If you get:
|
|
|
|
B<Can't exec "I<command>": No such file or directory>
|
|
|
|
or:
|
|
|
|
B<open3: exec of by I<command> failed>
|
|
|
|
it may be because I<command> is not known, but it could also be
|
|
because I<command> is an alias or a function. GNU B<parallel> will
|
|
never support running aliases and functions (see why
|
|
http://www.perlmonks.org/index.pl?node_id=484296), so change your
|
|
alias or function to a script.
|
|
|
|
|
|
=head1 REPORTING BUGS
|
|
|
|
Report bugs to <bug-parallel@gnu.org> or
|
|
https://savannah.gnu.org/bugs/?func=additem&group=parallel
|
|
|
|
Your bugreport should always include:
|
|
|
|
=over 2
|
|
|
|
=item *
|
|
|
|
The output of B<parallel --version>. If you are not running the latest
|
|
released version you should specify why you believe the problem is not
|
|
fixed in that version.
|
|
|
|
=item *
|
|
|
|
A complete example that others can run that shows the problem. A
|
|
combination of B<seq>, B<echo>, and B<sleep> can reproduce most
|
|
errors.
|
|
|
|
=back
|
|
|
|
|
|
=head1 AUTHOR
|
|
|
|
When using GNU Parallel for a publication please cite:
|
|
|
|
O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login:
|
|
The USENIX Magazine, February 2011:42-47.
|
|
|
|
Copyright (C) 2007-10-18 Ole Tange, http://ole.tange.dk
|
|
|
|
Copyright (C) 2008,2009,2010 Ole Tange, http://ole.tange.dk
|
|
|
|
Copyright (C) 2010,2011 Ole Tange, http://ole.tange.dk and Free
|
|
Software Foundation, Inc.
|
|
|
|
Parts of the manual concerning B<xargs> compatibility is inspired by
|
|
the manual of B<xargs> from GNU findutils 4.4.2.
|
|
|
|
|
|
=head1 LICENSE
|
|
|
|
Copyright (C) 2007,2008,2009,2010,2011 Free Software Foundation, Inc.
|
|
|
|
This program is free software; you can redistribute it and/or modify
|
|
it under the terms of the GNU General Public License as published by
|
|
the Free Software Foundation; either version 3 of the License, or
|
|
at your option any later version.
|
|
|
|
This program is distributed in the hope that it will be useful,
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
GNU General Public License for more details.
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
|
|
|
=head2 Documentation license I
|
|
|
|
Permission is granted to copy, distribute and/or modify this documentation
|
|
under the terms of the GNU Free Documentation License, Version 1.3 or
|
|
any later version published by the Free Software Foundation; with no
|
|
Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
|
|
Texts. A copy of the license is included in the file fdl.txt.
|
|
|
|
=head2 Documentation license II
|
|
|
|
You are free:
|
|
|
|
=over 9
|
|
|
|
=item B<to Share>
|
|
|
|
to copy, distribute and transmit the work
|
|
|
|
=item B<to Remix>
|
|
|
|
to adapt the work
|
|
|
|
=back
|
|
|
|
Under the following conditions:
|
|
|
|
=over 9
|
|
|
|
=item B<Attribution>
|
|
|
|
You must attribute the work in the manner specified by the author or
|
|
licensor (but not in any way that suggests that they endorse you or
|
|
your use of the work).
|
|
|
|
=item B<Share Alike>
|
|
|
|
If you alter, transform, or build upon this work, you may distribute
|
|
the resulting work only under the same, similar or a compatible
|
|
license.
|
|
|
|
=back
|
|
|
|
With the understanding that:
|
|
|
|
=over 9
|
|
|
|
=item B<Waiver>
|
|
|
|
Any of the above conditions can be waived if you get permission from
|
|
the copyright holder.
|
|
|
|
=item B<Public Domain>
|
|
|
|
Where the work or any of its elements is in the public domain under
|
|
applicable law, that status is in no way affected by the license.
|
|
|
|
=item B<Other Rights>
|
|
|
|
In no way are any of the following rights affected by the license:
|
|
|
|
=over 2
|
|
|
|
=item *
|
|
|
|
Your fair dealing or fair use rights, or other applicable
|
|
copyright exceptions and limitations;
|
|
|
|
=item *
|
|
|
|
The author's moral rights;
|
|
|
|
=item *
|
|
|
|
Rights other persons may have either in the work itself or in
|
|
how the work is used, such as publicity or privacy rights.
|
|
|
|
=back
|
|
|
|
=back
|
|
|
|
=over 9
|
|
|
|
=item B<Notice>
|
|
|
|
For any reuse or distribution, you must make clear to others the
|
|
license terms of this work.
|
|
|
|
=back
|
|
|
|
A copy of the full license is included in the file as cc-by-sa.txt.
|
|
|
|
|
|
=head1 DEPENDENCIES
|
|
|
|
GNU B<parallel> uses Perl, and the Perl modules Getopt::Long,
|
|
IPC::Open3, Symbol, IO::File, POSIX, and File::Temp. For remote usage
|
|
it also uses rsync with ssh.
|
|
|
|
|
|
=head1 SEE ALSO
|
|
|
|
B<ssh>(1), B<rsync>(1), B<find>(1), B<xargs>(1), B<dirname>,
|
|
B<make>(1), B<pexec>(1), B<ppss>(1), B<xjobs>(1), B<prll>(1),
|
|
B<dxargs>(1), B<mdm>(1),
|
|
|
|
=cut
|