mirror of
https://git.savannah.gnu.org/git/parallel.git
synced 2024-11-22 05:57:54 +00:00
src/parallel: Better examples
This commit is contained in:
parent
d7be89d786
commit
65b073c7c4
230
src/parallel
230
src/parallel
|
@ -11,16 +11,17 @@ B<parallel> [-0cdEfghiIkmnpqrtuUvVX] [B<-I> str] [B<-j> num] [--silent]
|
||||||
|
|
||||||
=head1 DESCRIPTION
|
=head1 DESCRIPTION
|
||||||
|
|
||||||
GNU B<parallel> is a shell tool for executing jobs in parallel. A job is
|
GNU B<parallel> is a shell tool for executing jobs in parallel. A job
|
||||||
typically a single command or a small script that has to be run for
|
is typically a single command or a small script that has to be run for
|
||||||
each of the lines in the input. The typical input is a list of files,
|
each of the lines in the input. The typical input is a list of files,
|
||||||
a list of hosts, a list of users, or a list of tables.
|
a list of hosts, a list of users, a list of URLs, or a list of tables.
|
||||||
|
|
||||||
If you use B<xargs> today you will find GNU B<parallel> very easy to
|
If you use B<xargs> today you will find GNU B<parallel> very easy to
|
||||||
use. If you write loops in shell, you will find GNU B<parallel> may be
|
use as GNU B<parallel> is written to have the same options as
|
||||||
able to replace most of the loops and make them run faster by running
|
B<xargs>. If you write loops in shell, you will find GNU B<parallel>
|
||||||
jobs in parallel. If you use B<ppss> or B<pexec> you will find GNU
|
may be able to replace most of the loops and make them run faster by
|
||||||
B<parallel> will often make the command easier to read.
|
running several jobs in parallel. If you use B<ppss> or B<pexec> you will find
|
||||||
|
GNU B<parallel> will often make the command easier to read.
|
||||||
|
|
||||||
GNU B<parallel> makes sure output from the commands is the same output as
|
GNU B<parallel> makes sure output from the commands is the same output as
|
||||||
you would get had you run the commands sequentially. This makes it
|
you would get had you run the commands sequentially. This makes it
|
||||||
|
@ -168,9 +169,9 @@ B<-g> is the default. Can be reversed with B<-u>.
|
||||||
Print a summary of the options to GNU B<parallel> and exit.
|
Print a summary of the options to GNU B<parallel> and exit.
|
||||||
|
|
||||||
|
|
||||||
=item B<-I> I<string>
|
=item B<-I> I<replace-str>
|
||||||
|
|
||||||
Use the replacement string I<string> instead of {}.
|
Use the replacement string I<replace-str> instead of {}.
|
||||||
|
|
||||||
|
|
||||||
=item B<--replace>[=I<replace-str>]
|
=item B<--replace>[=I<replace-str>]
|
||||||
|
@ -439,11 +440,11 @@ Ungroup output. Output is printed as soon as possible. This may cause
|
||||||
output from different commands to be mixed. Can be reversed with B<-g>.
|
output from different commands to be mixed. Can be reversed with B<-g>.
|
||||||
|
|
||||||
|
|
||||||
=item B<--extensionreplace> I<string>
|
=item B<--extensionreplace> I<replace-str>
|
||||||
|
|
||||||
=item B<-U> I<string>
|
=item B<-U> I<replace-str>
|
||||||
|
|
||||||
Use the replacement string I<string> instead of {.} for input line without extension.
|
Use the replacement string I<replace-str> instead of {.} for input line without extension.
|
||||||
|
|
||||||
|
|
||||||
=item B<--use-cpus-instead-of-cores> (not implemented)
|
=item B<--use-cpus-instead-of-cores> (not implemented)
|
||||||
|
@ -453,7 +454,7 @@ jobs to run in parallel relative to the number of cores you can ask
|
||||||
GNU B<parallel> to instead look at the number of CPUs. This will make sense
|
GNU B<parallel> to instead look at the number of CPUs. This will make sense
|
||||||
for computers that have hyperthreading as two jobs running on one CPU
|
for computers that have hyperthreading as two jobs running on one CPU
|
||||||
with hyperthreading will run slower than two jobs running on two CPUs.
|
with hyperthreading will run slower than two jobs running on two CPUs.
|
||||||
Normal users will not need this option.
|
Most users will not need this option.
|
||||||
|
|
||||||
|
|
||||||
=item B<-v>
|
=item B<-v>
|
||||||
|
@ -473,56 +474,70 @@ Print the version GNU B<parallel> and exit.
|
||||||
|
|
||||||
=item B<-m>
|
=item B<-m>
|
||||||
|
|
||||||
Multiple. Insert as many arguments as the command line length permits. If
|
Multiple. Insert as many arguments as the command line length
|
||||||
{} is not used the arguments will be appended to the line. If {} is
|
permits. If {} is not used the arguments will be appended to the line.
|
||||||
used multiple times each {} will be replaced with all the arguments.
|
If {} is used multiple times each {} will be replaced with all the
|
||||||
|
arguments.
|
||||||
|
|
||||||
|
|
||||||
=item B<-X>
|
=item B<-X>
|
||||||
|
|
||||||
xargs with context replace. This works like B<-m> except if {} is part
|
xargs with context replace. This works like B<-m> except if {} is part
|
||||||
of a word (like I<pic{}.jpg>) then the whole word will be repeated.
|
of a word (like I<pic{}.jpg>) then the whole word will be
|
||||||
|
repeated. Normally B<-X> will do the right thing, whereas B<-m> can
|
||||||
|
give surprising results if {} is used as part of a word.
|
||||||
|
|
||||||
=back
|
=back
|
||||||
|
|
||||||
=head1 EXAMPLE 1: Working as cat | sh. Ressource inexpensive jobs and evaluation
|
=head1 EXAMPLE: Working as xargs -n1. Argument appending
|
||||||
|
|
||||||
GNU B<parallel> can work similar to B<cat | sh>.
|
GNU B<parallel> can work similar to B<xargs -n1>.
|
||||||
|
|
||||||
A ressource inexpensive job is a job that takes very little CPU, disk
|
To compress all html files using B<gzip> run:
|
||||||
I/O and network I/O. Ping is an example of a ressource inexpensive
|
|
||||||
job. wget is too - if the webpages are small.
|
|
||||||
|
|
||||||
The content of the file jobs_to_run:
|
B<find . -name '*.html' | parallel gzip>
|
||||||
|
|
||||||
ping -c 1 10.0.0.1
|
|
||||||
wget http://status-server/status.cgi?ip=10.0.0.1
|
|
||||||
ping -c 1 10.0.0.2
|
|
||||||
wget http://status-server/status.cgi?ip=10.0.0.2
|
|
||||||
...
|
|
||||||
ping -c 1 10.0.0.255
|
|
||||||
wget http://status-server/status.cgi?ip=10.0.0.255
|
|
||||||
|
|
||||||
To run 100 processes simultaneously do:
|
=head1 EXAMPLE: Inserting multiple arguments
|
||||||
|
|
||||||
B<parallel -j 100 < jobs_to_run>
|
When moving a lot of files like this: B<mv * destdir> you will
|
||||||
|
sometimes get the error:
|
||||||
|
|
||||||
As there is not a B<command> the option B<-c> is default because the
|
B<bash: /bin/mv: Argument list too long>
|
||||||
jobs needs to be evaluated by the shell.
|
|
||||||
|
|
||||||
=head1 EXAMPLE 2: Working as xargs -n1. Argument appending
|
because there are too many files. You can instead do:
|
||||||
|
|
||||||
GNU B<parallel> can work similar to B<xargs -n1>.
|
B<ls | parallel mv {} destdir>
|
||||||
|
|
||||||
To output all html files run:
|
This will run B<mv> for each file. It can be done faster if B<mv> gets
|
||||||
|
as many arguments that will fit on the line:
|
||||||
|
|
||||||
B<find . -name '*.html' | parallel cat>
|
B<ls | parallel -m mv {} destdir>
|
||||||
|
|
||||||
As there is a B<command> the option B<-f> is default because the
|
|
||||||
filenames needs to be protected from the shell in case a filename
|
|
||||||
contains special characters.
|
|
||||||
|
|
||||||
=head1 EXAMPLE 3: Compute intensive jobs and substitution
|
=head1 EXAMPLE: Context replace
|
||||||
|
|
||||||
|
To remove the files I<pict0000.jpg> .. I<pict9999.jpg> you could do:
|
||||||
|
|
||||||
|
B<seq -f %04g 0 9999 | parallel rm pict{}.jpg>
|
||||||
|
|
||||||
|
You could also do:
|
||||||
|
|
||||||
|
B<seq -f %04g 0 9999 | perl -pe 's/(.*)/pict$1.jpg/' | parallel -m rm>
|
||||||
|
|
||||||
|
The first will run B<rm> 10000 times, while the last will only run
|
||||||
|
B<rm> as many times needed to keep the command line length short
|
||||||
|
enough to avoid B<Argument list too long> (it typically runs 1-2 times).
|
||||||
|
|
||||||
|
You could also run:
|
||||||
|
|
||||||
|
B<seq -f %04g 0 9999 | parallel -X rm pict{}.jpg>
|
||||||
|
|
||||||
|
This will also only run B<rm> as many times needed to keep the command
|
||||||
|
line length short enough.
|
||||||
|
|
||||||
|
|
||||||
|
=head1 EXAMPLE: Compute intensive jobs and substitution
|
||||||
|
|
||||||
If ImageMagick is installed this will generate a thumbnail of a jpg
|
If ImageMagick is installed this will generate a thumbnail of a jpg
|
||||||
file:
|
file:
|
||||||
|
@ -541,27 +556,31 @@ B<find . -name '*.jpg' | parallel -j +0 convert -geometry 120 {} {}_thumb.jpg>
|
||||||
|
|
||||||
Notice how the argument has to start with {} as {} will include path
|
Notice how the argument has to start with {} as {} will include path
|
||||||
(e.g. running B<convert -geometry 120 ./foo/bar.jpg
|
(e.g. running B<convert -geometry 120 ./foo/bar.jpg
|
||||||
thumb_./foo/bar.jpg> would clearly be wrong). It will result in files
|
thumb_./foo/bar.jpg> would clearly be wrong). The command will
|
||||||
like ./foo/bar.jpg_thumb.jpg.
|
generate files like ./foo/bar.jpg_thumb.jpg.
|
||||||
|
|
||||||
This will make files like ./foo/bar_thumb.jpg:
|
Use B<{.}> to avoid the extra .jpg in the file name. This command will
|
||||||
|
make files like ./foo/bar_thumb.jpg:
|
||||||
|
|
||||||
B<find . -name '*.jpg' | parallel -j +0 convert -geometry 120 {} {.}_thumb.jpg>
|
B<find . -name '*.jpg' | parallel -j +0 convert -geometry 120 {} {.}_thumb.jpg>
|
||||||
|
|
||||||
=head1 EXAMPLE 4: Substitution and redirection
|
|
||||||
|
|
||||||
This will compare all files in the dir to the file foo and save the
|
=head1 EXAMPLE: Substitution and redirection
|
||||||
diffs in corresponding .diff files:
|
|
||||||
|
|
||||||
B<ls | parallel diff {} foo ">>B<"{}.diff>
|
This will generate an uncompressed version of .gz-files next to the .gz-file:
|
||||||
|
|
||||||
|
B<ls *.gz | parallel zcat {} ">>B<"{.}>
|
||||||
|
|
||||||
Quoting of > is necessary to postpone the redirection. Another
|
Quoting of > is necessary to postpone the redirection. Another
|
||||||
solution is to quote the whole command:
|
solution is to quote the whole command:
|
||||||
|
|
||||||
B<ls | parallel "diff {} foo >>B<{}.diff">
|
B<ls *.gz | parallel "zcat {} >>B<{.}">
|
||||||
|
|
||||||
|
Other special shell charaters (such as * ; $ > < | >> <<) also needs
|
||||||
|
to be put in quotes, as they may otherwise be interpreted by the shell
|
||||||
|
and not given to GNU B<parallel>.
|
||||||
|
|
||||||
=head1 EXAMPLE 5: Composed commands
|
=head1 EXAMPLE: Composed commands
|
||||||
|
|
||||||
A job can consist of several commands. This will print the number of
|
A job can consist of several commands. This will print the number of
|
||||||
files in each directory:
|
files in each directory:
|
||||||
|
@ -573,28 +592,61 @@ To put the output in a file called <name>.dir:
|
||||||
B<ls | parallel '(echo -n {}" "; ls {}|wc -l) >> B<{}.dir'>
|
B<ls | parallel '(echo -n {}" "; ls {}|wc -l) >> B<{}.dir'>
|
||||||
|
|
||||||
|
|
||||||
=head1 EXAMPLE 6: Context replace
|
=head1 EXAMPLE: Removing file extension when processing files
|
||||||
|
|
||||||
To remove the files I<pict0000.jpg> .. I<pict9999.jpg> you could do:
|
When processing files removing the file extension using {.} is often
|
||||||
|
useful.
|
||||||
|
|
||||||
B<seq -f %04g 0 9999 | parallel rm pict{}.jpg>
|
Create a directory for each zip-file and unzip it in that dir:
|
||||||
|
|
||||||
You could also do:
|
B<ls *zip | parallel 'mkdir {.}; cd {.}; unzip ../{}'>
|
||||||
|
|
||||||
B<seq -f %04g 0 9999 | perl -pe 's/(.*)/pict$1.jpg/' | parallel -m rm>
|
Recompress all .gz files in current directory using B<bzip2> running 1
|
||||||
|
job per CPU in parallel:
|
||||||
|
|
||||||
The first will run B<rm> 10000 times, while the last will only run
|
B<ls *.gz | parallel -j+0 "zcat {} | bzip2 >>B<{.}.bz2 && rm {}">
|
||||||
B<rm> as many times needed to keep the command line length short
|
|
||||||
enough (typically 1-2 times).
|
|
||||||
|
|
||||||
You could also run:
|
|
||||||
|
|
||||||
B<seq -f %04g 0 9999 | parallel -X rm pict{}.jpg>
|
=head1 EXAMPLE: Rewriting a for-loop and a while-loop
|
||||||
|
|
||||||
This will also only run B<rm> as many times needed to keep the command
|
for-loops like this:
|
||||||
line length short enough.
|
|
||||||
|
|
||||||
=head1 EXAMPLE 7: Group output lines
|
B< (for x in `cat list` ; do
|
||||||
|
do_something $x
|
||||||
|
done) | process_output>
|
||||||
|
|
||||||
|
and while-loops like this:
|
||||||
|
|
||||||
|
B< cat list | (while read x ; do
|
||||||
|
do_something $x
|
||||||
|
done) | process_output>
|
||||||
|
|
||||||
|
can be written like this:
|
||||||
|
|
||||||
|
B<cat list | parallel do_something | process_output>
|
||||||
|
|
||||||
|
If the processing requires more steps the for-loop like this:
|
||||||
|
|
||||||
|
B< (for x in `cat list` ; do
|
||||||
|
no_extension=${x%.png};
|
||||||
|
do_something $x scale $no_extension.jpg
|
||||||
|
do_step2 <$x $no_extension
|
||||||
|
done) | process_output>
|
||||||
|
|
||||||
|
and while-loops like this:
|
||||||
|
|
||||||
|
B< cat list | (while read x ; do
|
||||||
|
no_extension=${x%.png};
|
||||||
|
do_something $x scale $no_extension.jpg
|
||||||
|
do_step2 <$x $no_extension
|
||||||
|
done) | process_output>
|
||||||
|
|
||||||
|
can be written like this:
|
||||||
|
|
||||||
|
B<cat list | parallel "do_something {} scale {.}.jpg ; do_step2 <{} {.}" | process_output>
|
||||||
|
|
||||||
|
|
||||||
|
=head1 EXAMPLE: Group output lines
|
||||||
|
|
||||||
When runnning jobs that output data, you often do not want the output
|
When runnning jobs that output data, you often do not want the output
|
||||||
of multiple jobs to run together. GNU B<parallel> defaults to grouping the
|
of multiple jobs to run together. GNU B<parallel> defaults to grouping the
|
||||||
|
@ -611,14 +663,23 @@ to the output of:
|
||||||
B<(echo foss.org.my; echo debian.org; echo freenetproject.org) | parallel -u traceroute>
|
B<(echo foss.org.my; echo debian.org; echo freenetproject.org) | parallel -u traceroute>
|
||||||
|
|
||||||
|
|
||||||
=head1 EXAMPLE 8: Keep order of output same as order of input
|
=head1 EXAMPLE: Keep order of output same as order of input
|
||||||
|
|
||||||
Normally the output of a job will be printed as soon as it
|
Normally the output of a job will be printed as soon as it
|
||||||
completes. Sometimes you want the order of the output to remain the
|
completes. Sometimes you want the order of the output to remain the
|
||||||
same as the order of the input. B<-k> will make sure the order of
|
same as the order of the input. This is often important, if the output
|
||||||
|
is used for input for another system. B<-k> will make sure the order of
|
||||||
output will be in the same order as input even if later jobs end
|
output will be in the same order as input even if later jobs end
|
||||||
before earlier jobs.
|
before earlier jobs.
|
||||||
|
|
||||||
|
Append a string to every line in a text file:
|
||||||
|
|
||||||
|
B<cat textfile | parallel -k echo {} append_string>
|
||||||
|
|
||||||
|
If you remove B<-k> some of the lines may come out in the wrong order.
|
||||||
|
|
||||||
|
Another example is B<traceroute>:
|
||||||
|
|
||||||
B<(echo foss.org.my; echo debian.org; echo freenetproject.org) | parallel traceroute>
|
B<(echo foss.org.my; echo debian.org; echo freenetproject.org) | parallel traceroute>
|
||||||
|
|
||||||
will give traceroute of foss.org.my, debian.org and
|
will give traceroute of foss.org.my, debian.org and
|
||||||
|
@ -632,7 +693,7 @@ B<(echo foss.org.my; echo debian.org; echo freenetproject.org) | parallel -k tra
|
||||||
This will make sure the traceroute to foss.org.my will be printed
|
This will make sure the traceroute to foss.org.my will be printed
|
||||||
first.
|
first.
|
||||||
|
|
||||||
=head1 EXAMPLE 9: Using remote computers (not implemented)
|
=head1 EXAMPLE: Using remote computers (not implemented)
|
||||||
|
|
||||||
To run commands on a remote computer SSH needs to be set up and you
|
To run commands on a remote computer SSH needs to be set up and you
|
||||||
must be able to login without entering a password (B<ssh-agent> may be
|
must be able to login without entering a password (B<ssh-agent> may be
|
||||||
|
@ -681,7 +742,7 @@ server has 8 CPU cores.
|
||||||
seq 1 10 | parallel --sshlogin 8/server.example.com echo
|
seq 1 10 | parallel --sshlogin 8/server.example.com echo
|
||||||
|
|
||||||
|
|
||||||
=head1 EXAMPLE 10: Transferring of files (not implemented)
|
=head1 EXAMPLE: Transferring of files (not implemented)
|
||||||
|
|
||||||
To recompress gzipped files with B<bzip2> using a remote server run:
|
To recompress gzipped files with B<bzip2> using a remote server run:
|
||||||
|
|
||||||
|
@ -745,6 +806,33 @@ With the file I<mymachines> containing the compute machines it becomes:
|
||||||
find logs/ -name '*.gz' | parallel --sshloginfile mymachines \
|
find logs/ -name '*.gz' | parallel --sshloginfile mymachines \
|
||||||
--trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
|
--trc {.}.bz2 "zcat {} | bzip2 -9 >{.}.bz2"
|
||||||
|
|
||||||
|
|
||||||
|
=head1 EXAMPLE: Working as cat | sh. Ressource inexpensive jobs and evaluation
|
||||||
|
|
||||||
|
GNU B<parallel> can work similar to B<cat | sh>.
|
||||||
|
|
||||||
|
A ressource inexpensive job is a job that takes very little CPU, disk
|
||||||
|
I/O and network I/O. Ping is an example of a ressource inexpensive
|
||||||
|
job. wget is too - if the webpages are small.
|
||||||
|
|
||||||
|
The content of the file jobs_to_run:
|
||||||
|
|
||||||
|
ping -c 1 10.0.0.1
|
||||||
|
wget http://status-server/status.cgi?ip=10.0.0.1
|
||||||
|
ping -c 1 10.0.0.2
|
||||||
|
wget http://status-server/status.cgi?ip=10.0.0.2
|
||||||
|
...
|
||||||
|
ping -c 1 10.0.0.255
|
||||||
|
wget http://status-server/status.cgi?ip=10.0.0.255
|
||||||
|
|
||||||
|
To run 100 processes simultaneously do:
|
||||||
|
|
||||||
|
B<parallel -j 100 < jobs_to_run>
|
||||||
|
|
||||||
|
As there is not a B<command> the option B<-c> is default because the
|
||||||
|
jobs needs to be evaluated by the shell.
|
||||||
|
|
||||||
|
|
||||||
=head1 QUOTING
|
=head1 QUOTING
|
||||||
|
|
||||||
For more advanced use quoting may be an issue. The following will
|
For more advanced use quoting may be an issue. The following will
|
||||||
|
@ -764,9 +852,9 @@ B<ls | parallel -q perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"'>
|
||||||
However, this means you cannot make the shell interpret special
|
However, this means you cannot make the shell interpret special
|
||||||
characters. For example this B<will not work>:
|
characters. For example this B<will not work>:
|
||||||
|
|
||||||
B<ls | parallel -q "diff {} foo >>B<{}.diff">
|
B<ls *.gz | parallel -q "zcat {} >>B<{.}">
|
||||||
|
|
||||||
B<ls | parallel -q "ls {} | wc -l">
|
B<ls *.gz | parallel -q "zcat {} | bzip2 >>B<{.}.bz2">
|
||||||
|
|
||||||
because > and | need to be interpreted by the shell.
|
because > and | need to be interpreted by the shell.
|
||||||
|
|
||||||
|
@ -808,7 +896,7 @@ should send the signal B<SIGTERM> to GNU B<parallel>:
|
||||||
B<killall -TERM parallel>
|
B<killall -TERM parallel>
|
||||||
|
|
||||||
This will tell GNU B<parallel> to not start any new jobs, but wait until
|
This will tell GNU B<parallel> to not start any new jobs, but wait until
|
||||||
the currently running jobs are finished.
|
the currently running jobs are finished before exiting.
|
||||||
|
|
||||||
|
|
||||||
=head1 DIFFERENCES BETWEEN xargs/find -exec AND parallel
|
=head1 DIFFERENCES BETWEEN xargs/find -exec AND parallel
|
||||||
|
|
Loading…
Reference in a new issue