parallel: Implemented --number-of-cpus and --number-of-cores

Prepared man-page for parallel remote execution
This commit is contained in:
Ole Tange 2010-04-13 16:05:21 +02:00
parent 6db05dc16e
commit 8873273a92
4 changed files with 629 additions and 55 deletions

351
parallel
View file

@ -21,14 +21,31 @@ Several lines will be run in parallel.
=item I<command> =item I<command>
Command to execute. If B<command> or the following arguments contain {} Command to execute. If B<command> or the following arguments contain
every instance will be substituted with the input line. Setting a {} every instance will be substituted with the input line. Setting a
command also invokes B<-f>. command also invokes B<-f>.
If B<command> is given, B<parallel> will behave similar to B<xargs>. If If B<command> is given, B<parallel> will behave similar to B<xargs>. If
B<command> is not given B<parallel> will behave similar to B<cat | sh>. B<command> is not given B<parallel> will behave similar to B<cat | sh>.
=item B<{}>
Input line. This is the default replacement string and will normally
be used for putting the argument in the command line. It can be
changed with B<-I>.
=item B<{.}> (not implemented)
Input line without extension. This is a specialized replacement string
with the extension removed. It will remove from the last B<.> till the
end of line of each input line and replace {.} with the
remaining. E.g. I<foo.jpg> becomes I<foo>. If the input line does
not contain B<.> it will remain unchanged.
{.} can be used the same places as {}.
=item B<--null> =item B<--null>
=item B<-0> =item B<-0>
@ -80,11 +97,17 @@ Group output. Output from each jobs is grouped together and is only
printed when the command is finished. STDERR first followed by STDOUT. printed when the command is finished. STDERR first followed by STDOUT.
B<-g> is the default. Can be reversed with B<-u>. B<-g> is the default. Can be reversed with B<-u>.
=item B<-I> I<string> =item B<-I> I<string>
Use the replacement string I<string> instead of {}. Use the replacement string I<string> instead of {}.
=item B<-U> I<string> (not implemented)
Use the replacement string I<string> instead of {.} for input line without extension.
=item B<--jobs> I<N> =item B<--jobs> I<N>
=item B<-j> I<N> =item B<-j> I<N>
@ -104,9 +127,10 @@ Run up to N jobs in parallel. 0 means as many as possible. Default is 10.
=item B<-P> I<+N> =item B<-P> I<+N>
Add N to the number of CPUs. Run this many jobs in parallel. For Add N to the number of CPU cores. Run this many jobs in parallel. For
compute intensive jobs I<-j +0> is useful as it will run compute intensive jobs I<-j +0> is useful as it will run
number-of-cpus jobs in parallel. number-of-cpu-cores jobs in parallel. See also
--use-cpus-instead-of-cores.
=item B<--jobs> I<-N> =item B<--jobs> I<-N>
@ -117,8 +141,9 @@ number-of-cpus jobs in parallel.
=item B<-P> I<-N> =item B<-P> I<-N>
Subtract N from the number of CPUs. Run this many jobs in parallel. Subtract N from the number of CPU cores. Run this many jobs in parallel.
If the evaluated number is less than 1 then 1 will be used. If the evaluated number is less than 1 then 1 will be used. See also
--use-cpus-instead-of-cores.
=item B<--jobs> I<N>% =item B<--jobs> I<N>%
@ -129,8 +154,9 @@ If the evaluated number is less than 1 then 1 will be used.
=item B<-P> I<N>% =item B<-P> I<N>%
Multiply N% with the number of CPUs. Run this many jobs in parallel. Multiply N% with the number of CPU cores. Run this many jobs in parallel.
If the evaluated number is less than 1 then 1 will be used. If the evaluated number is less than 1 then 1 will be used. See also
--use-cpus-instead-of-cores.
=item B<--keeporder> =item B<--keeporder>
@ -141,6 +167,18 @@ Keep sequence of output same as the order of input. If jobs 1 2 3 4
end in the sequence 3 1 4 2 the output will still be 1 2 3 4. end in the sequence 3 1 4 2 the output will still be 1 2 3 4.
=item B<--number-of-cpus>
Print the number of CPUs and exit (used by B<parallel> itself to
determine the number of CPUs on remote machines).
=item B<--number-of-cores>
Print the number of cores and exit (used by B<parallel> itself to determine the
number of cores on remote machines).
=item B<--quote> =item B<--quote>
=item B<-q> =item B<-q>
@ -151,12 +189,133 @@ QUOTING. Most people will never need this. Quoting is disabled by
default. default.
=item B<-S> I<[ncpu/]sshlogin[,[ncpu/]sshlogin]> (not implemented)
=item B<--sshlogin> I<[ncpu/]sshlogin[,[ncpu/]sshlogin]> (not implemented)
Distribute jobs to remote servers. The jobs will be run on a list of
remote servers. B<parallel> will determine the number of CPU cores on
the remote servers and run the number of jobs as specified by -j. If
the number I<ncpu> is given B<parallel> will use this number for
number of CPUs on the host. Normally I<ncpu> will not be needed.
An I<sshlogin> is the string you would normally pass to SSH to login,
e.g. I<server.example.com>, I<foo@server.example.com>, or I<"-l foo -p
2222 server.example.com">. The sshlogin must not require a password.
The sshlogin ':' is special, it means 'no ssh' and will therefore run
on the local machine.
To specify more sshlogins separate the sshlogins by comma or repeat
the options multiple times.
For examples: see B<--sshloginfile>.
The remote host must have B<parallel> installed.
=item B<--sshloginfile> I<filename> (not implemented)
File with sshlogins. The file consists of sshlogins on separate
lines. Empty lines and lines starting with '#' are ignored. Example:
server.example.com
username@server2.example.com
8/my-8-core-server.example.com
2/myusername@my-dualcore.example.net
# This server has SSH running on port 2222
-p 2222 server.example.net
4/-p 2222 quadserver.example.net
# Assume 16 cores on the local machine
16/:
=item B<--silent> =item B<--silent>
Silent. The job to be run will not be printed. This is the default. Silent. The job to be run will not be printed. This is the default.
Can be reversed with B<-v>. Can be reversed with B<-v>.
=item B<--transfer> (not implemented)
Transfer files to remote servers. B<--transfer> is used with
B<--sshlogin> when the arguments are files and should be transfered to
the remote servers. The files will be transfered using B<rsync> and
will be put relative to the default login dir. E.g.
echo foo/bar.txt | parallel \
--sshlogin server.example.com --transfer wc
This will transfer the file I<foo/bar.txt> to the server
I<server.example.com> to the file $HOME/foo/bar.txt before running
B<wc foo/bar.txt> on I<server.example.com>.
echo /tmp/foo/bar.txt | parallel \
--sshlogin server.example.com --transfer wc
This will transfer the file I<foo/bar.txt> to the server
I<server.example.com> to the file /tmp/foo/bar.txt before running
B<wc /tmp/foo/bar.txt> on I<server.example.com>.
B<--transfer> is often used with B<--return> and B<--cleanup>.
B<--transfer> is ignored when used with B<--sshlogin :> or when not used with B<--sshlogin>.
=item B<--return> I<suffix> (not implemented)
Transfer files from remote servers. B<--return> is used with
B<--sshlogin> when the arguments are files on the remote servers. When
processing is done the file with I<suffix> appended will be transfered
from the remote server using B<rsync> and will be put relative to
the default login dir. E.g.
echo foo/bar.txt | parallel \
--sshlogin server.example.com --return .out touch {}.out
This will transfer the file I<$HOME/foo/bar.txt.out> from the server
I<server.example.com> to the file I<foo/bar.txt.out> after running
B<touch foo/bar.txt.out> on I<server.example.com>.
echo /tmp/foo/bar.txt | parallel \
--sshlogin server.example.com --return .out touch {}.out
This will transfer the file I</tmp/foo/bar.txt.out> from the server
I<server.example.com> to the file I</tmp/foo/bar.txt.out> after running
B<touch /tmp/foo/bar.txt.out> on I<server.example.com>.
Multiple files with different suffixes can be transfered by repeating
the options multiple times:
echo /tmp/foo/bar.txt | \
parallel --sshlogin server.example.com \
--return .out --return .out2 touch {}.out {}.out2
B<--return> is often used with B<--transfer> and B<--cleanup>.
B<--return> is ignored when used with B<--sshlogin :> or when not used with B<--sshlogin>.
=item B<--cleanup> (not implemented)
Remove transfered files. B<--cleanup> will remove the transfered files
on the remote server after processing is done.
find log -name '*gz' | parallel \
--sshlogin server.example.com --transfer --return .bz2 \
--cleanup "zcat {} | bzip -9 >{}.bz2"
With B<--transfer> the file transfered to the remote server will be
removed on the remote server. Directories created will not be removed
- even if they are empty.
With B<--return> the file transfered from the remote server will be
removed on the remote server. Directories created will not be removed
- even if they are empty.
B<--cleanup> is ignored when not used with B<--transfer> or B<--return>.
=item B<--ungroup> =item B<--ungroup>
=item B<-u> =item B<-u>
@ -165,6 +324,16 @@ Ungroup output. Output is printed as soon as possible. This may cause
output from different commands to be mixed. Can be reversed with B<-g>. output from different commands to be mixed. Can be reversed with B<-g>.
=item B<--use-cpus-instead-of-cores> (not implemented)
Count the number of CPUs instead of cores. When computing how many
jobs to run in parallel relative to the number of cores you can ask
parallel to instead look at the number of CPUs. This will make sense
for computers that have hyperthreading as two jobs running on one CPU
with hyperthreading will run slower than two jobs running on two CPUs.
Normal users will not need this option.
=item B<-v> =item B<-v>
Verbose. Print the job to be run on STDOUT. Can be reversed with Verbose. Print the job to be run on STDOUT. Can be reversed with
@ -231,9 +400,9 @@ file:
B<convert -geometry 120 foo.jpg thumb_foo.jpg> B<convert -geometry 120 foo.jpg thumb_foo.jpg>
If the system has more than 1 CPU it can be run with number-of-cpus If the system has more than 1 CPU core it can be run with
jobs in parallel (-j +0). This will do that for all jpg files in a number-of-cpu-cores jobs in parallel (-j +0). This will do that for
directory: all jpg files in a directory:
B<ls *.jpg | parallel -j +0 convert -geometry 120 {} thumb_{}> B<ls *.jpg | parallel -j +0 convert -geometry 120 {} thumb_{}>
@ -257,6 +426,9 @@ is a better solution:
find . -name '*.jpg' | parallel -j +0 convert -geometry 120 {} {}_thumb.jpg find . -name '*.jpg' | parallel -j +0 convert -geometry 120 {} {}_thumb.jpg
find . -name '*_thumb.jpg' | ren 's:/([^/]+)_thumb.jpg$:/thumb_$1:' find . -name '*_thumb.jpg' | ren 's:/([^/]+)_thumb.jpg$:/thumb_$1:'
(Not implemented) This will make files like ./foo/bar_thumb.jpg:
B<find . -name '*.jpg' | parallel -j +0 convert -geometry 120 {} {.}_thumb.jpg>
=head1 EXAMPLE 4: Substitution and redirection =head1 EXAMPLE 4: Substitution and redirection
@ -342,6 +514,104 @@ B<(echo foss.org.my; echo debian.org; echo freenetproject.org) | parallel -k tra
This will make sure the traceroute to foss.org.my will be printed This will make sure the traceroute to foss.org.my will be printed
first. first.
=head1 EXAMPLE 9: Using remote computers (not implemented)
To run commands on a remote computer SSH needs to be set up and you
must be able to login without entering a password (B<ssh-agent> may be
handy).
To run B<echo> on B<server.example.com>:
seq 1 10 | parallel --sshlogin server.example.com echo
To run commands on more than one remote computer run:
seq 1 10 | parallel --sshlogin server.example.com,server2.example.net echo
Or:
seq 1 10 | parallel --sshlogin server.example.com \
--sshlogin server2.example.net echo
If the login username is I<foo> on I<server2.example.net> use:
seq 1 10 | parallel --sshlogin server.example.com \
--sshlogin foo@server2.example.net echo
To distribute the commands to a list of machines, make a file
I<mymachines> with all the machines:
server.example.com
foo@server2.example.com
server3.example.com
Then run:
seq 1 10 | parallel --sshloginfile mymachines echo
To include the local machine add the special sshlogin ':' to the list:
server.example.com
foo@server2.example.com
server3.example.com
:
If the number of CPU cores on the remote servers is not identified
correctly the number of CPU cores can be added in front. Here the
server has 8 CPU cores.
seq 1 10 | parallel --sshlogin 8/server.example.com echo
=head1 EXAMPLE 10: Transferring of files (not implemented)
To recompress gzipped files with bzip2 using a remote server run:
find logs/ -name '*.gz' | \
parallel --sshlogin server.example.com \
--transfer "zcat {} | bzip2 -9 >{}.bz2"
This will list the .gz-files in the I<logs> directory and all
directories below. Then it will transfer the files to
I<server.example.com> to the corresponding directory in
I<$HOME/logs>. On I<server.example.com> the file will be recompressed
using B<zcat> and B<bzip2> resulting in the corresponding file with
the suffix I<.bz2>.
If you want the file to be transfered back to the local machine add
I<--return .bz2>:
find logs/ -name '*.gz' | \
parallel --sshlogin server.example.com \
--transfer --return .bz2 "zcat {} | bzip2 -9 >{}.bz2"
After the recompressing is done the I<.bz2>-file is transfered back to
the local machine.
If you want to delete the transfered files on the remote machine add
I<--cleanup>:
find logs/ -name '*.gz' | \
parallel --sshlogin server.example.com \
--transfer --return .bz2 --cleanup "zcat {} | bzip2 -9 >{}.bz2"
If you want run one several servers add the servers to I<--sshlogin>
either using ',' or separate I<--sshlogin>:
find logs/ -name '*.gz' | \
parallel --sshlogin server.example.com,server2.example.com \
--sshlogin server3.example.com \
--transfer --return .bz2 --cleanup "zcat {} | bzip2 -9 >{}.bz2"
You can add the local machine using I<--sshlogin :>. This will disable the
removing and transferring for the local machine only:
find logs/ -name '*.gz' | \
parallel --sshlogin server.example.com,server2.example.com \
--sshlogin server3.example.com \
--sshlogin : \
--transfer --return .bz2 --cleanup "zcat {} | bzip2 -9 >{}.bz2"
=head1 QUOTING =head1 QUOTING
@ -439,7 +709,7 @@ So B<parallel>'s newline separation can be emulated with:
B<cat | xargs -d "\n" -n1 I<command>> B<cat | xargs -d "\n" -n1 I<command>>
B<xargs> can run a given number of jobs in parallel, but has no B<xargs> can run a given number of jobs in parallel, but has no
support for running no_of_cpus jobs in parallel. support for running number-of-cpu-cores jobs in parallel.
B<xargs> has no support for grouping the output, therefore output may B<xargs> has no support for grouping the output, therefore output may
run together, e.g. the first half of a line is from one process and run together, e.g. the first half of a line is from one process and
@ -512,25 +782,17 @@ hard, as all foreground processes get the INT from the shell.
If there are nomore jobs (STDIN is closed) then make sure to If there are nomore jobs (STDIN is closed) then make sure to
distribute the arguments evenly if running -X. distribute the arguments evenly if running -X.
Distibute jobs to computers with different speeds/no_of_cpu using ssh Distibute jobs to computers with different speeds/number-of-cpu-cores using ssh
ask the computers how many cpus they have and spawn appropriately ask the computers how many cpus they have and spawn appropriately
according to -j setting. Reuse ssh connection (-M and -S) according to -j setting. Reuse ssh connection (-M and -S)
http://www.semicomplete.com/blog/geekery/distributed-xargs.html?source=rss20 http://www.semicomplete.com/blog/geekery/distributed-xargs.html?source=rss20
http://code.google.com/p/ppss/wiki/Manual2 http://code.google.com/p/ppss/wiki/Manual2
=head2 -S http://www.gnu.org/software/pexec/
-S sshlogin[,sshlogin]
sshlogin is [user@]host or filename with list of sshlogin
What about copying data to remote host? Have an option that says the Where will '>' be run? Local or remote? Where ever is easier.
argument is a file that should be copied.
What about copying data from remote host? Have an option that says
the argument is a file that should be copied.
Where will '>' be run? Local or remote?
Parallelize so this can be done: Parallelize so this can be done:
@ -609,6 +871,8 @@ GetOptions("debug|D" => \$::opt_D,
"quote|q" => \$::opt_q, "quote|q" => \$::opt_q,
"I=s" => \$::opt_I, "I=s" => \$::opt_I,
"jobs|j=s" => \$::opt_P, "jobs|j=s" => \$::opt_P,
"number-of-cpus" => \$::opt_number_of_cpus,
"number-of-cores" => \$::opt_number_of_cores,
# xargs-compatibility - implemented, man, unittest # xargs-compatibility - implemented, man, unittest
"max-procs|P=s" => \$::opt_P, "max-procs|P=s" => \$::opt_P,
"delimiter|d=s" => \$::opt_d, "delimiter|d=s" => \$::opt_d,
@ -674,6 +938,8 @@ if(defined $::opt_i and $::opt_i) { $Global::replacestring = $::opt_i; }
if(defined $::opt_E and $::opt_E) { $Global::end_of_file_string = $::opt_E; } if(defined $::opt_E and $::opt_E) { $Global::end_of_file_string = $::opt_E; }
if(defined $::opt_n and $::opt_n) { $Global::max_number_of_args = $::opt_n; } if(defined $::opt_n and $::opt_n) { $Global::max_number_of_args = $::opt_n; }
if(defined $::opt_h) { die_usage(); } if(defined $::opt_h) { die_usage(); }
if(defined $::opt_number_of_cpus) { print no_of_cpus(),"\n"; exit(0); }
if(defined $::opt_number_of_cores) { print no_of_cores(),"\n"; exit(0); }
if(defined $::opt_version) { version(); exit(0); } if(defined $::opt_version) { version(); exit(0); }
if(defined $::opt_a) { if(defined $::opt_a) {
@ -735,7 +1001,15 @@ sub generate_command_line {
} }
my $number_of_args = 0; my $number_of_args = 0;
# max number of lines (-L) =
# number_of_read_lines = 0
while (defined($next_arg = get_next_arg())) { while (defined($next_arg = get_next_arg())) {
# if defined max_number_of_lines {
# number_of_read_lines++
# if $next_arg =~ /\w$/ number_of_read_lines-- (Trailing blanks cause an
# input line to be logically continued on the next input line.)
# if number_of_read_lines > max_number_of_lines
# last
push (@quoted_args, $next_arg); push (@quoted_args, $next_arg);
$number_of_args++; $number_of_args++;
if(not $Global::xargs and not $Global::Xargs) { if(not $Global::xargs and not $Global::Xargs) {
@ -1013,9 +1287,16 @@ sub user_requested_processes {
return $processes; return $processes;
} }
sub no_of_cores {
# TODO This should return number of cores and not the number of CPUs
return no_of_cpus(@_);
}
sub no_of_cpus { sub no_of_cpus {
if(not $Global::no_of_cpus) { if(not $Global::no_of_cpus) {
my $no_of_cpus = (no_of_cpus_gnu_linux() || no_of_cpus_solaris()); my $no_of_cpus = (no_of_cpus_darwin()
|| no_of_cpus_gnu_linux()
|| no_of_cpus_solaris());
if($no_of_cpus) { if($no_of_cpus) {
$Global::no_of_cpus = $no_of_cpus; $Global::no_of_cpus = $no_of_cpus;
} else { } else {
@ -1039,6 +1320,11 @@ sub no_of_cpus_gnu_linux {
return $no_of_cpus; return $no_of_cpus;
} }
sub no_of_cpus_darwin {
my $no_of_cpus = `sysctl -n hw.ncpu 2>/dev/null`;
return $no_of_cpus;
}
sub no_of_cpus_solaris { sub no_of_cpus_solaris {
if(-x "/usr/sbin/psrinfo") { if(-x "/usr/sbin/psrinfo") {
my @psrinfo = `/usr/sbin/psrinfo`; my @psrinfo = `/usr/sbin/psrinfo`;
@ -1372,16 +1658,21 @@ sub my_memory_usage {
use strict; use strict;
use FileHandle; use FileHandle;
my $pid = $$; my $pid = $$;
my $fh = FileHandle->new("</proc/$pid/stat"); if(-e "/proc/$pid/stat") {
my $fh = FileHandle->new("</proc/$pid/stat");
my $data = <$fh>; my $data = <$fh>;
chomp $data; chomp $data;
$fh->close; $fh->close;
my @procinfo = split(/\s+/,$data); my @procinfo = split(/\s+/,$data);
return $procinfo[22]; return $procinfo[22];
} else {
return 0;
}
} }
sub my_size { sub my_size {

View file

@ -124,7 +124,7 @@
.\" ======================================================================== .\" ========================================================================
.\" .\"
.IX Title "PARALLEL 1" .IX Title "PARALLEL 1"
.TH PARALLEL 1 "2010-03-06" "perl v5.10.1" "User Contributed Perl Documentation" .TH PARALLEL 1 "2010-04-13" "perl v5.10.1" "User Contributed Perl Documentation"
.\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" For nroff, turn off justification. Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents. .\" way too many mistakes in technical documents.
.if n .ad l .if n .ad l
@ -144,12 +144,26 @@ or \fBcat | sh\fR.
Several lines will be run in parallel. Several lines will be run in parallel.
.IP "\fIcommand\fR" 9 .IP "\fIcommand\fR" 9
.IX Item "command" .IX Item "command"
Command to execute. If \fBcommand\fR or the following arguments contain {} Command to execute. If \fBcommand\fR or the following arguments contain
every instance will be substituted with the input line. Setting a {} every instance will be substituted with the input line. Setting a
command also invokes \fB\-f\fR. command also invokes \fB\-f\fR.
.Sp .Sp
If \fBcommand\fR is given, \fBparallel\fR will behave similar to \fBxargs\fR. If If \fBcommand\fR is given, \fBparallel\fR will behave similar to \fBxargs\fR. If
\&\fBcommand\fR is not given \fBparallel\fR will behave similar to \fBcat | sh\fR. \&\fBcommand\fR is not given \fBparallel\fR will behave similar to \fBcat | sh\fR.
.IP "\fB{}\fR" 9
.IX Item "{}"
Input line. This is the default replacement string and will normally
be used for putting the argument in the command line. It can be
changed with \fB\-I\fR.
.IP "\fB{.}\fR (not implemented)" 9
.IX Item "{.} (not implemented)"
Input line without extension. This is a specialized replacement string
with the extension removed. It will remove from the last \fB.\fR till the
end of line of each input line and replace {.} with the
remaining. E.g. \fIfoo.jpg\fR becomes \fIfoo\fR. If the input line does
not contain \fB.\fR it will remain unchanged.
.Sp
{.} can be used the same places as {}.
.IP "\fB\-\-null\fR" 9 .IP "\fB\-\-null\fR" 9
.IX Item "--null" .IX Item "--null"
.PD 0 .PD 0
@ -205,6 +219,9 @@ printed when the command is finished. \s-1STDERR\s0 first followed by \s-1STDOUT
.IP "\fB\-I\fR \fIstring\fR" 9 .IP "\fB\-I\fR \fIstring\fR" 9
.IX Item "-I string" .IX Item "-I string"
Use the replacement string \fIstring\fR instead of {}. Use the replacement string \fIstring\fR instead of {}.
.IP "\fB\-U\fR \fIstring\fR (not implemented)" 9
.IX Item "-U string (not implemented)"
Use the replacement string \fIstring\fR instead of {.} for input line without extension.
.IP "\fB\-\-jobs\fR \fIN\fR" 9 .IP "\fB\-\-jobs\fR \fIN\fR" 9
.IX Item "--jobs N" .IX Item "--jobs N"
.PD 0 .PD 0
@ -226,9 +243,10 @@ Run up to N jobs in parallel. 0 means as many as possible. Default is 10.
.IP "\fB\-P\fR \fI+N\fR" 9 .IP "\fB\-P\fR \fI+N\fR" 9
.IX Item "-P +N" .IX Item "-P +N"
.PD .PD
Add N to the number of CPUs. Run this many jobs in parallel. For Add N to the number of \s-1CPU\s0 cores. Run this many jobs in parallel. For
compute intensive jobs \fI\-j +0\fR is useful as it will run compute intensive jobs \fI\-j +0\fR is useful as it will run
number-of-cpus jobs in parallel. number-of-cpu-cores jobs in parallel. See also
\&\-\-use\-cpus\-instead\-of\-cores.
.IP "\fB\-\-jobs\fR \fI\-N\fR" 9 .IP "\fB\-\-jobs\fR \fI\-N\fR" 9
.IX Item "--jobs -N" .IX Item "--jobs -N"
.PD 0 .PD 0
@ -239,8 +257,9 @@ number-of-cpus jobs in parallel.
.IP "\fB\-P\fR \fI\-N\fR" 9 .IP "\fB\-P\fR \fI\-N\fR" 9
.IX Item "-P -N" .IX Item "-P -N"
.PD .PD
Subtract N from the number of CPUs. Run this many jobs in parallel. Subtract N from the number of \s-1CPU\s0 cores. Run this many jobs in parallel.
If the evaluated number is less than 1 then 1 will be used. If the evaluated number is less than 1 then 1 will be used. See also
\&\-\-use\-cpus\-instead\-of\-cores.
.IP "\fB\-\-jobs\fR \fIN\fR%" 9 .IP "\fB\-\-jobs\fR \fIN\fR%" 9
.IX Item "--jobs N%" .IX Item "--jobs N%"
.PD 0 .PD 0
@ -251,8 +270,9 @@ If the evaluated number is less than 1 then 1 will be used.
.IP "\fB\-P\fR \fIN\fR%" 9 .IP "\fB\-P\fR \fIN\fR%" 9
.IX Item "-P N%" .IX Item "-P N%"
.PD .PD
Multiply N% with the number of CPUs. Run this many jobs in parallel. Multiply N% with the number of \s-1CPU\s0 cores. Run this many jobs in parallel.
If the evaluated number is less than 1 then 1 will be used. If the evaluated number is less than 1 then 1 will be used. See also
\&\-\-use\-cpus\-instead\-of\-cores.
.IP "\fB\-\-keeporder\fR" 9 .IP "\fB\-\-keeporder\fR" 9
.IX Item "--keeporder" .IX Item "--keeporder"
.PD 0 .PD 0
@ -261,6 +281,14 @@ If the evaluated number is less than 1 then 1 will be used.
.PD .PD
Keep sequence of output same as the order of input. If jobs 1 2 3 4 Keep sequence of output same as the order of input. If jobs 1 2 3 4
end in the sequence 3 1 4 2 the output will still be 1 2 3 4. end in the sequence 3 1 4 2 the output will still be 1 2 3 4.
.IP "\fB\-\-number\-of\-cpus\fR" 9
.IX Item "--number-of-cpus"
Print the number of CPUs and exit (used by \fBparallel\fR itself to
determine the number of CPUs on remote machines).
.IP "\fB\-\-number\-of\-cores\fR" 9
.IX Item "--number-of-cores"
Print the number of cores and exit (used by \fBparallel\fR itself to determine the
number of cores on remote machines).
.IP "\fB\-\-quote\fR" 9 .IP "\fB\-\-quote\fR" 9
.IX Item "--quote" .IX Item "--quote"
.PD 0 .PD 0
@ -271,10 +299,137 @@ Quote \fBcommand\fR. This will quote the command line so special
characters are not interpreted by the shell. See the section characters are not interpreted by the shell. See the section
\&\s-1QUOTING\s0. Most people will never need this. Quoting is disabled by \&\s-1QUOTING\s0. Most people will never need this. Quoting is disabled by
default. default.
.IP "\fB\-S\fR \fI[ncpu/]sshlogin[,[ncpu/]sshlogin]\fR (not implemented)" 9
.IX Item "-S [ncpu/]sshlogin[,[ncpu/]sshlogin] (not implemented)"
.PD 0
.IP "\fB\-\-sshlogin\fR \fI[ncpu/]sshlogin[,[ncpu/]sshlogin]\fR (not implemented)" 9
.IX Item "--sshlogin [ncpu/]sshlogin[,[ncpu/]sshlogin] (not implemented)"
.PD
Distribute jobs to remote servers. The jobs will be run on a list of
remote servers. \fBparallel\fR will determine the number of \s-1CPU\s0 cores on
the remote servers and run the number of jobs as specified by \-j. If
the number \fIncpu\fR is given \fBparallel\fR will use this number for
number of CPUs on the host. Normally \fIncpu\fR will not be needed.
.Sp
An \fIsshlogin\fR is the string you would normally pass to \s-1SSH\s0 to login,
e.g. \fIserver.example.com\fR, \fIfoo@server.example.com\fR, or \fI\*(L"\-l foo \-p
2222 server.example.com\*(R"\fR. The sshlogin must not require a password.
.Sp
The sshlogin ':' is special, it means 'no ssh' and will therefore run
on the local machine.
.Sp
To specify more sshlogins separate the sshlogins by comma or repeat
the options multiple times.
.Sp
For examples: see \fB\-\-sshloginfile\fR.
.Sp
The remote host must have \fBparallel\fR installed.
.IP "\fB\-\-sshloginfile\fR \fIfilename\fR (not implemented)" 9
.IX Item "--sshloginfile filename (not implemented)"
File with sshlogins. The file consists of sshlogins on separate
lines. Empty lines and lines starting with '#' are ignored. Example:
.Sp
.Vb 9
\& server.example.com
\& username@server2.example.com
\& 8/my\-8\-core\-server.example.com
\& 2/myusername@my\-dualcore.example.net
\& # This server has SSH running on port 2222
\& \-p 2222 server.example.net
\& 4/\-p 2222 quadserver.example.net
\& # Assume 16 cores on the local machine
\& 16/:
.Ve
.IP "\fB\-\-silent\fR" 9 .IP "\fB\-\-silent\fR" 9
.IX Item "--silent" .IX Item "--silent"
Silent. The job to be run will not be printed. This is the default. Silent. The job to be run will not be printed. This is the default.
Can be reversed with \fB\-v\fR. Can be reversed with \fB\-v\fR.
.IP "\fB\-\-transfer\fR (not implemented)" 9
.IX Item "--transfer (not implemented)"
Transfer files to remote servers. \fB\-\-transfer\fR is used with
\&\fB\-\-sshlogin\fR when the arguments are files and should be transfered to
the remote servers. The files will be transfered using \fBrsync\fR and
will be put relative to the default login dir. E.g.
.Sp
.Vb 2
\& echo foo/bar.txt | parallel \e
\& \-\-sshlogin server.example.com \-\-transfer wc
.Ve
.Sp
This will transfer the file \fIfoo/bar.txt\fR to the server
\&\fIserver.example.com\fR to the file \f(CW$HOME\fR/foo/bar.txt before running
\&\fBwc foo/bar.txt\fR on \fIserver.example.com\fR.
.Sp
.Vb 2
\& echo /tmp/foo/bar.txt | parallel \e
\& \-\-sshlogin server.example.com \-\-transfer wc
.Ve
.Sp
This will transfer the file \fIfoo/bar.txt\fR to the server
\&\fIserver.example.com\fR to the file /tmp/foo/bar.txt before running
\&\fBwc /tmp/foo/bar.txt\fR on \fIserver.example.com\fR.
.Sp
\&\fB\-\-transfer\fR is often used with \fB\-\-return\fR and \fB\-\-cleanup\fR.
.Sp
\&\fB\-\-transfer\fR is ignored when used with \fB\-\-sshlogin :\fR or when not used with \fB\-\-sshlogin\fR.
.IP "\fB\-\-return\fR \fIsuffix\fR (not implemented)" 9
.IX Item "--return suffix (not implemented)"
Transfer files from remote servers. \fB\-\-return\fR is used with
\&\fB\-\-sshlogin\fR when the arguments are files on the remote servers. When
processing is done the file with \fIsuffix\fR appended will be transfered
from the remote server using \fBrsync\fR and will be put relative to
the default login dir. E.g.
.Sp
.Vb 2
\& echo foo/bar.txt | parallel \e
\& \-\-sshlogin server.example.com \-\-return .out touch {}.out
.Ve
.Sp
This will transfer the file \fI\f(CI$HOME\fI/foo/bar.txt.out\fR from the server
\&\fIserver.example.com\fR to the file \fIfoo/bar.txt.out\fR after running
\&\fBtouch foo/bar.txt.out\fR on \fIserver.example.com\fR.
.Sp
.Vb 2
\& echo /tmp/foo/bar.txt | parallel \e
\& \-\-sshlogin server.example.com \-\-return .out touch {}.out
.Ve
.Sp
This will transfer the file \fI/tmp/foo/bar.txt.out\fR from the server
\&\fIserver.example.com\fR to the file \fI/tmp/foo/bar.txt.out\fR after running
\&\fBtouch /tmp/foo/bar.txt.out\fR on \fIserver.example.com\fR.
.Sp
Multiple files with different suffixes can be transfered by repeating
the options multiple times:
.Sp
.Vb 3
\& echo /tmp/foo/bar.txt | \e
\& parallel \-\-sshlogin server.example.com \e
\& \-\-return .out \-\-return .out2 touch {}.out {}.out2
.Ve
.Sp
\&\fB\-\-return\fR is often used with \fB\-\-transfer\fR and \fB\-\-cleanup\fR.
.Sp
\&\fB\-\-return\fR is ignored when used with \fB\-\-sshlogin :\fR or when not used with \fB\-\-sshlogin\fR.
.IP "\fB\-\-cleanup\fR (not implemented)" 9
.IX Item "--cleanup (not implemented)"
Remove transfered files. \fB\-\-cleanup\fR will remove the transfered files
on the remote server after processing is done.
.Sp
.Vb 3
\& find log \-name \*(Aq*gz\*(Aq | parallel \e
\& \-\-sshlogin server.example.com \-\-transfer \-\-return .bz2 \e
\& \-\-cleanup "zcat {} | bzip \-9 >{}.bz2"
.Ve
.Sp
With \fB\-\-transfer\fR the file transfered to the remote server will be
removed on the remote server. Directories created will not be removed
\&\- even if they are empty.
.Sp
With \fB\-\-return\fR the file transfered from the remote server will be
removed on the remote server. Directories created will not be removed
\&\- even if they are empty.
.Sp
\&\fB\-\-cleanup\fR is ignored when not used with \fB\-\-transfer\fR or \fB\-\-return\fR.
.IP "\fB\-\-ungroup\fR" 9 .IP "\fB\-\-ungroup\fR" 9
.IX Item "--ungroup" .IX Item "--ungroup"
.PD 0 .PD 0
@ -283,6 +438,14 @@ Can be reversed with \fB\-v\fR.
.PD .PD
Ungroup output. Output is printed as soon as possible. This may cause Ungroup output. Output is printed as soon as possible. This may cause
output from different commands to be mixed. Can be reversed with \fB\-g\fR. output from different commands to be mixed. Can be reversed with \fB\-g\fR.
.IP "\fB\-\-use\-cpus\-instead\-of\-cores\fR (not implemented)" 9
.IX Item "--use-cpus-instead-of-cores (not implemented)"
Count the number of CPUs instead of cores. When computing how many
jobs to run in parallel relative to the number of cores you can ask
parallel to instead look at the number of CPUs. This will make sense
for computers that have hyperthreading as two jobs running on one \s-1CPU\s0
with hyperthreading will run slower than two jobs running on two CPUs.
Normal users will not need this option.
.IP "\fB\-v\fR" 9 .IP "\fB\-v\fR" 9
.IX Item "-v" .IX Item "-v"
Verbose. Print the job to be run on \s-1STDOUT\s0. Can be reversed with Verbose. Print the job to be run on \s-1STDOUT\s0. Can be reversed with
@ -344,9 +507,9 @@ file:
.PP .PP
\&\fBconvert \-geometry 120 foo.jpg thumb_foo.jpg\fR \&\fBconvert \-geometry 120 foo.jpg thumb_foo.jpg\fR
.PP .PP
If the system has more than 1 \s-1CPU\s0 it can be run with number-of-cpus If the system has more than 1 \s-1CPU\s0 core it can be run with
jobs in parallel (\-j +0). This will do that for all jpg files in a number-of-cpu-cores jobs in parallel (\-j +0). This will do that for
directory: all jpg files in a directory:
.PP .PP
\&\fBls *.jpg | parallel \-j +0 convert \-geometry 120 {} thumb_{}\fR \&\fBls *.jpg | parallel \-j +0 convert \-geometry 120 {} thumb_{}\fR
.PP .PP
@ -373,6 +536,10 @@ is a better solution:
\& find . \-name \*(Aq*.jpg\*(Aq | parallel \-j +0 convert \-geometry 120 {} {}_thumb.jpg \& find . \-name \*(Aq*.jpg\*(Aq | parallel \-j +0 convert \-geometry 120 {} {}_thumb.jpg
\& find . \-name \*(Aq*_thumb.jpg\*(Aq | ren \*(Aqs:/([^/]+)_thumb.jpg$:/thumb_$1:\*(Aq \& find . \-name \*(Aq*_thumb.jpg\*(Aq | ren \*(Aqs:/([^/]+)_thumb.jpg$:/thumb_$1:\*(Aq
.Ve .Ve
.PP
(Not implemented) This will make files like ./foo/bar_thumb.jpg:
.PP
\&\fBfind . \-name '*.jpg' | parallel \-j +0 convert \-geometry 120 {} {.}_thumb.jpg\fR
.SH "EXAMPLE 4: Substitution and redirection" .SH "EXAMPLE 4: Substitution and redirection"
.IX Header "EXAMPLE 4: Substitution and redirection" .IX Header "EXAMPLE 4: Substitution and redirection"
This will compare all files in the dir to the file foo and save the This will compare all files in the dir to the file foo and save the
@ -449,6 +616,127 @@ To keep the order the same as input run:
.PP .PP
This will make sure the traceroute to foss.org.my will be printed This will make sure the traceroute to foss.org.my will be printed
first. first.
.SH "EXAMPLE 9: Using remote computers (not implemented)"
.IX Header "EXAMPLE 9: Using remote computers (not implemented)"
To run commands on a remote computer \s-1SSH\s0 needs to be set up and you
must be able to login without entering a password (\fBssh-agent\fR may be
handy).
.PP
To run \fBecho\fR on \fBserver.example.com\fR:
.PP
.Vb 1
\& seq 1 10 | parallel \-\-sshlogin server.example.com echo
.Ve
.PP
To run commands on more than one remote computer run:
.PP
.Vb 1
\& seq 1 10 | parallel \-\-sshlogin server.example.com,server2.example.net echo
.Ve
.PP
Or:
.PP
.Vb 2
\& seq 1 10 | parallel \-\-sshlogin server.example.com \e
\& \-\-sshlogin server2.example.net echo
.Ve
.PP
If the login username is \fIfoo\fR on \fIserver2.example.net\fR use:
.PP
.Vb 2
\& seq 1 10 | parallel \-\-sshlogin server.example.com \e
\& \-\-sshlogin foo@server2.example.net echo
.Ve
.PP
To distribute the commands to a list of machines, make a file
\&\fImymachines\fR with all the machines:
.PP
.Vb 3
\& server.example.com
\& foo@server2.example.com
\& server3.example.com
.Ve
.PP
Then run:
.PP
.Vb 1
\& seq 1 10 | parallel \-\-sshloginfile mymachines echo
.Ve
.PP
To include the local machine add the special sshlogin ':' to the list:
.PP
.Vb 4
\& server.example.com
\& foo@server2.example.com
\& server3.example.com
\& :
.Ve
.PP
If the number of \s-1CPU\s0 cores on the remote servers is not identified
correctly the number of \s-1CPU\s0 cores can be added in front. Here the
server has 8 \s-1CPU\s0 cores.
.PP
.Vb 1
\& seq 1 10 | parallel \-\-sshlogin 8/server.example.com echo
.Ve
.SH "EXAMPLE 10: Transferring of files (not implemented)"
.IX Header "EXAMPLE 10: Transferring of files (not implemented)"
To recompress gzipped files with bzip2 using a remote server run:
.PP
.Vb 3
\& find logs/ \-name \*(Aq*.gz\*(Aq | \e
\& parallel \-\-sshlogin server.example.com \e
\& \-\-transfer "zcat {} | bzip2 \-9 >{}.bz2"
.Ve
.PP
This will list the .gz\-files in the \fIlogs\fR directory and all
directories below. Then it will transfer the files to
\&\fIserver.example.com\fR to the corresponding directory in
\&\fI\f(CI$HOME\fI/logs\fR. On \fIserver.example.com\fR the file will be recompressed
using \fBzcat\fR and \fBbzip2\fR resulting in the corresponding file with
the suffix \fI.bz2\fR.
.PP
If you want the file to be transfered back to the local machine add
\&\fI\-\-return .bz2\fR:
.PP
.Vb 3
\& find logs/ \-name \*(Aq*.gz\*(Aq | \e
\& parallel \-\-sshlogin server.example.com \e
\& \-\-transfer \-\-return .bz2 "zcat {} | bzip2 \-9 >{}.bz2"
.Ve
.PP
After the recompressing is done the \fI.bz2\fR\-file is transfered back to
the local machine.
.PP
If you want to delete the transfered files on the remote machine add
\&\fI\-\-cleanup\fR:
.PP
.Vb 3
\& find logs/ \-name \*(Aq*.gz\*(Aq | \e
\& parallel \-\-sshlogin server.example.com \e
\& \-\-transfer \-\-return .bz2 \-\-cleanup "zcat {} | bzip2 \-9 >{}.bz2"
.Ve
.PP
If you want run one several servers add the servers to \fI\-\-sshlogin\fR
either using ',' or separate \fI\-\-sshlogin\fR:
.PP
.Vb 4
\& find logs/ \-name \*(Aq*.gz\*(Aq | \e
\& parallel \-\-sshlogin server.example.com,server2.example.com \e
\& \-\-sshlogin server3.example.com \e
\& \-\-transfer \-\-return .bz2 \-\-cleanup "zcat {} | bzip2 \-9 >{}.bz2"
.Ve
.PP
You can add the local machine using \fI\-\-sshlogin :\fR. This will disable the
removing and transferring for the local machine only:
.PP
.Vb 5
\& find logs/ \-name \*(Aq*.gz\*(Aq | \e
\& parallel \-\-sshlogin server.example.com,server2.example.com \e
\& \-\-sshlogin server3.example.com \e
\& \-\-sshlogin : \e
\& \-\-transfer \-\-return .bz2 \-\-cleanup "zcat {} | bzip2 \-9 >{}.bz2"
.Ve
.SH "QUOTING" .SH "QUOTING"
.IX Header "QUOTING" .IX Header "QUOTING"
For more advanced use quoting may be an issue. The following will For more advanced use quoting may be an issue. The following will
@ -541,7 +829,7 @@ So \fBparallel\fR's newline separation can be emulated with:
\&\fBcat | xargs \-d \*(L"\en\*(R" \-n1 \f(BIcommand\fB\fR \&\fBcat | xargs \-d \*(L"\en\*(R" \-n1 \f(BIcommand\fB\fR
.PP .PP
\&\fBxargs\fR can run a given number of jobs in parallel, but has no \&\fBxargs\fR can run a given number of jobs in parallel, but has no
support for running no_of_cpus jobs in parallel. support for running number-of-cpu-cores jobs in parallel.
.PP .PP
\&\fBxargs\fR has no support for grouping the output, therefore output may \&\fBxargs\fR has no support for grouping the output, therefore output may
run together, e.g. the first half of a line is from one process and run together, e.g. the first half of a line is from one process and
@ -608,24 +896,15 @@ hard, as all foreground processes get the \s-1INT\s0 from the shell.
If there are nomore jobs (\s-1STDIN\s0 is closed) then make sure to If there are nomore jobs (\s-1STDIN\s0 is closed) then make sure to
distribute the arguments evenly if running \-X. distribute the arguments evenly if running \-X.
.PP .PP
Distibute jobs to computers with different speeds/no_of_cpu using ssh Distibute jobs to computers with different speeds/number\-of\-cpu\-cores using ssh
ask the computers how many cpus they have and spawn appropriately ask the computers how many cpus they have and spawn appropriately
according to \-j setting. Reuse ssh connection (\-M and \-S) according to \-j setting. Reuse ssh connection (\-M and \-S)
http://www.semicomplete.com/blog/geekery/distributed\-xargs.html?source=rss20 http://www.semicomplete.com/blog/geekery/distributed\-xargs.html?source=rss20
http://code.google.com/p/ppss/wiki/Manual2 http://code.google.com/p/ppss/wiki/Manual2
.SS "\-S"
.IX Subsection "-S"
\&\-S sshlogin[,sshlogin]
.PP .PP
sshlogin is [user@]host or filename with list of sshlogin http://www.gnu.org/software/pexec/
.PP .PP
What about copying data to remote host? Have an option that says the Where will '>' be run? Local or remote? Where ever is easier.
argument is a file that should be copied.
.PP
What about copying data from remote host? Have an option that says
the argument is a file that should be copied.
.PP
Where will '>' be run? Local or remote?
.PP .PP
Parallelize so this can be done: Parallelize so this can be done:
mdm.screen find dir \-execdir mdm-run cmd {} \e; mdm.screen find dir \-execdir mdm-run cmd {} \e;

View file

@ -25,6 +25,8 @@ replace
replace replace
replace replace
replace replace
replace
replace
include this include this
include this include this
include this include this

View file

@ -25,6 +25,8 @@ replace
replace replace
replace replace
replace replace
replace
replace
include this include this
include this include this
include this include this