parallel: small optimization for --pipe

This commit is contained in:
Ole Tange 2011-05-04 00:30:56 +02:00
parent d173d70602
commit 48199a88fe
4 changed files with 62 additions and 15 deletions

View file

@ -1,3 +1,17 @@
Postkort:
- Forside kun figur
- Bagside:
- Logo med figur - evt gnu.org/s/parallel
- kort grå tekst, der forklarer hvad det er.
- Eet eksempel: parallel gzip ::: *
- Link til video. http://nd.gd/0s
GNU parallel is a UNIX-tool for running commands in parallel.
To gzip all files running one job per CPU write:
parallel gzip ::: *
Watch the intro video to learn more: http://nd.gd/0s
Or read more about GNU parallel: www.gnu.org/s/parallel
job->start():
$jobslot = Global::jobslot->$sshlogin

View file

@ -158,9 +158,9 @@ cc:Peter Simons <simons@cryp.to>, Sandro Cazzaniga <kharec@mandriva.org>,
Christian Faulhammer <fauli@gentoo.org>, Ryoichiro Suzuki <ryoichiro.suzuki@gmail.com>,
Jesse Alama <jesse.alama@gmail.com>
Subject: GNU Parallel 2011XXXX ('?') released
Subject: GNU Parallel 2011XX22 ('Pakistan') released
GNU Parallel 2011XXXX ('?') has been released. It is
GNU Parallel 2011XX22 ('Pakistan') has been released. It is
available for download at: http://ftp.gnu.org/gnu/parallel/
New in this release:
@ -173,6 +173,12 @@ New in this release:
* Review with idea for {..} and {...} in Japanese. Thanks to ichii386.
http://d.hatena.ne.jp/ichii386/20110426
* Upgrade GNU Parallel using Macports. Thanks to Phil Hollenback.
http://www.hollenback.net/index.php/MacportsParallel
* Robert from Echo One discusses using processes instead of threads:
http://rrees.wordpress.com/2011/04/25/many-cores-many-threads/
* Bug fixes and man page updates.

View file

@ -108,12 +108,20 @@ sub spreadstdin {
$recerror = "parallel: Warning: --recend unmatched. Is --blocksize too small?";
}
if($::opt_regexp) {
# If $recstart/$recend contains '|' this should only apply to the regexp
$recstart = "(?:".$recstart.")";
$recend = "(?:".$recend.")";
} else {
# $recstart/$recend = printf strings (\n)
$recstart =~ s/\\([rnt'"\\])/"qq|\\$1|"/gee;
$recend =~ s/\\([rnt'"\\])/"qq|\\$1|"/gee;
}
my $recendrecstart = $recend.$recstart;
while(read(STDIN,substr($buf,length $buf,0),$::opt_blocksize)) {
# substr above = append to $buf
if($::opt_regexp) {
# If $recstart/$recend contains '|' this should only apply to the regexp
$recstart = "(?:".$recstart.")";
$recend = "(?:".$recend.")";
if($Global::max_number_of_args) {
# -N => (start..*?end){n}
while($buf =~ s/((?:$recstart.*?$recend){$Global::max_number_of_args})($recstart.*)$/$2/os) {
@ -130,13 +138,10 @@ sub spreadstdin {
}
}
} else {
# $recstart/$recend = printf strings (\n)
$recstart =~ s/\\([rnt'"\\])/"qq|\\$1|"/gee;
$recend =~ s/\\([rnt'"\\])/"qq|\\$1|"/gee;
if($Global::max_number_of_args) {
# -N => (start..*?end){n}
my $i = 0;
while(($i = nindex(\$buf,$recend.$recstart,$Global::max_number_of_args)) != -1) {
while(($i = nindex(\$buf,$recendrecstart,$Global::max_number_of_args)) != -1) {
$i += length $recend; # find the actual splitting location
my $record = substr($buf,0,$i);
substr($buf,0,$i) = "";
@ -145,7 +150,7 @@ sub spreadstdin {
}
} else {
# Find the last recend-recstart in $buf
my $i = rindex($buf,$recend.$recstart);
my $i = rindex($buf,$recendrecstart);
if($i != -1) {
$i += length $recend; # find the actual splitting location
my $record = substr($buf,0,$i);

View file

@ -1415,6 +1415,26 @@ can be written like this:
B<cat list | parallel "do_something {} scale {.}.jpg ; do_step2 <{} {.}" | process_output>
=head1 EXAMPLE: Rewriting nested for-loops
Nested for-loops like this:
(for x in `cat xlist` ; do
for y in `cat ylist` ; do
do_something $x $y
done
done) | process_output
can be written like this:
B<cat xlist | parallel cat ylist \| parallel -I {o} do_something {} {o} | process_output>
The above will run N*N jobs in parallel if parallel normally runs N jobs. To
ensure the output order is the same as the input and only run N jobs do:
B<cat xlist | parallel -k cat ylist \| parallel -j1 -kI {o} do_something {} {o} | process_output>
=head1 EXAMPLE: Group output lines
When running jobs that output data, you often do not want the output
@ -1501,8 +1521,8 @@ B<cat bigfile | parallel --pipe --block 10M grep foo>
=head1 EXAMPLE: Using remote computers
To run commands on a remote computer SSH needs to be set up and you
must be able to login without entering a password (B<ssh-agent> may be
handy).
must be able to login without entering a password (The commands
B<ssh-copy-id> and B<ssh-agent> may help you do that).
To run B<echo> on B<server.example.com>:
@ -1643,6 +1663,8 @@ Print the number on the opposing sides of a six sided die:
B<parallel -a <(seq 6) -a <(seq 6 -1 1) echo>
B<parallel echo :::: <(seq 6) <(seq 6 -1 1)>
Convert files from all subdirs to PNG-files with consecutive numbers
(useful for making input PNG's for B<ffmpeg>):
@ -1693,12 +1715,12 @@ job. wget is too - if the webpages are small.
The content of the file jobs_to_run:
ping -c 1 10.0.0.1
wget http://status-server/status.cgi?ip=10.0.0.1
wget http://example.com/status.cgi?ip=10.0.0.1
ping -c 1 10.0.0.2
wget http://status-server/status.cgi?ip=10.0.0.2
wget http://example.com/status.cgi?ip=10.0.0.2
...
ping -c 1 10.0.0.255
wget http://status-server/status.cgi?ip=10.0.0.255
wget http://example.com/status.cgi?ip=10.0.0.255
To run 100 processes simultaneously do: