More examples

This commit is contained in:
Ole Tange 2010-08-14 20:39:33 +02:00
parent f888da9fdb
commit a038ade0de
3 changed files with 72 additions and 32 deletions

View file

@ -1,31 +1,16 @@
Default sshloginfile ~/.parallel/sshloginfile
--sshloginfile .. or -S .. means use default sshloginfile
# Allow 7 to run. After then 7th is started, block untill one is dead
parallel --mutex uniqidentifier -j7 command
parallel --automutex -j7 command
mdm.screen find dir -execdir mdm-run cmd {} \;
find dir -execdir parallel --automutex cmd {} \;
getppid
# Gzip all files in parallel fex syntax for splitting fields
parallel gzip ::: * http://www.semicomplete.com/projects/fex/
sql :foo 'select * from bar' | parallel --fex '|{1,2}' do_stuff {2} {1}
# Convert *.wav to *.mp3 using LAME running one process per CPU core:
parallel -j+0 lame {} -o {.}.mp3 ::: *.wav
# Make an uncompressed version of all *.gz
parallel zcat {} ">"{.} ::: *.gz
# Recompress all .gz files using bzip2 running 1 job per CPU core:
find . -name '*.gz' | parallel -j+0 "zcat {} | bzip2 >{.}.bz2 && rm {}"
# Create a directory for each zip-file and unzip it in that dir
parallel 'mkdir {.}; cd {.}; unzip ../{}' ::: *.zip
# Convert all *.mp3 in subdirs to *.ogg running
# one process per CPU core on local computer and server2
find . -name '*.mp3' | parallel --trc {.}.ogg -j+0 -S server2,: \
'mpg321 -w - {} | oggenc -q0 - -o {.}.ogg'
# Run mycmd on column 1-3 of each row of TAB separated values
parallel -a table_file.tsv --colsep '\t' mycmd -o {2} {3} -i {1}
# Run traceroute in parallel, but keep the output order the same
parallel -k traceroute ::: foss.org.my debian.org freenetproject.org
Unittest: --colsep + multiple -a Unittest: --colsep + multiple -a
@ -346,3 +331,30 @@ do not start new jobs. Print out the number of jobs waiting to
complete on STDERR. Accept sig INT again to kill now. This seems to be complete on STDERR. Accept sig INT again to kill now. This seems to be
hard, as all foreground processes get the INT from the shell. hard, as all foreground processes get the INT from the shell.
# Gzip all files in parallel
parallel gzip ::: *
# Convert *.wav to *.mp3 using LAME running one process per CPU core:
parallel -j+0 lame {} -o {.}.mp3 ::: *.wav
# Make an uncompressed version of all *.gz
parallel zcat {} ">"{.} ::: *.gz
# Recompress all .gz files using bzip2 running 1 job per CPU core:
find . -name '*.gz' | parallel -j+0 "zcat {} | bzip2 >{.}.bz2 && rm {}"
# Create a directory for each zip-file and unzip it in that dir
parallel 'mkdir {.}; cd {.}; unzip ../{}' ::: *.zip
# Convert all *.mp3 in subdirs to *.ogg running
# one process per CPU core on local computer and server2
find . -name '*.mp3' | parallel --trc {.}.ogg -j+0 -S server2,: \
'mpg321 -w - {} | oggenc -q0 - -o {.}.ogg'
# Run mycmd on column 1-3 of each row of TAB separated values
parallel -a table_file.tsv --colsep '\t' mycmd -o {2} {3} -i {1}
# Run traceroute in parallel, but keep the output order the same
parallel -k traceroute ::: foss.org.my debian.org freenetproject.org

View file

@ -78,7 +78,8 @@ Newsgroups: comp.unix.shell,comp.unix.admin
<<<<< <<<<<
to:parallel@gnu.org, bug-parallel@gnu.org, info-gnu@gnu.org, bug-directory@gnu.org to:parallel@gnu.org, bug-parallel@gnu.org, info-gnu@gnu.org, bug-directory@gnu.org
cc:Peter Simons <simons@cryp.to>, Sandro Cazzaniga <kharec@mandriva.org> cc:Peter Simons <simons@cryp.to>, Sandro Cazzaniga <kharec@mandriva.org>,
Tim Cuthbertson <tim3d.junk@gmail.com>
Subject: GNU Parallel 20100722 released Subject: GNU Parallel 20100722 released
@ -88,13 +89,23 @@ download at: http://ftp.gnu.org/gnu/parallel/
New in this release: New in this release:
* With --colsep a table can be used as input. Example: * With --colsep a table can be used as input. Example:
cat table | parallel --colsep '\s+' echo col1 {1} col2 {2} cat tab_sep_table | parallel --colsep '\t' echo col1 {1} col2 {2}
* --trim can remove white space around arguments. * --trim can remove white space around arguments.
* Zero install package. Thanks to Tim Cuthbertson <tim3d dot junk at * Zero install package. Thanks to Tim Cuthbertson <tim3d dot junk at
gmail dot com> gmail dot com>
* OpenSUSE package. Thanks to Markus Ammer <mkmm at gmx-topmail dot
de>
* Web review http://oentend.blogspot.com/2010/08/gnu-parallel.html
Thanks to Pavel Nuzhdin <pnzhdin at gmail dot com>
* Web review http://psung.blogspot.com/2010/08/gnu-parallel.html
Thanks to Phil Sung
= About GNU Parallel = = About GNU Parallel =
GNU Parallel is a shell tool for executing jobs in parallel using one GNU Parallel is a shell tool for executing jobs in parallel using one

View file

@ -815,11 +815,11 @@ B<ls | parallel -m mv {} destdir>
To remove the files I<pict0000.jpg> .. I<pict9999.jpg> you could do: To remove the files I<pict0000.jpg> .. I<pict9999.jpg> you could do:
B<seq -f %04g 0 9999 | parallel rm pict{}.jpg> B<seq -w 0 9999 | parallel rm pict{}.jpg>
You could also do: You could also do:
B<seq -f %04g 0 9999 | perl -pe 's/(.*)/pict$1.jpg/' | parallel -m rm> B<seq -w 0 9999 | perl -pe 's/(.*)/pict$1.jpg/' | parallel -m rm>
The first will run B<rm> 10000 times, while the last will only run The first will run B<rm> 10000 times, while the last will only run
B<rm> as many times needed to keep the command line length short B<rm> as many times needed to keep the command line length short
@ -827,7 +827,7 @@ enough to avoid B<Argument list too long> (it typically runs 1-2 times).
You could also run: You could also run:
B<seq -f %04g 0 9999 | parallel -X rm pict{}.jpg> B<seq -w 0 9999 | parallel -X rm pict{}.jpg>
This will also only run B<rm> as many times needed to keep the command This will also only run B<rm> as many times needed to keep the command
line length short enough. line length short enough.
@ -925,6 +925,21 @@ foo) you can do:
B<ls *.tar.gz| parallel -U {tar} 'echo {tar}|parallel "mkdir -p {.} ; tar -C {.} -xf {.}.tar.gz"'> B<ls *.tar.gz| parallel -U {tar} 'echo {tar}|parallel "mkdir -p {.} ; tar -C {.} -xf {.}.tar.gz"'>
=head1 EXAMPLE: Download 10 images for each of the past 30 days
Let us assume a website stores images like:
http://www.website.com/path/to/YYYYMMDD_##.jpg
where YYYYMMDD is the date and ## is the number 01-10. This will
generate the past 30 days as YYYYMMDD:
B<seq 1 30 | parallel date -d '"today -{} days"' +%Y%m%d>
Based on this we can let GNU B<parallel> generate 10 B<wget>s per day:
B<I<the above> | parallel -I {o} seq -w 1 10 "|" parallel wget
http://www.website.com/path/to/{o}_{}.jpg>
=head1 EXAMPLE: Rewriting a for-loop and a while-loop =head1 EXAMPLE: Rewriting a for-loop and a while-loop
@ -1278,7 +1293,7 @@ the currently running jobs are finished before exiting.
The environment variable $PARALLEL_PID is set by GNU B<parallel> and The environment variable $PARALLEL_PID is set by GNU B<parallel> and
is visible to the jobs started from GNU B<parallel>. This makes it is visible to the jobs started from GNU B<parallel>. This makes it
possible for the jobs to communicate directly to GNU <parallel>. possible for the jobs to communicate directly to GNU B<parallel>.
B<Example:> If each of the jobs tests a solution and one of jobs finds B<Example:> If each of the jobs tests a solution and one of jobs finds
the solution the job can tell GNU B<parallel> not to start more jobs the solution the job can tell GNU B<parallel> not to start more jobs
@ -1320,6 +1335,8 @@ The file ~/.parallelrc will be read if it exists. It should be
formatted like the environment variable $PARALLEL. Lines starting with formatted like the environment variable $PARALLEL. Lines starting with
'#' will be ignored. '#' will be ignored.
Options on the command line takes precedence over the environment
variable $PARALLEL which takes precedence over the file ~/.parallelrc.
=head1 EXIT STATUS =head1 EXIT STATUS