2017-01-01 11:42:52 +00:00
|
|
|
#!/usr/bin/perl -w
|
|
|
|
|
|
|
|
=encoding utf8
|
|
|
|
|
|
|
|
=head1 NAME
|
|
|
|
|
|
|
|
parallel_alternatives - Alternatives to GNU B<parallel>
|
|
|
|
|
|
|
|
|
|
|
|
=head1 DIFFERENCES BETWEEN GNU Parallel AND ALTERNATIVES
|
|
|
|
|
|
|
|
There are a lot programs with some of the functionality of GNU
|
|
|
|
B<parallel>. GNU B<parallel> strives to include the best of the
|
|
|
|
functionality without sacrificing ease of use.
|
|
|
|
|
2018-10-22 22:46:38 +00:00
|
|
|
B<parallel> has existed since 2002 and as GNU B<parallel> since
|
|
|
|
2010. A lot of the alternatives have not had the vitality to survive
|
|
|
|
that long, but have come and gone during that time.
|
|
|
|
|
|
|
|
GNU B<parallel> is actively maintained with a new release every month
|
|
|
|
since 2010. Most other alternatives are fleeting interests of the
|
|
|
|
developers with irregular releases and only maintained for a few
|
|
|
|
years.
|
|
|
|
|
|
|
|
|
2017-01-01 11:42:52 +00:00
|
|
|
=head2 SUMMARY TABLE
|
|
|
|
|
|
|
|
The following features are in some of the comparable tools:
|
|
|
|
|
|
|
|
Inputs
|
|
|
|
I1. Arguments can be read from stdin
|
|
|
|
I2. Arguments can be read from a file
|
|
|
|
I3. Arguments can be read from multiple files
|
|
|
|
I4. Arguments can be read from command line
|
|
|
|
I5. Arguments can be read from a table
|
|
|
|
I6. Arguments can be read from the same file using #! (shebang)
|
|
|
|
I7. Line oriented input as default (Quoting of special chars not needed)
|
|
|
|
|
|
|
|
Manipulation of input
|
|
|
|
M1. Composed command
|
|
|
|
M2. Multiple arguments can fill up an execution line
|
|
|
|
M3. Arguments can be put anywhere in the execution line
|
|
|
|
M4. Multiple arguments can be put anywhere in the execution line
|
|
|
|
M5. Arguments can be replaced with context
|
|
|
|
M6. Input can be treated as the complete command line
|
|
|
|
|
|
|
|
Outputs
|
|
|
|
O1. Grouping output so output from different jobs do not mix
|
|
|
|
O2. Send stderr (standard error) to stderr (standard error)
|
|
|
|
O3. Send stdout (standard output) to stdout (standard output)
|
|
|
|
O4. Order of output can be same as order of input
|
|
|
|
O5. Stdout only contains stdout (standard output) from the command
|
|
|
|
O6. Stderr only contains stderr (standard error) from the command
|
|
|
|
|
|
|
|
Execution
|
|
|
|
E1. Running jobs in parallel
|
|
|
|
E2. List running jobs
|
|
|
|
E3. Finish running jobs, but do not start new jobs
|
|
|
|
E4. Number of running jobs can depend on number of cpus
|
|
|
|
E5. Finish running jobs, but do not start new jobs after first failure
|
|
|
|
E6. Number of running jobs can be adjusted while running
|
|
|
|
|
|
|
|
Remote execution
|
|
|
|
R1. Jobs can be run on remote computers
|
|
|
|
R2. Basefiles can be transferred
|
|
|
|
R3. Argument files can be transferred
|
|
|
|
R4. Result files can be transferred
|
|
|
|
R5. Cleanup of transferred files
|
|
|
|
R6. No config files needed
|
|
|
|
R7. Do not run more than SSHD's MaxStartups can handle
|
|
|
|
R8. Configurable SSH command
|
|
|
|
R9. Retry if connection breaks occasionally
|
|
|
|
|
|
|
|
Semaphore
|
|
|
|
S1. Possibility to work as a mutex
|
|
|
|
S2. Possibility to work as a counting semaphore
|
|
|
|
|
|
|
|
Legend
|
|
|
|
- = no
|
|
|
|
x = not applicable
|
|
|
|
ID = yes
|
|
|
|
|
|
|
|
As every new version of the programs are not tested the table may be
|
|
|
|
outdated. Please file a bug-report if you find errors (See REPORTING
|
|
|
|
BUGS).
|
|
|
|
|
|
|
|
parallel:
|
|
|
|
I1 I2 I3 I4 I5 I6 I7
|
|
|
|
M1 M2 M3 M4 M5 M6
|
|
|
|
O1 O2 O3 O4 O5 O6
|
|
|
|
E1 E2 E3 E4 E5 E6
|
|
|
|
R1 R2 R3 R4 R5 R6 R7 R8 R9
|
|
|
|
S1 S2
|
|
|
|
|
|
|
|
xargs:
|
|
|
|
I1 I2 - - - - -
|
|
|
|
- M2 M3 - - -
|
|
|
|
- O2 O3 - O5 O6
|
|
|
|
E1 - - - - -
|
|
|
|
- - - - - x - - -
|
|
|
|
- -
|
|
|
|
|
|
|
|
find -exec:
|
|
|
|
- - - x - x -
|
|
|
|
- M2 M3 - - - -
|
|
|
|
- O2 O3 O4 O5 O6
|
|
|
|
- - - - - - -
|
|
|
|
- - - - - - - - -
|
|
|
|
x x
|
|
|
|
|
|
|
|
make -j:
|
|
|
|
- - - - - - -
|
|
|
|
- - - - - -
|
|
|
|
O1 O2 O3 - x O6
|
|
|
|
E1 - - - E5 -
|
|
|
|
- - - - - - - - -
|
|
|
|
- -
|
|
|
|
|
|
|
|
ppss:
|
|
|
|
I1 I2 - - - - I7
|
|
|
|
M1 - M3 - - M6
|
|
|
|
O1 - - x - -
|
|
|
|
E1 E2 ?E3 E4 - -
|
|
|
|
R1 R2 R3 R4 - - ?R7 ? ?
|
|
|
|
- -
|
|
|
|
|
|
|
|
pexec:
|
|
|
|
I1 I2 - I4 I5 - -
|
|
|
|
M1 - M3 - - M6
|
|
|
|
O1 O2 O3 - O5 O6
|
|
|
|
E1 - - E4 - E6
|
|
|
|
R1 - - - - R6 - - -
|
|
|
|
S1 -
|
|
|
|
|
|
|
|
xjobs, prll, dxargs, mdm/middelman, xapply, paexec, ladon, jobflow,
|
|
|
|
ClusterSSH: TODO - Please file a bug-report if you know what features
|
|
|
|
they support (See REPORTING BUGS).
|
|
|
|
|
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN xargs AND GNU Parallel
|
|
|
|
|
|
|
|
B<xargs> offers some of the same possibilities as GNU B<parallel>.
|
|
|
|
|
|
|
|
B<xargs> deals badly with special characters (such as space, \, ' and
|
|
|
|
"). To see the problem try this:
|
|
|
|
|
|
|
|
touch important_file
|
|
|
|
touch 'not important_file'
|
|
|
|
ls not* | xargs rm
|
|
|
|
mkdir -p "My brother's 12\" records"
|
|
|
|
ls | xargs rmdir
|
|
|
|
touch 'c:\windows\system32\clfs.sys'
|
|
|
|
echo 'c:\windows\system32\clfs.sys' | xargs ls -l
|
|
|
|
|
|
|
|
You can specify B<-0>, but many input generators are not
|
|
|
|
optimized for using B<NUL> as separator but are optimized for
|
|
|
|
B<newline> as separator. E.g B<head>, B<tail>, B<awk>, B<ls>, B<echo>,
|
|
|
|
B<sed>, B<tar -v>, B<perl> (B<-0> and \0 instead of \n), B<locate>
|
|
|
|
(requires using B<-0>), B<find> (requires using B<-print0>), B<grep>
|
|
|
|
(requires user to use B<-z> or B<-Z>), B<sort> (requires using B<-z>).
|
|
|
|
|
|
|
|
GNU B<parallel>'s newline separation can be emulated with:
|
|
|
|
|
|
|
|
B<cat | xargs -d "\n" -n1 I<command>>
|
|
|
|
|
|
|
|
B<xargs> can run a given number of jobs in parallel, but has no
|
|
|
|
support for running number-of-cpu-cores jobs in parallel.
|
|
|
|
|
|
|
|
B<xargs> has no support for grouping the output, therefore output may
|
|
|
|
run together, e.g. the first half of a line is from one process and
|
|
|
|
the last half of the line is from another process. The example
|
|
|
|
B<Parallel grep> cannot be done reliably with B<xargs> because of
|
|
|
|
this. To see this in action try:
|
|
|
|
|
2018-05-08 21:16:48 +00:00
|
|
|
parallel perl -e '\$a=\"1\".\"{}\"x10000000\;print\ \$a,\"\\n\"' \
|
|
|
|
'>' {} ::: a b c d e f g h
|
|
|
|
# Serial = no mixing = the wanted result
|
|
|
|
# 'tr -s a-z' squeezes repeating letters into a single letter
|
|
|
|
echo a b c d e f g h | xargs -P1 -n1 grep 1 | tr -s a-z
|
|
|
|
# Compare to 8 jobs in parallel
|
|
|
|
parallel -kP8 -n1 grep 1 ::: a b c d e f g h | tr -s a-z
|
|
|
|
echo a b c d e f g h | xargs -P8 -n1 grep 1 | tr -s a-z
|
|
|
|
echo a b c d e f g h | xargs -P8 -n1 grep --line-buffered 1 | \
|
|
|
|
tr -s a-z
|
2017-01-01 11:42:52 +00:00
|
|
|
|
2017-12-01 19:53:56 +00:00
|
|
|
Or try this:
|
|
|
|
|
|
|
|
slow_seq() {
|
2018-05-08 21:16:48 +00:00
|
|
|
echo Count to "$@"
|
2017-12-01 19:53:56 +00:00
|
|
|
seq "$@" |
|
|
|
|
perl -ne '$|=1; for(split//){ print; select($a,$a,$a,0.100);}'
|
|
|
|
}
|
|
|
|
export -f slow_seq
|
2018-05-08 21:16:48 +00:00
|
|
|
# Serial = no mixing = the wanted result
|
|
|
|
seq 8 | xargs -n1 -P1 -I {} bash -c 'slow_seq {}'
|
|
|
|
# Compare to 8 jobs in parallel
|
|
|
|
seq 8 | parallel -P8 slow_seq {}
|
|
|
|
seq 8 | xargs -n1 -P8 -I {} bash -c 'slow_seq {}'
|
2017-12-01 19:53:56 +00:00
|
|
|
|
2017-01-01 11:42:52 +00:00
|
|
|
B<xargs> has no support for keeping the order of the output, therefore
|
|
|
|
if running jobs in parallel using B<xargs> the output of the second
|
|
|
|
job cannot be postponed till the first job is done.
|
|
|
|
|
|
|
|
B<xargs> has no support for running jobs on remote computers.
|
|
|
|
|
|
|
|
B<xargs> has no support for context replace, so you will have to create the
|
|
|
|
arguments.
|
|
|
|
|
|
|
|
If you use a replace string in B<xargs> (B<-I>) you can not force
|
|
|
|
B<xargs> to use more than one argument.
|
|
|
|
|
|
|
|
Quoting in B<xargs> works like B<-q> in GNU B<parallel>. This means
|
|
|
|
composed commands and redirection require using B<bash -c>.
|
|
|
|
|
|
|
|
ls | parallel "wc {} >{}.wc"
|
|
|
|
ls | parallel "echo {}; ls {}|wc"
|
|
|
|
|
2018-08-22 22:21:29 +00:00
|
|
|
becomes (assuming you have 8 cores and that none of the filenames
|
2018-05-08 21:16:48 +00:00
|
|
|
contain space, " or ').
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
ls | xargs -d "\n" -P8 -I {} bash -c "wc {} >{}.wc"
|
|
|
|
ls | xargs -d "\n" -P8 -I {} bash -c "echo {}; ls {}|wc"
|
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
https://www.gnu.org/software/findutils/
|
|
|
|
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN find -exec AND GNU Parallel
|
|
|
|
|
2018-08-22 22:21:29 +00:00
|
|
|
B<find -exec> offers some of the same possibilities as GNU B<parallel>.
|
2017-01-01 11:42:52 +00:00
|
|
|
|
2018-08-22 22:21:29 +00:00
|
|
|
B<find -exec> only works on files. Processing other input (such as
|
2017-01-01 11:42:52 +00:00
|
|
|
hosts or URLs) will require creating these inputs as files. B<find
|
|
|
|
-exec> has no support for running commands in parallel.
|
|
|
|
|
2019-01-18 15:15:47 +00:00
|
|
|
https://www.gnu.org/software/findutils/ (Last checked: 2019-01)
|
2018-03-21 21:57:28 +00:00
|
|
|
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN make -j AND GNU Parallel
|
|
|
|
|
|
|
|
B<make -j> can run jobs in parallel, but requires a crafted Makefile
|
2018-08-22 22:21:29 +00:00
|
|
|
to do this. That results in extra quoting to get filenames containing
|
|
|
|
newlines to work correctly.
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
B<make -j> computes a dependency graph before running jobs. Jobs run
|
2018-08-22 22:21:29 +00:00
|
|
|
by GNU B<parallel> does not depend on each other.
|
2017-01-01 11:42:52 +00:00
|
|
|
|
2018-08-22 22:21:29 +00:00
|
|
|
(Very early versions of GNU B<parallel> were coincidentally implemented
|
2017-01-01 11:42:52 +00:00
|
|
|
using B<make -j>).
|
|
|
|
|
2019-01-18 15:15:47 +00:00
|
|
|
https://www.gnu.org/software/make/ (Last checked: 2019-01)
|
2018-03-21 21:57:28 +00:00
|
|
|
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN ppss AND GNU Parallel
|
|
|
|
|
|
|
|
B<ppss> is also a tool for running jobs in parallel.
|
|
|
|
|
|
|
|
The output of B<ppss> is status information and thus not useful for
|
|
|
|
using as input for another command. The output from the jobs are put
|
|
|
|
into files.
|
|
|
|
|
|
|
|
The argument replace string ($ITEM) cannot be changed. Arguments must
|
|
|
|
be quoted - thus arguments containing special characters (space '"&!*)
|
2018-08-22 22:21:29 +00:00
|
|
|
may cause problems. More than one argument is not supported. Filenames
|
|
|
|
containing newlines are not processed correctly. When reading input
|
|
|
|
from a file null cannot be used as a terminator. B<ppss> needs to read
|
|
|
|
the whole input file before starting any jobs.
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
Output and status information is stored in ppss_dir and thus requires
|
|
|
|
cleanup when completed. If the dir is not removed before running
|
|
|
|
B<ppss> again it may cause nothing to happen as B<ppss> thinks the
|
|
|
|
task is already done. GNU B<parallel> will normally not need cleaning
|
|
|
|
up if running locally and will only need cleaning up if stopped
|
|
|
|
abnormally and running remote (B<--cleanup> may not complete if
|
|
|
|
stopped abnormally). The example B<Parallel grep> would require extra
|
|
|
|
postprocessing if written using B<ppss>.
|
|
|
|
|
|
|
|
For remote systems PPSS requires 3 steps: config, deploy, and
|
|
|
|
start. GNU B<parallel> only requires one step.
|
|
|
|
|
|
|
|
=head3 EXAMPLES FROM ppss MANUAL
|
|
|
|
|
|
|
|
Here are the examples from B<ppss>'s manual page with the equivalent
|
|
|
|
using GNU B<parallel>:
|
|
|
|
|
|
|
|
B<1> ./ppss.sh standalone -d /path/to/files -c 'gzip '
|
|
|
|
|
|
|
|
B<1> find /path/to/files -type f | parallel gzip
|
|
|
|
|
|
|
|
B<2> ./ppss.sh standalone -d /path/to/files -c 'cp "$ITEM" /destination/dir '
|
|
|
|
|
|
|
|
B<2> find /path/to/files -type f | parallel cp {} /destination/dir
|
|
|
|
|
|
|
|
B<3> ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q '
|
|
|
|
|
|
|
|
B<3> parallel -a list-of-urls.txt wget -q
|
|
|
|
|
|
|
|
B<4> ./ppss.sh standalone -f list-of-urls.txt -c 'wget -q "$ITEM"'
|
|
|
|
|
|
|
|
B<4> parallel -a list-of-urls.txt wget -q {}
|
|
|
|
|
|
|
|
B<5> ./ppss config -C config.cfg -c 'encode.sh ' -d /source/dir -m
|
|
|
|
192.168.1.100 -u ppss -k ppss-key.key -S ./encode.sh -n nodes.txt -o
|
|
|
|
/some/output/dir --upload --download ; ./ppss deploy -C config.cfg ;
|
|
|
|
./ppss start -C config
|
|
|
|
|
|
|
|
B<5> # parallel does not use configs. If you want a different username put it in nodes.txt: user@hostname
|
|
|
|
|
|
|
|
B<5> find source/dir -type f | parallel --sshloginfile nodes.txt --trc {.}.mp3 lame -a {} -o {.}.mp3 --preset standard --quiet
|
|
|
|
|
|
|
|
B<6> ./ppss stop -C config.cfg
|
|
|
|
|
|
|
|
B<6> killall -TERM parallel
|
|
|
|
|
|
|
|
B<7> ./ppss pause -C config.cfg
|
|
|
|
|
|
|
|
B<7> Press: CTRL-Z or killall -SIGTSTP parallel
|
|
|
|
|
|
|
|
B<8> ./ppss continue -C config.cfg
|
|
|
|
|
|
|
|
B<8> Enter: fg or killall -SIGCONT parallel
|
|
|
|
|
|
|
|
B<9> ./ppss.sh status -C config.cfg
|
|
|
|
|
|
|
|
B<9> killall -SIGUSR2 parallel
|
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
https://github.com/louwrentius/PPSS
|
|
|
|
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN pexec AND GNU Parallel
|
|
|
|
|
|
|
|
B<pexec> is also a tool for running jobs in parallel.
|
|
|
|
|
|
|
|
=head3 EXAMPLES FROM pexec MANUAL
|
|
|
|
|
|
|
|
Here are the examples from B<pexec>'s info page with the equivalent
|
|
|
|
using GNU B<parallel>:
|
|
|
|
|
|
|
|
B<1> pexec -o sqrt-%s.dat -p "$(seq 10)" -e NUM -n 4 -c -- \
|
|
|
|
'echo "scale=10000;sqrt($NUM)" | bc'
|
|
|
|
|
|
|
|
B<1> seq 10 | parallel -j4 'echo "scale=10000;sqrt({})" | bc > sqrt-{}.dat'
|
|
|
|
|
|
|
|
B<2> pexec -p "$(ls myfiles*.ext)" -i %s -o %s.sort -- sort
|
|
|
|
|
|
|
|
B<2> ls myfiles*.ext | parallel sort {} ">{}.sort"
|
|
|
|
|
|
|
|
B<3> pexec -f image.list -n auto -e B -u star.log -c -- \
|
|
|
|
'fistar $B.fits -f 100 -F id,x,y,flux -o $B.star'
|
|
|
|
|
|
|
|
B<3> parallel -a image.list \
|
|
|
|
'fistar {}.fits -f 100 -F id,x,y,flux -o {}.star' 2>star.log
|
|
|
|
|
|
|
|
B<4> pexec -r *.png -e IMG -c -o - -- \
|
|
|
|
'convert $IMG ${IMG%.png}.jpeg ; "echo $IMG: done"'
|
|
|
|
|
|
|
|
B<4> ls *.png | parallel 'convert {} {.}.jpeg; echo {}: done'
|
|
|
|
|
|
|
|
B<5> pexec -r *.png -i %s -o %s.jpg -c 'pngtopnm | pnmtojpeg'
|
|
|
|
|
|
|
|
B<5> ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {}.jpg'
|
|
|
|
|
|
|
|
B<6> for p in *.png ; do echo ${p%.png} ; done | \
|
|
|
|
pexec -f - -i %s.png -o %s.jpg -c 'pngtopnm | pnmtojpeg'
|
|
|
|
|
|
|
|
B<6> ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {.}.jpg'
|
|
|
|
|
|
|
|
B<7> LIST=$(for p in *.png ; do echo ${p%.png} ; done)
|
|
|
|
pexec -r $LIST -i %s.png -o %s.jpg -c 'pngtopnm | pnmtojpeg'
|
|
|
|
|
|
|
|
B<7> ls *.png | parallel 'pngtopnm < {} | pnmtojpeg > {.}.jpg'
|
|
|
|
|
|
|
|
B<8> pexec -n 8 -r *.jpg -y unix -e IMG -c \
|
|
|
|
'pexec -j -m blockread -d $IMG | \
|
|
|
|
jpegtopnm | pnmscale 0.5 | pnmtojpeg | \
|
|
|
|
pexec -j -m blockwrite -s th_$IMG'
|
|
|
|
|
|
|
|
B<8> Combining GNU B<parallel> and GNU B<sem>.
|
|
|
|
|
|
|
|
B<8> ls *jpg | parallel -j8 'sem --id blockread cat {} | jpegtopnm |' \
|
|
|
|
'pnmscale 0.5 | pnmtojpeg | sem --id blockwrite cat > th_{}'
|
|
|
|
|
|
|
|
B<8> If reading and writing is done to the same disk, this may be
|
|
|
|
faster as only one process will be either reading or writing:
|
|
|
|
|
|
|
|
B<8> ls *jpg | parallel -j8 'sem --id diskio cat {} | jpegtopnm |' \
|
|
|
|
'pnmscale 0.5 | pnmtojpeg | sem --id diskio cat > th_{}'
|
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
https://www.gnu.org/software/pexec/
|
|
|
|
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN xjobs AND GNU Parallel
|
|
|
|
|
|
|
|
B<xjobs> is also a tool for running jobs in parallel. It only supports
|
|
|
|
running jobs on your local computer.
|
|
|
|
|
|
|
|
B<xjobs> deals badly with special characters just like B<xargs>. See
|
|
|
|
the section B<DIFFERENCES BETWEEN xargs AND GNU Parallel>.
|
|
|
|
|
|
|
|
Here are the examples from B<xjobs>'s man page with the equivalent
|
|
|
|
using GNU B<parallel>:
|
|
|
|
|
|
|
|
B<1> ls -1 *.zip | xjobs unzip
|
|
|
|
|
|
|
|
B<1> ls *.zip | parallel unzip
|
|
|
|
|
|
|
|
B<2> ls -1 *.zip | xjobs -n unzip
|
|
|
|
|
|
|
|
B<2> ls *.zip | parallel unzip >/dev/null
|
|
|
|
|
|
|
|
B<3> find . -name '*.bak' | xjobs gzip
|
|
|
|
|
|
|
|
B<3> find . -name '*.bak' | parallel gzip
|
|
|
|
|
|
|
|
B<4> ls -1 *.jar | sed 's/\(.*\)/\1 > \1.idx/' | xjobs jar tf
|
|
|
|
|
|
|
|
B<4> ls *.jar | parallel jar tf {} '>' {}.idx
|
|
|
|
|
|
|
|
B<5> xjobs -s script
|
|
|
|
|
|
|
|
B<5> cat script | parallel
|
|
|
|
|
|
|
|
B<6> mkfifo /var/run/my_named_pipe;
|
|
|
|
xjobs -s /var/run/my_named_pipe &
|
|
|
|
echo unzip 1.zip >> /var/run/my_named_pipe;
|
|
|
|
echo tar cf /backup/myhome.tar /home/me >> /var/run/my_named_pipe
|
|
|
|
|
|
|
|
B<6> mkfifo /var/run/my_named_pipe;
|
|
|
|
cat /var/run/my_named_pipe | parallel &
|
|
|
|
echo unzip 1.zip >> /var/run/my_named_pipe;
|
|
|
|
echo tar cf /backup/myhome.tar /home/me >> /var/run/my_named_pipe
|
|
|
|
|
2019-01-18 15:15:47 +00:00
|
|
|
http://www.maier-komor.de/xjobs.html (Last checked: 2019-01)
|
2018-03-21 21:57:28 +00:00
|
|
|
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN prll AND GNU Parallel
|
|
|
|
|
|
|
|
B<prll> is also a tool for running jobs in parallel. It does not
|
|
|
|
support running jobs on remote computers.
|
|
|
|
|
|
|
|
B<prll> encourages using BASH aliases and BASH functions instead of
|
|
|
|
scripts. GNU B<parallel> supports scripts directly, functions if they
|
|
|
|
are exported using B<export -f>, and aliases if using B<env_parallel>.
|
|
|
|
|
|
|
|
B<prll> generates a lot of status information on stderr (standard
|
|
|
|
error) which makes it harder to use the stderr (standard error) output
|
|
|
|
of the job directly as input for another program.
|
|
|
|
|
|
|
|
Here is the example from B<prll>'s man page with the equivalent
|
|
|
|
using GNU B<parallel>:
|
|
|
|
|
|
|
|
prll -s 'mogrify -flip $1' *.jpg
|
|
|
|
parallel mogrify -flip ::: *.jpg
|
|
|
|
|
2019-01-18 15:15:47 +00:00
|
|
|
https://github.com/exzombie/prll (Last checked: 2019-01)
|
2018-03-21 21:57:28 +00:00
|
|
|
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN dxargs AND GNU Parallel
|
|
|
|
|
|
|
|
B<dxargs> is also a tool for running jobs in parallel.
|
|
|
|
|
|
|
|
B<dxargs> does not deal well with more simultaneous jobs than SSHD's
|
|
|
|
MaxStartups. B<dxargs> is only built for remote run jobs, but does not
|
|
|
|
support transferring of files.
|
|
|
|
|
2019-01-18 15:15:47 +00:00
|
|
|
https://web.archive.org/web/20120518070250/http://www.semicomplete.com/blog/geekery/distributed-xargs.html (Last checked: 2019-01)
|
2018-03-21 21:57:28 +00:00
|
|
|
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN mdm/middleman AND GNU Parallel
|
|
|
|
|
|
|
|
middleman(mdm) is also a tool for running jobs in parallel.
|
|
|
|
|
2018-10-22 22:46:38 +00:00
|
|
|
Here are the shellscripts of
|
|
|
|
https://web.archive.org/web/20110728064735/http://mdm.berlios.de/usage.html
|
|
|
|
ported to GNU B<parallel>:
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
seq 19 | parallel buffon -o - | sort -n > result
|
|
|
|
cat files | parallel cmd
|
|
|
|
find dir -execdir sem cmd {} \;
|
|
|
|
|
2019-01-18 15:15:47 +00:00
|
|
|
https://github.com/cklin/mdm (Last checked: 2019-01)
|
2018-03-21 21:57:28 +00:00
|
|
|
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN xapply AND GNU Parallel
|
|
|
|
|
|
|
|
B<xapply> can run jobs in parallel on the local computer.
|
|
|
|
|
|
|
|
Here are the examples from B<xapply>'s man page with the equivalent
|
|
|
|
using GNU B<parallel>:
|
|
|
|
|
|
|
|
B<1> xapply '(cd %1 && make all)' */
|
|
|
|
|
|
|
|
B<1> parallel 'cd {} && make all' ::: */
|
|
|
|
|
|
|
|
B<2> xapply -f 'diff %1 ../version5/%1' manifest | more
|
|
|
|
|
|
|
|
B<2> parallel diff {} ../version5/{} < manifest | more
|
|
|
|
|
|
|
|
B<3> xapply -p/dev/null -f 'diff %1 %2' manifest1 checklist1
|
|
|
|
|
|
|
|
B<3> parallel --link diff {1} {2} :::: manifest1 checklist1
|
|
|
|
|
|
|
|
B<4> xapply 'indent' *.c
|
|
|
|
|
|
|
|
B<4> parallel indent ::: *.c
|
|
|
|
|
|
|
|
B<5> find ~ksb/bin -type f ! -perm -111 -print | xapply -f -v 'chmod a+x' -
|
|
|
|
|
|
|
|
B<5> find ~ksb/bin -type f ! -perm -111 -print | parallel -v chmod a+x
|
|
|
|
|
|
|
|
B<6> find */ -... | fmt 960 1024 | xapply -f -i /dev/tty 'vi' -
|
|
|
|
|
|
|
|
B<6> sh <(find */ -... | parallel -s 1024 echo vi)
|
|
|
|
|
|
|
|
B<6> find */ -... | parallel -s 1024 -Xuj1 vi
|
|
|
|
|
|
|
|
B<7> find ... | xapply -f -5 -i /dev/tty 'vi' - - - - -
|
|
|
|
|
|
|
|
B<7> sh <(find ... |parallel -n5 echo vi)
|
|
|
|
|
|
|
|
B<7> find ... |parallel -n5 -uj1 vi
|
|
|
|
|
|
|
|
B<8> xapply -fn "" /etc/passwd
|
|
|
|
|
|
|
|
B<8> parallel -k echo < /etc/passwd
|
|
|
|
|
|
|
|
B<9> tr ':' '\012' < /etc/passwd | xapply -7 -nf 'chown %1 %6' - - - - - - -
|
|
|
|
|
|
|
|
B<9> tr ':' '\012' < /etc/passwd | parallel -N7 chown {1} {6}
|
|
|
|
|
|
|
|
B<10> xapply '[ -d %1/RCS ] || echo %1' */
|
|
|
|
|
|
|
|
B<10> parallel '[ -d {}/RCS ] || echo {}' ::: */
|
|
|
|
|
|
|
|
B<11> xapply -f '[ -f %1 ] && echo %1' List | ...
|
|
|
|
|
|
|
|
B<11> parallel '[ -f {} ] && echo {}' < List | ...
|
|
|
|
|
2019-01-25 05:16:35 +00:00
|
|
|
https://web.archive.org/web/20160702211113/
|
2018-03-21 21:57:28 +00:00
|
|
|
http://carrera.databits.net/~ksb/msrc/local/bin/xapply/xapply.html
|
|
|
|
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN AIX apply AND GNU Parallel
|
|
|
|
|
|
|
|
B<apply> can build command lines based on a template and arguments -
|
|
|
|
very much like GNU B<parallel>. B<apply> does not run jobs in
|
|
|
|
parallel. B<apply> does not use an argument separator (like B<:::>);
|
|
|
|
instead the template must be the first argument.
|
|
|
|
|
2019-01-25 05:16:35 +00:00
|
|
|
Here are the examples from IBM's Knowledge Center and the
|
|
|
|
corresponding command using GNU B<parallel>:
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
1. To obtain results similar to those of the B<ls> command, enter:
|
|
|
|
|
|
|
|
apply echo *
|
|
|
|
parallel echo ::: *
|
|
|
|
|
|
|
|
2. To compare the file named B<a1> to the file named B<b1>, and the
|
|
|
|
file named B<a2> to the file named B<b2>, enter:
|
|
|
|
|
|
|
|
apply -2 cmp a1 b1 a2 b2
|
|
|
|
parallel -N2 cmp ::: a1 b1 a2 b2
|
|
|
|
|
|
|
|
3. To run the B<who> command five times, enter:
|
|
|
|
|
|
|
|
apply -0 who 1 2 3 4 5
|
|
|
|
parallel -N0 who ::: 1 2 3 4 5
|
|
|
|
|
|
|
|
4. To link all files in the current directory to the directory
|
|
|
|
B</usr/joe>, enter:
|
|
|
|
|
|
|
|
apply 'ln %1 /usr/joe' *
|
|
|
|
parallel ln {} /usr/joe ::: *
|
|
|
|
|
2019-01-18 15:15:47 +00:00
|
|
|
https://www-01.ibm.com/support/knowledgecenter/ssw_aix_71/com.ibm.aix.cmds1/apply.htm (Last checked: 2019-01)
|
2018-03-21 21:57:28 +00:00
|
|
|
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN paexec AND GNU Parallel
|
|
|
|
|
|
|
|
B<paexec> can run jobs in parallel on both the local and remote computers.
|
|
|
|
|
|
|
|
B<paexec> requires commands to print a blank line as the last
|
|
|
|
output. This means you will have to write a wrapper for most programs.
|
|
|
|
|
|
|
|
B<paexec> has a job dependency facility so a job can depend on another
|
|
|
|
job to be executed successfully. Sort of a poor-man's B<make>.
|
|
|
|
|
|
|
|
Here are the examples from B<paexec>'s example catalog with the equivalent
|
|
|
|
using GNU B<parallel>:
|
|
|
|
|
|
|
|
=over 1
|
|
|
|
|
|
|
|
=item 1_div_X_run:
|
|
|
|
|
|
|
|
../../paexec -s -l -c "`pwd`/1_div_X_cmd" -n +1 <<EOF [...]
|
|
|
|
parallel echo {} '|' `pwd`/1_div_X_cmd <<EOF [...]
|
|
|
|
|
|
|
|
=item all_substr_run:
|
|
|
|
|
|
|
|
../../paexec -lp -c "`pwd`/all_substr_cmd" -n +3 <<EOF [...]
|
|
|
|
parallel echo {} '|' `pwd`/all_substr_cmd <<EOF [...]
|
|
|
|
|
|
|
|
=item cc_wrapper_run:
|
|
|
|
|
|
|
|
../../paexec -c "env CC=gcc CFLAGS=-O2 `pwd`/cc_wrapper_cmd" \
|
|
|
|
-n 'host1 host2' \
|
|
|
|
-t '/usr/bin/ssh -x' <<EOF [...]
|
|
|
|
parallel echo {} '|' "env CC=gcc CFLAGS=-O2 `pwd`/cc_wrapper_cmd" \
|
|
|
|
-S host1,host2 <<EOF [...]
|
|
|
|
# This is not exactly the same, but avoids the wrapper
|
|
|
|
parallel gcc -O2 -c -o {.}.o {} \
|
|
|
|
-S host1,host2 <<EOF [...]
|
|
|
|
|
|
|
|
=item toupper_run:
|
|
|
|
|
|
|
|
../../paexec -lp -c "`pwd`/toupper_cmd" -n +10 <<EOF [...]
|
|
|
|
parallel echo {} '|' ./toupper_cmd <<EOF [...]
|
|
|
|
# Without the wrapper:
|
|
|
|
parallel echo {} '| awk {print\ toupper\(\$0\)}' <<EOF [...]
|
|
|
|
|
|
|
|
=back
|
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
https://github.com/cheusov/paexec
|
|
|
|
|
2017-01-01 11:42:52 +00:00
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
=head2 DIFFERENCES BETWEEN map(sitaramc) AND GNU Parallel
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
B<map> sees it as a feature to have less features and in doing so it
|
|
|
|
also handles corner cases incorrectly. A lot of GNU B<parallel>'s code
|
|
|
|
is to handle corner cases correctly on every platform, so you will not
|
2018-08-22 22:21:29 +00:00
|
|
|
get a nasty surprise if a user, for example, saves a file called: I<My
|
2017-01-01 11:42:52 +00:00
|
|
|
brother's 12" records.txt>
|
|
|
|
|
|
|
|
B<map>'s example showing how to deal with special characters fails on
|
|
|
|
special characters:
|
|
|
|
|
|
|
|
echo "The Cure" > My\ brother\'s\ 12\"\ records
|
|
|
|
|
|
|
|
ls | \
|
2018-09-20 22:15:14 +00:00
|
|
|
map 'echo -n `gzip < "%" | wc -c`; echo -n '*100/'; wc -c < "%"' |
|
|
|
|
bc
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
It works with GNU B<parallel>:
|
|
|
|
|
|
|
|
ls | \
|
2018-09-20 22:15:14 +00:00
|
|
|
parallel \
|
|
|
|
'echo -n `gzip < {} | wc -c`; echo -n '*100/'; wc -c < {}' | bc
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
And you can even get the file name prepended:
|
|
|
|
|
|
|
|
ls | \
|
2018-09-20 22:15:14 +00:00
|
|
|
parallel --tag \
|
|
|
|
'(echo -n `gzip < {} | wc -c`'*100/'; wc -c < {}) | bc'
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
B<map> has no support for grouping. So this gives the wrong results
|
|
|
|
without any warnings:
|
|
|
|
|
|
|
|
parallel perl -e '\$a=\"1{}\"x10000000\;print\ \$a,\"\\n\"' '>' {} \
|
|
|
|
::: a b c d e f
|
|
|
|
ls -l a b c d e f
|
|
|
|
parallel -kP4 -n1 grep 1 > out.par ::: a b c d e f
|
|
|
|
map -p 4 'grep 1' a b c d e f > out.map-unbuf
|
|
|
|
map -p 4 'grep --line-buffered 1' a b c d e f > out.map-linebuf
|
|
|
|
map -p 1 'grep --line-buffered 1' a b c d e f > out.map-serial
|
|
|
|
ls -l out*
|
|
|
|
md5sum out*
|
|
|
|
|
|
|
|
The documentation shows a workaround, but not only does that mix
|
|
|
|
stdout (standard output) with stderr (standard error) it also fails
|
|
|
|
completely for certain jobs (and may even be considered less readable):
|
|
|
|
|
|
|
|
parallel echo -n {} ::: 1 2 3
|
|
|
|
|
2018-09-20 22:15:14 +00:00
|
|
|
map -p 4 'echo -n % 2>&1 | sed -e "s/^/$$:/"' 1 2 3 | \
|
|
|
|
sort | cut -f2- -d:
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
B<map>s replacement strings (% %D %B %E) can be simulated in GNU
|
|
|
|
B<parallel> by putting this in B<~/.parallel/config>:
|
|
|
|
|
|
|
|
--rpl '%'
|
2018-09-20 22:15:14 +00:00
|
|
|
--rpl '%D $_=Q(::dirname($_));'
|
2017-01-01 11:42:52 +00:00
|
|
|
--rpl '%B s:.*/::;s:\.[^/.]+$::;'
|
|
|
|
--rpl '%E s:.*\.::'
|
|
|
|
|
|
|
|
B<map> does not have an argument separator on the command line, but
|
|
|
|
uses the first argument as command. This makes quoting harder which again
|
|
|
|
may affect readability. Compare:
|
|
|
|
|
2018-09-20 22:15:14 +00:00
|
|
|
map -p 2 'perl -ne '"'"'/^\S+\s+\S+$/ and print $ARGV,"\n"'"'" *
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
parallel -q perl -ne '/^\S+\s+\S+$/ and print $ARGV,"\n"' ::: *
|
|
|
|
|
|
|
|
B<map> can do multiple arguments with context replace, but not without
|
|
|
|
context replace:
|
|
|
|
|
|
|
|
parallel --xargs echo 'BEGIN{'{}'}END' ::: 1 2 3
|
|
|
|
|
2018-09-20 22:15:14 +00:00
|
|
|
map "echo 'BEGIN{'%'}END'" 1 2 3
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
B<map> requires Perl v5.10.0 making it harder to use on old systems.
|
|
|
|
|
2018-09-20 22:15:14 +00:00
|
|
|
B<map> has no way of using % in the command (GNU B<parallel> has -I to
|
2017-01-01 11:42:52 +00:00
|
|
|
specify another replacement string than B<{}>).
|
|
|
|
|
|
|
|
By design B<map> is option incompatible with B<xargs>, it does not
|
|
|
|
have remote job execution, a structured way of saving results,
|
|
|
|
multiple input sources, progress indicator, configurable record
|
|
|
|
delimiter (only field delimiter), logging of jobs run with possibility
|
|
|
|
to resume, keeping the output in the same order as input, --pipe
|
|
|
|
processing, and dynamically timeouts.
|
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
https://github.com/sitaramc/map
|
|
|
|
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN ladon AND GNU Parallel
|
|
|
|
|
|
|
|
B<ladon> can run multiple jobs on files in parallel.
|
|
|
|
|
|
|
|
B<ladon> only works on files and the only way to specify files is
|
|
|
|
using a quoted glob string (such as \*.jpg). It is not possible to
|
|
|
|
list the files manually.
|
|
|
|
|
2018-09-20 22:15:14 +00:00
|
|
|
As replacement strings it uses FULLPATH DIRNAME BASENAME EXT RELDIR
|
|
|
|
RELPATH
|
2017-01-01 11:42:52 +00:00
|
|
|
|
2018-09-20 22:15:14 +00:00
|
|
|
These can be simulated using GNU B<parallel> by putting this in
|
|
|
|
B<~/.parallel/config>:
|
2017-01-01 11:42:52 +00:00
|
|
|
|
2018-09-20 22:15:14 +00:00
|
|
|
--rpl 'FULLPATH $_=Q($_);chomp($_=qx{readlink -f $_});'
|
|
|
|
--rpl 'DIRNAME $_=Q(::dirname($_));chomp($_=qx{readlink -f $_});'
|
2017-01-01 11:42:52 +00:00
|
|
|
--rpl 'BASENAME s:.*/::;s:\.[^/.]+$::;'
|
|
|
|
--rpl 'EXT s:.*\.::'
|
2018-09-22 21:09:30 +00:00
|
|
|
--rpl 'RELDIR $_=Q($_);chomp(($_,$c)=qx{readlink -f $_;pwd});
|
|
|
|
s:\Q$c/\E::;$_=::dirname($_);'
|
|
|
|
--rpl 'RELPATH $_=Q($_);chomp(($_,$c)=qx{readlink -f $_;pwd});
|
|
|
|
s:\Q$c/\E::;'
|
2017-01-01 11:42:52 +00:00
|
|
|
|
2018-09-20 22:15:14 +00:00
|
|
|
B<ladon> deals badly with filenames containing " and newline, and it
|
|
|
|
fails for output larger than 200k:
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
ladon '*' -- seq 36000 | wc
|
|
|
|
|
|
|
|
=head3 EXAMPLES FROM ladon MANUAL
|
|
|
|
|
|
|
|
It is assumed that the '--rpl's above are put in B<~/.parallel/config>
|
|
|
|
and that it is run under a shell that supports '**' globbing (such as B<zsh>):
|
|
|
|
|
|
|
|
B<1> ladon "**/*.txt" -- echo RELPATH
|
|
|
|
|
|
|
|
B<1> parallel echo RELPATH ::: **/*.txt
|
|
|
|
|
|
|
|
B<2> ladon "~/Documents/**/*.pdf" -- shasum FULLPATH >hashes.txt
|
|
|
|
|
|
|
|
B<2> parallel shasum FULLPATH ::: ~/Documents/**/*.pdf >hashes.txt
|
|
|
|
|
|
|
|
B<3> ladon -m thumbs/RELDIR "**/*.jpg" -- convert FULLPATH -thumbnail 100x100^ -gravity center -extent 100x100 thumbs/RELPATH
|
|
|
|
|
|
|
|
B<3> parallel mkdir -p thumbs/RELDIR\; convert FULLPATH -thumbnail 100x100^ -gravity center -extent 100x100 thumbs/RELPATH ::: **/*.jpg
|
|
|
|
|
|
|
|
B<4> ladon "~/Music/*.wav" -- lame -V 2 FULLPATH DIRNAME/BASENAME.mp3
|
|
|
|
|
|
|
|
B<4> parallel lame -V 2 FULLPATH DIRNAME/BASENAME.mp3 ::: ~/Music/*.wav
|
|
|
|
|
2019-01-18 15:15:47 +00:00
|
|
|
https://github.com/danielgtaylor/ladon (Last checked: 2019-01)
|
2018-03-21 21:57:28 +00:00
|
|
|
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN jobflow AND GNU Parallel
|
|
|
|
|
|
|
|
B<jobflow> can run multiple jobs in parallel.
|
|
|
|
|
|
|
|
Just like B<xargs> output from B<jobflow> jobs running in parallel mix
|
|
|
|
together by default. B<jobflow> can buffer into files (placed in
|
2018-09-20 22:15:14 +00:00
|
|
|
/run/shm), but these are not cleaned up if B<jobflow> dies
|
|
|
|
unexpectedly (e.g. by Ctrl-C). If the total output is big (in the
|
|
|
|
order of RAM+swap) it can cause the system to slow to a crawl and
|
|
|
|
eventually run out of memory.
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
B<jobflow> gives no error if the command is unknown, and like B<xargs>
|
2018-09-20 22:15:14 +00:00
|
|
|
redirection and composed commands require wrapping with B<bash -c>.
|
|
|
|
|
|
|
|
Input lines can at most be 4096 bytes. You can at most have 16 {}'s in
|
|
|
|
the command template. More than that either crashes the program or
|
|
|
|
simple does not execute the command.
|
|
|
|
|
|
|
|
B<jobflow> has no equivalent for B<--pipe>, or B<--sshlogin>.
|
2017-01-01 11:42:52 +00:00
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
B<jobflow> makes it possible to set resource limits on the running
|
2017-01-01 11:42:52 +00:00
|
|
|
jobs. This can be emulated by GNU B<parallel> using B<bash>'s B<ulimit>:
|
|
|
|
|
|
|
|
jobflow -limits=mem=100M,cpu=3,fsize=20M,nofiles=300 myjob
|
|
|
|
|
|
|
|
parallel 'ulimit -v 102400 -t 3 -f 204800 -n 300 myjob'
|
|
|
|
|
|
|
|
|
|
|
|
=head3 EXAMPLES FROM jobflow README
|
|
|
|
|
|
|
|
B<1> cat things.list | jobflow -threads=8 -exec ./mytask {}
|
|
|
|
|
|
|
|
B<1> cat things.list | parallel -j8 ./mytask {}
|
|
|
|
|
|
|
|
B<2> seq 100 | jobflow -threads=100 -exec echo {}
|
|
|
|
|
|
|
|
B<2> seq 100 | parallel -j100 echo {}
|
|
|
|
|
|
|
|
B<3> cat urls.txt | jobflow -threads=32 -exec wget {}
|
|
|
|
|
|
|
|
B<3> cat urls.txt | parallel -j32 wget {}
|
|
|
|
|
|
|
|
B<4> find . -name '*.bmp' | jobflow -threads=8 -exec bmp2jpeg {.}.bmp {.}.jpg
|
|
|
|
|
|
|
|
B<4> find . -name '*.bmp' | parallel -j8 bmp2jpeg {.}.bmp {.}.jpg
|
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
https://github.com/rofl0r/jobflow
|
|
|
|
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN gargs AND GNU Parallel
|
|
|
|
|
|
|
|
B<gargs> can run multiple jobs in parallel.
|
|
|
|
|
2018-05-08 21:16:48 +00:00
|
|
|
Older versions cache output in memory. This causes it to be extremely
|
|
|
|
slow when the output is larger than the physical RAM, and can cause
|
|
|
|
the system to run out of memory.
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
See more details on this in B<man parallel_design>.
|
|
|
|
|
2018-05-08 21:16:48 +00:00
|
|
|
Newer versions cache output in files, but leave files in $TMPDIR if it
|
|
|
|
is killed.
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
Output to stderr (standard error) is changed if the command fails.
|
|
|
|
|
|
|
|
Here are the two examples from B<gargs> website.
|
|
|
|
|
|
|
|
B<1> seq 12 -1 1 | gargs -p 4 -n 3 "sleep {0}; echo {1} {2}"
|
|
|
|
|
|
|
|
B<1> seq 12 -1 1 | parallel -P 4 -n 3 "sleep {1}; echo {2} {3}"
|
|
|
|
|
|
|
|
B<2> cat t.txt | gargs --sep "\s+" -p 2 "echo '{0}:{1}-{2}' full-line: \'{}\'"
|
|
|
|
|
|
|
|
B<2> cat t.txt | parallel --colsep "\\s+" -P 2 "echo '{1}:{2}-{3}' full-line: \'{}\'"
|
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
https://github.com/brentp/gargs
|
|
|
|
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN orgalorg AND GNU Parallel
|
|
|
|
|
|
|
|
B<orgalorg> can run the same job on multiple machines. This is related
|
|
|
|
to B<--onall> and B<--nonall>.
|
|
|
|
|
|
|
|
B<orgalorg> supports entering the SSH password - provided it is the
|
|
|
|
same for all servers. GNU B<parallel> advocates using B<ssh-agent>
|
|
|
|
instead, but it is possible to emulate B<orgalorg>'s behavior by
|
|
|
|
setting SSHPASS and by using B<--ssh "sshpass ssh">.
|
|
|
|
|
|
|
|
To make the emulation easier, make a simple alias:
|
|
|
|
|
2018-09-20 22:15:14 +00:00
|
|
|
alias par_emul="parallel -j0 --ssh 'sshpass ssh' --nonall --tag --lb"
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
If you want to supply a password run:
|
|
|
|
|
|
|
|
SSHPASS=`ssh-askpass`
|
|
|
|
|
|
|
|
or set the password directly:
|
|
|
|
|
|
|
|
SSHPASS=P4$$w0rd!
|
|
|
|
|
|
|
|
If the above is set up you can then do:
|
|
|
|
|
|
|
|
orgalorg -o frontend1 -o frontend2 -p -C uptime
|
|
|
|
par_emul -S frontend1 -S frontend2 uptime
|
|
|
|
|
|
|
|
orgalorg -o frontend1 -o frontend2 -p -C top -bid 1
|
|
|
|
par_emul -S frontend1 -S frontend2 top -bid 1
|
|
|
|
|
2018-09-20 22:15:14 +00:00
|
|
|
orgalorg -o frontend1 -o frontend2 -p -er /tmp -n \
|
|
|
|
'md5sum /tmp/bigfile' -S bigfile
|
|
|
|
par_emul -S frontend1 -S frontend2 --basefile bigfile --workdir /tmp \
|
|
|
|
md5sum /tmp/bigfile
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
B<orgalorg> has a progress indicator for the transferring of a
|
|
|
|
file. GNU B<parallel> does not.
|
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
https://github.com/reconquest/orgalorg
|
|
|
|
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN Rust parallel AND GNU Parallel
|
|
|
|
|
|
|
|
Rust parallel focuses on speed. It is almost as fast as B<xargs>. It
|
|
|
|
implements a few features from GNU B<parallel>, but lacks many
|
|
|
|
functions. All these fail:
|
|
|
|
|
|
|
|
# Read arguments from file
|
|
|
|
parallel -a file echo
|
|
|
|
# Changing the delimiter
|
|
|
|
parallel -d _ echo ::: a_b_c_
|
|
|
|
|
|
|
|
These do something different from GNU B<parallel>
|
|
|
|
|
|
|
|
# -q to protect quoted $ and space
|
|
|
|
parallel -q perl -e '$a=shift; print "$a"x10000000' ::: a b c
|
|
|
|
# Generation of combination of inputs
|
|
|
|
parallel echo {1} {2} ::: red green blue ::: S M L XL XXL
|
|
|
|
# {= perl expression =} replacement string
|
|
|
|
parallel echo '{= s/new/old/ =}' ::: my.new your.new
|
|
|
|
# --pipe
|
|
|
|
seq 100000 | parallel --pipe wc
|
|
|
|
# linked arguments
|
2018-09-20 22:15:14 +00:00
|
|
|
parallel echo ::: S M L :::+ sml med lrg ::: R G B :::+ red grn blu
|
2017-01-01 11:42:52 +00:00
|
|
|
# Run different shell dialects
|
|
|
|
zsh -c 'parallel echo \={} ::: zsh && true'
|
|
|
|
csh -c 'parallel echo \$\{\} ::: shell && true'
|
|
|
|
bash -c 'parallel echo \$\({}\) ::: pwd && true'
|
|
|
|
# Rust parallel does not start before the last argument is read
|
|
|
|
(seq 10; sleep 5; echo 2) | time parallel -j2 'sleep 2; echo'
|
|
|
|
tail -f /var/log/syslog | parallel echo
|
|
|
|
|
2018-10-22 22:46:38 +00:00
|
|
|
Most of the examples from the book GNU Parallel 2018 do not work, thus
|
|
|
|
Rust parallel is not close to being a compatible replacement.
|
|
|
|
|
2017-01-01 11:42:52 +00:00
|
|
|
Rust parallel has no remote facilities.
|
|
|
|
|
|
|
|
It uses /tmp/parallel for tmp files and does not clean up if
|
2018-08-22 22:21:29 +00:00
|
|
|
terminated abruptly. If another user on the system uses Rust parallel,
|
2017-01-01 11:42:52 +00:00
|
|
|
then /tmp/parallel will have the wrong permissions and Rust parallel
|
|
|
|
will fail. A malicious user can setup the right permissions and
|
|
|
|
symlink the output file to one of the user's files and next time the
|
|
|
|
user uses Rust parallel it will overwrite this file.
|
|
|
|
|
2018-09-20 22:15:14 +00:00
|
|
|
attacker$ mkdir /tmp/parallel
|
|
|
|
attacker$ chmod a+rwX /tmp/parallel
|
|
|
|
# Symlink to the file the attacker wants to zero out
|
|
|
|
attacker$ ln -s ~victim/.important-file /tmp/parallel/stderr_1
|
|
|
|
victim$ seq 1000 | parallel echo
|
|
|
|
# This file is now overwritten with stderr from 'echo'
|
|
|
|
victim$ cat ~victim/.important-file
|
|
|
|
|
2017-01-01 11:42:52 +00:00
|
|
|
If /tmp/parallel runs full during the run, Rust parallel does not
|
|
|
|
report this, but finishes with success - thereby risking data loss.
|
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
https://github.com/mmstick/parallel
|
|
|
|
|
2017-01-01 11:42:52 +00:00
|
|
|
|
2017-01-06 18:10:20 +00:00
|
|
|
=head2 DIFFERENCES BETWEEN Rush AND GNU Parallel
|
|
|
|
|
2017-05-11 20:57:46 +00:00
|
|
|
B<rush> (https://github.com/shenwei356/rush) is written in Go and
|
|
|
|
based on B<gargs>.
|
2017-01-06 18:10:20 +00:00
|
|
|
|
2017-10-16 13:41:54 +00:00
|
|
|
Just like GNU B<parallel> B<rush> buffers in temporary files. But
|
2017-05-11 20:57:46 +00:00
|
|
|
opposite GNU B<parallel> B<rush> does not clean up, if the process
|
|
|
|
dies abnormally.
|
|
|
|
|
|
|
|
B<rush> has some string manipulations that can be emulated by putting
|
2017-07-29 21:49:00 +00:00
|
|
|
this into ~/.parallel/config (/ is used instead of %, and % is used
|
|
|
|
instead of ^ as that is closer to bash's ${var%postfix}):
|
2017-04-14 22:00:21 +00:00
|
|
|
|
|
|
|
--rpl '{:} s:(\.[^/]+)*$::'
|
|
|
|
--rpl '{:%([^}]+?)} s:$$1(\.[^/]+)*$::'
|
|
|
|
--rpl '{/:%([^}]*?)} s:.*/(.*)$$1(\.[^/]+)*$:$1:'
|
|
|
|
--rpl '{/:} s:(.*/)?([^/.]+)(\.[^/]+)*$:$2:'
|
2017-07-29 21:49:00 +00:00
|
|
|
--rpl '{@(.*?)} /$$1/ and $_=$1;'
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-07-29 21:49:00 +00:00
|
|
|
Here are the examples from B<rush>'s website with the equivalent
|
|
|
|
command in GNU B<parallel>.
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-10-16 13:41:54 +00:00
|
|
|
=head3 EXAMPLES
|
|
|
|
|
|
|
|
B<1. Simple run, quoting is not necessary>
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-07-29 21:49:00 +00:00
|
|
|
$ seq 1 3 | rush echo {}
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-07-29 21:49:00 +00:00
|
|
|
$ seq 1 3 | parallel echo {}
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-10-16 13:41:54 +00:00
|
|
|
B<2. Read data from file (`-i`)>
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-07-29 21:49:00 +00:00
|
|
|
$ rush echo {} -i data1.txt -i data2.txt
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-07-29 21:49:00 +00:00
|
|
|
$ cat data1.txt data2.txt | parallel echo {}
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-10-16 13:41:54 +00:00
|
|
|
B<3. Keep output order (`-k`)>
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-07-29 21:49:00 +00:00
|
|
|
$ seq 1 3 | rush 'echo {}' -k
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-07-29 21:49:00 +00:00
|
|
|
$ seq 1 3 | parallel -k echo {}
|
2017-04-14 22:00:21 +00:00
|
|
|
|
|
|
|
|
2017-10-16 13:41:54 +00:00
|
|
|
B<4. Timeout (`-t`)>
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-07-29 21:49:00 +00:00
|
|
|
$ time seq 1 | rush 'sleep 2; echo {}' -t 1
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-07-29 21:49:00 +00:00
|
|
|
$ time seq 1 | parallel --timeout 1 'sleep 2; echo {}'
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-10-16 13:41:54 +00:00
|
|
|
B<5. Retry (`-r`)>
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-07-29 21:49:00 +00:00
|
|
|
$ seq 1 | rush 'python unexisted_script.py' -r 1
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-07-29 21:49:00 +00:00
|
|
|
$ seq 1 | parallel --retries 2 'python unexisted_script.py'
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-07-29 21:49:00 +00:00
|
|
|
Use B<-u> to see it is really run twice:
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-07-29 21:49:00 +00:00
|
|
|
$ seq 1 | parallel -u --retries 2 'python unexisted_script.py'
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-10-16 13:41:54 +00:00
|
|
|
B<6. Dirname (`{/}`) and basename (`{%}`) and remove custom
|
|
|
|
suffix (`{^suffix}`)>
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-07-29 21:49:00 +00:00
|
|
|
$ echo dir/file_1.txt.gz | rush 'echo {/} {%} {^_1.txt.gz}'
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-07-29 21:49:00 +00:00
|
|
|
$ echo dir/file_1.txt.gz |
|
|
|
|
parallel --plus echo {//} {/} {%_1.txt.gz}
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-10-16 13:41:54 +00:00
|
|
|
B<7. Get basename, and remove last (`{.}`) or any (`{:}`) extension>
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-07-29 21:49:00 +00:00
|
|
|
$ echo dir.d/file.txt.gz | rush 'echo {.} {:} {%.} {%:}'
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-07-29 21:49:00 +00:00
|
|
|
$ echo dir.d/file.txt.gz | parallel 'echo {.} {:} {/.} {/:}'
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-10-16 13:41:54 +00:00
|
|
|
B<8. Job ID, combine fields index and other replacement strings>
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-07-29 21:49:00 +00:00
|
|
|
$ echo 12 file.txt dir/s_1.fq.gz |
|
|
|
|
rush 'echo job {#}: {2} {2.} {3%:^_1}'
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-07-29 21:49:00 +00:00
|
|
|
$ echo 12 file.txt dir/s_1.fq.gz |
|
|
|
|
parallel --colsep ' ' 'echo job {#}: {2} {2.} {3/:%_1}'
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-10-16 13:41:54 +00:00
|
|
|
B<9. Capture submatch using regular expression (`{@regexp}`)>
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-07-29 21:49:00 +00:00
|
|
|
$ echo read_1.fq.gz | rush 'echo {@(.+)_\d}'
|
|
|
|
|
|
|
|
$ echo read_1.fq.gz | parallel 'echo {@(.+)_\d}'
|
|
|
|
|
2017-10-16 13:41:54 +00:00
|
|
|
B<10. Custom field delimiter (`-d`)>
|
2017-07-29 21:49:00 +00:00
|
|
|
|
|
|
|
$ echo a=b=c | rush 'echo {1} {2} {3}' -d =
|
|
|
|
|
|
|
|
$ echo a=b=c | parallel -d = echo {1} {2} {3}
|
|
|
|
|
2017-10-16 13:41:54 +00:00
|
|
|
B<11. Send multi-lines to every command (`-n`)>
|
2017-07-29 21:49:00 +00:00
|
|
|
|
|
|
|
$ seq 5 | rush -n 2 -k 'echo "{}"; echo'
|
|
|
|
|
|
|
|
$ seq 5 |
|
|
|
|
parallel -n 2 -k \
|
|
|
|
'echo {=-1 $_=join"\n",@arg[1..$#arg] =}; echo'
|
|
|
|
|
|
|
|
$ seq 5 | rush -n 2 -k 'echo "{}"; echo' -J ' '
|
|
|
|
|
|
|
|
$ seq 5 | parallel -n 2 -k 'echo {}; echo'
|
|
|
|
|
|
|
|
|
2017-10-16 13:41:54 +00:00
|
|
|
B<12. Custom record delimiter (`-D`), note that empty records are not used.>
|
2017-07-29 21:49:00 +00:00
|
|
|
|
|
|
|
$ echo a b c d | rush -D " " -k 'echo {}'
|
|
|
|
|
|
|
|
$ echo a b c d | parallel -d " " -k 'echo {}'
|
|
|
|
|
|
|
|
$ echo abcd | rush -D "" -k 'echo {}'
|
|
|
|
|
|
|
|
Cannot be done by GNU Parallel
|
|
|
|
|
|
|
|
$ cat fasta.fa
|
|
|
|
>seq1
|
|
|
|
tag
|
|
|
|
>seq2
|
|
|
|
cat
|
|
|
|
gat
|
|
|
|
>seq3
|
|
|
|
attac
|
|
|
|
a
|
|
|
|
cat
|
|
|
|
|
|
|
|
$ cat fasta.fa | rush -D ">" \
|
|
|
|
'echo FASTA record {#}: name: {1} sequence: {2}' -k -d "\n"
|
|
|
|
# rush fails to join the multiline sequences
|
|
|
|
|
|
|
|
$ cat fasta.fa | (read -n1 ignore_first_char;
|
|
|
|
parallel -d '>' --colsep '\n' echo FASTA record {#}: \
|
|
|
|
name: {1} sequence: '{=2 $_=join"",@arg[2..$#arg]=}'
|
|
|
|
)
|
|
|
|
|
2017-10-16 13:41:54 +00:00
|
|
|
B<13. Assign value to variable, like `awk -v` (`-v`)>
|
2017-07-29 21:49:00 +00:00
|
|
|
|
|
|
|
$ seq 1 |
|
|
|
|
rush 'echo Hello, {fname} {lname}!' -v fname=Wei -v lname=Shen
|
|
|
|
|
|
|
|
$ seq 1 |
|
|
|
|
parallel -N0 \
|
|
|
|
'fname=Wei; lname=Shen; echo Hello, ${fname} ${lname}!'
|
|
|
|
|
|
|
|
$ for var in a b; do \
|
|
|
|
$ seq 1 3 | rush -k -v var=$var 'echo var: {var}, data: {}'; \
|
|
|
|
$ done
|
|
|
|
|
|
|
|
In GNU B<parallel> you would typically do:
|
|
|
|
|
|
|
|
$ seq 1 3 | parallel -k echo var: {1}, data: {2} ::: a b :::: -
|
|
|
|
|
|
|
|
If you I<really> want the var:
|
|
|
|
|
|
|
|
$ seq 1 3 |
|
|
|
|
parallel -k var={1} ';echo var: $var, data: {}' ::: a b :::: -
|
|
|
|
|
|
|
|
If you I<really> want the B<for>-loop:
|
|
|
|
|
|
|
|
$ for var in a b; do
|
|
|
|
> export var;
|
|
|
|
> seq 1 3 | parallel -k 'echo var: $var, data: {}';
|
|
|
|
> done
|
|
|
|
|
|
|
|
Contrary to B<rush> this also works if the value is complex like:
|
|
|
|
|
|
|
|
My brother's 12" records
|
|
|
|
|
|
|
|
|
2017-10-16 13:41:54 +00:00
|
|
|
B<14. B<Preset variable> (`-v`), avoid repeatedly writing verbose replacement strings>
|
2017-07-29 21:49:00 +00:00
|
|
|
|
|
|
|
# naive way
|
|
|
|
$ echo read_1.fq.gz | rush 'echo {:^_1} {:^_1}_2.fq.gz'
|
|
|
|
|
|
|
|
$ echo read_1.fq.gz | parallel 'echo {:%_1} {:%_1}_2.fq.gz'
|
|
|
|
|
|
|
|
# macro + removing suffix
|
|
|
|
$ echo read_1.fq.gz |
|
|
|
|
rush -v p='{:^_1}' 'echo {p} {p}_2.fq.gz'
|
|
|
|
|
|
|
|
$ echo read_1.fq.gz |
|
|
|
|
parallel 'p={:%_1}; echo $p ${p}_2.fq.gz'
|
|
|
|
|
|
|
|
# macro + regular expression
|
|
|
|
$ echo read_1.fq.gz | rush -v p='{@(.+?)_\d}' 'echo {p} {p}_2.fq.gz'
|
|
|
|
|
|
|
|
$ echo read_1.fq.gz | parallel 'p={@(.+?)_\d}; echo $p ${p}_2.fq.gz'
|
|
|
|
|
|
|
|
Contrary to B<rush> GNU B<parallel> works with complex values:
|
|
|
|
|
|
|
|
echo "My brother's 12\"read_1.fq.gz" |
|
|
|
|
parallel 'p={@(.+?)_\d}; echo $p ${p}_2.fq.gz'
|
|
|
|
|
2017-10-16 13:41:54 +00:00
|
|
|
B<15. Interrupt jobs by `Ctrl-C`, rush will stop unfinished commands and exit.>
|
2017-07-29 21:49:00 +00:00
|
|
|
|
|
|
|
$ seq 1 20 | rush 'sleep 1; echo {}'
|
|
|
|
^C
|
|
|
|
|
|
|
|
$ seq 1 20 | parallel 'sleep 1; echo {}'
|
|
|
|
^C
|
|
|
|
|
2017-10-16 13:41:54 +00:00
|
|
|
B<16. Continue/resume jobs (`-c`). When some jobs failed (by
|
2018-08-22 22:21:29 +00:00
|
|
|
execution failure, timeout, or canceling by user with `Ctrl + C`),
|
2017-07-29 21:49:00 +00:00
|
|
|
please switch flag `-c/--continue` on and run again, so that `rush`
|
2017-10-16 13:41:54 +00:00
|
|
|
can save successful commands and ignore them in I<NEXT> run.>
|
2017-07-29 21:49:00 +00:00
|
|
|
|
|
|
|
$ seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c
|
|
|
|
$ cat successful_cmds.rush
|
|
|
|
$ seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c
|
|
|
|
|
|
|
|
$ seq 1 3 | parallel --joblog mylog --timeout 2 \
|
|
|
|
'sleep {}; echo {}'
|
|
|
|
$ cat mylog
|
|
|
|
$ seq 1 3 | parallel --joblog mylog --retry-failed \
|
|
|
|
'sleep {}; echo {}'
|
|
|
|
|
|
|
|
Multi-line jobs:
|
|
|
|
|
|
|
|
$ seq 1 3 | rush 'sleep {}; echo {}; \
|
|
|
|
echo finish {}' -t 3 -c -C finished.rush
|
|
|
|
$ cat finished.rush
|
|
|
|
$ seq 1 3 | rush 'sleep {}; echo {}; \
|
|
|
|
echo finish {}' -t 3 -c -C finished.rush
|
|
|
|
|
|
|
|
$ seq 1 3 |
|
|
|
|
parallel --joblog mylog --timeout 2 'sleep {}; echo {}; \
|
|
|
|
echo finish {}'
|
|
|
|
$ cat mylog
|
|
|
|
$ seq 1 3 |
|
|
|
|
parallel --joblog mylog --retry-failed 'sleep {}; echo {}; \
|
|
|
|
echo finish {}'
|
|
|
|
|
2017-10-16 13:41:54 +00:00
|
|
|
B<17. A comprehensive example: downloading 1K+ pages given by
|
2017-07-29 21:49:00 +00:00
|
|
|
three URL list files using `phantomjs save_page.js` (some page
|
2018-08-22 22:21:29 +00:00
|
|
|
contents are dynamically generated by Javascript, so `wget` does not
|
2017-07-29 21:49:00 +00:00
|
|
|
work). Here I set max jobs number (`-j`) as `20`, each job has a max
|
|
|
|
running time (`-t`) of `60` seconds and `3` retry changes
|
|
|
|
(`-r`). Continue flag `-c` is also switched on, so we can continue
|
2017-10-16 13:41:54 +00:00
|
|
|
unfinished jobs. Luckily, it's accomplished in one run :)>
|
2017-07-29 21:49:00 +00:00
|
|
|
|
|
|
|
$ for f in $(seq 2014 2016); do \
|
|
|
|
$ /bin/rm -rf $f; mkdir -p $f; \
|
|
|
|
$ cat $f.html.txt | rush -v d=$f -d = \
|
|
|
|
'phantomjs save_page.js "{}" > {d}/{3}.html' \
|
|
|
|
-j 20 -t 60 -r 3 -c; \
|
|
|
|
$ done
|
|
|
|
|
|
|
|
GNU B<parallel> can append to an existing joblog with '+':
|
|
|
|
|
|
|
|
$ rm mylog
|
|
|
|
$ for f in $(seq 2014 2016); do
|
|
|
|
/bin/rm -rf $f; mkdir -p $f;
|
|
|
|
cat $f.html.txt |
|
|
|
|
parallel -j20 --timeout 60 --retries 4 --joblog +mylog \
|
|
|
|
--colsep = \
|
|
|
|
phantomjs save_page.js {1}={2}={3} '>' $f/{3}.html
|
|
|
|
done
|
|
|
|
|
2017-10-16 13:41:54 +00:00
|
|
|
B<18. A bioinformatics example: mapping with `bwa`, and
|
|
|
|
processing result with `samtools`:>
|
2017-07-29 21:49:00 +00:00
|
|
|
|
|
|
|
$ ref=ref/xxx.fa
|
|
|
|
$ threads=25
|
|
|
|
$ ls -d raw.cluster.clean.mapping/* \
|
|
|
|
| rush -v ref=$ref -v j=$threads -v p='{}/{%}' \
|
|
|
|
'bwa mem -t {j} -M -a {ref} {p}_1.fq.gz {p}_2.fq.gz > {p}.sam; \
|
|
|
|
samtools view -bS {p}.sam > {p}.bam; \
|
|
|
|
samtools sort -T {p}.tmp -@ {j} {p}.bam -o {p}.sorted.bam; \
|
|
|
|
samtools index {p}.sorted.bam; \
|
|
|
|
samtools flagstat {p}.sorted.bam > {p}.sorted.bam.flagstat; \
|
|
|
|
/bin/rm {p}.bam {p}.sam;' \
|
|
|
|
-j 2 --verbose -c -C mapping.rush
|
|
|
|
|
|
|
|
GNU B<parallel> would use a function:
|
|
|
|
|
|
|
|
$ ref=ref/xxx.fa
|
|
|
|
$ export ref
|
|
|
|
$ thr=25
|
|
|
|
$ export thr
|
|
|
|
$ bwa_sam() {
|
|
|
|
p="$1"
|
|
|
|
bam="$p".bam
|
|
|
|
sam="$p".sam
|
|
|
|
sortbam="$p".sorted.bam
|
|
|
|
bwa mem -t $thr -M -a $ref ${p}_1.fq.gz ${p}_2.fq.gz > "$sam"
|
|
|
|
samtools view -bS "$sam" > "$bam"
|
|
|
|
samtools sort -T ${p}.tmp -@ $thr "$bam" -o "$sortbam"
|
|
|
|
samtools index "$sortbam"
|
|
|
|
samtools flagstat "$sortbam" > "$sortbam".flagstat
|
|
|
|
/bin/rm "$bam" "$sam"
|
|
|
|
}
|
|
|
|
$ export -f bwa_sam
|
|
|
|
$ ls -d raw.cluster.clean.mapping/* |
|
|
|
|
parallel -j 2 --verbose --joblog mylog bwa_sam
|
|
|
|
|
|
|
|
=head3 Other B<rush> features
|
2017-04-14 22:00:21 +00:00
|
|
|
|
2017-05-11 20:57:46 +00:00
|
|
|
B<rush> has:
|
|
|
|
|
|
|
|
=over 4
|
|
|
|
|
|
|
|
=item * B<awk -v> like custom defined variables (B<-v>)
|
|
|
|
|
2018-07-08 18:45:39 +00:00
|
|
|
With GNU B<parallel> you would simply set a shell variable:
|
2017-05-11 20:57:46 +00:00
|
|
|
|
|
|
|
parallel 'v={}; echo "$v"' ::: foo
|
|
|
|
echo foo | rush -v v={} 'echo {v}'
|
|
|
|
|
2017-10-16 13:41:54 +00:00
|
|
|
Also B<rush> does not like special chars. So these B<do not work>:
|
2017-05-11 20:57:46 +00:00
|
|
|
|
|
|
|
echo does not work | rush -v v=\" 'echo {v}'
|
|
|
|
echo "My brother's 12\" records" | rush -v v={} 'echo {v}'
|
|
|
|
|
|
|
|
Whereas the corresponding GNU B<parallel> version works:
|
|
|
|
|
|
|
|
parallel 'v=\"; echo "$v"' ::: works
|
|
|
|
parallel 'v={}; echo "$v"' ::: "My brother's 12\" records"
|
|
|
|
|
|
|
|
=item * Exit on first error(s) (-e)
|
|
|
|
|
|
|
|
This is called B<--halt now,fail=1> (or shorter: B<--halt 2>) when
|
|
|
|
used with GNU B<parallel>.
|
|
|
|
|
|
|
|
=item * Settable records sending to every command (B<-n>, default 1)
|
|
|
|
|
|
|
|
This is also called B<-n> in GNU B<parallel>.
|
|
|
|
|
|
|
|
=item * Practical replacement strings
|
|
|
|
|
|
|
|
=over 4
|
|
|
|
|
|
|
|
=item {:} remove any extension
|
|
|
|
|
|
|
|
With GNU B<parallel> this can be emulated by:
|
|
|
|
|
|
|
|
parallel --plus echo '{/\..*/}' ::: foo.ext.bar.gz
|
|
|
|
|
|
|
|
=item {^suffix}, remove suffix
|
|
|
|
|
|
|
|
With GNU B<parallel> this can be emulated by:
|
|
|
|
|
|
|
|
parallel --plus echo '{%.bar.gz}' ::: foo.ext.bar.gz
|
|
|
|
|
2017-07-29 21:49:00 +00:00
|
|
|
=item {@regexp}, capture submatch using regular expression
|
|
|
|
|
|
|
|
With GNU B<parallel> this can be emulated by:
|
|
|
|
|
|
|
|
parallel --rpl '{@(.*?)} /$$1/ and $_=$1;' \
|
|
|
|
echo '{@\d_(.*).gz}' ::: 1_foo.gz
|
|
|
|
|
2017-05-11 20:57:46 +00:00
|
|
|
=item {%.}, {%:}, basename without extension
|
|
|
|
|
|
|
|
With GNU B<parallel> this can be emulated by:
|
|
|
|
|
|
|
|
parallel echo '{= s:.*/::;s/\..*// =}' ::: dir/foo.bar.gz
|
|
|
|
|
|
|
|
And if you need it often, you define a B<--rpl> in
|
|
|
|
B<$HOME/.parallel/config>:
|
|
|
|
|
|
|
|
--rpl '{%.} s:.*/::;s/\..*//'
|
|
|
|
--rpl '{%:} s:.*/::;s/\..*//'
|
|
|
|
|
|
|
|
Then you can use them as:
|
|
|
|
|
|
|
|
parallel echo {%.} {%:} ::: dir/foo.bar.gz
|
|
|
|
|
|
|
|
=back
|
|
|
|
|
|
|
|
=item * Preset variable (macro)
|
|
|
|
|
|
|
|
E.g.
|
|
|
|
|
|
|
|
echo foosuffix | rush -v p={^suffix} 'echo {p}_new_suffix'
|
|
|
|
|
|
|
|
With GNU B<parallel> this can be emulated by:
|
|
|
|
|
2018-09-20 22:15:14 +00:00
|
|
|
echo foosuffix |
|
|
|
|
parallel --plus 'p={%suffix}; echo ${p}_new_suffix'
|
2017-05-11 20:57:46 +00:00
|
|
|
|
|
|
|
Opposite B<rush> GNU B<parallel> works fine if the input contains
|
|
|
|
double space, ' and ":
|
|
|
|
|
|
|
|
echo "1'6\" foosuffix" |
|
|
|
|
parallel --plus 'p={%suffix}; echo "${p}"_new_suffix'
|
|
|
|
|
|
|
|
|
|
|
|
=item * Commands of multi-lines
|
|
|
|
|
2017-07-29 21:49:00 +00:00
|
|
|
While you I<can> use multi-lined commands in GNU B<parallel>, to
|
2018-08-22 22:21:29 +00:00
|
|
|
improve readability GNU B<parallel> discourages the use of multi-line
|
2017-05-11 20:57:46 +00:00
|
|
|
commands. In most cases it can be written as a function:
|
|
|
|
|
2018-04-20 21:29:44 +00:00
|
|
|
seq 1 3 |
|
|
|
|
parallel --timeout 2 --joblog my.log 'sleep {}; echo {}; \
|
|
|
|
echo finish {}'
|
2017-05-11 20:57:46 +00:00
|
|
|
|
|
|
|
Could be written as:
|
|
|
|
|
|
|
|
doit() {
|
|
|
|
sleep "$1"
|
|
|
|
echo "$1"
|
|
|
|
echo finish "$1"
|
|
|
|
}
|
|
|
|
export -f doit
|
|
|
|
seq 1 3 | parallel --timeout 2 --joblog my.log doit
|
|
|
|
|
|
|
|
The failed commands can be resumed with:
|
|
|
|
|
|
|
|
seq 1 3 |
|
|
|
|
parallel --resume-failed --joblog my.log 'sleep {}; echo {};\
|
2018-04-20 21:29:44 +00:00
|
|
|
echo finish {}'
|
2017-05-11 20:57:46 +00:00
|
|
|
|
|
|
|
=back
|
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
https://github.com/shenwei356/rush
|
|
|
|
|
|
|
|
|
2017-01-01 11:42:52 +00:00
|
|
|
=head2 DIFFERENCES BETWEEN ClusterSSH AND GNU Parallel
|
|
|
|
|
|
|
|
ClusterSSH solves a different problem than GNU B<parallel>.
|
|
|
|
|
|
|
|
ClusterSSH opens a terminal window for each computer and using a
|
|
|
|
master window you can run the same command on all the computers. This
|
|
|
|
is typically used for administrating several computers that are almost
|
|
|
|
identical.
|
|
|
|
|
|
|
|
GNU B<parallel> runs the same (or different) commands with different
|
|
|
|
arguments in parallel possibly using remote computers to help
|
|
|
|
computing. If more than one computer is listed in B<-S> GNU B<parallel> may
|
|
|
|
only use one of these (e.g. if there are 8 jobs to be run and one
|
|
|
|
computer has 8 cores).
|
|
|
|
|
|
|
|
GNU B<parallel> can be used as a poor-man's version of ClusterSSH:
|
|
|
|
|
|
|
|
B<parallel --nonall -S server-a,server-b do_stuff foo bar>
|
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
https://github.com/duncs/clusterssh
|
|
|
|
|
2017-10-16 13:41:54 +00:00
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN coshell AND GNU Parallel
|
2017-10-09 22:33:43 +00:00
|
|
|
|
|
|
|
B<coshell> only accepts full commands on standard input. Any quoting
|
|
|
|
needs to be done by the user.
|
|
|
|
|
|
|
|
Commands are run in B<sh> so any B<bash>/B<tcsh>/B<zsh> specific
|
|
|
|
syntax will not work.
|
|
|
|
|
|
|
|
Output can be buffered by using B<-d>. Output is buffered in memory,
|
|
|
|
so big output can cause swapping and therefore be terrible slow or
|
|
|
|
even cause out of memory.
|
|
|
|
|
2019-01-21 02:16:59 +00:00
|
|
|
https://github.com/gdm85/coshell (Last checked: 2019-01)
|
2018-03-21 21:57:28 +00:00
|
|
|
|
|
|
|
|
2017-11-22 22:29:03 +00:00
|
|
|
=head2 DIFFERENCES BETWEEN spread AND GNU Parallel
|
|
|
|
|
|
|
|
B<spread> runs commands on all directories.
|
|
|
|
|
|
|
|
It can be emulated with GNU B<parallel> using this Bash function:
|
|
|
|
|
|
|
|
spread() {
|
|
|
|
_cmds() {
|
|
|
|
perl -e '$"=" && ";print "@ARGV"' "cd {}" "$@"
|
|
|
|
}
|
|
|
|
parallel $(_cmds "$@")'|| echo exit status $?' ::: */
|
|
|
|
}
|
|
|
|
|
2018-08-22 22:21:29 +00:00
|
|
|
This works except for the B<--exclude> option.
|
2017-01-01 11:42:52 +00:00
|
|
|
|
2019-01-18 15:15:47 +00:00
|
|
|
(Last checked: 2017-11)
|
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
|
2017-12-26 14:46:00 +00:00
|
|
|
=head2 DIFFERENCES BETWEEN pyargs AND GNU Parallel
|
|
|
|
|
|
|
|
B<pyargs> deals badly with input containing spaces. It buffers stdout,
|
|
|
|
but not stderr. It buffers in RAM. {} does not work as replacement
|
|
|
|
string. It does not support running functions.
|
|
|
|
|
2019-01-29 05:58:12 +00:00
|
|
|
B<pyargs> does not support composed commands if run with B<--lines>,
|
|
|
|
and fails on B<pyargs traceroute gnu.org fsf.org>.
|
2017-12-26 14:46:00 +00:00
|
|
|
|
|
|
|
=head3 Examples
|
|
|
|
|
|
|
|
seq 5 | pyargs -P50 -L seq
|
|
|
|
seq 5 | parallel -P50 --lb seq
|
|
|
|
|
|
|
|
seq 5 | pyargs -P50 --mark -L seq
|
|
|
|
seq 5 | parallel -P50 --lb \
|
|
|
|
--tagstring OUTPUT'[{= $_=$job->replaced()=}]' seq
|
|
|
|
# Similar, but not precisely the same
|
|
|
|
seq 5 | parallel -P50 --lb --tag seq
|
|
|
|
|
|
|
|
seq 5 | pyargs -P50 --mark command
|
|
|
|
# Somewhat longer with GNU Parallel due to the special
|
|
|
|
# --mark formatting
|
|
|
|
cmd="$(echo "command" | parallel --shellquote)"
|
|
|
|
wrap_cmd() {
|
|
|
|
echo "MARK $cmd $@================================" >&3
|
|
|
|
echo "OUTPUT START[$cmd $@]:"
|
|
|
|
eval $cmd "$@"
|
|
|
|
echo "OUTPUT END[$cmd $@]"
|
|
|
|
}
|
|
|
|
(seq 5 | env_parallel -P2 wrap_cmd) 3>&1
|
|
|
|
# Similar, but not exactly the same
|
|
|
|
seq 5 | parallel -t --tag command
|
|
|
|
|
|
|
|
(echo '1 2 3';echo 4 5 6) | pyargs --stream seq
|
|
|
|
(echo '1 2 3';echo 4 5 6) | perl -pe 's/\n/ /' |
|
|
|
|
parallel -r -d' ' seq
|
|
|
|
# Similar, but not exactly the same
|
|
|
|
parallel seq ::: 1 2 3 4 5 6
|
|
|
|
|
2019-01-21 02:16:59 +00:00
|
|
|
https://github.com/robertblackwell/pyargs (Last checked: 2019-01)
|
2018-03-21 21:57:28 +00:00
|
|
|
|
2017-12-26 14:46:00 +00:00
|
|
|
|
2018-01-25 02:48:16 +00:00
|
|
|
=head2 DIFFERENCES BETWEEN concurrently AND GNU Parallel
|
|
|
|
|
|
|
|
B<concurrently> runs jobs in parallel.
|
|
|
|
|
|
|
|
The output is prepended with the job number, and may be incomplete:
|
|
|
|
|
|
|
|
$ concurrently 'seq 100000' | (sleep 3;wc -l)
|
|
|
|
7165
|
|
|
|
|
|
|
|
When pretty printing it caches output in memory. Output mixes by using
|
2018-08-22 22:21:29 +00:00
|
|
|
test MIX below whether or not output is cached.
|
2018-01-25 02:48:16 +00:00
|
|
|
|
|
|
|
There seems to be no way of making a template command and have
|
|
|
|
B<concurrently> fill that with different args. The full commands must
|
|
|
|
be given on the command line.
|
|
|
|
|
|
|
|
There is also no way of controlling how many jobs should be run in
|
|
|
|
parallel at a time - i.e. "number of jobslots". Instead all jobs are
|
|
|
|
simply started in parallel.
|
|
|
|
|
2019-01-18 15:15:47 +00:00
|
|
|
https://github.com/kimmobrunfeldt/concurrently (Last checked: 2019-01)
|
2018-03-21 21:57:28 +00:00
|
|
|
|
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN map(soveran) AND GNU Parallel
|
|
|
|
|
|
|
|
B<map> does not run jobs in parallel by default. The README suggests using:
|
|
|
|
|
|
|
|
... | map t 'sleep $t && say done &'
|
|
|
|
|
|
|
|
But this fails if more jobs are run in parallel than the number of
|
|
|
|
available processes. Since there is no support for parallelization in
|
|
|
|
B<map> itself, the output also mixes:
|
|
|
|
|
|
|
|
seq 10 | map i 'echo start-$i && sleep 0.$i && echo end-$i &'
|
|
|
|
|
2018-08-22 22:21:29 +00:00
|
|
|
The major difference is that GNU B<parallel> is built for parallelization
|
2019-01-18 15:15:47 +00:00
|
|
|
and B<map> is not. So GNU B<parallel> has lots of ways of dealing with the
|
2018-03-21 21:57:28 +00:00
|
|
|
issues that parallelization raises:
|
|
|
|
|
|
|
|
=over 4
|
|
|
|
|
|
|
|
=item *
|
|
|
|
|
|
|
|
Keep the number of processes manageable
|
|
|
|
|
|
|
|
=item *
|
|
|
|
|
|
|
|
Make sure output does not mix
|
|
|
|
|
|
|
|
=item *
|
|
|
|
|
|
|
|
Make Ctrl-C kill all running processes
|
|
|
|
|
|
|
|
=back
|
|
|
|
|
|
|
|
Here are the 5 examples converted to GNU Parallel:
|
|
|
|
|
|
|
|
1$ ls *.c | map f 'foo $f'
|
|
|
|
1$ ls *.c | parallel foo
|
|
|
|
|
|
|
|
2$ ls *.c | map f 'foo $f; bar $f'
|
|
|
|
2$ ls *.c | parallel 'foo {}; bar {}'
|
|
|
|
|
|
|
|
3$ cat urls | map u 'curl -O $u'
|
|
|
|
3$ cat urls | parallel curl -O
|
|
|
|
|
|
|
|
4$ printf "1\n1\n1\n" | map t 'sleep $t && say done'
|
|
|
|
4$ printf "1\n1\n1\n" | parallel 'sleep {} && say done'
|
2018-08-22 22:21:29 +00:00
|
|
|
4$ parallel 'sleep {} && say done' ::: 1 1 1
|
2018-03-21 21:57:28 +00:00
|
|
|
|
|
|
|
5$ printf "1\n1\n1\n" | map t 'sleep $t && say done &'
|
|
|
|
5$ printf "1\n1\n1\n" | parallel -j0 'sleep {} && say done'
|
|
|
|
5$ parallel -j0 'sleep {} && say done' ::: 1 1 1
|
|
|
|
|
2019-01-18 15:15:47 +00:00
|
|
|
https://github.com/soveran/map (Last checked: 2019-01)
|
2018-03-21 21:57:28 +00:00
|
|
|
|
2018-01-25 02:48:16 +00:00
|
|
|
|
2018-10-22 22:46:38 +00:00
|
|
|
=head2 DIFFERENCES BETWEEN loop AND GNU Parallel
|
|
|
|
|
|
|
|
B<loop> mixes stdout and stderr:
|
|
|
|
|
|
|
|
loop 'ls /no-such-file' >/dev/null
|
|
|
|
|
|
|
|
B<loop>'s replacement string B<$ITEM> does not quote strings:
|
|
|
|
|
|
|
|
echo 'two spaces' | loop 'echo $ITEM'
|
|
|
|
|
|
|
|
B<loop> cannot run functions:
|
|
|
|
|
|
|
|
myfunc() { echo joe; }
|
|
|
|
export -f myfunc
|
|
|
|
loop 'myfunc this fails'
|
|
|
|
|
|
|
|
Some of the examples from https://github.com/Miserlou/Loop/ can be
|
|
|
|
emulated with GNU B<parallel>:
|
|
|
|
|
|
|
|
# A couple of functions will make the code easier to read
|
|
|
|
$ loopy() {
|
|
|
|
yes | parallel -uN0 -j1 "$@"
|
|
|
|
}
|
|
|
|
$ export -f loopy
|
|
|
|
$ time_out() {
|
|
|
|
parallel -uN0 -q --timeout "$@" ::: 1
|
|
|
|
}
|
|
|
|
$ match() {
|
|
|
|
perl -0777 -ne 'grep /'"$1"'/,$_ and print or exit 1'
|
|
|
|
}
|
|
|
|
$ export -f match
|
|
|
|
|
|
|
|
$ loop 'ls' --every 10s
|
|
|
|
$ loopy --delay 10s ls
|
|
|
|
|
|
|
|
$ loop 'touch $COUNT.txt' --count-by 5
|
|
|
|
$ loopy touch '{= $_=seq()*5 =}'.txt
|
|
|
|
|
|
|
|
$ loop --until-contains 200 -- ./get_response_code.sh --site mysite.biz`
|
|
|
|
$ loopy --halt now,success=1 './get_response_code.sh --site mysite.biz |
|
|
|
|
match 200'
|
|
|
|
|
|
|
|
$ loop './poke_server' --for-duration 8h
|
|
|
|
$ time_out 8h loopy ./poke_server
|
|
|
|
|
|
|
|
$ loop './poke_server' --until-success
|
|
|
|
$ loopy --halt now,success=1 ./poke_server
|
|
|
|
|
|
|
|
$ cat files_to_create.txt | loop 'touch $ITEM'
|
|
|
|
$ cat files_to_create.txt | parallel touch {}
|
|
|
|
|
|
|
|
$ loop 'ls' --for-duration 10min --summary
|
|
|
|
# --joblog is somewhat more verbose than --summary
|
|
|
|
$ time_out 10m loopy --joblog my.log ./poke_server; cat my.log
|
|
|
|
|
|
|
|
$ loop 'echo hello'
|
|
|
|
$ loopy echo hello
|
|
|
|
|
|
|
|
$ loop 'echo $COUNT'
|
|
|
|
# GNU Parallel counts from 1
|
|
|
|
$ loopy echo {#}
|
|
|
|
# Counting from 0 can be forced
|
|
|
|
$ loopy echo '{= $_=seq()-1 =}'
|
|
|
|
|
|
|
|
$ loop 'echo $COUNT' --count-by 2
|
|
|
|
$ loopy echo '{= $_=2*(seq()-1) =}'
|
|
|
|
|
|
|
|
$ loop 'echo $COUNT' --count-by 2 --offset 10
|
|
|
|
$ loopy echo '{= $_=10+2*(seq()-1) =}'
|
|
|
|
|
|
|
|
$ loop 'echo $COUNT' --count-by 1.1
|
|
|
|
# GNU Parallel rounds 3.3000000000000003 to 3.3
|
|
|
|
$ loopy echo '{= $_=1.1*(seq()-1) =}'
|
|
|
|
|
|
|
|
$ loop 'echo $COUNT $ACTUALCOUNT' --count-by 2
|
|
|
|
$ loopy echo '{= $_=2*(seq()-1) =} {#}'
|
|
|
|
|
|
|
|
$ loop 'echo $COUNT' --num 3 --summary
|
|
|
|
# --joblog is somewhat more verbose than --summary
|
|
|
|
$ seq 3 | parallel --joblog my.log echo; cat my.log
|
|
|
|
|
|
|
|
$ loop 'ls -foobarbatz' --num 3 --summary
|
|
|
|
# --joblog is somewhat more verbose than --summary
|
|
|
|
$ seq 3 | parallel --joblog my.log -N0 ls -foobarbatz; cat my.log
|
|
|
|
|
|
|
|
$ loop 'echo $COUNT' --count-by 2 --num 50 --only-last
|
|
|
|
# Can be emulated by running 2 jobs
|
|
|
|
$ seq 49 | parallel echo '{= $_=2*(seq()-1) =}' >/dev/null
|
|
|
|
$ echo 50| parallel echo '{= $_=2*(seq()-1) =}'
|
|
|
|
|
|
|
|
$ loop 'date' --every 5s
|
|
|
|
$ loopy --delay 5s date
|
|
|
|
|
|
|
|
$ loop 'date' --for-duration 8s --every 2s
|
|
|
|
$ time_out 8s loopy --delay 2s date
|
|
|
|
|
|
|
|
$ loop 'date -u' --until-time '2018-05-25 20:50:00' --every 5s
|
|
|
|
$ seconds=$((`date -d 2019-05-25T20:50:00 +%s` - `date +%s`))s
|
|
|
|
$ time_out $seconds loopy --delay 5s date -u
|
|
|
|
|
|
|
|
$ loop 'echo $RANDOM' --until-contains "666"
|
|
|
|
$ loopy --halt now,success=1 'echo $RANDOM | match 666'
|
|
|
|
|
|
|
|
$ loop 'if (( RANDOM % 2 )); then
|
|
|
|
(echo "TRUE"; true);
|
|
|
|
else
|
|
|
|
(echo "FALSE"; false);
|
|
|
|
fi' --until-success
|
|
|
|
$ loopy --halt now,success=1 'if (( $RANDOM % 2 )); then
|
|
|
|
(echo "TRUE"; true);
|
|
|
|
else
|
|
|
|
(echo "FALSE"; false);
|
|
|
|
fi'
|
|
|
|
|
|
|
|
$ loop 'if (( RANDOM % 2 )); then
|
|
|
|
(echo "TRUE"; true);
|
|
|
|
else
|
|
|
|
(echo "FALSE"; false);
|
|
|
|
fi' --until-error
|
|
|
|
$ loopy --halt now,fail=1 'if (( $RANDOM % 2 )); then
|
|
|
|
(echo "TRUE"; true);
|
|
|
|
else
|
|
|
|
(echo "FALSE"; false);
|
|
|
|
fi'
|
|
|
|
|
|
|
|
$ loop 'date' --until-match "(\d{4})"
|
|
|
|
$ loopy --halt now,success=1 'date | match [0-9][0-9][0-9][0-9]'
|
|
|
|
|
|
|
|
$ loop 'echo $ITEM' --for red,green,blue
|
|
|
|
$ parallel echo ::: red green blue
|
|
|
|
|
|
|
|
$ cat /tmp/my-list-of-files-to-create.txt | loop 'touch $ITEM'
|
|
|
|
$ cat /tmp/my-list-of-files-to-create.txt | parallel touch
|
|
|
|
|
|
|
|
$ ls | loop 'cp $ITEM $ITEM.bak'; ls
|
|
|
|
$ ls | parallel cp {} {}.bak; ls
|
|
|
|
|
|
|
|
$ loop 'echo $ITEM | tr a-z A-Z' -i
|
|
|
|
$ parallel 'echo {} | tr a-z A-Z'
|
|
|
|
# Or more efficiently:
|
|
|
|
$ parallel --pipe tr a-z A-Z
|
|
|
|
|
|
|
|
$ loop 'echo $ITEM' --for "`ls`"
|
|
|
|
$ parallel echo {} ::: "`ls`"
|
|
|
|
|
|
|
|
$ ls | loop './my_program $ITEM' --until-success;
|
|
|
|
$ ls | parallel --halt now,success=1 ./my_program {}
|
|
|
|
|
|
|
|
$ ls | loop './my_program $ITEM' --until-fail;
|
|
|
|
$ ls | parallel --halt now,fail=1 ./my_program {}
|
|
|
|
|
|
|
|
$ ./deploy.sh;
|
|
|
|
loop 'curl -sw "%{http_code}" http://coolwebsite.biz' \
|
|
|
|
--every 5s --until-contains 200;
|
|
|
|
./announce_to_slack.sh
|
|
|
|
$ ./deploy.sh;
|
|
|
|
loopy --delay 5s --halt now,success=1 \
|
|
|
|
'curl -sw "%{http_code}" http://coolwebsite.biz | match 200';
|
|
|
|
./announce_to_slack.sh
|
|
|
|
|
|
|
|
$ loop "ping -c 1 mysite.com" --until-success; ./do_next_thing
|
|
|
|
$ loopy --halt now,success=1 ping -c 1 mysite.com; ./do_next_thing
|
|
|
|
|
|
|
|
$ ./create_big_file -o my_big_file.bin;
|
|
|
|
loop 'ls' --until-contains 'my_big_file.bin';
|
|
|
|
./upload_big_file my_big_file.bin
|
|
|
|
# inotifywait is a better tool to detect file system changes.
|
|
|
|
# It can even make sure the file is complete
|
|
|
|
# so you are not uploading an incomplete file
|
|
|
|
$ inotifywait -qmre MOVED_TO -e CLOSE_WRITE --format %w%f . |
|
|
|
|
grep my_big_file.bin
|
|
|
|
|
|
|
|
$ ls | loop 'cp $ITEM $ITEM.bak'
|
|
|
|
$ ls | parallel cp {} {}.bak
|
|
|
|
|
|
|
|
$ loop './do_thing.sh' --every 15s --until-success --num 5
|
|
|
|
$ parallel --retries 5 --delay 15s ::: ./do_thing.sh
|
|
|
|
|
|
|
|
https://github.com/Miserlou/Loop/ (Last checked: 2018-10)
|
|
|
|
|
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN lorikeet AND GNU Parallel
|
|
|
|
|
|
|
|
B<lorikeet> can run jobs in parallel. It does this based on a
|
|
|
|
dependency graph described in a file, so this is similar to B<make>.
|
|
|
|
|
|
|
|
https://github.com/cetra3/lorikeet (Last checked: 2018-10)
|
|
|
|
|
2019-01-21 02:16:59 +00:00
|
|
|
|
2018-11-22 23:30:23 +00:00
|
|
|
=head2 DIFFERENCES BETWEEN spp AND GNU Parallel
|
|
|
|
|
|
|
|
B<spp> can run jobs in parallel. B<spp> does not use a command
|
|
|
|
template to generate the jobs, but requires jobs to be in a
|
|
|
|
file. Output from the jobs mix.
|
|
|
|
|
2019-01-21 02:16:59 +00:00
|
|
|
https://github.com/john01dav/spp (Last checked: 2019-01)
|
|
|
|
|
2018-10-22 22:46:38 +00:00
|
|
|
|
2019-01-25 05:16:35 +00:00
|
|
|
=head2 DIFFERENCES BETWEEN paral AND GNU Parallel
|
|
|
|
|
|
|
|
B<paral> prints a lot of status information and stores the output from
|
|
|
|
the commands run into files. This means it cannot be used the middle
|
|
|
|
of a pipe like this
|
|
|
|
|
|
|
|
paral "echo this" "echo does not" "echo work" | wc
|
|
|
|
|
|
|
|
Instead it puts the output into files named like
|
|
|
|
B<out_#_I<command>.out.log>. To get a very similar behaviour with GNU
|
|
|
|
B<parallel> use B<--results
|
|
|
|
'out_{#}_{=s/[^\sa-z_0-9]//g;s/\s+/_/g=}.log' --eta>
|
|
|
|
|
|
|
|
B<paral> only takes arguments on the command line and each argument
|
|
|
|
should be a full command. Thus it does not use command templates.
|
|
|
|
|
|
|
|
This limits how many jobs it can run in total, because they all need
|
|
|
|
to fit on a single command line.
|
|
|
|
|
|
|
|
B<paral> has no support for running jobs remotely.
|
|
|
|
|
|
|
|
The examples from B<README.markdown> and the corresponding command run
|
|
|
|
with GNU B<parallel> (B<--results
|
|
|
|
'out_{#}_{=s/[^\sa-z_0-9]//g;s/\s+/_/g=}.log' --eta> is omitted from
|
|
|
|
the GNU B<parallel> command):
|
|
|
|
|
|
|
|
paral "command 1" "command 2 --flag" "command arg1 arg2"
|
|
|
|
parallel ::: "command 1" "command 2 --flag" "command arg1 arg2"
|
|
|
|
|
|
|
|
paral "sleep 1 && echo c1" "sleep 2 && echo c2" \
|
|
|
|
"sleep 3 && echo c3" "sleep 4 && echo c4" "sleep 5 && echo c5"
|
|
|
|
parallel ::: "sleep 1 && echo c1" "sleep 2 && echo c2" \
|
|
|
|
"sleep 3 && echo c3" "sleep 4 && echo c4" "sleep 5 && echo c5"
|
|
|
|
# Or shorter:
|
|
|
|
parallel "sleep {} && echo c{}" ::: {1..5}
|
|
|
|
|
|
|
|
paral -n=0 "sleep 5 && echo c5" "sleep 4 && echo c4" \
|
|
|
|
"sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
|
|
|
|
parallel ::: "sleep 5 && echo c5" "sleep 4 && echo c4" \
|
|
|
|
"sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
|
|
|
|
# Or shorter:
|
|
|
|
parallel -j0 "sleep {} && echo c{}" ::: 5 4 3 2 1
|
|
|
|
|
|
|
|
paral -n=1 "sleep 5 && echo c5" "sleep 4 && echo c4" \
|
|
|
|
"sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
|
|
|
|
parallel -j1 "sleep {} && echo c{}" ::: 5 4 3 2 1
|
|
|
|
|
|
|
|
paral -n=2 "sleep 5 && echo c5" "sleep 4 && echo c4" \
|
|
|
|
"sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
|
|
|
|
parallel -j2 "sleep {} && echo c{}" ::: 5 4 3 2 1
|
|
|
|
|
|
|
|
paral -n=5 "sleep 5 && echo c5" "sleep 4 && echo c4" \
|
|
|
|
"sleep 3 && echo c3" "sleep 2 && echo c2" "sleep 1 && echo c1"
|
|
|
|
parallel -j5 "sleep {} && echo c{}" ::: 5 4 3 2 1
|
|
|
|
|
|
|
|
paral -n=1 "echo a && sleep 0.5 && echo b && sleep 0.5 && \
|
|
|
|
echo c && sleep 0.5 && echo d && sleep 0.5 && \
|
|
|
|
echo e && sleep 0.5 && echo f && sleep 0.5 && \
|
|
|
|
echo g && sleep 0.5 && echo h"
|
|
|
|
parallel ::: "echo a && sleep 0.5 && echo b && sleep 0.5 && \
|
|
|
|
echo c && sleep 0.5 && echo d && sleep 0.5 && \
|
|
|
|
echo e && sleep 0.5 && echo f && sleep 0.5 && \
|
|
|
|
echo g && sleep 0.5 && echo h"
|
|
|
|
|
|
|
|
https://github.com/amattn/paral (Last checked: 2019-01)
|
|
|
|
|
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN concurr AND GNU Parallel
|
|
|
|
|
|
|
|
B<concurr> is built to run jobs in parallel using a client/server
|
|
|
|
model.
|
|
|
|
|
|
|
|
The examples from B<README.md>:
|
|
|
|
|
|
|
|
concurr 'echo job {#} on slot {%}: {}' : arg1 arg2 arg3 arg4
|
|
|
|
parallel 'echo job {#} on slot {%}: {}' ::: arg1 arg2 arg3 arg4
|
|
|
|
|
|
|
|
concurr 'echo job {#} on slot {%}: {}' :: file1 file2 file3
|
|
|
|
parallel 'echo job {#} on slot {%}: {}' :::: file1 file2 file3
|
|
|
|
|
|
|
|
concurr 'echo {}' < input_file
|
|
|
|
parallel 'echo {}' < input_file
|
|
|
|
|
|
|
|
cat file | concurr 'echo {}'
|
|
|
|
cat file | parallel 'echo {}'
|
|
|
|
|
|
|
|
B<concurr> deals badly empty input files and with output larger than
|
|
|
|
64 KB.
|
|
|
|
|
|
|
|
https://github.com/mmstick/concurr (Last checked: 2019-01)
|
|
|
|
|
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN lesser-parallel AND GNU Parallel
|
|
|
|
|
|
|
|
B<lesser-parallel> is the inspiration for B<parallel --embed>. Both
|
|
|
|
B<lesser-parallel> and B<parallel --embed> define bash functions that
|
|
|
|
can be included as part of a bash script to run jobs in parallel.
|
|
|
|
|
|
|
|
B<lesser-parallel> implements a few of the replacement strings, but
|
|
|
|
hardly any options, whereas B<parallel --embed> gives you the full
|
|
|
|
GNU B<parallel> experience.
|
|
|
|
|
|
|
|
https://github.com/kou1okada/lesser-parallel (Last checked: 2019-01)
|
|
|
|
|
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN npm-parallel AND GNU Parallel
|
|
|
|
|
|
|
|
B<npm-parallel> can run npm tasks in parallel.
|
|
|
|
|
|
|
|
There are no examples and very little documentation, so it is hard to
|
|
|
|
compare to GNU B<parallel>.
|
|
|
|
|
|
|
|
https://github.com/spion/npm-parallel (Last checked: 2019-01)
|
|
|
|
|
|
|
|
|
|
|
|
=head2 DIFFERENCES BETWEEN machma AND GNU Parallel
|
|
|
|
|
|
|
|
B<machma> runs tasks in parallel. It gives time stamped
|
|
|
|
output. It buffers in RAM. The examples from README.md:
|
|
|
|
|
2019-02-22 18:00:42 +00:00
|
|
|
find . -iname '*.jpg' |
|
|
|
|
machma -- mogrify -resize 1200x1200 -filter Lanczos {}
|
|
|
|
find . -iname '*.jpg' |
|
|
|
|
parallel mogrify -resize 1200x1200 -filter Lanczos {}
|
2019-01-25 05:16:35 +00:00
|
|
|
|
|
|
|
cat /tmp/ips | machma -p 2 -- ping -c 2 -q {}
|
|
|
|
cat /tmp/ips | parallel -j 2 --tag --line-buffer ping -c 2 -q {}
|
|
|
|
|
2019-02-22 18:00:42 +00:00
|
|
|
cat /tmp/ips |
|
|
|
|
machma -- sh -c 'ping -c 2 -q $0 > /dev/null && echo alive' {}
|
|
|
|
cat /tmp/ips |
|
|
|
|
parallel --tag 'ping -c 2 -q {} > /dev/null && echo alive'
|
2019-01-25 05:16:35 +00:00
|
|
|
|
2019-02-22 18:00:42 +00:00
|
|
|
find . -iname '*.jpg' |
|
|
|
|
machma --timeout 5s -- mogrify -resize 1200x1200 -filter Lanczos {}
|
|
|
|
find . -iname '*.jpg' |
|
|
|
|
parallel --timeout 5s mogrify -resize 1200x1200 -filter Lanczos {}
|
2019-01-25 05:16:35 +00:00
|
|
|
|
2019-02-22 18:00:42 +00:00
|
|
|
find . -iname '*.jpg' -print0 |
|
|
|
|
machma --null -- mogrify -resize 1200x1200 -filter Lanczos {}
|
|
|
|
find . -iname '*.jpg' -print0 |
|
|
|
|
parallel --null mogrify -resize 1200x1200 -filter Lanczos {}
|
2019-01-25 05:16:35 +00:00
|
|
|
|
|
|
|
https://github.com/fd0/machma (Last checked: 2019-01)
|
|
|
|
|
|
|
|
|
2019-02-22 18:00:42 +00:00
|
|
|
=head2 DIFFERENCES BETWEEN interlace AND GNU Parallel
|
|
|
|
|
|
|
|
B<interlace> is built for network analysis to run network tools in parallel.
|
|
|
|
|
|
|
|
B<interface> does not buffer output, so output mixes.
|
|
|
|
|
|
|
|
Using B<prips> the examples from https://github.com/codingo/Interlace
|
|
|
|
can be run with GNU B<parallel>:
|
|
|
|
|
|
|
|
interlace -tL ./targets.txt -threads 5 \
|
|
|
|
-c "nikto --host _target_ > ./_target_-nikto.txt" -v
|
|
|
|
parallel -a targets.txt -P5 nikto --host {} > ./{}_-nikto.txt
|
|
|
|
|
|
|
|
interlace -tL ./targets.txt -threads 5 -c \
|
|
|
|
"nikto --host _target_:_port_ > ./_target_-_port_-nikto.txt" \
|
|
|
|
-p 80,443 -v
|
|
|
|
parallel -P5 nikto --host {1}:{2} > ./{1}-{2}-nikto.txt \
|
|
|
|
:::: targets.txt ::: 80 443
|
|
|
|
|
|
|
|
commands.txt:
|
|
|
|
nikto --host _target_:_port_ > _output_/_target_-nikto.txt
|
|
|
|
sslscan _target_:_port_ > _output_/_target_-sslscan.txt
|
|
|
|
testssl.sh _target_:_port_ > _output_/_target_-testssl.txt
|
|
|
|
interlace -t example.com -o ~/Engagements/example/ \
|
|
|
|
-cL ./commands.txt -p 80,443
|
|
|
|
|
|
|
|
_nikto() {
|
|
|
|
nikto --host "$1:$2"
|
|
|
|
}
|
|
|
|
_sslscan() {
|
|
|
|
sslscan "$1:$2"
|
|
|
|
}
|
|
|
|
_testssl() {
|
|
|
|
testssl.sh "$1:$2"
|
|
|
|
}
|
|
|
|
export -f _nikto
|
|
|
|
export -f _sslscan
|
|
|
|
export -f _testssl
|
|
|
|
parallel --results ~/Engagements/example/{2}:{3}{1} \
|
|
|
|
::: _nikto _sslscan _testssl ::: example.com ::: 80 443
|
|
|
|
|
|
|
|
interlace -t 192.168.12.0/24 -c "vhostscan _target_ \
|
|
|
|
-oN _output_/_target_-vhosts.txt" -o ~/scans/ -threads 50
|
|
|
|
prips 192.168.12.0/24 |
|
|
|
|
parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
|
|
|
|
|
|
|
|
interlace -t 192.168.12.* -c "vhostscan _target_ \
|
|
|
|
-oN _output_/_target_-vhosts.txt" -o ~/scans/ -threads 50
|
|
|
|
# Glob is not supported in prips
|
|
|
|
prips 192.168.12.0/24 |
|
|
|
|
parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
|
|
|
|
|
|
|
|
interlace -t 192.168.12.1-15 -c \
|
|
|
|
"vhostscan _target_ -oN _output_/_target_-vhosts.txt" \
|
|
|
|
-o ~/scans/ -threads 50
|
|
|
|
# Dash notation is not supported in prips
|
|
|
|
prips 192.168.12.1 192.168.12.15 |
|
|
|
|
parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
|
|
|
|
|
|
|
|
interlace -tL ./target-list.txt -c \
|
|
|
|
"vhostscan -t _target_ -oN _output_/_target_-vhosts.txt" \
|
|
|
|
-o ~/scans/ -threads 50
|
|
|
|
cat ./target-list.txt |
|
|
|
|
parallel -P50 vhostscan -t {} -oN ~/scans/{}-vhosts.txt
|
|
|
|
|
|
|
|
./vhosts-commands.txt -tL ./target-list.txt:
|
|
|
|
vhostscan -t $target -oN _output_/_target_-vhosts.txt
|
|
|
|
interlace -cL ./vhosts-commands.txt -tL ./target-list.txt \
|
|
|
|
-threads 50 -o ~/scans
|
|
|
|
|
|
|
|
./vhosts-commands.txt -tL ./target-list.txt:
|
|
|
|
vhostscan -t "$1" -oN "$2"
|
|
|
|
parallel -P50 ./vhosts-commands.txt {} ~/scans/{} \
|
|
|
|
:::: ./target-list.txt
|
|
|
|
|
|
|
|
interlace -t 192.168.12.0/24 -e 192.168.12.0/26 -c \
|
|
|
|
"vhostscan _target_ -oN _output_/_target_-vhosts.txt" \
|
|
|
|
-o ~/scans/ -threads 50
|
|
|
|
prips 192.168.12.0/24 | grep -xv -Ff <(prips 192.168.12.0/26) |
|
|
|
|
parallel -P50 vhostscan {} -oN ~/scans/{}-vhosts.txt
|
|
|
|
|
|
|
|
|
|
|
|
https://github.com/codingo/Interlace (Last checked: 2019-02)
|
|
|
|
|
2017-09-21 22:50:39 +00:00
|
|
|
=head2 Todo
|
|
|
|
|
2018-09-20 22:15:14 +00:00
|
|
|
Url for spread
|
2018-03-21 21:57:28 +00:00
|
|
|
|
2019-01-21 02:16:59 +00:00
|
|
|
https://github.com/reggi/pkgrun
|
|
|
|
|
|
|
|
https://github.com/benoror/better-npm-run
|
|
|
|
|
|
|
|
https://github.com/bahmutov/with-package
|
|
|
|
|
|
|
|
https://github.com/spion/npm-parallel
|
|
|
|
|
|
|
|
https://github.com/royriojas/shell-executor
|
|
|
|
|
|
|
|
https://github.com/darkguy2008/parallelshell
|
|
|
|
|
2018-12-28 11:47:25 +00:00
|
|
|
https://github.com/xuchenCN/go-pssh
|
2018-10-22 22:46:38 +00:00
|
|
|
|
|
|
|
https://github.com/amritb/with-this.git
|
|
|
|
|
2018-09-20 22:15:14 +00:00
|
|
|
https://github.com/fd0/machma Requires Go >= 1.7.
|
2017-09-21 22:50:39 +00:00
|
|
|
|
|
|
|
https://github.com/k-bx/par requires Haskell to work. This limits the
|
|
|
|
number of platforms this can work on.
|
|
|
|
|
|
|
|
https://github.com/otonvm/Parallel
|
|
|
|
|
|
|
|
https://github.com/flesler/parallel
|
|
|
|
|
|
|
|
https://github.com/Julian/Verge
|
|
|
|
|
|
|
|
|
2017-05-21 19:04:37 +00:00
|
|
|
=head1 TESTING OTHER TOOLS
|
|
|
|
|
|
|
|
There are certain issues that are very common on parallelizing
|
|
|
|
tools. Here are a few stress tests. Be warned: If the tool is badly
|
|
|
|
coded it may overload you machine.
|
|
|
|
|
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
=head2 MIX: Output mixes
|
|
|
|
|
|
|
|
Output from 2 jobs should not mix. If the output is not used, this
|
|
|
|
does not matter; but if the output I<is> used then it is important
|
|
|
|
that you do not get half a line from one job followed by half a line
|
|
|
|
from another job.
|
|
|
|
|
|
|
|
If the tool does not buffer, output will most likely mix now and then.
|
|
|
|
|
|
|
|
This test stresses whether output mixes.
|
2017-05-21 19:04:37 +00:00
|
|
|
|
|
|
|
#!/bin/bash
|
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
paralleltool="parallel -j0"
|
2017-05-21 19:04:37 +00:00
|
|
|
|
|
|
|
cat <<-EOF > mycommand
|
|
|
|
#!/bin/bash
|
|
|
|
|
2017-12-26 14:46:00 +00:00
|
|
|
# If a, b, c, d, e, and f mix: Very bad
|
|
|
|
perl -e 'print STDOUT "a"x3000_000," "'
|
|
|
|
perl -e 'print STDERR "b"x3000_000," "'
|
|
|
|
perl -e 'print STDOUT "c"x3000_000," "'
|
|
|
|
perl -e 'print STDERR "d"x3000_000," "'
|
|
|
|
perl -e 'print STDOUT "e"x3000_000," "'
|
|
|
|
perl -e 'print STDERR "f"x3000_000," "'
|
2017-05-21 19:04:37 +00:00
|
|
|
echo
|
2017-12-26 14:46:00 +00:00
|
|
|
echo >&2
|
2017-05-21 19:04:37 +00:00
|
|
|
EOF
|
|
|
|
chmod +x mycommand
|
|
|
|
|
|
|
|
# Run 30 jobs in parallel
|
2018-09-20 22:15:14 +00:00
|
|
|
seq 30 |
|
|
|
|
$paralleltool ./mycommand > >(tr -s abcdef) 2> >(tr -s abcdef >&2)
|
2017-05-21 19:04:37 +00:00
|
|
|
|
2017-12-26 14:46:00 +00:00
|
|
|
# 'a c e' and 'b d f' should always stay together
|
2017-05-21 19:04:37 +00:00
|
|
|
# and there should only be a single line per job
|
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
|
|
|
|
=head2 RAM: Output limited by RAM
|
2017-05-21 19:04:37 +00:00
|
|
|
|
|
|
|
Some tools cache output in RAM. This makes them extremely slow if the
|
2018-07-08 18:45:39 +00:00
|
|
|
output is bigger than physical memory and crash if the output is
|
2017-05-21 19:04:37 +00:00
|
|
|
bigger than the virtual memory.
|
|
|
|
|
|
|
|
#!/bin/bash
|
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
paralleltool="parallel -j0"
|
2017-05-21 19:04:37 +00:00
|
|
|
|
|
|
|
cat <<'EOF' > mycommand
|
|
|
|
#!/bin/bash
|
|
|
|
|
|
|
|
# Generate 1 GB output
|
|
|
|
yes "`perl -e 'print \"c\"x30_000'`" | head -c 1G
|
|
|
|
EOF
|
|
|
|
chmod +x mycommand
|
|
|
|
|
|
|
|
# Run 20 jobs in parallel
|
|
|
|
# Adjust 20 to be > physical RAM and < free space on /tmp
|
2018-03-21 21:57:28 +00:00
|
|
|
seq 20 | time $paralleltool ./mycommand | wc -c
|
2017-05-21 19:04:37 +00:00
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
|
|
|
|
=head2 DISKFULL: Incomplete data if /tmp runs full
|
|
|
|
|
|
|
|
If caching is done on disk, the disk can run full during the run. Not
|
|
|
|
all programs discover this. GNU Parallel discovers it, if it stays
|
|
|
|
full for at least 2 seconds.
|
|
|
|
|
|
|
|
#!/bin/bash
|
|
|
|
|
|
|
|
paralleltool="parallel -j0"
|
|
|
|
|
|
|
|
# This should be a dir with less than 100 GB free space
|
|
|
|
smalldisk=/tmp/shm/parallel
|
|
|
|
|
|
|
|
TMPDIR="$smalldisk"
|
|
|
|
export TMPDIR
|
|
|
|
|
|
|
|
max_output() {
|
|
|
|
# Force worst case scenario:
|
|
|
|
# Make GNU Parallel only check once per second
|
|
|
|
sleep 10
|
|
|
|
# Generate 100 GB to fill $TMPDIR
|
|
|
|
# Adjust if /tmp is bigger than 100 GB
|
|
|
|
yes | head -c 100G >$TMPDIR/$$
|
|
|
|
# Generate 10 MB output that will not be buffered due to full disk
|
|
|
|
perl -e 'print "X"x10_000_000' | head -c 10M
|
|
|
|
echo This part is missing from incomplete output
|
|
|
|
sleep 2
|
|
|
|
rm $TMPDIR/$$
|
|
|
|
echo Final output
|
|
|
|
}
|
|
|
|
|
|
|
|
export -f max_output
|
|
|
|
seq 10 | $paralleltool max_output | tr -s X
|
|
|
|
|
|
|
|
|
|
|
|
=head2 CLEANUP: Leaving tmp files at unexpected death
|
2017-05-21 19:04:37 +00:00
|
|
|
|
2017-07-20 19:38:45 +00:00
|
|
|
Some tools do not clean up tmp files if they are killed. If the tool
|
|
|
|
buffers on disk, they may not clean up, if they are killed.
|
2017-05-21 19:04:37 +00:00
|
|
|
|
|
|
|
#!/bin/bash
|
|
|
|
|
|
|
|
paralleltool=parallel
|
|
|
|
|
|
|
|
ls /tmp >/tmp/before
|
|
|
|
seq 10 | $paralleltool sleep &
|
|
|
|
pid=$!
|
|
|
|
# Give the tool time to start up
|
|
|
|
sleep 1
|
|
|
|
# Kill it without giving it a chance to cleanup
|
|
|
|
kill -9 $!
|
|
|
|
# Should be empty: No files should be left behind
|
|
|
|
diff <(ls /tmp) /tmp/before
|
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
|
|
|
|
=head2 SPCCHAR: Dealing badly with special file names.
|
2017-05-21 19:04:37 +00:00
|
|
|
|
|
|
|
It is not uncommon for users to create files like:
|
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
My brother's 12" *** record (costs $$$).jpg
|
2017-05-21 19:04:37 +00:00
|
|
|
|
|
|
|
Some tools break on this.
|
|
|
|
|
|
|
|
#!/bin/bash
|
|
|
|
|
|
|
|
paralleltool=parallel
|
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
touch "My brother's 12\" *** record (costs \$\$\$).jpg"
|
|
|
|
ls My*jpg | $paralleltool ls -l
|
|
|
|
|
2017-05-21 19:04:37 +00:00
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
=head2 COMPOSED: Composed commands do not work
|
2017-05-21 19:04:37 +00:00
|
|
|
|
|
|
|
Some tools require you to wrap composed commands into B<bash -c>.
|
|
|
|
|
|
|
|
echo bar | $paralleltool echo foo';' echo {}
|
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
|
|
|
|
=head2 ONEREP: Only one replacement string allowed
|
2017-05-21 19:04:37 +00:00
|
|
|
|
|
|
|
Some tools can only insert the argument once.
|
|
|
|
|
|
|
|
echo bar | $paralleltool echo {} foo {}
|
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
|
2018-09-20 22:15:14 +00:00
|
|
|
=head2 INPUTSIZE: Length of input should not be limited
|
|
|
|
|
|
|
|
Some tools limit the length of the input lines artificially with no good
|
|
|
|
reason. GNU B<parallel> does not:
|
|
|
|
|
|
|
|
perl -e 'print "foo."."x"x100_000_000' | parallel echo {.}
|
|
|
|
|
|
|
|
GNU B<parallel> limits the command to run to 128 KB due to execve(1):
|
|
|
|
|
|
|
|
perl -e 'print "x"x131_000' | parallel echo {} | wc
|
|
|
|
|
|
|
|
|
2018-03-21 21:57:28 +00:00
|
|
|
=head2 NUMWORDS: Speed depends on number of words
|
2017-07-20 19:38:45 +00:00
|
|
|
|
|
|
|
Some tools become very slow if output lines have many words.
|
|
|
|
|
|
|
|
#!/bin/bash
|
|
|
|
|
|
|
|
paralleltool=parallel
|
|
|
|
|
|
|
|
cat <<-EOF > mycommand
|
|
|
|
#!/bin/bash
|
|
|
|
|
|
|
|
# 10 MB of lines with 1000 words
|
|
|
|
yes "`seq 1000`" | head -c 10M
|
|
|
|
EOF
|
|
|
|
chmod +x mycommand
|
|
|
|
|
|
|
|
# Run 30 jobs in parallel
|
|
|
|
seq 30 | time $paralleltool -j0 ./mycommand > /dev/null
|
|
|
|
|
2017-05-21 19:04:37 +00:00
|
|
|
|
2017-01-01 11:42:52 +00:00
|
|
|
=head1 AUTHOR
|
|
|
|
|
|
|
|
When using GNU B<parallel> for a publication please cite:
|
|
|
|
|
|
|
|
O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login:
|
|
|
|
The USENIX Magazine, February 2011:42-47.
|
|
|
|
|
|
|
|
This helps funding further development; and it won't cost you a cent.
|
|
|
|
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
|
|
|
|
|
|
|
|
Copyright (C) 2007-10-18 Ole Tange, http://ole.tange.dk
|
|
|
|
|
2018-09-20 22:15:14 +00:00
|
|
|
Copyright (C) 2008-2010 Ole Tange, http://ole.tange.dk
|
2017-01-01 11:42:52 +00:00
|
|
|
|
2018-12-28 11:47:25 +00:00
|
|
|
Copyright (C) 2010-2019 Ole Tange, http://ole.tange.dk and Free
|
2018-09-20 22:15:14 +00:00
|
|
|
Software Foundation, Inc.
|
2017-01-01 11:42:52 +00:00
|
|
|
|
|
|
|
Parts of the manual concerning B<xargs> compatibility is inspired by
|
|
|
|
the manual of B<xargs> from GNU findutils 4.4.2.
|
|
|
|
|
|
|
|
|
|
|
|
=head1 LICENSE
|
|
|
|
|
|
|
|
This program is free software; you can redistribute it and/or modify
|
|
|
|
it under the terms of the GNU General Public License as published by
|
|
|
|
the Free Software Foundation; either version 3 of the License, or
|
|
|
|
at your option any later version.
|
|
|
|
|
|
|
|
This program is distributed in the hope that it will be useful,
|
|
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
GNU General Public License for more details.
|
|
|
|
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
|
|
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
|
|
|
|
|
|
|
=head2 Documentation license I
|
|
|
|
|
|
|
|
Permission is granted to copy, distribute and/or modify this documentation
|
|
|
|
under the terms of the GNU Free Documentation License, Version 1.3 or
|
|
|
|
any later version published by the Free Software Foundation; with no
|
|
|
|
Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
|
|
|
|
Texts. A copy of the license is included in the file fdl.txt.
|
|
|
|
|
|
|
|
=head2 Documentation license II
|
|
|
|
|
|
|
|
You are free:
|
|
|
|
|
|
|
|
=over 9
|
|
|
|
|
|
|
|
=item B<to Share>
|
|
|
|
|
|
|
|
to copy, distribute and transmit the work
|
|
|
|
|
|
|
|
=item B<to Remix>
|
|
|
|
|
|
|
|
to adapt the work
|
|
|
|
|
|
|
|
=back
|
|
|
|
|
|
|
|
Under the following conditions:
|
|
|
|
|
|
|
|
=over 9
|
|
|
|
|
|
|
|
=item B<Attribution>
|
|
|
|
|
|
|
|
You must attribute the work in the manner specified by the author or
|
|
|
|
licensor (but not in any way that suggests that they endorse you or
|
|
|
|
your use of the work).
|
|
|
|
|
|
|
|
=item B<Share Alike>
|
|
|
|
|
|
|
|
If you alter, transform, or build upon this work, you may distribute
|
|
|
|
the resulting work only under the same, similar or a compatible
|
|
|
|
license.
|
|
|
|
|
|
|
|
=back
|
|
|
|
|
|
|
|
With the understanding that:
|
|
|
|
|
|
|
|
=over 9
|
|
|
|
|
|
|
|
=item B<Waiver>
|
|
|
|
|
|
|
|
Any of the above conditions can be waived if you get permission from
|
|
|
|
the copyright holder.
|
|
|
|
|
|
|
|
=item B<Public Domain>
|
|
|
|
|
|
|
|
Where the work or any of its elements is in the public domain under
|
|
|
|
applicable law, that status is in no way affected by the license.
|
|
|
|
|
|
|
|
=item B<Other Rights>
|
|
|
|
|
|
|
|
In no way are any of the following rights affected by the license:
|
|
|
|
|
|
|
|
=over 2
|
|
|
|
|
|
|
|
=item *
|
|
|
|
|
|
|
|
Your fair dealing or fair use rights, or other applicable
|
|
|
|
copyright exceptions and limitations;
|
|
|
|
|
|
|
|
=item *
|
|
|
|
|
|
|
|
The author's moral rights;
|
|
|
|
|
|
|
|
=item *
|
|
|
|
|
|
|
|
Rights other persons may have either in the work itself or in
|
|
|
|
how the work is used, such as publicity or privacy rights.
|
|
|
|
|
|
|
|
=back
|
|
|
|
|
|
|
|
=back
|
|
|
|
|
|
|
|
=over 9
|
|
|
|
|
|
|
|
=item B<Notice>
|
|
|
|
|
|
|
|
For any reuse or distribution, you must make clear to others the
|
|
|
|
license terms of this work.
|
|
|
|
|
|
|
|
=back
|
|
|
|
|
|
|
|
A copy of the full license is included in the file as cc-by-sa.txt.
|
|
|
|
|
|
|
|
|
|
|
|
=head1 DEPENDENCIES
|
|
|
|
|
|
|
|
GNU B<parallel> uses Perl, and the Perl modules Getopt::Long,
|
|
|
|
IPC::Open3, Symbol, IO::File, POSIX, and File::Temp. For remote usage
|
|
|
|
it also uses rsync with ssh.
|
|
|
|
|
|
|
|
|
|
|
|
=head1 SEE ALSO
|
|
|
|
|
|
|
|
B<find>(1), B<xargs>(1), B<make>(1), B<pexec>(1), B<ppss>(1),
|
|
|
|
B<xjobs>(1), B<prll>(1), B<dxargs>(1), B<mdm>(1)
|
|
|
|
|
|
|
|
=cut
|