Fixes: bug #43165: NFS as tmp broken in 20131222.

This commit is contained in:
Ole Tange 2014-09-07 17:10:44 +02:00
parent b01688b07f
commit 30ed40ee3f
6 changed files with 86 additions and 55 deletions

View file

@ -230,32 +230,17 @@ GNU Parallel 20140922 ('Attenborough') has been released. It is available for do
Haiku of the month: Haiku of the month:
code fork headache blues? <<>>
option P is your new friend
`man parallel` now!
-- Malcolm Cook
New in this release: New in this release:
* GNU Parallel now uses the same shell it was started from as the command shell for local jobs. So if GNU Parallel is started from tcsh it will use tcsh as its shell even if the login $SHELL is different. For remote jobs the login $SHELL will be used. * If the file give as --sshloginfile is changed it will be re-read when a job finishes though at most once per second. This makes it possible to add and remove hosts while running.
* The whole current environment in bash can be copied by using a shell wrapper function (Search manual for env_parallel). * Brutha uses GNU Parallel https://pypi.python.org/pypi/brutha/1.0.2
* --plus adds the replacement strings {+/} {+.} {+..} {+...} {..} {...} {/..} {/...}. The idea being that '+foo' matches the opposite of 'foo' and {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} = {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...} * OCRmyPDF uses GNU Parallel https://github.com/fritz-hh/OCRmyPDF/
* GNU Parallel now deals correctly with the combination rsync-3.1.X-client and rsync-2.5.7-server * GNU Parallel (Sebuah Uji Coba) http://pr4ka5a.wordpress.com/2014/09/04/gnu-parallel-sebuah-uji-coba/
* GNU Parallel was cited in: A Web Service for Scholarly Big Data Information Extraction http://patshih.ist.psu.edu/publications/Williams-CiteSeerExtractor-ICWS14.pdf
* Comparison of the speed of different GNU Parallel versions: http://lists.gnu.org/archive/html/parallel/2014-08/msg00030.html
* GNU Parallel was covered in the webcast 2014-08-20: Data Science at the Command Line http://www.oreilly.com/pub/e/3115
* Distributed processing with GNU parallel http://kazjote.eu/2014/08/11/distributed-processing-with-gnu-parallel
* A Peek into GNU Parallel http://blog.dataweave.in/post/94238943763/a-peek-into-gnu-parallel
* Сборка GNU parallel для CentOS/RHEL http://www.stableit.ru/2014/07/gnu-parallel-centosrhel.html
* Bug fixes and man page updates. * Bug fixes and man page updates.

View file

@ -2992,10 +2992,10 @@ sub which {
my %pid_parentpid_cmd; my %pid_parentpid_cmd;
sub pid_table { sub pid_table {
# return two tables: # Returns:
# pid -> children of pid # %children_of = { pid -> children of pid }
# pid -> pid of parent # %parent_of = { pid -> pid of parent }
# pid -> commandname # %name_of = { pid -> commandname }
if(not %pid_parentpid_cmd) { if(not %pid_parentpid_cmd) {
# Filter for SysV-style `ps` # Filter for SysV-style `ps`
@ -3595,6 +3595,7 @@ sub loadavg {
if($self->{'string'} ne ":") { if($self->{'string'} ne ":") {
$cmd = $self->sshcommand() . " " . $self->serverlogin() . " "; $cmd = $self->sshcommand() . " " . $self->serverlogin() . " ";
} }
# TODO Is is called 'ps ax -o state,command' on other platforms?
$cmd .= "ps ax -o state,command"; $cmd .= "ps ax -o state,command";
# As the command can take long to run if run remote # As the command can take long to run if run remote
# save it to a tmp file before moving it to the correct file # save it to a tmp file before moving it to the correct file
@ -5986,14 +5987,15 @@ sub set_exitsignal {
} }
{ {
my ($disk_full_fh,$b8193); my ($disk_full_fh, $b8193, $name);
sub exit_if_disk_full { sub exit_if_disk_full {
# Checks if $TMPDIR is full by writing 8kb to a tmpfile # Checks if $TMPDIR is full by writing 8kb to a tmpfile
# If the disk is full: Exit immediately. # If the disk is full: Exit immediately.
# Returns: # Returns:
# N/A # N/A
if(not $disk_full_fh) { if(not $disk_full_fh) {
$disk_full_fh = ::tmpfile(SUFFIX => ".df"); ($disk_full_fh, $name) = ::tmpfile(SUFFIX => ".df");
unlink $name;
$b8193 = "x"x8193; $b8193 = "x"x8193;
} }
my $pos = tell $disk_full_fh; my $pos = tell $disk_full_fh;

View file

@ -478,8 +478,8 @@ http://perldoc.perl.org/perlre.html
=item B<--compress> =item B<--compress>
Compress temporary files. If the output is big and very compressible Compress temporary files. If the output is big and very compressible
this will take up less disk space in $TMPDIR and possibly be faster due to less this will take up less disk space in $TMPDIR and possibly be faster
disk I/O. due to less disk I/O.
GNU B<parallel> will try B<lzop>, B<pigz>, B<gzip>, B<pbzip2>, GNU B<parallel> will try B<lzop>, B<pigz>, B<gzip>, B<pbzip2>,
B<plzip>, B<bzip2>, B<lzma>, B<lzip>, B<xz> in that order, and use the B<plzip>, B<bzip2>, B<lzma>, B<lzip>, B<xz> in that order, and use the
@ -594,12 +594,12 @@ To copy the full environment use this function:
export parallel_bash_environment='() { export parallel_bash_environment='() {
'"$(echo "shopt -s expand_aliases 2>/dev/null"; alias;typeset -p | grep -vFf <(readonly; echo GROUPS; echo FUNCNAME; echo DIRSTACK; echo _; echo PIPESTATUS; echo USERNAME) | grep -v BASH_;typeset -f)"' '"$(echo "shopt -s expand_aliases 2>/dev/null"; alias;typeset -p | grep -vFf <(readonly; echo GROUPS; echo FUNCNAME; echo DIRSTACK; echo _; echo PIPESTATUS; echo USERNAME) | grep -v BASH_;typeset -f)"'
}' }'
# Run as: env_parallel parallel_bash_environment "2>/dev/null;" ... # Run as: env_parallel [normal parallel options]
`which parallel` "$@" `which parallel` "$@"
unset parallel_bash_environment unset parallel_bash_environment
} }
# call as: # call as:
env_parallel ... env_parallel [normal parallel options]
See also: B<--record-env>. See also: B<--record-env>.
@ -1627,7 +1627,8 @@ The sshloginfile '-' is special, too, it read sshlogins from stdin
(standard input). (standard input).
If the sshloginfile is changed it will be re-read when a job finishes If the sshloginfile is changed it will be re-read when a job finishes
though at most once per second. though at most once per second. This makes it possible to add and
remove hosts while running.
This can be used to have a daemon that updates the sshloginfile to This can be used to have a daemon that updates the sshloginfile to
only contain servers that are up: only contain servers that are up:
@ -1791,8 +1792,14 @@ B<parallel -j0 'sleep {};echo -n start{};sleep {};echo {}end' ::: 1 2 3 4>
B<parallel -u -j0 'sleep {};echo -n start{};sleep {};echo {}end' ::: 1 2 3 4> B<parallel -u -j0 'sleep {};echo -n start{};sleep {};echo {}end' ::: 1 2 3 4>
It also disables B<--tag>. GNU B<parallel> runs faster with B<-u>. Can It also disables B<--tag>. GNU B<parallel> outputs faster with
be reversed with B<--group>. B<-u>. Compare the speed of these:
parallel seq ::: 300000000 >/dev/null
parallel -u seq ::: 300000000 >/dev/null
parallel --line-buffer seq ::: 300000000 >/dev/null
Can be reversed with B<--group>.
See also: B<--line-buffer> B<--group> See also: B<--line-buffer> B<--group>
@ -2090,6 +2097,26 @@ B<--env>:
parallel --env doit -S server doit ::: 1 2 3 parallel --env doit -S server doit ::: 1 2 3
parallel --env doubleit -S server doubleit ::: 1 2 3 ::: a b parallel --env doubleit -S server doubleit ::: 1 2 3 ::: a b
If your environment (aliases, variables, and functions) is small you
can copy the full environment without having to B<export -f>
anything. Just run this first:
env_parallel() {
export parallel_bash_environment='() {
'"$(echo "shopt -s expand_aliases 2>/dev/null"; alias;typeset -p | grep -vFf <(readonly; echo GROUPS; echo FUNCNAME; echo DIRSTACK; echo _; echo PIPESTATUS; echo USERNAME) | grep -v BASH_;typeset -f)"'
}'
# Run as: env_parallel parallel_bash_environment "2>/dev/null;" ...
`which parallel` "$@"
unset parallel_bash_environment
}
And then call as:
env_parallel doit ::: 1 2 3
env_parallel doubleit ::: 1 2 3 ::: a b
env_parallel -S server doit ::: 1 2 3
env_parallel -S server doubleit ::: 1 2 3 ::: a b
=head1 EXAMPLE: Removing file extension when processing files =head1 EXAMPLE: Removing file extension when processing files
@ -2114,14 +2141,13 @@ Put all converted in the same directory:
B<find sounddir -type f -name '*.wav' | parallel lame {} -o mydir/{/.}.mp3> B<find sounddir -type f -name '*.wav' | parallel lame {} -o mydir/{/.}.mp3>
=head1 EXAMPLE: Removing two file extensions when processing files and =head1 EXAMPLE: Removing two file extensions when processing files
calling GNU Parallel from itself
If you have directory with tar.gz files and want these extracted in If you have directory with tar.gz files and want these extracted in
the corresponding dir (e.g foo.tar.gz will be extracted in the dir the corresponding dir (e.g foo.tar.gz will be extracted in the dir
foo) you can do: foo) you can do:
B<ls *.tar.gz| parallel --er {tar} 'echo {tar}|parallel "mkdir -p {.} ; tar -C {.} -xf {.}.tar.gz"'> B<parallel --plus 'mkdir {..}; tar -C {..} -xf {}' ::: *.tar.gz>
=head1 EXAMPLE: Download 10 images for each of the past 30 days =head1 EXAMPLE: Download 10 images for each of the past 30 days
@ -2147,6 +2173,25 @@ source. If the value modudo 2 is 1: Use ":" otherwise use " ":
B<parallel -k echo {1}'{=3 $_=$_%2?":":" "=}'{2}{3} ::: {0..12} ::: {0..5} ::: {0..9}> B<parallel -k echo {1}'{=3 $_=$_%2?":":" "=}'{2}{3} ::: {0..12} ::: {0..5} ::: {0..9}>
=head1 EXAMPLE: Aggregating content of files
This:
parallel --header : echo x{X}y{Y}z{Z} \> x{X}y{Y}z{Z} \
::: X {1..5} ::: Y {01..10} ::: Z {1..5}
will generate the files x1y01z1 .. x5y10z5. If you want to aggregate
the output grouping on x and z you can do this:
parallel eval 'cat {=s/y01/y*/=} > {=s/y01//=}' ::: *y01*
For all values of x and z it runs commands like:
cat x1y*z1 > x1z1
So you end up with x1z1 .. x1z5 each containing the content of all
values of y.
=head1 EXAMPLE: Breadth first parallel web crawler/mirrorer =head1 EXAMPLE: Breadth first parallel web crawler/mirrorer
@ -2308,7 +2353,7 @@ Using B<--results> the results are saved in /tmp/diffcount*.
parallel --results /tmp/diffcount "diff -U 0 {1} {2} |tail -n +3 |grep -v '^@'|wc -l" ::: * ::: * parallel --results /tmp/diffcount "diff -U 0 {1} {2} |tail -n +3 |grep -v '^@'|wc -l" ::: * ::: *
To see the difference between file A and file B look at the file To see the difference between file A and file B look at the file
'/tmp/diffcount 1 A 2 B' where spaces are TABs (\t). '/tmp/diffcount/1/A/2/B'.
=head1 EXAMPLE: Speeding up fast jobs =head1 EXAMPLE: Speeding up fast jobs
@ -2807,20 +2852,21 @@ the parts directly to the program:
B<parallel --pipepart --block 100m -a bigfile --files sort | parallel -Xj1 sort -m {} ';' rm {} >>B<bigfile.sort> B<parallel --pipepart --block 100m -a bigfile --files sort | parallel -Xj1 sort -m {} ';' rm {} >>B<bigfile.sort>
=head1 EXAMPLE: Running more than 500 jobs workaround =head1 EXAMPLE: Running more than 250 jobs workaround
If you need to run a massive amount of jobs in parallel, then you will If you need to run a massive amount of jobs in parallel, then you will
likely hit the filehandle limit which is often around 500 jobs. If you likely hit the filehandle limit which is often around 250 jobs. If you
are super user you can raise the limit in /etc/security/limits.conf are super user you can raise the limit in /etc/security/limits.conf
but you can also use this workaround. The filehandle limit is per but you can also use this workaround. The filehandle limit is per
process. That means that if you just spawn more GNU B<parallel>s then process. That means that if you just spawn more GNU B<parallel>s then
each of them can run 500 jobs. This will spawn up to 2500 jobs: each of them can run 250 jobs. This will spawn up to 2500 jobs:
B<cat myinput | parallel --pipe -N 50 --round-robin -j50 parallel -j50 your_prg> B<cat myinput | parallel --pipe -N 50 --round-robin -j50 parallel -j50 your_prg>
This will spawn up to 250000 jobs (use with caution - you need 250 GB RAM to do this): This will spawn up to 62500 jobs (use with caution - you need 64 GB
RAM to do this, and you may need to increase /proc/sys/kernel/pid_max):
B<cat myinput | parallel --pipe -N 500 --round-robin -j500 parallel -j500 your_prg> B<cat myinput | parallel --pipe -N 250 --round-robin -j250 parallel -j250 your_prg>
=head1 EXAMPLE: Working as mutex and counting semaphore =head1 EXAMPLE: Working as mutex and counting semaphore
@ -3967,14 +4013,12 @@ and 150 ms after that.
=head3 Job startup =head3 Job startup
Starting a job on the local machine takes around 3 ms. This can be a Starting a job on the local machine takes around 10 ms. This can be a
big overhead if the job takes very few ms to run. Often you can group big overhead if the job takes very few ms to run. Often you can group
small jobs together using B<-X> which will make the overhead less small jobs together using B<-X> which will make the overhead less
significant. Or you can run multiple GNU B<parallel>s as described in significant. Or you can run multiple GNU B<parallel>s as described in
B<EXAMPLE: Speeding up fast jobs>. B<EXAMPLE: Speeding up fast jobs>.
Using B<--ungroup> the 3 ms can be lowered to around 2 ms.
=head3 SSH =head3 SSH
When using multiple computers GNU B<parallel> opens B<ssh> connections When using multiple computers GNU B<parallel> opens B<ssh> connections

View file

@ -133,7 +133,7 @@
.\" ======================================================================== .\" ========================================================================
.\" .\"
.IX Title "PARALLEL_TUTORIAL 1" .IX Title "PARALLEL_TUTORIAL 1"
.TH PARALLEL_TUTORIAL 1 "2014-08-23" "20140822" "parallel" .TH PARALLEL_TUTORIAL 1 "2014-09-04" "20140827" "parallel"
.\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" For nroff, turn off justification. Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents. .\" way too many mistakes in technical documents.
.if n .ad l .if n .ad l
@ -1277,7 +1277,7 @@ Output:
\&\s-1GNU\s0 Parallel can save the output of each job into files: \&\s-1GNU\s0 Parallel can save the output of each job into files:
.PP .PP
.Vb 1 .Vb 1
\& parallel \-\-files ::: A B C \& parallel \-\-files echo ::: A B C
.Ve .Ve
.PP .PP
Output will be similar to: Output will be similar to:
@ -1292,7 +1292,7 @@ By default \s-1GNU\s0 Parallel will cache the output in files in /tmp. This
can be changed by setting \f(CW$TMPDIR\fR or \-\-tmpdir: can be changed by setting \f(CW$TMPDIR\fR or \-\-tmpdir:
.PP .PP
.Vb 1 .Vb 1
\& parallel \-\-tmpdir /var/tmp \-\-files ::: A B C \& parallel \-\-tmpdir /var/tmp \-\-files echo ::: A B C
.Ve .Ve
.PP .PP
Output will be similar to: Output will be similar to:
@ -1306,7 +1306,7 @@ Output will be similar to:
Or: Or:
.PP .PP
.Vb 1 .Vb 1
\& TMPDIR=/var/tmp parallel \-\-files ::: A B C \& TMPDIR=/var/tmp parallel \-\-files echo ::: A B C
.Ve .Ve
.PP .PP
Output: Same as above. Output: Same as above.

View file

@ -952,7 +952,7 @@
<p>GNU Parallel can save the output of each job into files:</p> <p>GNU Parallel can save the output of each job into files:</p>
<pre><code> parallel --files ::: A B C</code></pre> <pre><code> parallel --files echo ::: A B C</code></pre>
<p>Output will be similar to:</p> <p>Output will be similar to:</p>
@ -962,7 +962,7 @@
<p>By default GNU Parallel will cache the output in files in /tmp. This can be changed by setting $TMPDIR or --tmpdir:</p> <p>By default GNU Parallel will cache the output in files in /tmp. This can be changed by setting $TMPDIR or --tmpdir:</p>
<pre><code> parallel --tmpdir /var/tmp --files ::: A B C</code></pre> <pre><code> parallel --tmpdir /var/tmp --files echo ::: A B C</code></pre>
<p>Output will be similar to:</p> <p>Output will be similar to:</p>
@ -972,7 +972,7 @@
<p>Or:</p> <p>Or:</p>
<pre><code> TMPDIR=/var/tmp parallel --files ::: A B C</code></pre> <pre><code> TMPDIR=/var/tmp parallel --files echo ::: A B C</code></pre>
<p>Output: Same as above.</p> <p>Output: Same as above.</p>

View file

@ -880,7 +880,7 @@ Output:
GNU Parallel can save the output of each job into files: GNU Parallel can save the output of each job into files:
parallel --files ::: A B C parallel --files echo ::: A B C
Output will be similar to: Output will be similar to:
@ -891,7 +891,7 @@ Output will be similar to:
By default GNU Parallel will cache the output in files in /tmp. This By default GNU Parallel will cache the output in files in /tmp. This
can be changed by setting $TMPDIR or --tmpdir: can be changed by setting $TMPDIR or --tmpdir:
parallel --tmpdir /var/tmp --files ::: A B C parallel --tmpdir /var/tmp --files echo ::: A B C
Output will be similar to: Output will be similar to:
@ -901,7 +901,7 @@ Output will be similar to:
Or: Or:
TMPDIR=/var/tmp parallel --files ::: A B C TMPDIR=/var/tmp parallel --files echo ::: A B C
Output: Same as above. Output: Same as above.