Fixes: bug #43165: NFS as tmp broken in 20131222.

2024-11-22 14:07:55 +00:00 · 2014-09-07 17:10:44 +02:00 · 2014-09-07 17:10:44 +02:00 · 30ed40ee3f
parent b01688b07f
commit 30ed40ee3f
6 changed files with 86 additions and 55 deletions
--- a/doc/release_new_version
+++ b/doc/release_new_version
@ -230,32 +230,17 @@ GNU Parallel 20140922 ('Attenborough') has been released. It is available for do

 Haiku of the month:

-  code fork headache blues?
-  option P is your new friend
-  `man parallel` now!
-         -- Malcolm Cook
+  <<>>

 New in this release:

-* GNU Parallel now uses the same shell it was started from as the command shell for local jobs. So if GNU Parallel is started from tcsh it will use tcsh as its shell even if the login $SHELL is different. For remote jobs the login $SHELL will be used.
+* If the file give as --sshloginfile is changed it will be re-read when a job finishes though at most once per second. This makes it possible to add and remove hosts while running.

-* The whole current environment in bash can be copied by using a shell wrapper function (Search manual for env_parallel).
+* Brutha uses GNU Parallel https://pypi.python.org/pypi/brutha/1.0.2

-* --plus adds the replacement strings {+/} {+.} {+..} {+...} {..} {...} {/..} {/...}. The idea being that '+foo' matches the opposite of 'foo' and {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} = {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}
+* OCRmyPDF uses GNU Parallel https://github.com/fritz-hh/OCRmyPDF/

-* GNU Parallel now deals correctly with the combination rsync-3.1.X-client and rsync-2.5.7-server
-
-* GNU Parallel was cited in: A Web Service for Scholarly Big Data Information Extraction http://patshih.ist.psu.edu/publications/Williams-CiteSeerExtractor-ICWS14.pdf
-
-* Comparison of the speed of different GNU Parallel versions: http://lists.gnu.org/archive/html/parallel/2014-08/msg00030.html
-
-* GNU Parallel was covered in the webcast 2014-08-20: Data Science at the Command Line http://www.oreilly.com/pub/e/3115
-
-* Distributed processing with GNU parallel http://kazjote.eu/2014/08/11/distributed-processing-with-gnu-parallel
-
-* A Peek into GNU Parallel http://blog.dataweave.in/post/94238943763/a-peek-into-gnu-parallel
-
-* Сборка GNU parallel для CentOS/RHEL http://www.stableit.ru/2014/07/gnu-parallel-centosrhel.html
+* GNU Parallel (Sebuah Uji Coba) http://pr4ka5a.wordpress.com/2014/09/04/gnu-parallel-sebuah-uji-coba/

 * Bug fixes and man page updates.

--- a/src/parallel
+++ b/src/parallel
@ -2992,10 +2992,10 @@ sub which {
    my %pid_parentpid_cmd;

    sub pid_table {
-	# return two tables:
-	# pid -> children of pid
-	# pid -> pid of parent
-	# pid -> commandname
+	# Returns:
+	#   %children_of = { pid -> children of pid }
+	#   %parent_of = { pid -> pid of parent }
+	#   %name_of = { pid -> commandname }

       	if(not %pid_parentpid_cmd) {
 	    # Filter for SysV-style `ps`
@ -3595,6 +3595,7 @@ sub loadavg {
        if($self->{'string'} ne ":") {
 	    $cmd = $self->sshcommand() . " " . $self->serverlogin() . " ";
 	}
+	# TODO Is is called 'ps ax -o state,command' on other platforms?
 	$cmd .= "ps ax -o state,command";
        # As the command can take long to run if run remote
        # save it to a tmp file before moving it to the correct file
@ -5986,14 +5987,15 @@ sub set_exitsignal {
 }

 {
-    my ($disk_full_fh,$b8193);
+    my ($disk_full_fh, $b8193, $name);
    sub exit_if_disk_full {
 	# Checks if $TMPDIR is full by writing 8kb to a tmpfile
 	# If the disk is full: Exit immediately.
 	# Returns:
 	#   N/A
 	if(not $disk_full_fh) {
-	    $disk_full_fh = ::tmpfile(SUFFIX => ".df");
+	    ($disk_full_fh, $name) = ::tmpfile(SUFFIX => ".df");
+	    unlink $name;
 	    $b8193 = "x"x8193;
 	}
 	my $pos = tell $disk_full_fh;
--- a/src/parallel.pod
+++ b/src/parallel.pod
@ -478,8 +478,8 @@ http://perldoc.perl.org/perlre.html
 =item B<--compress>

 Compress temporary files. If the output is big and very compressible
-this will take up less disk space in $TMPDIR and possibly be faster due to less
-disk I/O.
+this will take up less disk space in $TMPDIR and possibly be faster
+due to less disk I/O.

 GNU B<parallel> will try B<lzop>, B<pigz>, B<gzip>, B<pbzip2>,
 B<plzip>, B<bzip2>, B<lzma>, B<lzip>, B<xz> in that order, and use the
@ -594,12 +594,12 @@ To copy the full environment use this function:
      export parallel_bash_environment='() {
        '"$(echo "shopt -s expand_aliases 2>/dev/null"; alias;typeset -p | grep -vFf <(readonly; echo GROUPS; echo FUNCNAME; echo DIRSTACK; echo _; echo PIPESTATUS; echo USERNAME) | grep -v BASH_;typeset -f)"'
      }'
-      # Run as: env_parallel parallel_bash_environment "2>/dev/null;" ...
+      # Run as: env_parallel [normal parallel options]
      `which parallel` "$@"
      unset parallel_bash_environment
    }
    # call as:
-    env_parallel ...
+    env_parallel [normal parallel options]

 See also: B<--record-env>.

@ -1627,7 +1627,8 @@ The sshloginfile '-' is special, too, it read sshlogins from stdin
 (standard input).

 If the sshloginfile is changed it will be re-read when a job finishes
-though at most once per second.
+though at most once per second. This makes it possible to add and
+remove hosts while running.

 This can be used to have a daemon that updates the sshloginfile to
 only contain servers that are up:
@ -1791,8 +1792,14 @@ B<parallel -j0 'sleep {};echo -n start{};sleep {};echo {}end' ::: 1 2 3 4>

 B<parallel -u -j0 'sleep {};echo -n start{};sleep {};echo {}end' ::: 1 2 3 4>

-It also disables B<--tag>. GNU B<parallel> runs faster with B<-u>. Can
-be reversed with B<--group>.
+It also disables B<--tag>. GNU B<parallel> outputs faster with
+B<-u>. Compare the speed of these:
+
+  parallel seq ::: 300000000 >/dev/null
+  parallel -u seq ::: 300000000 >/dev/null
+  parallel --line-buffer seq ::: 300000000 >/dev/null
+
+Can be reversed with B<--group>.

 See also: B<--line-buffer> B<--group>

@ -2090,6 +2097,26 @@ B<--env>:
  parallel --env doit -S server doit ::: 1 2 3
  parallel --env doubleit -S server doubleit ::: 1 2 3 ::: a b

+If your environment (aliases, variables, and functions) is small you
+can copy the full environment without having to B<export -f>
+anything. Just run this first:
+
+  env_parallel() {
+      export parallel_bash_environment='() {
+        '"$(echo "shopt -s expand_aliases 2>/dev/null"; alias;typeset -p | grep -vFf <(readonly; echo GROUPS; echo FUNCNAME; echo DIRSTACK; echo _; echo PIPESTATUS; echo USERNAME) | grep -v BASH_;typeset -f)"'
+      }'
+      # Run as: env_parallel parallel_bash_environment "2>/dev/null;" ...
+      `which parallel` "$@"
+      unset parallel_bash_environment
+  }
+
+And then call as:
+
+  env_parallel doit ::: 1 2 3
+  env_parallel doubleit ::: 1 2 3 ::: a b
+  env_parallel -S server doit ::: 1 2 3
+  env_parallel -S server doubleit ::: 1 2 3 ::: a b
+

 =head1 EXAMPLE: Removing file extension when processing files

@ -2114,14 +2141,13 @@ Put all converted in the same directory:
 B<find sounddir -type f -name '*.wav' | parallel lame {} -o mydir/{/.}.mp3>


-=head1 EXAMPLE: Removing two file extensions when processing files and
-calling GNU Parallel from itself
+=head1 EXAMPLE: Removing two file extensions when processing files

 If you have directory with tar.gz files and want these extracted in
 the corresponding dir (e.g foo.tar.gz will be extracted in the dir
 foo) you can do:

-B<ls *.tar.gz| parallel --er {tar} 'echo {tar}|parallel "mkdir -p {.} ; tar -C {.} -xf {.}.tar.gz"'>
+B<parallel --plus 'mkdir {..}; tar -C {..} -xf {}' ::: *.tar.gz>


 =head1 EXAMPLE: Download 10 images for each of the past 30 days
@ -2147,6 +2173,25 @@ source. If the value modudo 2 is 1: Use ":" otherwise use " ":

 B<parallel -k echo {1}'{=3 $_=$_%2?":":" "=}'{2}{3} ::: {0..12} ::: {0..5} ::: {0..9}>

+=head1 EXAMPLE: Aggregating content of files
+
+This:
+
+  parallel --header : echo x{X}y{Y}z{Z} \> x{X}y{Y}z{Z} \
+  ::: X {1..5} ::: Y {01..10} ::: Z {1..5}
+
+will generate the files x1y01z1 .. x5y10z5. If you want to aggregate
+the output grouping on x and z you can do this:
+
+  parallel eval 'cat {=s/y01/y*/=} > {=s/y01//=}' ::: *y01*
+
+For all values of x and z it runs commands like:
+
+  cat x1y*z1 > x1z1
+
+So you end up with x1z1 .. x1z5 each containing the content of all
+values of y.
+

 =head1 EXAMPLE: Breadth first parallel web crawler/mirrorer

@ -2308,7 +2353,7 @@ Using B<--results> the results are saved in /tmp/diffcount*.
  parallel --results /tmp/diffcount "diff -U 0 {1} {2} |tail -n +3 |grep -v '^@'|wc -l" ::: * ::: *

 To see the difference between file A and file B look at the file
-'/tmp/diffcount 1 A 2 B' where spaces are TABs (\t).
+'/tmp/diffcount/1/A/2/B'.


 =head1 EXAMPLE: Speeding up fast jobs
@ -2807,20 +2852,21 @@ the parts directly to the program:
 B<parallel --pipepart --block 100m -a bigfile --files sort | parallel -Xj1 sort -m {} ';' rm {} >>B<bigfile.sort>


-=head1 EXAMPLE: Running more than 500 jobs workaround
+=head1 EXAMPLE: Running more than 250 jobs workaround

 If you need to run a massive amount of jobs in parallel, then you will
-likely hit the filehandle limit which is often around 500 jobs. If you
+likely hit the filehandle limit which is often around 250 jobs. If you
 are super user you can raise the limit in /etc/security/limits.conf
 but you can also use this workaround. The filehandle limit is per
 process. That means that if you just spawn more GNU B<parallel>s then
-each of them can run 500 jobs. This will spawn up to 2500 jobs:
+each of them can run 250 jobs. This will spawn up to 2500 jobs:

 B<cat myinput | parallel --pipe -N 50 --round-robin -j50 parallel -j50 your_prg>

-This will spawn up to 250000 jobs (use with caution - you need 250 GB RAM to do this):
+This will spawn up to 62500 jobs (use with caution - you need 64 GB
+RAM to do this, and you may need to increase /proc/sys/kernel/pid_max):

-B<cat myinput | parallel --pipe -N 500 --round-robin -j500 parallel -j500 your_prg>
+B<cat myinput | parallel --pipe -N 250 --round-robin -j250 parallel -j250 your_prg>


 =head1 EXAMPLE: Working as mutex and counting semaphore
@ -3967,14 +4013,12 @@ and 150 ms after that.

 =head3 Job startup

-Starting a job on the local machine takes around 3 ms. This can be a
+Starting a job on the local machine takes around 10 ms. This can be a
 big overhead if the job takes very few ms to run. Often you can group
 small jobs together using B<-X> which will make the overhead less
 significant. Or you can run multiple GNU B<parallel>s as described in
 B<EXAMPLE: Speeding up fast jobs>.

-Using B<--ungroup> the 3 ms can be lowered to around 2 ms.
-
 =head3 SSH

 When using multiple computers GNU B<parallel> opens B<ssh> connections
--- a/src/parallel_tutorial.1
+++ b/src/parallel_tutorial.1
@ -133,7 +133,7 @@
 .\" ========================================================================
 .\"
 .IX Title "PARALLEL_TUTORIAL 1"
-.TH PARALLEL_TUTORIAL 1 "2014-08-23" "20140822" "parallel"
+.TH PARALLEL_TUTORIAL 1 "2014-09-04" "20140827" "parallel"
 .\" For nroff, turn off justification.  Always turn off hyphenation; it makes
 .\" way too many mistakes in technical documents.
 .if n .ad l
@ -1277,7 +1277,7 @@ Output:
 \&\s-1GNU\s0 Parallel can save the output of each job into files:
 .PP
 .Vb 1
-\&  parallel \-\-files ::: A B C
+\&  parallel \-\-files echo ::: A B C
 .Ve
 .PP
 Output will be similar to:
@ -1292,7 +1292,7 @@ By default \s-1GNU\s0 Parallel will cache the output in files in /tmp. This
 can be changed by setting \f(CW$TMPDIR\fR or \-\-tmpdir:
 .PP
 .Vb 1
-\&  parallel \-\-tmpdir /var/tmp \-\-files ::: A B C
+\&  parallel \-\-tmpdir /var/tmp \-\-files echo ::: A B C
 .Ve
 .PP
 Output will be similar to:
@ -1306,7 +1306,7 @@ Output will be similar to:
 Or:
 .PP
 .Vb 1
-\&  TMPDIR=/var/tmp parallel \-\-files ::: A B C
+\&  TMPDIR=/var/tmp parallel \-\-files echo ::: A B C
 .Ve
 .PP
 Output: Same as above.
--- a/src/parallel_tutorial.html
+++ b/src/parallel_tutorial.html
@ -952,7 +952,7 @@

 <p>GNU Parallel can save the output of each job into files:</p>

-<pre><code>  parallel --files ::: A B C</code></pre>
+<pre><code>  parallel --files echo ::: A B C</code></pre>

 <p>Output will be similar to:</p>

@ -962,7 +962,7 @@

 <p>By default GNU Parallel will cache the output in files in /tmp. This can be changed by setting $TMPDIR or --tmpdir:</p>

-<pre><code>  parallel --tmpdir /var/tmp --files ::: A B C</code></pre>
+<pre><code>  parallel --tmpdir /var/tmp --files echo ::: A B C</code></pre>

 <p>Output will be similar to:</p>

@ -972,7 +972,7 @@

 <p>Or:</p>

-<pre><code>  TMPDIR=/var/tmp parallel --files ::: A B C</code></pre>
+<pre><code>  TMPDIR=/var/tmp parallel --files echo ::: A B C</code></pre>

 <p>Output: Same as above.</p>

--- a/src/parallel_tutorial.pod
+++ b/src/parallel_tutorial.pod
@ -880,7 +880,7 @@ Output:

 GNU Parallel can save the output of each job into files:

-  parallel --files ::: A B C
+  parallel --files echo ::: A B C

 Output will be similar to:

@ -891,7 +891,7 @@ Output will be similar to:
 By default GNU Parallel will cache the output in files in /tmp. This
 can be changed by setting $TMPDIR or --tmpdir:

-  parallel --tmpdir /var/tmp --files ::: A B C
+  parallel --tmpdir /var/tmp --files echo ::: A B C

 Output will be similar to:

@ -901,7 +901,7 @@ Output will be similar to:

 Or:

-  TMPDIR=/var/tmp parallel --files ::: A B C
+  TMPDIR=/var/tmp parallel --files echo ::: A B C

 Output: Same as above.