niceload: Wrong regexp for loadaverage on MacOS X. Force LANG=C.

2024-11-22 14:07:55 +00:00 · 2016-08-03 23:42:15 +02:00 · 2016-08-03 23:42:15 +02:00 · 769d2706f2
parent efcfefedf0
commit 769d2706f2
6 changed files with 145 additions and 56 deletions
--- a/doc/release_new_version
+++ b/doc/release_new_version
@ -219,30 +219,21 @@ cc:Tim Cuthbertson <tim3d.junk@gmail.com>,
   Ryoichiro Suzuki <ryoichiro.suzuki@gmail.com>,
   Jesse Alama <jesse.alama@gmail.com>

-Subject: GNU Parallel 20160722 ('Brexit') released <<[stable]>>
+Subject: GNU Parallel 20160722 ('Munich/Erdogan') released <<[stable]>>

-GNU Parallel 20160722 ('Brexit') <<[stable]>> has been released. It is available for download at: http://ftp.gnu.org/gnu/parallel/
+GNU Parallel 20160722 ('Munich/Erdogan') <<[stable]>> has been released. It is available for download at: http://ftp.gnu.org/gnu/parallel/

 <<No new functionality was introduced so this is a good candidate for a stable release.>>

 Haiku of the month:

-  Pipes are fast and good.
-  Use them in your programs, too.
-  Use GNU Parallel
+  <<>>
    -- Ole Tange

 New in this release:

-* env_parallel is now ready for wider testing. It is still beta quality.
+* GNU Parallel was cited in: Exome sequencing of geographically diverse barley landraces and wild relatives gives insights into environmental adaptation http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.3612.html?WT.feed_name=subjects_genetics#references

-* env_parallel is heavily modified for all shells and testing has been increased.
-
-* Selectively choosing what to export using --env now works for env_parallel (bash, csh, fish, ksh, pdksh, tcsh, zsh).
-
-* --round-robin now gives more work to a job that processes faster instead of same amount to all jobs.
-
-* --pipepart works on block devices on GNU/Linux.

 * <<Possibly http://link.springer.com/chapter/10.1007%2F978-3-319-22053-6_46>>

@ -270,31 +261,19 @@ for Big Data Applications https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumb

 * <<link No citation: Next-generation TCP for ns-3 simulator http://www.sciencedirect.com/science/article/pii/S1569190X15300939>>

+* <<link No citation: Scalable metagenomics alignment research tool (SMART): a scalable, rapid, and complete search heuristic for the classification of metagenomic sequences from complex sequence populations http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1159-6#Bib1>>
+
 * <<No citation: Argumentation Models for Cyber Attribution http://arxiv.org/pdf/1607.02171.pdf>>

 * <<Possible: http://link.springer.com/article/10.1007/s12021-015-9290-5 http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_14>>

-* GNU Parallel was cited in: HybPiper: Extracting Coding Sequence and Introns for Phylogenetics from High-Throughput Sequencing Reads Using Target Enrichment http://www.bioone.org/doi/full/10.3732/apps.1600016
+* Easy parallelization with GNU parallel http://mpharrigan.com/2016/08/02/parallel.html

-* GNU Parallel was cited in: StrAuto - Automation and Parallelization of STRUCTURE Analysis http://www.crypticlineage.net/download/strauto/strauto_doc.pdf
+* Facebook V: Predicting Check Ins, Winner's Interview: 2nd Place, Markus Kliegl http://blog.kaggle.com/2016/08/02/facebook-v-predicting-check-ins-winners-interview-2nd-place-markus-kliegl/

-* GNU Parallel was cited in: Tools and techniques for computational reproducibility http://gigascience.biomedcentral.com/articles/10.1186/s13742-016-0135-4
+* Parallel import http://www.manitou-mail.org/blog/2016/07/parallel-import/

-* GNU Parallel was cited in: FlashPCA: fast sparse canonical correlation analysis of genomic data http://biorxiv.org/content/biorxiv/suppl/2016/04/06/047217.DC1/047217-1.pdf
-
-* GNU Parallel was cited in: Computational Design of DNA-Binding Proteins http://link.springer.com/protocol/10.1007/978-1-4939-3569-7_16
-
-* GNU Parallel was cited in: Math Indexer and Searcher under the Hood: Fine-tuning Query Expansion and Unification Strategies http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings12/pdf/ntcir/MathIR/05-NTCIR12-MathIR-RuzickaM.pdf
-
-* GNU Parallel was cited in: The Evolution and Fate of Super-Chandrasekhar Mass White Dwarf Merger Remnants http://arxiv.org/pdf/1606.02300.pdf
-
-* GNU Parallel was cited in: Evaluation of Coastal Scatterometer Products https://mdc.coaps.fsu.edu/scatterometry/meeting/docs/2016/Thu_AM/coastal-poster.pdf
-
-* GNU Parallel was used in: https://github.com/splitice/bulkdnsblcheck
-
-* The iconv slurp misfeature http://www.openfusion.net/linux/iconv_slurp_misfeature
-
-* แบบว่า CPU เหลือ https://veer66.wordpress.com/2016/06/15/gnu-parallel/
+* Large file batch processing using NodeJs and GNU Parallel http://www.zacorndorff.com/2016/07/27/large-file-batch-processing-using-nodejs-and-gnu-parallel/

 * Bug fixes and man page updates.

--- a/src/niceload
+++ b/src/niceload
@ -24,7 +24,7 @@
 use strict;
 use Getopt::Long;
 $Global::progname="niceload";
-$Global::version = 20160722;
+$Global::version = 20160724;
 Getopt::Long::Configure("bundling","require_order");
 get_options_from_array(\@ARGV) || die_usage();
 if($opt::version) {
@ -1005,7 +1005,7 @@ sub load_status_linux {
 	    ::die_bug("proc_loadavg");
 	}
 	close IN;
-    } elsif (open(IN,"uptime|")) {
+    } elsif (open(IN,"LANG=C uptime|")) {
 	my $upString = <IN>;
 	if($upString =~ m/averages?.\s*(\d+\.\d+)/) {
 	    $loadavg = $1;
@ -1019,7 +1019,7 @@ sub load_status_linux {

 sub load_status_darwin {
    my $loadavg = `sysctl vm.loadavg`;
-    if($loadavg =~ /vm\.loadavg: { ([0-9.]+) ([0-9.]+) ([0-9.]+) }/) {
+    if($loadavg =~ /vm\.loadavg: \{ ([0-9.]+) ([0-9.]+) ([0-9.]+) \}/) {
 	$loadavg = $1;
    } elsif (open(IN,"LANG=C uptime|")) {
 	my $upString = <IN>;
--- a/src/niceload.pod
+++ b/src/niceload.pod
@ -32,9 +32,9 @@ run 1 second, suspend (3.00-1.00) seconds, run 1 second, suspend

 =over 9

-=item B<-B> (beta testing)
+=item B<-B>

-=item B<--battery> (beta testing)
+=item B<--battery>

 Suspend if the system is running on battery. Shorthand for: -l -1 --sensor 'cat /sys/class/power_supply/BAT0/status /proc/acpi/battery/BAT0/state 2>/dev/null |grep -i -q discharging; echo $?'

@ -102,12 +102,12 @@ B<--noswap> is over limit if the system is swapping both in and out.
 B<--noswap> will set both B<--start-noswap> and B<run-noswap>.


-=item B<--net> (beta testing)
+=item B<--net>

 Shorthand for B<--nethops 3>.


-=item B<--nethops> I<h> (beta testing)
+=item B<--nethops> I<h>

 Network nice. Pause if the internet connection is overloaded.

@ -140,9 +140,9 @@ Process ID of process to suspend. You can specify multiple process IDs
 with multiple B<-p> I<PID>.


-=item B<--prg> I<program> (beta testing)
+=item B<--prg> I<program>

-=item B<--program> I<program> (beta testing)
+=item B<--program> I<program>

 Name of running program to suspend. You can specify multiple programs
 with multiple B<--prg> I<program>. If no processes with the name
--- a/src/parallel.pod
+++ b/src/parallel.pod
@ -632,9 +632,7 @@ The variable '_' is special. It will copy all exported environment
 variables except for the ones mentioned in ~/.parallel/ignored_vars.

 To copy the full environment (both exported and not exported
-variables, arrays, and functions) use B<env_parallel> as described
-under the option I<command>.
-
+variables, arrays, and functions) use B<env_parallel>.

 See also: B<--record-env>.

@ -2512,7 +2510,7 @@ B<--env>:

 If your environment (aliases, variables, and functions) is small you
 can copy the full environment without having to B<export -f>
-anything. See B<env_parallel> earlier in the man page.
+anything. See B<env_parallel>.


 =head1 EXAMPLE: Function tester
@ -2579,16 +2577,16 @@ foo) you can do:
  parallel --plus 'mkdir {..}; tar -C {..} -xf {}' ::: *.tar.gz


-=head1 EXAMPLE: Download 10 images for each of the past 30 days
+=head1 EXAMPLE: Download 24 images for each of the past 30 days

 Let us assume a website stores images like:

  http://www.example.com/path/to/YYYYMMDD_##.jpg

-where YYYYMMDD is the date and ## is the number 01-10. This will
+where YYYYMMDD is the date and ## is the number 01-24. This will
 download images for the past 30 days:

-  parallel wget http://www.example.com/path/to/'$(date -d "today -{1} days" +%Y%m%d)_{2}.jpg' ::: $(seq 30) ::: $(seq -w 10)
+  parallel wget http://www.example.com/path/to/'$(date -d "today -{1} days" +%Y%m%d)_{2}.jpg' ::: $(seq 30) ::: $(seq -w 24)

 B<$(date -d "today -{1} days" +%Y%m%d)> will give the dates in
 YYYYMMDD with B<{1}> days subtracted.
@ -4383,8 +4381,7 @@ support running jobs on remote computers.

 B<prll> encourages using BASH aliases and BASH functions instead of
 scripts. GNU B<parallel> supports scripts directly, functions if they
-are exported using B<export -f>, and aliases if using B<env_parallel>
-described earlier.
+are exported using B<export -f>, and aliases if using B<env_parallel>.

 B<prll> generates a lot of status information on stderr (standard
 error) which makes it harder to use the stderr (standard error) output
@ -4729,6 +4726,66 @@ B<4> find . -name '*.bmp' | jobflow -threads=8 -exec bmp2jpeg {.}.bmp {.}.jpg
 B<4> find . -name '*.bmp' | parallel -j8 bmp2jpeg {.}.bmp {.}.jpg


+=head2 DIFFERENCES BETWEEN gargs AND GNU Parallel
+
+B<gargs> can run multiple jobs in parallel.
+
+It caches output in memory. This causes it to be extremely slow when
+the output is larger than the physical RAM, and can cause the system
+to run out of memory.
+
+See more details on this in B<man parallel_design>.
+
+
+Output to stderr (standard error) is changed if the command fails.
+
+Here are the two examples from B<gargs> website.
+
+B<1> seq 12 -1 1 | gargs -p 4 -n 3 "sleep {0}; echo {1} {2}"
+
+B<1> seq 12 -1 1 | parallel -P 4 -n 3 "sleep {1}; echo {2} {3}"
+
+B<2> cat t.txt | gargs --sep "\s+" -p 2 "echo '{0}:{1}-{2}' full-line: \'{}\'"
+
+B<2> cat t.txt | parallel --colsep "\\s+" -P 2 "echo '{1}:{2}-{3}' full-line: \'{}\'"
+
+
+=head2 DIFFERENCES BETWEEN orgalorg AND GNU Parallel
+
+B<orgalorg> can run the same job on multiple machines. This is related
+to B<--onall> and B<--nonall>.
+
+B<orgalorg> supports entering the SSH password - provided it is the
+same for all servers. GNU B<parallel> advocates using B<ssh-agent>
+instead, but it is possible to emulate B<orgalorg>'s behavior by
+setting SSHPASS and by using B<--ssh "sshpass ssh">.
+
+To make the emulation easier, make a simple alias:
+
+  alias par_emul="parallel -j0 --ssh 'sshpass ssh' --nonall --tag --linebuffer"
+
+If you want to supply a password run:
+
+  SSHPASS=`ssh-askpass`
+
+or set the password directly:
+
+  SSHPASS=P4$$w0rd!
+
+If the above is set up you can then do:
+
+  orgalorg -o frontend1 -o frontend2 -p -C uptime
+  par_emul -S frontend1 -S frontend2 uptime
+
+  orgalorg -o frontend1 -o frontend2 -p -C top -bid 1
+  par_emul -S frontend1 -S frontend2 top -bid 1
+
+  orgalorg -o frontend1 -o frontend2 -p -er /tmp -n 'md5sum /tmp/bigfile' -S bigfile
+  par_emul -S frontend1 -S frontend2 --basefile bigfile --workdir /tmp  md5sum /tmp/bigfile
+
+B<orgalorg> has a progress indicator for the transferring of a
+file. GNU B<parallel> does not.
+

 =head2 DIFFERENCES BETWEEN ClusterSSH AND GNU Parallel

@ -4834,8 +4891,8 @@ or:

 it may be because I<command> is not known, but it could also be
 because I<command> is an alias or a function. If it is a function you
-need to B<export -f> the function first. An alias will only work if you use 
-B<env_parallel> described earlier.
+need to B<export -f> the function first. An alias will only work if
+you use B<env_parallel>.


 =head1 REPORTING BUGS
--- a/src/parallel_design.pod
+++ b/src/parallel_design.pod
@ -543,7 +543,7 @@ The wrapper looks like this:
 Transferring of variables and functions given by B<--env> is done by
 running a Perl script remotely that calls the actual command. The Perl
 script sets B<$ENV{>I<variable>B<}> to the correct value before
-exec'ing the a shell that runs the function definition followed by the
+exec'ing a shell that runs the function definition followed by the
 actual command.

 The function B<env_parallel> copies the full current environment into
@ -743,10 +743,63 @@ not need to sync them to disk.
 It gives the odd situation that a disk can be fully used, but there
 are no visible files on it.

+=head3 Comparing to buffering in memory
+
+B<gargs> is a parallelizing tool that buffers in memory. It is
+therefore a useful way of comparing the advantages and disadvantages.
+
+On an system with 6 GB RAM free and 6 GB free swap these were tested
+with different sizes:
+
+  echo /dev/zero | gargs "head -c $size {}" >/dev/null
+  echo /dev/zero | parallel "head -c $size {}" >/dev/null
+
+The results are here:
+
+  JobRuntime      Command
+       0.344      parallel_test 1M
+       0.362      parallel_test 10M
+       0.640      parallel_test 100M
+       9.818      parallel_test 1000M
+      23.888      parallel_test 2000M
+      30.217      parallel_test 2500M
+      30.963      parallel_test 2750M
+      34.648      parallel_test 3000M
+      43.302      parallel_test 4000M
+      55.167      parallel_test 5000M
+      67.493      parallel_test 6000M
+     178.654      parallel_test 7000M
+     204.138      parallel_test 8000M
+     230.052      parallel_test 9000M
+     255.639      parallel_test 10000M
+     757.981      parallel_test 30000M
+       0.537      gargs_test 1M
+       0.292      gargs_test 10M
+       0.398      gargs_test 100M
+       3.456      gargs_test 1000M
+       8.577      gargs_test 2000M
+      22.705      gargs_test 2500M
+     123.076      gargs_test 2750M
+      89.866      gargs_test 3000M
+     291.798      gargs_test 4000M
+
+GNU B<parallel> is pretty much limited by the speed of the disk: Up to
+6 GB data is written to disk but cached, so reading is fast. Above 6
+GB data are both written and read from disk. When the 30000MB job is
+running, the system is slow, but not completely unusable: If you are
+not using the disk, you almost do not feel it.
+
+B<gargs> hits a wall around 2500M. Then the system starts swapping
+like crazy and is completely unusable. At 5000M it goes out of memory.
+
+You can make GNU B<parallel> behave similar to B<gargs> if you point
+$TMPDIR to a tmpfs-filesystem: It will be faster for small outputs,
+but kill your system for larger outputs.
+

 =head2 Disk full

-GNU B<parallel> buffers on disk. If the disk is full data may be
+GNU B<parallel> buffers on disk. If the disk is full, data may be
 lost. To check if the disk is full GNU B<parallel> writes a 8193 byte
 file every second. If this file is written successfully, it is removed
 immediately. If it is not written successfully, the disk is full. The
@ -758,7 +811,7 @@ systems, whereas 8193 did the correct thing on all tested filesystems.

 The shorthands for replacement strings make a command look more
 cryptic. Different users will need different replacement
-strings. Instead of inventing more shorthands you get more more
+strings. Instead of inventing more shorthands you get more
 flexible replacement strings if they can be programmed by the user.

 The language Perl was chosen because GNU B<parallel> is written in
@ -939,7 +992,7 @@ was obsoleted 20130222 and removed one year later.
 Until 20150122 variables and functions were transferred by looking at
 $SHELL to see whether the shell was a B<*csh> shell. If so the
 variables would be set using B<setenv>. Otherwise they would be set
-using B<=>. The caused the content of the variable to be repeated:
+using B<=>. This caused the content of the variable to be repeated:

 echo $SHELL | grep "/t\{0,1\}csh" > /dev/null && setenv VAR foo ||
 export VAR=foo
--- a/src/parallel_tutorial.pod
+++ b/src/parallel_tutorial.pod
@ -1281,7 +1281,7 @@ B<--resume-failed> reads the commands from the command line (and
 ignores the commands in the joblog), B<--retry-failed> ignores the
 command line and reruns the commands mentioned in the joblog.

-  parallel --resume-failed --joblog /tmp/log
+  parallel --retry-failed --joblog /tmp/log
  cat /tmp/log

 Output: