parallel: --number-of-cores now respects 'taskset'.

This commit is contained in:
Ole Tange 2015-02-23 22:32:34 +01:00
parent 72819bdcba
commit e22467f4dd
5 changed files with 120 additions and 82 deletions

4
NEWS
View file

@ -22,7 +22,7 @@
Experiments for Identifying Recommender Differences
http://elehack.net/research/thesis/mde-thesis.pdf
* GNU Parallel was using (unfortunately with wrong citation) in:
* GNU Parallel was used (unfortunately with wrong citation) in:
Performance and Scaling Comparison Study of RDBMS and NoSQL
(MongoDB)
http://ijact.in/wp-content/uploads/2014/11/COMPUSOFT-311-1270-1275.pdf
@ -37,7 +37,7 @@
http://biorxiv.org/content/biorxiv/early/2014/12/05/012179.full.pdf
* Zip Folders with GNU Parallel
http://fazky.github.io/Linux/2015-01-07-GNU-Parallel.html
http://fazky.github.io/posts/Linux/2015-01-07-GNU-Parallel.html
* Using GNU Parallel with Freesurfer
http://programminginadarkroom.blogspot.dk/2015/02/using-gnu-parallel-with-freesurfer.html

View file

@ -208,44 +208,21 @@ cc:Tim Cuthbertson <tim3d.junk@gmail.com>,
Ryoichiro Suzuki <ryoichiro.suzuki@gmail.com>,
Jesse Alama <jesse.alama@gmail.com>
Subject: GNU Parallel 20150222 (' (((:~{> Krudttønden') released
Subject: GNU Parallel 20150322 ('') released
GNU Parallel 20150222 (' (((:~{> Krudttønden') has been released. It is available for download at: http://ftp.gnu.org/gnu/parallel/
GNU Parallel 20150322 ('') has been released. It is available for download at: http://ftp.gnu.org/gnu/parallel/
Haiku of the month:
xargs' space and quote
headache causing behaviour.
Use GNU Parallel
-- Ole Tange
<<>>
New in this release:
* --tmux has gotten a major overhaul.
* GNU Parallel was cited in: RIG: Recalibration and Interrelation of genomic sequence data with the GATK http://www.g3journal.org/content/early/2015/02/13/g3.115.017012.full.pdf+html
* GNU Parallel was cited in: RaftLib: A C++ Template Library for High Performance Stream Parallel Processing http://www.cs.wustl.edu/~lip/pubs/pmam15_jbeard.pdf
* GNU Parallel was cited in: MPI-blastn and NCBI-TaxCollector: Improving metagenomic analysis with high performance classification and wide taxonomic attachment http://www.worldscientific.com/doi/abs/10.1142/S0219720014500139?af=R&
* GNU Parallel was cited in: Towards Collaborative Exploration and Analysis of Big Data from Mars: A Noachis Terra Case Study http://link.springer.com/chapter/10.1007/978-3-319-13865-7_25
* GNU Parallel was cited in: Quantifying properties of hot and dense QCD matter through systematic model-to-data comparison http://arxiv.org/pdf/1502.00339.pdf
* GNU Parallel was cited in: Towards Collaborative Exploration and Analysis of Big Data from Mars: A Noachis Terra Case Study http://link.springer.com/chapter/10.1007/978-3-319-13865-7_25
* GNU Parallel was cited in: Towards Recommender Engineering Tools and Experiments for Identifying Recommender Differences http://elehack.net/research/thesis/mde-thesis.pdf
* GNU Parallel was using (unfortunately with wrong citation) in: Performance and Scaling Comparison Study of RDBMS and NoSQL (MongoDB) http://ijact.in/wp-content/uploads/2014/11/COMPUSOFT-311-1270-1275.pdf
* GNU Parallel was used (unfortunately without citation) in: Parallel Implementation of Big Data Pre-Processing Algorithms for Sentiment Analysis of Social Networking Data http://www.researchmathsci.org/IJFMAart/ijfma-v6n2-7.pdf
* GNU Parallel was used (unfortunately without citation) in: SpeedSeq: Ultra-fast personal genome analysis and interpretation http://biorxiv.org/content/biorxiv/early/2014/12/05/012179.full.pdf
* Zip Folders with GNU Parallel http://fazky.github.io/Linux/2015-01-07-GNU-Parallel.html
* Using GNU Parallel with Freesurfer http://programminginadarkroom.blogspot.dk/2015/02/using-gnu-parallel-with-freesurfer.html
* GNU Parallel is used in Velociraptor: https://github.com/ericwhyne/Velociraptor
* Marcus Beach GNU Parallel http://marcusbeach.co/gnu-parallel/
* GNU Parallel was used in: https://github.com/alexbyrnes/FCC-Political-Ads_The-Code
* Bug fixes and man page updates.

View file

@ -953,7 +953,7 @@ sub parse_options {
sub init_globals {
# Defaults:
$Global::version = 20150222;
$Global::version = 20150223;
$Global::progname = 'parallel';
$Global::infinity = 2**31;
$Global::debug = 0;
@ -4616,20 +4616,35 @@ sub no_of_cpus_gnu_linux {
# undef if not GNU/Linux
my $no_of_cpus;
my $no_of_cores;
my $no_of_active_cores;
if(-e "/proc/cpuinfo") {
$no_of_cpus = 0;
$no_of_cores = 0;
my %seen;
open(my $in_fh, "<", "/proc/cpuinfo") || return undef;
while(<$in_fh>) {
if(/^physical id.*[:](.*)/ and not $seen{$1}++) {
$no_of_cpus++;
}
/^processor.*[:]/i and $no_of_cores++;
}
close $in_fh;
if(open(my $in_fh, "<", "/proc/cpuinfo")) {
while(<$in_fh>) {
if(/^physical id.*[:](.*)/ and not $seen{$1}++) {
$no_of_cpus++;
}
/^processor.*[:]/i and $no_of_cores++;
}
close $in_fh;
}
}
return ($no_of_cpus||$no_of_cores);
if(-e "/proc/self/status") {
# if 'taskset' is used to limit number of cores
if(open(my $in_fh, "<", "/proc/self/status")) {
while(<$in_fh>) {
if(/^Cpus_allowed:\s*(\S+)/) {
my $a = $1;
$a =~ tr/,//d;
$no_of_active_cores = unpack ("%32b*", pack ("H*",$a));
}
}
close $in_fh;
}
}
return (::min($no_of_cpus || $no_of_cores,$no_of_active_cores));
}
sub no_of_cores_gnu_linux {
@ -4637,6 +4652,7 @@ sub no_of_cores_gnu_linux {
# Number of CPU cores on GNU/Linux
# undef if not GNU/Linux
my $no_of_cores;
my $no_of_active_cores;
if(-e "/proc/cpuinfo") {
$no_of_cores = 0;
open(my $in_fh, "<", "/proc/cpuinfo") || return undef;
@ -4645,7 +4661,20 @@ sub no_of_cores_gnu_linux {
}
close $in_fh;
}
return $no_of_cores;
if(-e "/proc/self/status") {
# if 'taskset' is used to limit number of cores
if(open(my $in_fh, "<", "/proc/self/status")) {
while(<$in_fh>) {
if(/^Cpus_allowed:\s*(\S+)/) {
my $a = $1;
$a =~ tr/,//d;
$no_of_active_cores = unpack ("%32b*", pack ("H*",$a));
}
}
close $in_fh;
}
}
return (::min($no_of_cores,$no_of_active_cores));
}
sub no_of_cpus_freebsd {

View file

@ -104,7 +104,7 @@ B<--env> and use B<env_parallel> instead of B<parallel>.
The command cannot contain the character \257 (macron: ¯).
=item B<{}> (beta testing)
=item B<{}>
Input line. This replacement string will be replaced by a full line
read from the input source. The input source is normally stdin
@ -117,7 +117,7 @@ If the command line contains no replacement strings then B<{}> will be
appended to the command line.
=item B<{.}> (beta testing)
=item B<{.}>
Input line without extension. This replacement string will be replaced
by the input with the extension removed. If the input line contains
@ -133,7 +133,7 @@ The replacement string B<{.}> can be changed with B<--er>.
To understand replacement strings see B<{}>.
=item B<{/}> (beta testing)
=item B<{/}>
Basename of input line. This replacement string will be replaced by
the input with the directory part removed.
@ -144,7 +144,7 @@ B<--basenamereplace>.
To understand replacement strings see B<{}>.
=item B<{//}> (beta testing)
=item B<{//}>
Dirname of input line. This replacement string will be replaced by the
dir of the input line. See B<dirname>(1).
@ -155,7 +155,7 @@ B<--dirnamereplace>.
To understand replacement strings see B<{}>.
=item B<{/.}> (beta testing)
=item B<{/.}>
Basename of input line without extension. This replacement string will
be replaced by the input with the directory and extension part
@ -167,7 +167,7 @@ B<--basenameextensionreplace>.
To understand replacement strings see B<{}>.
=item B<{#}> (beta testing)
=item B<{#}>
Sequence number of the job to run. This replacement string will be
replaced by the sequence number of the job being run. It contains the
@ -178,7 +178,7 @@ The replacement string B<{#}> can be changed with B<--seqreplace>.
To understand replacement strings see B<{}>.
=item B<{%}> (beta testing)
=item B<{%}>
Job slot number. This replacement string will be replaced by the job's
slot number between 1 and number of jobs to run in parallel. There
@ -190,7 +190,7 @@ The replacement string B<{%}> can be changed with B<--slotreplace>.
To understand replacement strings see B<{}>.
=item B<{>I<n>B<}> (beta testing)
=item B<{>I<n>B<}>
Argument from input source I<n> or the I<n>'th argument. This
positional replacement string will be replaced by the input from input
@ -201,7 +201,7 @@ I<n>'th last argument.
To understand replacement strings see B<{}>.
=item B<{>I<n>.B<}> (beta testing)
=item B<{>I<n>.B<}>
Argument from input source I<n> or the I<n>'th argument without
extension. It is a combination of B<{>I<n>B<}> and B<{.}>.
@ -214,7 +214,7 @@ extension removed.
To understand positional replacement strings see B<{>I<n>B<}>.
=item B<{>I<n>/B<}> (beta testing)
=item B<{>I<n>/B<}>
Basename of argument from input source I<n> or the I<n>'th argument.
It is a combination of B<{>I<n>B<}> and B<{/}>.
@ -227,7 +227,7 @@ directory (if any) removed.
To understand positional replacement strings see B<{>I<n>B<}>.
=item B<{>I<n>//B<}> (beta testing)
=item B<{>I<n>//B<}>
Dirname of argument from input source I<n> or the I<n>'th argument.
It is a combination of B<{>I<n>B<}> and B<{//}>.
@ -239,7 +239,7 @@ the I<n>'th argument (when used with B<-N>). See B<dirname>(1).
To understand positional replacement strings see B<{>I<n>B<}>.
=item B<{>I<n>/.B<}> (beta testing)
=item B<{>I<n>/.B<}>
Basename of argument from input source I<n> or the I<n>'th argument
without extension. It is a combination of B<{>I<n>B<}>, B<{/}>, and
@ -253,7 +253,7 @@ directory (if any) and extension removed.
To understand positional replacement strings see B<{>I<n>B<}>.
=item B<{=>I<perl expression>B<=}> (beta testing)
=item B<{=>I<perl expression>B<=}>
Replace with calculated I<perl expression>. B<$_> will contain the
same as B<{}>. After evaluating I<perl expression> B<$_> will be used
@ -266,7 +266,7 @@ The B<{=>I<perl expression>B<=}> must be given as a single string.
See also: B<--rpl> B<--parens>
=item B<{=>I<n> I<perl expression>B<=}> (beta testing)
=item B<{=>I<n> I<perl expression>B<=}>
Positional equivalent to B<{= perl expression =}>. To understand
positional replacement strings see B<{>I<n>B<}>.
@ -444,7 +444,7 @@ I<size> defaults to 1M.
See B<--pipe> and B<--pipepart> for use of this.
=item B<--cat> (beta testing)
=item B<--cat>
Create a temporary file with content. Normally B<--pipe>/B<--pipepart>
will give data to the program on stdin (standard input). With B<--cat>
@ -454,7 +454,7 @@ you can do: B<parallel --pipe --cat wc {}>.
See also B<--fifo>.
=item B<--cleanup> (beta testing)
=item B<--cleanup>
Remove transferred files. B<--cleanup> will remove the transferred files
on the remote computer after processing is done.
@ -568,7 +568,7 @@ If I<eof-str> is omitted, there is no end of file string. If neither
B<-E> nor B<-e> is used, no end of file string is used.
=item B<--env> I<var> (beta testing)
=item B<--env> I<var>
Copy environment variable I<var>. This will copy I<var> to the
environment that the command is run in. This is especially useful for
@ -635,7 +635,7 @@ Implies B<--semaphore>.
See also B<--bg>, B<man sem>.
=item B<--fifo> (beta testing)
=item B<--fifo>
Create a temporary fifo with content. Normally B<--pipe> and
B<--pipepart> will give data to the program on stdin (standard
@ -687,9 +687,9 @@ See also: B<--line-buffer> B<--ungroup>
Print a summary of the options to GNU B<parallel> and exit.
=item B<--halt-on-error> I<val> (beta testing)
=item B<--halt-on-error> I<val>
=item B<--halt> I<val> (beta testing)
=item B<--halt> I<val>
How should GNU B<parallel> terminate?
@ -925,7 +925,7 @@ limiting factor.
See also: B<--group> B<--ungroup>
=item B<--load> I<max-load> (beta testing)
=item B<--load> I<max-load>
Do not start new jobs on a given computer unless the number of running
processes on the computer is less than I<max-load>. I<max-load> uses
@ -973,7 +973,7 @@ See also B<-X> for context replace. If in doubt use B<-X> as that will
most likely do what is needed.
=item B<--memfree> I<size> (beta testing)
=item B<--memfree> I<size>
Minimum memory free when starting another job. The I<size> can be
postfixed with K, M, G, T, P, k, m, g, t, or p which would multiply
@ -1169,7 +1169,7 @@ Print the number of CPU cores and exit (used by GNU B<parallel> itself
to determine the number of CPU cores on remote computers).
=item B<--no-keep-order> (beta testing)
=item B<--no-keep-order>
Overrides an earlier B<--keep-order> (e.g. if set in
B<~/.parallel/config>).
@ -1392,7 +1392,7 @@ useful if some jobs fail for no apparent reason (such as network
failure).
=item B<--return> I<filename> (alpha testing)
=item B<--return> I<filename> (beta testing)
Transfer files from remote computers. B<--return> is used with
B<--sshlogin> when the arguments are files on the remote computers. When
@ -1493,7 +1493,7 @@ operating system and the B<-s> option. Pipe the input from /dev/null
to do anything.
=item B<--semaphore> (beta testing)
=item B<--semaphore>
Work as a counting semaphore. B<--semaphore> will cause GNU
B<parallel> to start I<command> in the background. When the number of
@ -1530,9 +1530,9 @@ Implies B<--semaphore>.
See also B<man sem>.
=item B<--semaphoretimeout> I<secs> (beta testing)
=item B<--semaphoretimeout> I<secs>
=item B<--st> I<secs> (beta testing)
=item B<--st> I<secs>
If I<secs> > 0: If the semaphore is not released within I<secs> seconds, take it anyway.
@ -1628,9 +1628,9 @@ I<secs> seconds after starting each ssh. I<secs> can be less than 1
seconds.
=item B<-S> I<[@hostgroups/][ncpu/]sshlogin[,[@hostgroups/][ncpu/]sshlogin[,...]]> (beta testing)
=item B<-S> I<[@hostgroups/][ncpu/]sshlogin[,[@hostgroups/][ncpu/]sshlogin[,...]]>
=item B<--sshlogin> I<[@hostgroups/][ncpu/]sshlogin[,[@hostgroups/][ncpu/]sshlogin[,...]]> (beta testing)
=item B<--sshlogin> I<[@hostgroups/][ncpu/]sshlogin[,[@hostgroups/][ncpu/]sshlogin[,...]]>
Distribute jobs to remote computers. The jobs will be run on a list of
remote computers.
@ -1768,7 +1768,7 @@ the lines will be prepended with the sshlogin instead.
B<--tag> is ignored when using B<-u>.
=item B<--tagstring> I<str> (alpha testing)
=item B<--tagstring> I<str> (beta testing)
Tag lines with a string. Each output line will be prepended with
I<str> and TAB (\t). I<str> can contain replacement strings such as
@ -1785,7 +1785,7 @@ different dir for the files. Setting B<--tmpdir> is equivalent to
setting $TMPDIR.
=item B<--tmux> (alpha testing)
=item B<--tmux> (beta testing)
Use B<tmux> for output. Start a B<tmux> session and run each job in a
window in that session. No other output will be produced.
@ -1811,7 +1811,7 @@ Print the job to be run on stderr (standard error).
See also B<-v>, B<-p>.
=item B<--transfer> (beta testing)
=item B<--transfer>
Transfer files to remote computers. B<--transfer> is used with
B<--sshlogin> when the arguments are files and should be transferred
@ -1838,7 +1838,7 @@ B<--transfer> is often used with B<--return> and B<--cleanup>.
B<--transfer> is ignored when used with B<--sshlogin :> or when not used with B<--sshlogin>.
=item B<--trc> I<filename> (beta testing)
=item B<--trc> I<filename>
Transfer, Return, Cleanup. Short hand for:

View file

@ -198,20 +198,52 @@ shell is B<csh> (which cannot hide stderr).
=item --tmux
mkfifo I<tmpfile.tmx>;
tmux -S <tmpfile.tms> new-session -s pI<PID> -d 'sleep .2' >&/dev/null;
tmux -S <tmpfile.tms> new-window -t pI<PID> -n <<shell quoted input>> \(<<shell quoted input>>\)\;\ perl\ -e\ \'while\(\$t++\<3\)\{\ print\ \$ARGV\[0\],\"\\n\"\ \}\'\ \$\?h/\$status\ \>\>\ I<tmpfile.tmx>\&echo\ <<shell double quoted input>>\;echo\ \Job\ finished\ at:\ \`date\`\;sleep\ 10;
exec perl -e '$/="/";$_=<>;$c=<>;unlink $ARGV; /(\d+)h/ and exit($1);exit$c' I<tmpfile.tmx>
mkfifo I<tmpfile>; tmux new-session -s pI<PID> -d -n <<shell quoted input>> \(<<shell quoted input>>\)\;\ perl\ -e\ \'while\(\$t++\<3\)\{\ print\ \$ARGV\[0\],\"\\n\"\ \}\'\ \$\?h/\$status/255\ \>\>\ I<tmpfile>\&echo\ <<shell double quoted input>>\;echo\ \Job\ finished\ at:\ \`date\`\;sleep\ 10; exec perl -e '$/="/";$_=<>;$c=<>;unlink $ARGV; /(\d+)h/ and exit($1);exit$c' I<tmpfile>
First a FIFO is made (.tmx). It is used for communicating exit
value. Next a new tmux session is made. This may fail if there is
already a session, so the output is ignored. If all job slots finish
at the same time, then B<tmux> will close the session. A temporary
socket is made (.tms) to avoid a race condition in B<tmux>. It is
cleaned up when GNU B<parallel> finishes.
The input is used as the name of the windows in B<tmux>. When the job
inside B<tmux> finishes, the exit value is printed to a fifo. This
fifo is opened by perl outside B<tmux>, and perl then removes the fifo
(but keeping it open). Perl blocks until the first value is read from
the fifo, and this value is used as exit value.
inside B<tmux> finishes, the exit value is printed to the FIFO (.tmx).
This FIFO is opened by B<perl> outside B<tmux>, and B<perl> then
removes the FIFO. B<Perl> blocks until the first value is read from
the FIFO, and this value is used as exit value.
To make it compatible with B<csh> and B<bash> the exit value is
printed as: $?h/$status/255 and this is parsed by perl.
printed as: $?h/$status and this is parsed by B<perl>.
Works in B<csh>.
There is a bug that makes it necessary to print the exit value 3
times. Works in B<csh>.
times.
Another bug in B<tmux> requires the length of the tmux title and
command to not have certain limits. When inside these limits, 75 '\ '
are added to the title to force it to be outside the limits.
You can map the bad limits using:
perl -e 'map { $a=$_; print map { "$a,$_\n" } (1..17000) } (1..17000)' | shuf > ab;
cat ab | parallel --colsep , --tagstring '{1}{=$_="\t"=}{2}'
tmux -S /tmp/p{%} new-session -d -n '{=1 $_="O"x$_ =}' true'\ {=2 $_="O"x$_ =};echo $?;rm /tmp/p{%}'
> value.csv 2>/dev/null
R -e 'a<-read.table("value.csv");X11();plot(a[,1],a[,2],col=a[,3]+5,cex=0.1);Sys.sleep(1000)'
For B<tmux 1.8> 17000 can be lowered to 2100.
The interesting areas are title 0..1000 with (title + whole command)
in 996..1127 and 9331..9636.
=back