parallel: Speedup of --lb: Don't look for \r if \n is found.

This commit is contained in:
Ole Tange 2022-01-08 17:40:37 +01:00
parent 01228bfa41
commit 16e6fb9a65
9 changed files with 388 additions and 211 deletions

5
NEWS
View file

@ -2994,7 +2994,7 @@ New in this release:
20140922 20140922
* If the file give as --sshloginfile is changed it will be re-read * If the file given as --sshloginfile is changed it will be re-read
when a job finishes though at most once per second. This makes it when a job finishes though at most once per second. This makes it
possible to add and remove hosts while running. possible to add and remove hosts while running.
@ -3689,7 +3689,8 @@ New in this release:
pretty cool! pretty cool!
* GNU Parallel was used (unfortunately with improper citation) in: * GNU Parallel was used (unfortunately with improper citation) in:
Understanding the Impact of E-Commerce Software on the Adoption of Structured Data on the Web Understanding the Impact of E-Commerce Software on the Adoption of
Structured Data on the Web
http://link.springer.com/chapter/10.1007/978-3-642-38366-3_9#page-1 http://link.springer.com/chapter/10.1007/978-3-642-38366-3_9#page-1
* GNU Parallel was used (unfortunately with improper citation) in: * GNU Parallel was used (unfortunately with improper citation) in:

View file

@ -1,4 +1,4 @@
AC_INIT([parallel], [20211222], [bug-parallel@gnu.org]) AC_INIT([parallel],[20211222],[bug-parallel@gnu.org])
AM_INIT_AUTOMAKE([-Wall -Werror foreign]) AM_INIT_AUTOMAKE([-Wall -Werror foreign])
AC_CONFIG_HEADERS([config.h]) AC_CONFIG_HEADERS([config.h])
AC_CONFIG_FILES([ AC_CONFIG_FILES([
@ -7,8 +7,7 @@ AC_CONFIG_FILES([
]) ])
AC_ARG_ENABLE(documentation, AC_ARG_ENABLE(documentation,
AC_HELP_STRING([--disable-documentation], AS_HELP_STRING([--disable-documentation],[Omit building and installing the documentation. (default=no)]),,
[Omit building and installing the documentation. (default=no)]),,
[enable_documentation=yes]) [enable_documentation=yes])
AM_CONDITIONAL([DOCUMENTATION], [test x$enable_documentation = xyes]) AM_CONDITIONAL([DOCUMENTATION], [test x$enable_documentation = xyes])
AC_PROG_LN_S AC_PROG_LN_S

View file

@ -24,8 +24,8 @@ if ! $TMP/bin/parallel-20140722 --version; then
mkdir -p $TMP/ftp mkdir -p $TMP/ftp
( (
cd $TMP/ftp cd $TMP/ftp
# wget -c ftp://ftp.gnu.org/old-gnu/parallel/p* wget -c ftp://ftp.gnu.org/old-gnu/parallel/p*
wget -c ftp://ftp.uni-kl.de/pub/gnu/parallel/p* wget -c ftp://mirrors.dotsrc.org/gnu/parallel/p*
parallel 'gpg --auto-key-locate keyserver --keyserver-options auto-key-retrieve {}' ::: *.sig parallel 'gpg --auto-key-locate keyserver --keyserver-options auto-key-retrieve {}' ::: *.sig
parallel --plus 'tar xvf {.} && cd {...} && ./configure --prefix '$TMP'/ftp/{.}-bin && make && make install' ::: *sig parallel --plus 'tar xvf {.} && cd {...} && ./configure --prefix '$TMP'/ftp/{.}-bin && make && make install' ::: *sig
perl -i -pe 's/qw\(keys/(keys/' parallel*/src/parallel perl -i -pe 's/qw\(keys/(keys/' parallel*/src/parallel
@ -45,7 +45,8 @@ measure() {
CORES=$3 CORES=$3
VERSION=$4 VERSION=$4
MHZ=1700 MHZ=1700
echo Running $OUTER test with $INNER jobs each on $CORES cores
# Force cpuspeed at 1.7GHz - seems to give tighter results # Force cpuspeed at 1.7GHz - seems to give tighter results
#forever 'parallel sudo cpufreq-set -g performance -u '$MHZ'MHz -d '$MHZ'MHz -c{} ::: {0..3};sleep 10' & #forever 'parallel sudo cpufreq-set -g performance -u '$MHZ'MHz -d '$MHZ'MHz -c{} ::: {0..3};sleep 10' &

View file

@ -53,9 +53,9 @@ to treat it as software that you have no license to use.
== Do automated scripts break if the notice is not silenced? == == Do automated scripts break if the notice is not silenced? ==
No. Not a single time has that happened. This is due to the notice No. Not a single time has that been demonstrated to happen. This is
only being printed, if the output is to the screen - not if the output due to the notice only being printed, if the output is to the screen -
is to a file or a pipe. not if the output is to a file or a pipe.
== How do I silence the citation notice? == == How do I silence the citation notice? ==
@ -86,6 +86,8 @@ The notice is only relevant if you write scientific articles.
These links say: Yes, you should cite software, and if the author These links say: Yes, you should cite software, and if the author
suggests a way of citing, use that. suggests a way of citing, use that.
* https://www.force11.org/software-citation-principles (refers to many others)
* https://www.software.ac.uk/blog/2016-09-30-oh-research-software-how-shalt-i-cite-thee
* https://blog.apastyle.org/apastyle/2015/01/how-to-cite-software-in-apa-style.html * https://blog.apastyle.org/apastyle/2015/01/how-to-cite-software-in-apa-style.html
* https://libguides.mit.edu/c.php?g=551454&p=3900280 * https://libguides.mit.edu/c.php?g=551454&p=3900280
* https://www.software.ac.uk/how-cite-software * https://www.software.ac.uk/how-cite-software
@ -94,17 +96,20 @@ suggests a way of citing, use that.
* https://journals.aas.org/policy-statement-on-software/ * https://journals.aas.org/policy-statement-on-software/
* https://guides.lib.monash.edu/c.php?g=219786&p=1454293 * https://guides.lib.monash.edu/c.php?g=219786&p=1454293
* https://www.maxqda.com/blogpost/how-to-cite-maxqda * https://www.maxqda.com/blogpost/how-to-cite-maxqda
* https://the-turing-way.netlify.app/communication/citable/citable-cite.html
* https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/creating-a-repository-on-github/about-citation-files * https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/creating-a-repository-on-github/about-citation-files
The CITATION.cff file format was designed to make it easy to cite
software.
If you feel the benefit from using GNU Parallel is too small to If you feel the benefit from using GNU Parallel is too small to
warrant a citation, then prove that by simply using another tool. If warrant a citation, then prove that by simply using another tool. If
you replace your use of GNU Parallel with another tool, you obviously you replace your use of GNU Parallel with another tool, you obviously
do not have to cite GNU Parallel. If it is too much work replacing the do not have to cite GNU Parallel. If it is too much work replacing the
use of GNU Parallel, then it is a good indication that the benefit is use of GNU Parallel, then it is a good indication that the
big enough to warrant a citation. contribution to the research is big enough to warrant a citation.
The citation is also needed for reproducibility. Let us assume a bug
in GNU Parallel skews the results. People replicating the research
needs to have the information, so they can replicate the (possibly
wrong) results.
== Do other software tools show how to cite? == == Do other software tools show how to cite? ==
@ -126,6 +131,9 @@ refer to peer-reviewed articles - others do not:
* http://www.fon.hum.uva.nl/paul/praat.html * http://www.fon.hum.uva.nl/paul/praat.html
* https://github.com/UnixJunkie/PAR/blob/master/README * https://github.com/UnixJunkie/PAR/blob/master/README
The CITATION.cff file format was designed to make it easy to cite
software, and
== I do not like the notice. Can I fork GNU Parallel and remove it? == == I do not like the notice. Can I fork GNU Parallel and remove it? ==
@ -185,9 +193,27 @@ been public domain.
Yes. Find a way to finance future development of GNU Parallel. If you Yes. Find a way to finance future development of GNU Parallel. If you
pay me a normal salary, I will be happy to remove the citation notice. pay me a normal salary, I will be happy to remove the citation notice.
You can also pay to use a specific version, which you will then get
without the citation notice.
The citation notice is about (indirect) funding - nothing else. The citation notice is about (indirect) funding - nothing else.
== Free software should be funded, but not this way ==
OK. But then please take resposibility and do the funding. Show that
it is indeed possible to fund GNU Parallel in a different way by
actually doing it.
Talk is cheap, and action speaks louder than words. Instead of just
telling others what to do, prove that you are serious and actually
*do* the work.
I will be happy to get a "funding manager" and remove the citation
notice, if that means I can stop worrying about rent, mortgages, bills
and retirement.
== I do not want to cite == == I do not want to cite ==
If you do not want to cite, then you should use another tool. If you do not want to cite, then you should use another tool.

View file

@ -4,6 +4,19 @@
Quote of the month: Quote of the month:
If I could only keep 5 GNU utils, parallel would make it to the list
:)
-- 5heikki@reddit
Gnu Parallel: installed in every computer i have access to.
-- raffaele messuti @atomotic@twitter
gnu parallel is a good program
-- Pwn A. Day @pwnaday@twitter
Deus salve o gnu parallel
-- marcos @guv_Tuv@twitter
@a201 @a201
4 4
@ -158,7 +171,7 @@ https://negfeedback.blogspot.com/2020/05/indispensable-command-line-tools.html
me optimise so many of my tasks and analyses. me optimise so many of my tasks and analyses.
-- Parice Brandies @PariceBrandies@twitter -- Parice Brandies @PariceBrandies@twitter
We use gnu parallel now - and happier for it. We use gnu parallel now - and happier for it.
-- Ben Davies @benjamindavies@twitter -- Ben Davies @benjamindavies@twitter
GNU Parallel makes my life so much easier. GNU Parallel makes my life so much easier.
@ -214,7 +227,6 @@ We use gnu parallel now - and happier for it.
and pool concurrency. and pool concurrency.
-- Nick Ursa @nickursa@twitter -- Nick Ursa @nickursa@twitter
I wish more command line software had example pages as robust as GNU Parallel I wish more command line software had example pages as robust as GNU Parallel
-- Lucidbeaming @lucidbeaming -- Lucidbeaming @lucidbeaming

View file

@ -254,7 +254,7 @@ from:tange@gnu.org
to:parallel@gnu.org, bug-parallel@gnu.org to:parallel@gnu.org, bug-parallel@gnu.org
stable-bcc: Jesse Alama <jessealama@fastmail.fm> stable-bcc: Jesse Alama <jessealama@fastmail.fm>
Subject: GNU Parallel 20220122 ('James Webb/Tutu/Pillar of Shame<<>>') released <<[stable]>> Subject: GNU Parallel 20220122 ('Kazakhstan/James Webb/Tutu/Pillar of Shame<<>>') released <<[stable]>>
GNU Parallel 20220122 ('<<>>') <<[stable]>> has been released. It is available for download at: lbry://@GnuParallel:4 GNU Parallel 20220122 ('<<>>') <<[stable]>> has been released. It is available for download at: lbry://@GnuParallel:4

View file

@ -334,8 +334,8 @@ sub parcat_script() {
for $infh (@ready) { for $infh (@ready) {
# There is only one key, namely the output file descriptor # There is only one key, namely the output file descriptor
for my $outfd (keys %{$buffer{$infh}}) { for my $outfd (keys %{$buffer{$infh}}) {
# TODO test if 65536 is optimal (2^17 is used elsewhere) # TODO test if 60800 is optimal (2^17 is used elsewhere)
$rv = sysread($infh, $buf, 65536); $rv = sysread($infh, $buf, 60800);
if (!$rv) { if (!$rv) {
if($! == EAGAIN) { if($! == EAGAIN) {
# Would block: Nothing read # Would block: Nothing read
@ -829,7 +829,30 @@ sub cat_partial($@) {
my @start_len = map { my @start_len = map {
if(++$i % 2) { $start = $_; } else { $_-$start } if(++$i % 2) { $start = $_; } else { $_-$start }
} @start_end; } @start_end;
# This can read 7 GB/s using a single core # The optimal block size differs
# It has been measured on:
# AMD 6376: n*4k-1; small n
# AMD Neo N36L: 44k-200k
# Intel i7-3632QM: 55k-
# ARM Cortex A53: 4k-28k
# Intel i5-2410M: 36k-46k
#
# I choose 2^15-1 = 32767
# q{
# expseq() {
# perl -E '
# $last = pop @ARGV;
# $first = shift || 1;
# $inc = shift || 1.03;
# for($i=$first; $i<=$last;$i*=$inc) { say int $i }
# ' "$@"
# }
#
# seq 111111111 > big;
# f() { ppar --test $1 -a big --pipepart --block -1 'md5sum > /dev/null'; }
# export -f f;
# expseq 1000 1.001 300000 | shuf | parallel -j1 --jl jl-md5sum f;
# };
my $script = spacefree my $script = spacefree
(0, (0,
q{ q{
@ -837,7 +860,7 @@ sub cat_partial($@) {
sysseek(STDIN,shift,0) || die; sysseek(STDIN,shift,0) || die;
$left = shift; $left = shift;
while($read = while($read =
sysread(STDIN,$buf, $left > 131072 ? 131072 : $left)){ sysread(STDIN,$buf, $left > 32767 ? 32767 : $left)){
$left -= $read; $left -= $read;
syswrite(STDOUT,$buf); syswrite(STDOUT,$buf);
} }
@ -1635,7 +1658,7 @@ sub options_hash() {
# https://www.gnu.org/software/parallel/parallel_design.html#Citation-notice # https://www.gnu.org/software/parallel/parallel_design.html#Citation-notice
# https://git.savannah.gnu.org/cgit/parallel.git/tree/doc/citation-notice-faq.txt # https://git.savannah.gnu.org/cgit/parallel.git/tree/doc/citation-notice-faq.txt
# You accept to be put in a public hall of shame by removing # You accept to be put in a public hall of shame by removing
# the lines. # these lines.
"bibtex|citation" => \$opt::citation, "bibtex|citation" => \$opt::citation,
"wc|willcite|will-cite|nn|nonotice|no-notice" => \$opt::willcite, "wc|willcite|will-cite|nn|nonotice|no-notice" => \$opt::willcite,
# Termination and retries # Termination and retries
@ -1670,7 +1693,7 @@ sub options_hash() {
"exit|x" => \$opt::x, "exit|x" => \$opt::x,
# Semaphore # Semaphore
"semaphore" => \$opt::semaphore, "semaphore" => \$opt::semaphore,
"semaphoretimeout|st=i" => \$opt::semaphoretimeout, "semaphoretimeout|st=s" => \$opt::semaphoretimeout,
"semaphorename|id=s" => \$opt::semaphorename, "semaphorename|id=s" => \$opt::semaphorename,
"fg" => \$opt::fg, "fg" => \$opt::fg,
"bg" => \$opt::bg, "bg" => \$opt::bg,
@ -1705,6 +1728,8 @@ sub options_hash() {
"embed" => \$opt::embed, "embed" => \$opt::embed,
"filter=s" => \@opt::filter, "filter=s" => \@opt::filter,
"parset=s" => \$opt::parset, "parset=s" => \$opt::parset,
# Parameter for testing optimal values
"test=s" => \$opt::test,
); );
} }
@ -2561,10 +2586,12 @@ sub parse_semaphore() {
::wait_and_exit(255); ::wait_and_exit(255);
} }
} }
@opt::a = ("/dev/null");
# Append a dummy empty argument # Append a dummy empty argument
# \0 => nothing (not the empty string) # \0 => nothing (not the empty string)
push(@Global::unget_argv, [Arg->new("\0noarg")]); push(@Global::unget_argv, [Arg->new("\0noarg")]);
$Semaphore::timeout = $opt::semaphoretimeout || 0; $Semaphore::timeout = int(multiply_time_units($opt::semaphoretimeout))
|| 0;
if(defined $opt::semaphorename) { if(defined $opt::semaphorename) {
$Semaphore::name = $opt::semaphorename; $Semaphore::name = $opt::semaphorename;
} else { } else {
@ -4907,8 +4934,8 @@ sub reaper() {
if($Global::delayauto or $Global::sshdelayauto) { if($Global::delayauto or $Global::sshdelayauto) {
if($job->exitstatus()) { if($job->exitstatus()) {
# Job failed: Increase delay (if $opt::(ssh)delay set) # Job failed: Increase delay (if $opt::(ssh)delay set)
$opt::delay &&= $opt::delay * 2; $opt::delay &&= $opt::delay * 1.3;
$opt::sshdelay &&= $opt::sshdelay * 2; $opt::sshdelay &&= $opt::sshdelay * 1.3;
} else { } else {
# Job succeeded: Decrease delay (if $opt::(ssh)delay set) # Job succeeded: Decrease delay (if $opt::(ssh)delay set)
$opt::delay &&= $opt::delay * 0.9; $opt::delay &&= $opt::delay * 0.9;
@ -5094,7 +5121,7 @@ sub usage() {
# https://www.gnu.org/software/parallel/parallel_design.html#Citation-notice # https://www.gnu.org/software/parallel/parallel_design.html#Citation-notice
# https://git.savannah.gnu.org/cgit/parallel.git/tree/doc/citation-notice-faq.txt # https://git.savannah.gnu.org/cgit/parallel.git/tree/doc/citation-notice-faq.txt
# You accept to be put in a public hall of shame by removing # You accept to be put in a public hall of shame by removing
# the lines. # these lines.
"This helps funding further development; AND IT WON'T COST YOU A CENT.", "This helps funding further development; AND IT WON'T COST YOU A CENT.",
"If you pay 10000 EUR you should feel free to use GNU Parallel without citing.", "If you pay 10000 EUR you should feel free to use GNU Parallel without citing.",
"", "",
@ -5127,7 +5154,7 @@ sub citation_notice() {
# https://www.gnu.org/software/parallel/parallel_design.html#Citation-notice and # https://www.gnu.org/software/parallel/parallel_design.html#Citation-notice and
# https://git.savannah.gnu.org/cgit/parallel.git/tree/doc/citation-notice-faq.txt # https://git.savannah.gnu.org/cgit/parallel.git/tree/doc/citation-notice-faq.txt
# You accept to be put in a public hall of shame by # You accept to be put in a public hall of shame by
# removing the lines. # removing these lines.
"This helps funding further development; AND IT WON'T COST YOU A CENT.", "This helps funding further development; AND IT WON'T COST YOU A CENT.",
"If you pay 10000 EUR you should feel free to use GNU Parallel without citing.", "If you pay 10000 EUR you should feel free to use GNU Parallel without citing.",
"", "",
@ -5265,7 +5292,7 @@ sub citation() {
# https://www.gnu.org/software/parallel/parallel_design.html#Citation-notice and # https://www.gnu.org/software/parallel/parallel_design.html#Citation-notice and
# https://git.savannah.gnu.org/cgit/parallel.git/tree/doc/citation-notice-faq.txt # https://git.savannah.gnu.org/cgit/parallel.git/tree/doc/citation-notice-faq.txt
# You accept to be put in a public hall of shame by removing # You accept to be put in a public hall of shame by removing
# the lines. # these lines.
"This helps funding further development; AND IT WON'T COST YOU A CENT.", "This helps funding further development; AND IT WON'T COST YOU A CENT.",
"If you pay 10000 EUR you should feel free to use GNU Parallel without citing.", "If you pay 10000 EUR you should feel free to use GNU Parallel without citing.",
"", "",
@ -5839,7 +5866,7 @@ sub which(@) {
# ash bash csh dash fdsh fish fizsh ksh ksh93 mksh pdksh # ash bash csh dash fdsh fish fizsh ksh ksh93 mksh pdksh
# posh rbash rc rush rzsh sash sh static-sh tcsh yash zsh # posh rbash rc rush rzsh sash sh static-sh tcsh yash zsh
my @shells = (qw(ash bash bsd-csh csh dash fdsh fish fizsh ksh my @shells = (qw(ash bash bsd-csh csh dash fdsh fish fizsh ksh
ksh93 lksh mksh pdksh posh rbash rc rush rzsh sash sh ksh93 lksh mksh pdksh posh rbash rc rush rzsh sash sh
static-sh tcsh yash zsh -sh -csh -bash), static-sh tcsh yash zsh -sh -csh -bash),
'-sh (sh)' # sh on FreeBSD '-sh (sh)' # sh on FreeBSD
@ -8665,16 +8692,16 @@ sub empty_input_wrapper($) {
# Returns: # Returns:
# $wrapped_command = the wrapped command # $wrapped_command = the wrapped command
my $command = shift; my $command = shift;
# The optimal block size differs
# It has been measured on:
# AMD 6376: 59000
# <big ppar --pipe --block 100M --test $1 -j1 'cat >/dev/null';
my $script = my $script =
::spacefree(0,q{ ::spacefree(0,q{
if(sysread(STDIN, $buf, 1)) { if(sysread(STDIN, $buf, 1)) {
open($fh, "|-", @ARGV) || die; open($fh, "|-", @ARGV) || die;
syswrite($fh, $buf); syswrite($fh, $buf);
# Align up to 128k block while($read = sysread(STDIN, $buf, 59000)) {
if($read = sysread(STDIN, $buf, 131071)) {
syswrite($fh, $buf);
}
while($read = sysread(STDIN, $buf, 131072)) {
syswrite($fh, $buf); syswrite($fh, $buf);
} }
close $fh; close $fh;
@ -9098,6 +9125,11 @@ sub total_failed($) {
# * cat > fifo # * cat > fifo
# * waitpid to get the exit code from $command # * waitpid to get the exit code from $command
# * be less than 1000 chars long # * be less than 1000 chars long
# The optimal block size differs
# It has been measured on:
# AMD 6376: 4095
# ppar -a big --pipepart --block -1 --test $1 --fifo 'cat {} >/dev/null';
$script = "perl -e '". $script = "perl -e '".
(::spacefree (::spacefree
(0, q{ (0, q{
@ -9108,7 +9140,7 @@ sub total_failed($) {
$pid = fork || exec $s, "-c", $c; $pid = fork || exec $s, "-c", $c;
open($o,">",$f) || die $!; open($o,">",$f) || die $!;
# cat > $PARALLEL_TMP # cat > $PARALLEL_TMP
while(sysread(STDIN,$buf,131072)){ while(sysread(STDIN,$buf,4095)){
syswrite $o, $buf; syswrite $o, $buf;
} }
close $o; close $o;
@ -9207,7 +9239,7 @@ sub wrapped($) {
# --pipepart: prepend: # --pipepart: prepend:
# < /tmp/foo perl -e 'while(@ARGV) { # < /tmp/foo perl -e 'while(@ARGV) {
# sysseek(STDIN,shift,0) || die; $left = shift; # sysseek(STDIN,shift,0) || die; $left = shift;
# while($read = sysread(STDIN,$buf, ($left > 131072 ? 131072 : $left))){ # while($read = sysread(STDIN,$buf, ($left > 60800 ? 60800 : $left))){
# $left -= $read; syswrite(STDOUT,$buf); # $left -= $read; syswrite(STDOUT,$buf);
# } # }
# }' 0 0 0 11 | # }' 0 0 0 11 |
@ -9646,7 +9678,7 @@ sub fill_templates($) {
# Returns: # Returns:
# @templates - File names of replaced templates # @templates - File names of replaced templates
my $self = shift; my $self = shift;
if(%opt::template) { if(%opt::template) {
my @template_name = my @template_name =
map { $self->{'commandline'}->replace_placeholders([$_],0,0) } map { $self->{'commandline'}->replace_placeholders([$_],0,0) }
@ -10663,11 +10695,21 @@ sub print_linebuffer($) {
my ($buf,$i,$rv); my ($buf,$i,$rv);
# 1310720 gives 1.2 GB/s # 1310720 gives 1.2 GB/s
# 131072 gives 0.9 GB/s # 131072 gives 0.9 GB/s
while($rv = sysread($in_fh, $buf,1310720)) { # The optimal block size differs
# It has been measured on:
# AMD 6376: 60800 (>70k is also reasonable)
# Intel i7-3632QM: 52-59k, 170-175k
# seq 64 | ppar --test $1 --lb 'yes {} `seq 1000`|head -c 10000000' >/dev/null
while($rv = sysread($in_fh, $buf, 60800)) {
$outputlength += $rv; $outputlength += $rv;
# TODO --recend # TODO --recend
# Treat both \n and \r as line end # Treat both \n and \r as line end
$i = ::max((rindex($buf,"\n")+1), (rindex($buf,"\r")+1)); # Only test for \r if there is no \n
# Test:
# perl -e '$a="x"x1000000;
# $b="$a\r$a\n$a\r$a\n";
# map { print $b,$_ } 1..10'
$i = ((rindex($buf,"\n")+1) || (rindex($buf,"\r")+1));
if($i) { if($i) {
# One or more complete lines were found # One or more complete lines were found
if($opt::tag or defined $opt::tagstring) { if($opt::tag or defined $opt::tagstring) {
@ -10839,7 +10881,8 @@ sub print_normal($) {
} }
} else { } else {
# Most efficient way of copying data from $in_fh to $out_fh # Most efficient way of copying data from $in_fh to $out_fh
while(sysread($in_fh,$buf,131072)) { # Intel i7-3632QM: 25k-
while(sysread($in_fh,$buf,32767)) {
print $out_fh $buf; print $out_fh $buf;
$outputlength += length $buf; $outputlength += length $buf;
if($Global::membuffer) { if($Global::membuffer) {
@ -10887,7 +10930,7 @@ sub print_results($) {
} }
} else { } else {
# Most efficient way of copying data from $in_fh to $out_fh # Most efficient way of copying data from $in_fh to $out_fh
while(sysread($in_fh,$buf,131072)) { while(sysread($in_fh,$buf,60000)) {
$outputlength += length $buf; $outputlength += length $buf;
push @{$self->{'output'}{$fdno}}, $buf; push @{$self->{'output'}{$fdno}}, $buf;
} }

View file

@ -128,10 +128,11 @@ B<Zsh, Fish, Ksh, and Pdksh functions and aliases>: Use B<env_parallel>.
=item B<{}> =item B<{}>
Input line. This replacement string will be replaced by a full line Input line.
read from the input source. The input source is normally stdin
(standard input), but can also be given with B<-a>, B<:::>, or This replacement string will be replaced by a full line read from the
B<::::>. input source. The input source is normally stdin (standard input), but
can also be given with B<-a>, B<:::>, or B<::::>.
The replacement string B<{}> can be changed with B<-I>. The replacement string B<{}> can be changed with B<-I>.
@ -142,17 +143,21 @@ Replacement strings are normally quoted, so special characters are not
parsed by the shell. The exception is if the command starts with a parsed by the shell. The exception is if the command starts with a
replacement string; then the string is not quoted. replacement string; then the string is not quoted.
See also: B<--plus> B<{.}> B<{/}> B<{//}> B<{/.}> B<{#}> B<{%}>
B<{>I<n>B<}> B<{=>I<perl expression>B<=}>
=item B<{.}> =item B<{.}>
Input line without extension. This replacement string will be replaced Input line without extension.
by the input with the extension removed. If the input line contains
B<.> after the last B</>, the last B<.> until the end of the string This replacement string will be replaced by the input with the
will be removed and B<{.}> will be replaced with the extension removed. If the input line contains B<.> after the last
remaining. E.g. I<foo.jpg> becomes I<foo>, I<subdir/foo.jpg> becomes B</>, the last B<.> until the end of the string will be removed and
I<subdir/foo>, I<sub.dir/foo.jpg> becomes I<sub.dir/foo>, B<{.}> will be replaced with the remaining. E.g. I<foo.jpg> becomes
I<sub.dir/bar> remains I<sub.dir/bar>. If the input line does not I<foo>, I<subdir/foo.jpg> becomes I<subdir/foo>, I<sub.dir/foo.jpg>
contain B<.> it will remain unchanged. becomes I<sub.dir/foo>, I<sub.dir/bar> remains I<sub.dir/bar>. If the
input line does not contain B<.> it will remain unchanged.
The replacement string B<{.}> can be changed with B<--er>. The replacement string B<{.}> can be changed with B<--er>.
@ -161,8 +166,10 @@ To understand replacement strings see B<{}>.
=item B<{/}> =item B<{/}>
Basename of input line. This replacement string will be replaced by Basename of input line.
the input with the directory part removed.
This replacement string will be replaced by the input with the
directory part removed.
The replacement string B<{/}> can be changed with The replacement string B<{/}> can be changed with
B<--basenamereplace>. B<--basenamereplace>.
@ -172,8 +179,10 @@ To understand replacement strings see B<{}>.
=item B<{//}> =item B<{//}>
Dirname of input line. This replacement string will be replaced by the Dirname of input line.
dir of the input line. See B<dirname>(1).
This replacement string will be replaced by the dir of the input
line. See B<dirname>(1).
The replacement string B<{//}> can be changed with The replacement string B<{//}> can be changed with
B<--dirnamereplace>. B<--dirnamereplace>.
@ -183,9 +192,11 @@ To understand replacement strings see B<{}>.
=item B<{/.}> =item B<{/.}>
Basename of input line without extension. This replacement string will Basename of input line without extension.
be replaced by the input with the directory and extension part
removed. It is a combination of B<{/}> and B<{.}>. This replacement string will be replaced by the input with the
directory and extension part removed. It is a combination of B<{/}>
and B<{.}>.
The replacement string B<{/.}> can be changed with The replacement string B<{/.}> can be changed with
B<--basenameextensionreplace>. B<--basenameextensionreplace>.
@ -195,9 +206,10 @@ To understand replacement strings see B<{}>.
=item B<{#}> =item B<{#}>
Sequence number of the job to run. This replacement string will be Sequence number of the job to run.
replaced by the sequence number of the job being run. It contains the
same number as $PARALLEL_SEQ. This replacement string will be replaced by the sequence number of the
job being run. It contains the same number as $PARALLEL_SEQ.
The replacement string B<{#}> can be changed with B<--seqreplace>. The replacement string B<{#}> can be changed with B<--seqreplace>.
@ -206,10 +218,11 @@ To understand replacement strings see B<{}>.
=item B<{%}> =item B<{%}>
Job slot number. This replacement string will be replaced by the job's Job slot number.
slot number between 1 and number of jobs to run in parallel. There
will never be 2 jobs running at the same time with the same job slot This replacement string will be replaced by the job's slot number
number. between 1 and number of jobs to run in parallel. There will never be 2
jobs running at the same time with the same job slot number.
The replacement string B<{%}> can be changed with B<--slotreplace>. The replacement string B<{%}> can be changed with B<--slotreplace>.
@ -242,14 +255,18 @@ To understand replacement strings see B<{}>.
=item B<{>I<n>B<}> =item B<{>I<n>B<}>
Argument from input source I<n> or the I<n>'th argument. This Argument from input source I<n> or the I<n>'th argument.
positional replacement string will be replaced by the input from input
source I<n> (when used with B<-a> or B<::::>) or with the I<n>'th This positional replacement string will be replaced by the input from
argument (when used with B<-N>). If I<n> is negative it refers to the input source I<n> (when used with B<-a> or B<::::>) or with the
I<n>'th last argument. I<n>'th argument (when used with B<-N>). If I<n> is negative it refers
to the I<n>'th last argument.
To understand replacement strings see B<{}>. To understand replacement strings see B<{}>.
See also: B<{}> B<{>I<n>.B<}> B<{>I<n>/B<}> B<{>I<n>//B<}>
B<{>I<n>/.B<}>
=item B<{>I<n>.B<}> =item B<{>I<n>.B<}>
@ -305,11 +322,12 @@ To understand positional replacement strings see B<{>I<n>B<}>.
=item B<{=>I<perl expression>B<=}> =item B<{=>I<perl expression>B<=}>
Replace with calculated I<perl expression>. B<$_> will contain the Replace with calculated I<perl expression>.
same as B<{}>. After evaluating I<perl expression> B<$_> will be used
as the value. It is recommended to only change $_ but you have full B<$_> will contain the same as B<{}>. After evaluating I<perl
access to all of GNU B<parallel>'s internal functions and data expression> B<$_> will be used as the value. It is recommended to only
structures. change $_ but you have full access to all of GNU B<parallel>'s
internal functions and data structures.
The expression must give the same result if evaluated twice - The expression must give the same result if evaluated twice -
otherwise the behaviour is undefined. E.g. this will not work as expected: otherwise the behaviour is undefined. E.g. this will not work as expected:
@ -386,7 +404,7 @@ See also: B<--rpl> B<--parens>
Positional equivalent to B<{=perl expression=}>. To understand Positional equivalent to B<{=perl expression=}>. To understand
positional replacement strings see B<{>I<n>B<}>. positional replacement strings see B<{>I<n>B<}>.
See also: B<{=perl expression=}> B<{>I<n>B<}>. See also: B<{=perl expression=}> B<{>I<n>B<}>
=item B<:::> I<arguments> =item B<:::> I<arguments>
@ -405,7 +423,7 @@ The following are equivalent:
parallel ::: "gzip file1" "gzip file2" parallel ::: "gzip file1" "gzip file2"
To avoid treating B<:::> as special use B<--arg-sep> to set the To avoid treating B<:::> as special use B<--arg-sep> to set the
argument separator to something else. See also B<--arg-sep>. argument separator to something else.
If multiple B<:::> are given, each group will be treated as an input If multiple B<:::> are given, each group will be treated as an input
source, and all combinations of input sources will be source, and all combinations of input sources will be
@ -427,6 +445,8 @@ B<:::> and B<::::> can be mixed. So these are equivalent:
seq 4 5 | parallel echo {1} {2} {3} :::: <(seq 6 7) - \ seq 4 5 | parallel echo {1} {2} {3} :::: <(seq 6 7) - \
::: 1 2 3 ::: 1 2 3
See also: B<--arg-sep>
=item B<:::+> I<arguments> =item B<:::+> I<arguments>
@ -446,7 +466,7 @@ Another way to write B<-a> I<argfile1> B<-a> I<argfile2> ...
B<:::> and B<::::> can be mixed. B<:::> and B<::::> can be mixed.
See B<-a>, B<:::> and B<--link>. See also: B<-a> B<:::> B<--link>
=item B<::::+> I<argfiles> =item B<::::+> I<argfiles>
@ -465,6 +485,10 @@ Use NUL as delimiter. Normally input lines will end in \n
(newline). If they end in \0 (NUL), then use this option. It is useful (newline). If they end in \0 (NUL), then use this option. It is useful
for processing arguments that may contain \n (newline). for processing arguments that may contain \n (newline).
Shortcut for B<-d '\0'>.
See also: B<-d>
=item B<--arg-file> I<input-file> =item B<--arg-file> I<input-file>
@ -481,7 +505,7 @@ contains B<a b c>. B<-a foo> B<-a bar> will result in the combinations
(1,a) (1,b) (1,c) (2,a) (2,b) (2,c). This is useful for replacing (1,a) (1,b) (1,c) (2,a) (2,b) (2,c). This is useful for replacing
nested for-loops. nested for-loops.
See also: B<--link> and B<{>I<n>B<}>. See also: B<--link> B<{>I<n>B<}>
=item B<--arg-file-sep> I<sep-str> =item B<--arg-file-sep> I<sep-str>
@ -490,7 +514,7 @@ Use I<sep-str> instead of B<::::> as separator string between command
and argument files. Useful if B<::::> is used for something else by the and argument files. Useful if B<::::> is used for something else by the
command. command.
See also: B<::::>. See also: B<::::>
=item B<--arg-sep> I<sep-str> =item B<--arg-sep> I<sep-str>
@ -502,7 +526,7 @@ Also useful if you command uses B<:::> but you still want to read
arguments from stdin (standard input): Simply change B<--arg-sep> to a arguments from stdin (standard input): Simply change B<--arg-sep> to a
string that is not in the command line. string that is not in the command line.
See also: B<:::>. See also: B<:::>
=item B<--bar> =item B<--bar>
@ -546,8 +570,12 @@ Use the replacement string I<replace-str> instead of B<{/.}> for basename of inp
Use I<binexpr> as binning key and bin input to the jobs. Use I<binexpr> as binning key and bin input to the jobs.
I<binexpr> is [column number|column name] [perlexpression] e.g. 3, I<binexpr> is [column number|column name] [perlexpression] e.g.:
Address, 3 $_%=100, Address s/\D//g.
3
Address
3 $_%=100
Address s/\D//g
Each input line is split using B<--colsep>. The value of the column is Each input line is split using B<--colsep>. The value of the column is
put into $_, the perl expression is executed, the resulting value is put into $_, the perl expression is executed, the resulting value is
@ -563,7 +591,9 @@ I<bincol> is small (<10), slower if it is big (>100).
B<--bin> requires B<--pipe> and a fixed numeric value for B<--jobs>. B<--bin> requires B<--pipe> and a fixed numeric value for B<--jobs>.
See also: B<--shard>, B<--group-by>, B<--roundrobin>. See the section: SPREADING BLOCKS OF DATA.
See also: B<--group-by> B<--roundrobin> B<--shard>
=item B<--bg> =item B<--bg>
@ -572,10 +602,11 @@ Run command in background thus GNU B<parallel> will not wait for
completion of the command before exiting. This is the default if completion of the command before exiting. This is the default if
B<--semaphore> is set. B<--semaphore> is set.
See also: B<--fg>, B<man sem>.
Implies B<--semaphore>. Implies B<--semaphore>.
See also: B<--fg> B<man sem>
=cut =cut
# You accept to be added to a public hall of shame by # You accept to be added to a public hall of shame by
@ -643,11 +674,8 @@ Time out for reading block when using B<--pipe>. If it takes longer
than I<duration> to read a full block, use the partial block read so than I<duration> to read a full block, use the partial block read so
far. far.
I<duration> must be in whole seconds, but can be expressed as floats I<duration> is in seconds, but can be postfixed with s, m, h, or d
postfixed with B<s>, B<m>, B<h>, or B<d> which would multiply the (see the section TIME POSTFIXES).
float by 1, 60, 3600, or 86400. Thus these are equivalent:
B<--blocktimeout 100000> and B<--blocktimeout 1d3.5h16.6m4s>.
=item B<--cat> =item B<--cat>
@ -659,7 +687,7 @@ you can do: B<parallel --pipe --cat wc {}>.
Implies B<--pipe> unless B<--pipepart> is used. Implies B<--pipe> unless B<--pipepart> is used.
See also: B<--fifo>. See also: B<--fifo>
=item B<--cleanup> =item B<--cleanup>
@ -706,15 +734,18 @@ https://perldoc.perl.org/perlre.html
=item B<--compress> =item B<--compress>
Compress temporary files. If the output is big and very compressible Compress temporary files.
this will take up less disk space in $TMPDIR and possibly be faster
due to less disk I/O. If the output is big and very compressible this will take up less disk
space in $TMPDIR and possibly be faster due to less disk I/O.
GNU B<parallel> will try B<pzstd>, B<lbzip2>, B<pbzip2>, B<zstd>, GNU B<parallel> will try B<pzstd>, B<lbzip2>, B<pbzip2>, B<zstd>,
B<pigz>, B<lz4>, B<lzop>, B<plzip>, B<lzip>, B<lrz>, B<gzip>, B<pxz>, B<pigz>, B<lz4>, B<lzop>, B<plzip>, B<lzip>, B<lrz>, B<gzip>, B<pxz>,
B<lzma>, B<bzip2>, B<xz>, B<clzip>, in that order, and use the first B<lzma>, B<bzip2>, B<xz>, B<clzip>, in that order, and use the first
available. available.
See also: B<--compress-program>
=item B<--compress-program> I<prg> =item B<--compress-program> I<prg>
@ -745,21 +776,25 @@ When used with B<--pipe> only pass full CSV-records.
=item B<--ctag> I<str> =item B<--ctag> I<str>
Color tag. See B<--tag>. Color tag.
See also: B<--tag>
=item B<--ctagstring> I<str> =item B<--ctagstring> I<str>
Color tagstring. See B<--tagstring>. Color tagstring.
See also: B<--tagstring>
=item B<--delay> I<mytime> =item B<--delay> I<mytime>
Delay starting next job by I<mytime>. GNU B<parallel> will pause Delay starting next job by I<mytime>.
I<mytime> after starting each job. I<mytime> is normally in seconds,
but can be floats postfixed with B<s>, B<m>, B<h>, or B<d> which would GNU B<parallel> will pause I<mytime> after starting each
multiply the float by 1, 60, 3600, or 86400. Thus these are job. I<mytime> is in seconds, but can be postfixed with s, m, h, or d
equivalent: B<--delay 100000> and B<--delay 1d3.5h16.6m4s>. (see the section TIME POSTFIXES).
If you append 'auto' to I<mytime> (e.g. 13m3sauto) GNU B<parallel> will If you append 'auto' to I<mytime> (e.g. 13m3sauto) GNU B<parallel> will
automatically try to find the optimal value: If a job fails, I<mytime> automatically try to find the optimal value: If a job fails, I<mytime>
@ -770,13 +805,12 @@ is doubled. If a job succeeds, I<mytime> is decreased by 10%.
=item B<-d> I<delim> =item B<-d> I<delim>
Input items are terminated by I<delim>. Quotes and backslash are not Input items are terminated by I<delim>.
special; every character in the input is taken literally. Disables
the end-of-file string, which is treated like any other argument. The The specified delimiter may be characters, C-style character escapes
specified delimiter may be characters, C-style character escapes such such as \n, or octal or hexadecimal escape codes. Octal and
as \n, or octal or hexadecimal escape codes. Octal and hexadecimal hexadecimal escape codes are understood as for the printf command.
escape codes are understood as for the printf command. Multibyte Multibyte characters are not supported.
characters are not supported.
=item B<--dirnamereplace> I<replace-str> =item B<--dirnamereplace> I<replace-str>
@ -841,7 +875,7 @@ variables except for the ones mentioned in ~/.parallel/ignored_vars.
To copy the full environment (both exported and not exported To copy the full environment (both exported and not exported
variables, arrays, and functions) use B<env_parallel>. variables, arrays, and functions) use B<env_parallel>.
See also: B<--record-env>, B<--session>. See also: B<--record-env> B<--session>
=item B<--eta> =item B<--eta>
@ -855,7 +889,7 @@ estimate will only be shown when the first job has finished.
Implies B<--progress>. Implies B<--progress>.
See also: B<--bar>, B<--progress>. See also: B<--bar> B<--progress>
=item B<--fg> =item B<--fg>
@ -870,21 +904,27 @@ foreground (opposite B<--bg>), and wait for completion of the command
before exiting. before exiting.
See also: B<--bg>, B<man sem>. See also: B<--bg> B<man sem>
=item B<--fifo> =item B<--fifo>
Create a temporary fifo with content. Normally B<--pipe> and Create a temporary fifo with content.
B<--pipepart> will give data to the program on stdin (standard
input). With B<--fifo> GNU B<parallel> will create a temporary fifo
with the name in B<{}>, so you can do: B<parallel --pipe --fifo wc {}>.
Beware: If data is not read from the fifo, the job will block forever. Normally B<--pipe> and B<--pipepart> will give data to the program on
stdin (standard input). With B<--fifo> GNU B<parallel> will create a
temporary fifo with the name in B<{}>, so you can do:
parallel --pipe --fifo wc {}
Beware: If the fifo is never opened for reading, the job will block forever:
seq 1000000 | parallel --fifo echo This will block
seq 1000000 | parallel --fifo 'echo This will not block < {}'
Implies B<--pipe> unless B<--pipepart> is used. Implies B<--pipe> unless B<--pipepart> is used.
See also: B<--cat>. See also: B<--cat>
=item B<--filter> I<filter> =item B<--filter> I<filter>
@ -1005,7 +1045,9 @@ UserID when grouping:
cat table.csv | parallel --pipe --colsep , --header : \ cat table.csv | parallel --pipe --colsep , --header : \
--group-by 'UserID s/\D//g' -kN1 wc --group-by 'UserID s/\D//g' -kN1 wc
See also: B<--shard>, B<--roundrobin>. See the section: SPREADING BLOCKS OF DATA.
See also: B<--bin> B<--shard> B<--roundrobin>
=item B<--help> =item B<--help>
@ -1145,7 +1187,7 @@ B<my_grp1_arg> may be run on either B<myserver1> or B<myserver2>,
B<third> may be run on either B<myserver1> or B<myserver3>, B<third> may be run on either B<myserver1> or B<myserver3>,
but B<arg_for_grp2> will only be run on B<myserver2>. but B<arg_for_grp2> will only be run on B<myserver2>.
See also: B<--sshlogin>, B<$PARALLEL_HOSTGROUPS>, B<$PARALLEL_ARGHOSTGROUPS>. See also: B<--sshlogin> B<$PARALLEL_HOSTGROUPS> B<$PARALLEL_ARGHOSTGROUPS>
=item B<-I> I<replace-str> =item B<-I> I<replace-str>
@ -1185,7 +1227,7 @@ If the host is long, you can use B<column -t> to pretty print it:
cat joblog | column -t cat joblog | column -t
See also: B<--resume> B<--resume-failed>. See also: B<--resume> B<--resume-failed>
=item B<--jobs> I<N> =item B<--jobs> I<N>
@ -1234,9 +1276,10 @@ B<--use-sockets-instead-of-threads>.
=item B<-P> I<-N> =item B<-P> I<-N>
Subtract N from the number of CPUs. Run this many jobs in parallel. Subtract N from the number of CPUs. Run this many jobs in parallel.
If the evaluated number is less than 1 then 1 will be used. See also If the evaluated number is less than 1 then 1 will be used.
B<--use-cores-instead-of-threads> and
B<--use-sockets-instead-of-threads>. See also: B<--use-cores-instead-of-threads>
B<--use-sockets-instead-of-threads>
=item B<--jobs> I<N>% =item B<--jobs> I<N>%
@ -1248,8 +1291,10 @@ B<--use-sockets-instead-of-threads>.
=item B<-P> I<N>% =item B<-P> I<N>%
Multiply N% with the number of CPUs. Run this many jobs in Multiply N% with the number of CPUs. Run this many jobs in
parallel. See also B<--use-cores-instead-of-threads> and parallel.
B<--use-sockets-instead-of-threads>.
See also: B<--use-cores-instead-of-threads>
B<--use-sockets-instead-of-threads>
=item B<--jobs> I<procfile> =item B<--jobs> I<procfile>
@ -1284,8 +1329,13 @@ to see the difference:
If used with B<--onall> or B<--nonall> the output will grouped by If used with B<--onall> or B<--nonall> the output will grouped by
sshlogin in sorted order. sshlogin in sorted order.
If used with B<--pipe --roundrobin> and the same input, the jobslots B<--keep-order> cannot keep the output order when used with B<--pipe
will get the same blocks in the same order in every run. --roundrobin>. Here it instead means, that the jobslots will get the
same blocks as input in the same order in every run if the input is
kept the same. Run each of these twice and compare:
seq 10000000 | parallel --pipe --roundrobin 'sleep 0.$RANDOM; wc'
seq 10000000 | parallel --pipe -k --roundrobin 'sleep 0.$RANDOM; wc'
B<-k> only affects the order in which the output is printed - not the B<-k> only affects the order in which the output is printed - not the
order in which jobs are run. order in which jobs are run.
@ -1403,12 +1453,12 @@ See also: B<--group> B<--ungroup>
=item B<--link> =item B<--link>
Link input sources. Read multiple input sources like B<xapply>. If Link input sources. Read multiple input sources like the command
multiple input sources are given, one argument will be read from each B<xapply>. If multiple input sources are given, one argument will be
of the input sources. The arguments can be accessed in the command as read from each of the input sources. The arguments can be accessed in
B<{1}> .. B<{>I<n>B<}>, so B<{1}> will be a line from the first input the command as B<{1}> .. B<{>I<n>B<}>, so B<{1}> will be a line from
source, and B<{6}> will refer to the line with the same line number the first input source, and B<{6}> will refer to the line with the
from the 6th input source. same line number from the 6th input source.
Compare these two: Compare these two:
@ -1458,19 +1508,21 @@ most likely do what is needed.
=item B<--memfree> I<size> =item B<--memfree> I<size>
Minimum memory free when starting another job. The I<size> can be Minimum memory free when starting another job.
postfixed with K, M, G, T, P, k, m, g, t, or p (see UNIT PREFIX).
The I<size> can be postfixed with K, M, G, T, P, k, m, g, t, or p (see
UNIT PREFIX).
If the jobs take up very different amount of RAM, GNU B<parallel> will If the jobs take up very different amount of RAM, GNU B<parallel> will
only start as many as there is memory for. If less than I<size> bytes only start as many as there is memory for. If less than I<size> bytes
are free, no more jobs will be started. If less than 50% I<size> bytes are free, no more jobs will be started. If less than 50% I<size> bytes
are free, the youngest job will be killed, and put back on the queue are free, the youngest job will be killed (as per B<--termseq>), and
to be run later. put back on the queue to be run later.
B<--retries> must be set to determine how many times GNU B<parallel> B<--retries> must be set to determine how many times GNU B<parallel>
should retry a given job. should retry a given job.
See also: B<--memsuspend> See also: B<--termseq>, B<--retries>, B<--memsuspend>
=item B<--memsuspend> I<size> =item B<--memsuspend> I<size>
@ -1595,29 +1647,30 @@ Spread input to jobs on stdin (standard input). Read a block of data
from stdin (standard input) and give one block of data as input to one from stdin (standard input) and give one block of data as input to one
job. job.
The block size is determined by B<--block>. The strings B<--recstart> The block size is determined by B<--block> (default: 1M). The strings
and B<--recend> tell GNU B<parallel> how a record starts and/or B<--recstart> and B<--recend> tell GNU B<parallel> how a record starts
ends. The block read will have the final partial record removed before and/or ends. The block read will have the final partial record removed
the block is passed on to the job. The partial record will be before the block is passed on to the job. The partial record will be
prepended to next block. prepended to next block.
If B<--recstart> is given this will be used to split at record start. You can limit the number of records to be passed with B<-N>, and set
the record size with B<-L>.
If B<--recend> is given this will be used to split at record end.
If both B<--recstart> and B<--recend> are given both will have to
match to find a split position.
If neither B<--recstart> nor B<--recend> are given B<--recend>
defaults to '\n'. To have no record separator use B<--recend "">.
B<--files> is often used with B<--pipe>.
B<--pipe> maxes out at around 1 GB/s input, and 100 MB/s output. If B<--pipe> maxes out at around 1 GB/s input, and 100 MB/s output. If
performance is important use B<--pipepart>. performance is important use B<--pipepart>.
See also: B<--recstart>, B<--recend>, B<--fifo>, B<--cat>, B<--fifo> and B<--cat> will give stdin (standard input) on a fifo or a
B<--pipepart>, B<--files>. temporary file.
If data is arriving slowly, you can use B<--blocktimeout> to finish
reading a block early.
The data can be spread between the jobs in specific ways using
B<--round-robin>, B<--bin>, B<--shard>, B<--group-by>. See the
section: SPREADING BLOCKS OF DATA
See also: B<--block>, B<--blocktimeout>, B<--recstart>, B<--recend>,
B<--fifo>, B<--cat>, B<--pipepart>, B<-N>, B<-L>.
=item B<--pipepart> =item B<--pipepart>
@ -1646,6 +1699,8 @@ where records end.
=back =back
See also: <--pipe>.
=item B<--plain> =item B<--plain>
@ -1805,17 +1860,19 @@ I<profilename> corresponds to the file ~/.parallel/I<profilename>.
You can give multiple profiles by repeating B<--profile>. If parts of You can give multiple profiles by repeating B<--profile>. If parts of
the profiles conflict, the later ones will be used. the profiles conflict, the later ones will be used.
Default: config Default: ~/.parallel/config
=item B<--quote> =item B<--quote>
=item B<-q> =item B<-q>
Quote I<command>. If your command contains special characters that Quote I<command>.
should not be interpreted by the shell (e.g. ; \ | *), use B<--quote> to
escape these. The command must be a simple command (see B<man If your command contains special characters that should not be
bash>) without redirections and without variable assignments. interpreted by the shell (e.g. ; \ | *), use B<--quote> to escape
these. The command must be a simple command (see B<man bash>) without
redirections and without variable assignments.
See the section QUOTING. Most people will not need this. Quoting is See the section QUOTING. Most people will not need this. Quoting is
disabled by default. disabled by default.
@ -1825,7 +1882,8 @@ disabled by default.
=item B<-r> =item B<-r>
If the stdin (standard input) only contains whitespace, do not run the command. If the stdin (standard input) only contains whitespace, do not run the
command.
If used with B<--pipe> this is slow. If used with B<--pipe> this is slow.
@ -1845,13 +1903,15 @@ problem, but both swapping in and out usually indicates a problem.
B<--memfree> and B<--memsuspend> may give better results, so try using B<--memfree> and B<--memsuspend> may give better results, so try using
those first. those first.
See also: B<--memfree> B<--memsuspend>
=item B<--record-env> =item B<--record-env>
Record current environment variables in ~/.parallel/ignored_vars. This Record current environment variables in ~/.parallel/ignored_vars. This
is useful before using B<--env _>. is useful before using B<--env _>.
See also: B<--env>, B<--session>. See also: B<--env> B<--session>
=item B<--recstart> I<startstring> =item B<--recstart> I<startstring>
@ -1867,14 +1927,20 @@ I<endstring>I<startstring> will have to match to find a split
position. This is useful if either I<startstring> or I<endstring> position. This is useful if either I<startstring> or I<endstring>
match in the middle of a record. match in the middle of a record.
If neither B<--recstart> nor B<--recend> are given then B<--recend> If neither B<--recstart> nor B<--recend> are given, then B<--recend>
defaults to '\n'. To have no record separator use B<--recend "">. defaults to '\n'. To have no record separator (e.g. for binary files)
use B<--recend "">.
B<--recstart> and B<--recend> are used with B<--pipe>. B<--recstart> and B<--recend> are used with B<--pipe>.
Use B<--regexp> to interpret B<--recstart> and B<--recend> as regular Use B<--regexp> to interpret B<--recstart> and B<--recend> as regular
expressions. This is slow, however. expressions. This is slow, however.
Use B<--remove-rec-sep> to remove B<--recstart> and B<--recend> before
passing the block to the job.
See also: B<--pipe> B<--regexp> B<--remove-rec-sep>
=item B<--regexp> =item B<--regexp>
@ -1891,7 +1957,7 @@ expressions. This is slow, however.
Remove the text matched by B<--recstart> and B<--recend> before piping Remove the text matched by B<--recstart> and B<--recend> before piping
it to the command. it to the command.
Only used with B<--pipe>. Only used with B<--pipe>/B<--pipepart>.
=item B<--results> I<name> =item B<--results> I<name>
@ -2017,7 +2083,7 @@ will generate the files:
my_foo/stderr my_foo/stderr
my_foo/stdout my_foo/stdout
See also: B<--files>, B<--tag>, B<--header>, B<--joblog>. See also: B<--files> B<--tag> B<--header> B<--joblog>
=item B<--resume> =item B<--resume>
@ -2029,7 +2095,7 @@ sequence numbers in B<--joblog> then the input, the command, and
B<--joblog> all have to remain unchanged; otherwise GNU B<parallel> B<--joblog> all have to remain unchanged; otherwise GNU B<parallel>
may run wrong commands. may run wrong commands.
See also: B<--joblog>, B<--results>, B<--resume-failed>, B<--retries>. See also: B<--joblog> B<--results> B<--resume-failed> B<--retries>
=item B<--resume-failed> =item B<--resume-failed>
@ -2042,7 +2108,7 @@ numbers in B<--joblog> then the input, the command, and B<--joblog>
all have to remain unchanged; otherwise GNU B<parallel> may run wrong all have to remain unchanged; otherwise GNU B<parallel> may run wrong
commands. commands.
See also: B<--joblog>, B<--resume>, B<--retry-failed>, B<--retries>. See also: B<--joblog> B<--resume> B<--retry-failed> B<--retries>
=item B<--retry-failed> =item B<--retry-failed>
@ -2112,7 +2178,7 @@ line:
6 [...] 2 0 echo 5;sleep .5; exit 2 6 [...] 2 0 echo 5;sleep .5; exit 2
4 [...] 1 0 echo 7;sleep .7; exit 1 4 [...] 1 0 echo 7;sleep .7; exit 1
See also: B<--joblog>, B<--resume>, B<--resume-failed>, B<--retries>. See also: B<--joblog> B<--resume> B<--resume-failed> B<--retries>
=item B<--retries> I<n> =item B<--retries> I<n>
@ -2181,7 +2247,9 @@ impossible to track which input block corresponds to which output.
B<--roundrobin> implies B<--pipe>, except if B<--pipepart> is given. B<--roundrobin> implies B<--pipe>, except if B<--pipepart> is given.
See also: B<--group-by>, B<--shard>. See the section: SPREADING BLOCKS OF DATA.
See also: B<--bin> B<--group-by> B<--shard>
=item B<--rpl> 'I<tag> I<perl expression>' =item B<--rpl> 'I<tag> I<perl expression>'
@ -2309,7 +2377,7 @@ Used with B<--fg>, B<--wait>, and B<--semaphorename>.
The command B<sem> is an alias for B<parallel --semaphore>. The command B<sem> is an alias for B<parallel --semaphore>.
See also: B<man sem>. See also: B<man sem>
=item B<--semaphorename> I<name> =item B<--semaphorename> I<name>
@ -2327,20 +2395,25 @@ The semaphore is stored in ~/.parallel/semaphores/
Implies B<--semaphore>. Implies B<--semaphore>.
See also: B<man sem>. See also: B<man sem>
=item B<--semaphoretimeout> I<secs> =item B<--semaphoretimeout> I<secs>
=item B<--st> I<secs> =item B<--st> I<secs>
If I<secs> > 0: If the semaphore is not released within I<secs> seconds, take it anyway. If I<secs> > 0: If the semaphore is not released within I<secs>
seconds, take it anyway.
If I<secs> < 0: If the semaphore is not released within I<secs> seconds, exit. If I<secs> < 0: If the semaphore is not released within I<secs>
seconds, exit.
I<secs> is in seconds, but can be postfixed with s, m, h, or d (see
the section TIME POSTFIXES).
Implies B<--semaphore>. Implies B<--semaphore>.
See also: B<man sem>. See also: B<man sem>
=item B<--seqreplace> I<replace-str> =item B<--seqreplace> I<replace-str>
@ -2357,15 +2430,19 @@ variables with names in B<$PARALLEL_IGNORED_NAMES> will not be copied.
Only supported in B<Ash, Bash, Dash, Ksh, Sh, and Zsh>. Only supported in B<Ash, Bash, Dash, Ksh, Sh, and Zsh>.
See also: B<--env>, B<--record-env>. See also: B<--env> B<--record-env>
=item B<--shard> I<shardexpr> =item B<--shard> I<shardexpr>
Use I<shardexpr> as shard key and shard input to the jobs. Use I<shardexpr> as shard key and shard input to the jobs.
I<shardexpr> is [column number|column name] [perlexpression] e.g. 3, I<shardexpr> is [column number|column name] [perlexpression] e.g.:
Address, 3 $_%=100, Address s/\d//g.
3
Address
3 $_%=100
Address s/\d//g
Each input line is split using B<--colsep>. The value of the column is Each input line is split using B<--colsep>. The value of the column is
put into $_, the perl expression is executed, the resulting value is put into $_, the perl expression is executed, the resulting value is
@ -2379,7 +2456,9 @@ I<shardcol> is small (<10), slower if it is big (>100).
B<--shard> requires B<--pipe> and a fixed numeric value for B<--jobs>. B<--shard> requires B<--pipe> and a fixed numeric value for B<--jobs>.
See also: B<--bin>, B<--group-by>, B<--roundrobin>. See the section: SPREADING BLOCKS OF DATA.
See also: B<--bin> B<--group-by> B<--roundrobin>
=item B<--shebang> =item B<--shebang>
@ -2554,8 +2633,9 @@ For details on I<mytime> see B<--delay>.
=item B<--sshlogin> I<@hostgroup> =item B<--sshlogin> I<@hostgroup>
Distribute jobs to remote computers. The jobs will be run on a list of Distribute jobs to remote computers.
remote computers.
The jobs will be run on a list of remote computers.
If I<hostgroups> is given, the I<sshlogin> will be added to that If I<hostgroups> is given, the I<sshlogin> will be added to that
hostgroup. Multiple hostgroups are separated by '+'. The I<sshlogin> hostgroup. Multiple hostgroups are separated by '+'. The I<sshlogin>
@ -2595,8 +2675,8 @@ The remote host must have GNU B<parallel> installed.
B<--sshlogin> is known to cause problems with B<-m> and B<-X>. B<--sshlogin> is known to cause problems with B<-m> and B<-X>.
B<--sshlogin> is often used with B<--transferfile>, B<--return>, See also: B<--transferfile> B<--return> B<--cleanup> B<--trc>
B<--cleanup>, and B<--trc>. B<--sshloginfile> B<--workdir>
=item B<--sshloginfile> I<filename> =item B<--sshloginfile> I<filename>
@ -2721,9 +2801,11 @@ then killed. Process groups are dependant on the tty.
=item B<--tag> =item B<--tag>
Tag lines with arguments. Each output line will be prepended with the Tag lines with arguments.
arguments and TAB (\t). When combined with B<--onall> or B<--nonall>
the lines will be prepended with the sshlogin instead. Each output line will be prepended with the arguments and TAB
(\t). When combined with B<--onall> or B<--nonall> the lines will be
prepended with the sshlogin instead.
B<--tag> is ignored when using B<-u>. B<--tag> is ignored when using B<-u>.
@ -2762,9 +2844,11 @@ How many words contain a..z and how many bytes do they fill?
=item B<--termseq> I<sequence> =item B<--termseq> I<sequence>
Termination sequence. When a job is killed due to B<--timeout>, Termination sequence.
B<--memfree>, B<--halt>, or abnormal termination of GNU B<parallel>,
I<sequence> determines how the job is killed. The default is: When a job is killed due to B<--timeout>, B<--memfree>, B<--halt>, or
abnormal termination of GNU B<parallel>, I<sequence> determines how
the job is killed. The default is:
TERM,200,TERM,100,TERM,50,KILL,25 TERM,200,TERM,100,TERM,50,KILL,25
@ -2776,10 +2860,13 @@ dies before the waiting time is up.
=item B<--tmpdir> I<dirname> =item B<--tmpdir> I<dirname>
Directory for temporary files. GNU B<parallel> normally buffers output Directory for temporary files.
into temporary files in /tmp. By setting B<--tmpdir> you can use a
different dir for the files. Setting B<--tmpdir> is equivalent to GNU B<parallel> normally buffers output into temporary files in
setting $TMPDIR. /tmp. By setting B<--tmpdir> you can use a different dir for the
files. Setting B<--tmpdir> is equivalent to setting $TMPDIR.
See also: B<--compress>
=item B<--tmux> (Long beta testing) =item B<--tmux> (Long beta testing)
@ -2804,10 +2891,10 @@ If I<duration> is followed by a % then the timeout will dynamically be
computed as a percentage of the median average runtime of successful computed as a percentage of the median average runtime of successful
jobs. Only values > 100% will make sense. jobs. Only values > 100% will make sense.
I<duration> is normally in seconds, but can be floats postfixed with I<duration> is in seconds, but can be postfixed with s, m, h, or d
B<s>, B<m>, B<h>, or B<d> which would multiply the float by 1, 60, (see the section TIME POSTFIXES).
3600, or 86400. Thus these are equivalent: B<--timeout 100000> and
B<--timeout 1d3.5h16.6m4s>. See also: B<--termseq>
=item B<--verbose> =item B<--verbose>
@ -2816,7 +2903,7 @@ B<--timeout 1d3.5h16.6m4s>.
Print the job to be run on stderr (standard error). Print the job to be run on stderr (standard error).
See also: B<-v>, B<-p>. See also: B<-v> B<-p>
=item B<--transfer> =item B<--transfer>
@ -2987,10 +3074,12 @@ compatibility.
=item B<-v> =item B<-v>
Verbose. Print the job to be run on stdout (standard output). Can be reversed Verbose. Print the job to be run on stdout (standard output). Can be reversed
with B<--silent>. See also B<-t>. with B<--silent>.
Use B<-v> B<-v> to print the wrapping ssh command when running remotely. Use B<-v> B<-v> to print the wrapping ssh command when running remotely.
See also: B<-t>
=item B<--version> =item B<--version>
@ -3034,7 +3123,7 @@ Wait for all commands to complete.
Used with B<--semaphore> or B<--sqlmaster>. Used with B<--semaphore> or B<--sqlmaster>.
See also: B<man sem>. See also: B<man sem>
=item B<-X> =item B<-X>
@ -3054,7 +3143,7 @@ unexpected results if B<{}> is used as part of a word.
Support for B<-X> with B<--sshlogin> is limited and may fail. Support for B<-X> with B<--sshlogin> is limited and may fail.
See also: B<-m>. See also: B<-m>
=item B<--exit> =item B<--exit>
@ -3075,8 +3164,7 @@ with all the arguments.
Support for B<--xargs> with B<--sshlogin> is limited and may fail. Support for B<--xargs> with B<--sshlogin> is limited and may fail.
See also B<-X> for context replace. If in doubt use B<-X> as that will See also: B<-X>
most likely do what is needed.
=back =back
@ -4878,7 +4966,7 @@ a chunk to the program.
B<--pipe-part> starts one job per chunk - just like normal B<--pipe-part> starts one job per chunk - just like normal
B<--pipe>. It first finds record endings near all block borders in the B<--pipe>. It first finds record endings near all block borders in the
file and then starts the jobs. By using B<--block -1> it will set the file and then starts the jobs. By using B<--block -1> it will set the
block size to 1/I<n> * size-of-file. Used this way it will start I<n> block size to size-of-file/I<n>. Used this way it will start I<n>
jobs in total. jobs in total.
B<--round-robin> starts I<n> jobs in total. It reads a block and B<--round-robin> starts I<n> jobs in total. It reads a block and
@ -4906,6 +4994,14 @@ chunk border.
B<--group-by> can be combined with B<--round-robin> or B<--pipe-part>. B<--group-by> can be combined with B<--round-robin> or B<--pipe-part>.
=head1 TIME POSTFIXES
Arguments that give a duration are given in seconds, but can be
expressed as floats postfixed with B<s>, B<m>, B<h>, or B<d> which
would multiply the float by 1, 60, 60*60, or 60*60*24. Thus these are
equivalent: 100000 and 1d3.5h16.6m4s.
=head1 UNIT PREFIX =head1 UNIT PREFIX
Many numerical arguments in GNU B<parallel> can be postfixed with K, Many numerical arguments in GNU B<parallel> can be postfixed with K,

View file

@ -6,7 +6,6 @@
=encoding utf8 =encoding utf8
options as wrapper scripts
=head1 Design of GNU Parallel =head1 Design of GNU Parallel