Fixed bug #45025: --pipe --retries does not reschedule on other host

This commit is contained in:
Ole Tange 2015-12-13 01:54:08 +01:00
parent b14ca07563
commit 09dcafc120
6 changed files with 231 additions and 26 deletions

49
NEWS
View file

@ -1,30 +1,53 @@
20151122 20151122
* GNU Parallel packaged for CERN CentOS: http://linuxsoft.cern.ch/cern/centos/7/cern/x86_64/repoview/parallel.html * GNU Parallel packaged for CERN CentOS:
http://linuxsoft.cern.ch/cern/centos/7/cern/x86_64/repoview/parallel.html
* GNU Parallel was cited in: The Outer Solar System Origins Survey: I. Design and First-Quarter Discoveries http://arxiv.org/pdf/1511.02895.pdf * GNU Parallel was cited in: The Outer Solar System Origins Survey:
I. Design and First-Quarter Discoveries
http://arxiv.org/pdf/1511.02895.pdf
* GNU Parallel was cited in: Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.3431.html * GNU Parallel was cited in: Contrasting genetic architectures of
schizophrenia and other complex diseases using fast
variance-components analysis
http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.3431.html
* GNU Parallel was cited in: Named-Entity Chunking for Norwegian Text using Support Vector Machines http://ojs.bibsys.no/index.php/NIK/article/viewFile/248/211 * GNU Parallel was cited in: Named-Entity Chunking for Norwegian Text
using Support Vector Machines
http://ojs.bibsys.no/index.php/NIK/article/viewFile/248/211
* GNU Parallel was cited in: Multiscale Estimation of Binding Kinetics Using Brownian Dynamics, Molecular Dynamics and Milestoning http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004381#pcbi.1004381.ref072 * GNU Parallel was cited in: Multiscale Estimation of Binding Kinetics
Using Brownian Dynamics, Molecular Dynamics and Milestoning
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004381#pcbi.1004381.ref072
* GNU Parallel was cited in: A Detailed Characterization of Errors in Logic Circuits due to Single-Event Transients http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=7302348&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D7302348 * GNU Parallel was cited in: A Detailed Characterization of Errors in
Logic Circuits due to Single-Event Transients
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=7302348&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D7302348
* GNU Parallel was cited in: Arabic Speaker Emotion Classification Using Rhythm Metrics and Neural Networks http://www.eurasip.org/Proceedings/Eusipco/Eusipco2015/papers/1570104855.pdf * GNU Parallel was cited in: Arabic Speaker Emotion Classification
Using Rhythm Metrics and Neural Networks
http://www.eurasip.org/Proceedings/Eusipco/Eusipco2015/papers/1570104855.pdf
* GNU Parallel was cited in: Stride Search: a general algorithm for storm detection in high resolution climate data http://www.geosci-model-dev-discuss.net/8/7727/2015/gmdd-8-7727-2015.pdf * GNU Parallel was cited in: Stride Search: a general algorithm for
storm detection in high resolution climate data
http://www.geosci-model-dev-discuss.net/8/7727/2015/gmdd-8-7727-2015.pdf
* GNU Parallel was cited in: Decomposing Digital Paintings into Layers via RGB-space Geometry http://arxiv.org/pdf/1509.03335.pdf * GNU Parallel was cited in: Decomposing Digital Paintings into Layers
via RGB-space Geometry http://arxiv.org/pdf/1509.03335.pdf
* GNU Parallel was cited in: Structure and evolutionary history of a large family of NLR proteins in the zebrafish http://www.biorxiv.org/content/biorxiv/early/2015/09/18/027151.full.pdf * GNU Parallel was cited in: Structure and evolutionary history of a
large family of NLR proteins in the zebrafish
http://www.biorxiv.org/content/biorxiv/early/2015/09/18/027151.full.pdf
* GNU Parallel was cited in: Evolution of movement strategies under competitive interactions http://digital.csic.es/bitstream/10261/115973/1/evolution_movement_strategies_Kiziridis.pdf * GNU Parallel was cited in: Evolution of movement strategies under
competitive interactions
http://digital.csic.es/bitstream/10261/115973/1/evolution_movement_strategies_Kiziridis.pdf
* Automating large numbers of tasks https://rcc.uchicago.edu/docs/tutorials/kicp-tutorials/running-jobs.html * Automating large numbers of tasks
https://rcc.uchicago.edu/docs/tutorials/kicp-tutorials/running-jobs.html
* Max out your IOPs with GNU Parallel http://blog.bitratchet.com/2015/11/11/max-out-your-iops-with-gnu-parallel/ * Max out your IOPs with GNU Parallel
http://blog.bitratchet.com/2015/11/11/max-out-your-iops-with-gnu-parallel/
* Bug fixes and man page updates. * Bug fixes and man page updates.

View file

@ -236,6 +236,17 @@ http://www.researchgate.net/profile/Christoph_Junghans/publication/276178326_TAD
* << Update forventet juni Rachel har lige svaret >> GNU Parallel was used in: SISRS: Site Identification from Short Read Sequences https://github.com/rachelss/SISRS/ * << Update forventet juni Rachel har lige svaret >> GNU Parallel was used in: SISRS: Site Identification from Short Read Sequences https://github.com/rachelss/SISRS/
* <<Citation needed: A Cache- and Memory-Aware Mapping Algorithm
for Big Data Applications https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7323015>>
* <<Citation needed: Introspecting for RSA Key Material to Assist Intrusion Detection http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=7331177&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D7331177>>
* GNU Parallel was cited in: Evolution and Learning in Heterogeneous Environments http://research.gold.ac.uk/15078/1/COM_thesis_JonesD_2015.pdf
* GNU Parallel was cited in: Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis http://www.nature.com/ng/journal/v47/n12/full/ng.3431.html
* GNU Parallel was cited in: Efficient Retrieval of Key Material for Inspecting Potentially Malicious Traffic in the Cloud http://www.cs.bham.ac.uk/~bxb/Papres/2015.1.pdf
* GNU Parallel was cited in: Achieving Consistent Doppler Measurements from SDO/HMI Vector Field Inversions http://arxiv.org/pdf/1511.06500.pdf * GNU Parallel was cited in: Achieving Consistent Doppler Measurements from SDO/HMI Vector Field Inversions http://arxiv.org/pdf/1511.06500.pdf
* 使用 GNU parallel 來平行運算http://mutolisp.logdown.com/posts/316959-using-gnu-parallel-to-parallel-computing * 使用 GNU parallel 來平行運算http://mutolisp.logdown.com/posts/316959-using-gnu-parallel-to-parallel-computing

View file

@ -424,6 +424,23 @@ sub spreadstdin {
# If there is anything left in the buffer write it # If there is anything left in the buffer write it
write_record_to_pipe($chunk_number++,\$header,\$buf,$recstart,$recend,length $buf); write_record_to_pipe($chunk_number++,\$header,\$buf,$recstart,$recend,length $buf);
if($opt::retries) {
$Global::no_more_input = 1;
# We need to start no more jobs: At most we need to retry some
# of the already running.
my @running = values %Global::running;
# Stop any virgins.
for my $job (@running) {
if(defined $job and $job->virgin()) {
close $job->fh(0,"w");
}
}
# Wait for running jobs to be done
my $sleep =1;
while($Global::total_running > 0) {
$sleep = ::reap_usleep($sleep);
}
}
$Global::start_no_new_jobs ||= 1; $Global::start_no_new_jobs ||= 1;
if($opt::roundrobin) { if($opt::roundrobin) {
for my $job (values %Global::running) { for my $job (values %Global::running) {
@ -662,7 +679,20 @@ sub write_record_to_pipe {
my $job = shift @Global::virgin_jobs; my $job = shift @Global::virgin_jobs;
# Job is no longer virgin # Job is no longer virgin
$job->set_virgin(0); $job->set_virgin(0);
if(1) {
if($opt::retries) {
# Copy $buffer[0..$endpos] to $job->{'block'}
# Remove rec_sep
# Run $job->add_transfersize
$job->set_block($header_ref,$buffer_ref,$endpos,$recstart,$recend);
if(fork()) {
# Skip
} else {
$job->write($job->block_ref());
close $job->fh(0,"w");
exit(0);
}
} else {
# We ignore the removed rec_sep which is technically wrong. # We ignore the removed rec_sep which is technically wrong.
$job->add_transfersize($endpos + length $$header_ref); $job->add_transfersize($endpos + length $$header_ref);
if(fork()) { if(fork()) {
@ -1075,10 +1105,14 @@ sub check_invalid_option_combinations {
::error("--timeout must be seconds or percentage."); ::error("--timeout must be seconds or percentage.");
wait_and_exit(255); wait_and_exit(255);
} }
if(defined $opt::fifo and $opt::cat) { if(defined $opt::fifo and defined $opt::cat) {
::error("--fifo cannot be combined with --cat."); ::error("--fifo cannot be combined with --cat.");
::wait_and_exit(255); ::wait_and_exit(255);
} }
if(defined $opt::retries and defined $opt::roundrobin) {
::error("--retries cannot be combined with --roundrobin.");
::wait_and_exit(255);
}
if(defined $opt::pipepart and if(defined $opt::pipepart and
(defined $opt::L or defined $opt::max_lines (defined $opt::L or defined $opt::max_lines
or defined $opt::max_replace_args)) { or defined $opt::max_replace_args)) {
@ -2153,7 +2187,20 @@ sub init_run_jobs {
$job->replaced(),"'\n"); $job->replaced(),"'\n");
if($job->start()) { if($job->start()) {
if($opt::pipe) { if($opt::pipe) {
push(@Global::virgin_jobs,$job); if($job->virgin()) {
push(@Global::virgin_jobs,$job);
} else {
# Block already set: This is a retry
if(fork()) {
::debug("pipe","\n\nWriting ",length ${$job->block_ref()},
" to ", $job->seq(),"\n");
close $job->fh(0,"w");
} else {
$job->write($job->block_ref());
close $job->fh(0,"w");
exit(0);
}
}
} }
debug("start", "Started as seq ", $job->seq(), debug("start", "Started as seq ", $job->seq(),
" pid:", $job->pid(), "\n"); " pid:", $job->pid(), "\n");
@ -3501,7 +3548,7 @@ sub bibtex {
" volume = {36},", " volume = {36},",
" url = {http://www.gnu.org/s/parallel},", " url = {http://www.gnu.org/s/parallel},",
" year = {2011},", " year = {2011},",
" pages = {42-47}", " pages = {42-47},",
" doi = {10.5281/zenodo.16303}", " doi = {10.5281/zenodo.16303}",
"}", "}",
"", "",
@ -6244,7 +6291,8 @@ sub set_block {
# N/A # N/A
my $self = shift; my $self = shift;
my ($header_ref,$buffer_ref,$endpos,$recstart,$recend) = @_; my ($header_ref,$buffer_ref,$endpos,$recstart,$recend) = @_;
$self->{'block'} = ($self->virgin() ? $$header_ref : "").substr($$buffer_ref,0,$endpos); $self->{'block'} = ($self->virgin() ? $$header_ref : "").
substr($$buffer_ref,0,$endpos);
if($opt::remove_rec_sep) { if($opt::remove_rec_sep) {
remove_rec_sep(\$self->{'block'},$recstart,$recend); remove_rec_sep(\$self->{'block'},$recstart,$recend);
} }
@ -7427,10 +7475,10 @@ sub should_be_retried {
if (not $opt::retries) { if (not $opt::retries) {
return 0; return 0;
} }
if(not $self->exitstatus()) { if(not $self->exitstatus() and not $self->exitsignal()) {
# Completed with success. If there is a recorded failure: forget it # Completed with success. If there is a recorded failure: forget it
$self->reset_failed_here(); $self->reset_failed_here();
return 0 return 0;
} else { } else {
# The job failed. Should it be retried? # The job failed. Should it be retried?
$self->add_failed_here(); $self->add_failed_here();
@ -7445,7 +7493,7 @@ sub should_be_retried {
::debug("run", "Retry ", $self->seq(), "\n"); ::debug("run", "Retry ", $self->seq(), "\n");
return 1; return 1;
} }
} }
} }
{ {
@ -8641,10 +8689,11 @@ sub get {
$cmd_line->populate(); $cmd_line->populate();
::debug("init","cmd_line->number_of_args ", ::debug("init","cmd_line->number_of_args ",
$cmd_line->number_of_args(), "\n"); $cmd_line->number_of_args(), "\n");
if($opt::pipe or $opt::pipepart) { if(not $Global::no_more_input and ($opt::pipe or $opt::pipepart)) {
if($cmd_line->replaced() eq "") { if($cmd_line->replaced() eq "") {
# Empty command - pipe requires a command # Empty command - pipe requires a command
::error("--pipe/--pipepart must have a command to pipe into (e.g. 'cat')."); ::error("--pipe/--pipepart must have a command to pipe into ".
"(e.g. 'cat').");
::wait_and_exit(255); ::wait_and_exit(255);
} }
} else { } else {

View file

@ -21,4 +21,8 @@ echo '### --ssh autossh - add commands that fail here'
parallel -S lo false ::: a || echo OK should fail; parallel -S lo false ::: a || echo OK should fail;
touch foo_autossh; stdout parallel -S csh@lo --trc {}.out touch {}.out ::: foo_autossh; rm foo_autossh*; touch foo_autossh; stdout parallel -S csh@lo --trc {}.out touch {}.out ::: foo_autossh; rm foo_autossh*;
echo '### bug #45025: --pipe --retries does not reschedule on other host'
seq 1 300030| stdout parallel -k --retries 2 -S a.a,: --pipe 'wc;hostname'
stdout parallel --retries 2 --roundrobin echo ::: should fail
EOF EOF

View file

@ -84,3 +84,13 @@ Environment variables are:
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(226) [Receiver=3.1.1] rsync error: error in rsync protocol data stream (code 12) at io.c(226) [Receiver=3.1.1]
echo '### bug #45025: --pipe --retries does not reschedule on other host'
### bug #45025: --pipe --retries does not reschedule on other host
seq 1 300030| stdout parallel -k --retries 2 -S a.a,: --pipe 'wc;hostname'
parallel: Warning: Could not figure out number of cpus on a.a (). Using 1.
165668 165668 1048571
aspire
134362 134362 940534
aspire
stdout parallel --retries 2 --roundrobin echo ::: should fail
parallel: Error: --retries cannot be combined with --roundrobin.

View file

@ -0,0 +1,108 @@
echo '### Test --onall'; parallel --onall --tag -k -S parallel@lo,csh@lo '(echo {1} {2}) | awk \{print\ \$2}' ::: a b c ::: 1 2
### Test --onall
csh@lo 1
csh@lo 2
csh@lo 1
csh@lo 2
csh@lo 1
csh@lo 2
parallel@lo 1
parallel@lo 2
parallel@lo 1
parallel@lo 2
parallel@lo 1
parallel@lo 2
echo '### Test | --onall'; seq 3 | parallel --onall --tag -k -S parallel@lo,csh@lo '(echo {1} {2}) | awk \{print\ \$2}' ::: a b c :::: -
### Test | --onall
csh@lo 1
csh@lo 2
csh@lo 3
csh@lo 1
csh@lo 2
csh@lo 3
csh@lo 1
csh@lo 2
csh@lo 3
parallel@lo 1
parallel@lo 2
parallel@lo 3
parallel@lo 1
parallel@lo 2
parallel@lo 3
parallel@lo 1
parallel@lo 2
parallel@lo 3
echo '### Test --onall -u'; parallel --onall -S parallel@lo,csh@lo -u '(echo {1} {2}) | awk \{print\ \$2}' ::: a b c ::: 1 2 3 | sort
### Test --onall -u
1
1
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
echo '### Test --nonall'; parallel --nonall -k -S parallel@lo,csh@lo pwd | sort
### Test --nonall
/home/csh
/home/parallel
echo '### Test --nonall -u - should be interleaved x y x y'; parallel --nonall -S parallel@lo,csh@lo -u 'pwd|grep -q csh && sleep 3; pwd;sleep 12;pwd;'
### Test --nonall -u - should be interleaved x y x y
/home/parallel
/home/csh
/home/parallel
/home/csh
echo '### Test read sshloginfile from STDIN'; echo parallel@lo,csh@lo | parallel -S - -k --nonall pwd; echo parallel@lo,csh@lo | parallel --sshloginfile - -k --onall pwd\; echo ::: foo
### Test read sshloginfile from STDIN
/home/csh
/home/parallel
/home/csh
foo
/home/parallel
foo
echo '**'
**
echo '### Test --nonall --basefile'; touch tmp/nonall--basefile; stdout parallel --nonall --basefile tmp/nonall--basefile -S parallel@lo,csh@lo ls tmp/nonall--basefile; stdout parallel --nonall -S parallel@lo,csh@lo rm tmp/nonall--basefile; stdout rm tmp/nonall--basefile
### Test --nonall --basefile
tmp/nonall--basefile
tmp/nonall--basefile
echo '**'
**
echo '### Test --onall --basefile'; touch tmp/onall--basefile; stdout parallel --onall --basefile tmp/onall--basefile -S parallel@lo,csh@lo ls {} ::: tmp/onall--basefile; stdout parallel --onall -S parallel@lo,csh@lo rm {} ::: tmp/onall--basefile; stdout rm tmp/onall--basefile
### Test --onall --basefile
tmp/onall--basefile
tmp/onall--basefile
echo '**'
**
echo '### Test --nonall --basefile --cleanup (rm should fail)'; touch tmp/nonall--basefile--clean; stdout parallel --nonall --basefile tmp/nonall--basefile--clean --cleanup -S parallel@lo,csh@lo ls tmp/nonall--basefile--clean; stdout parallel --nonall -S parallel@lo,csh@lo rm tmp/nonall--basefile--clean; stdout rm tmp/nonall--basefile--clean
### Test --nonall --basefile --cleanup (rm should fail)
tmp/nonall--basefile--clean
tmp/nonall--basefile--clean
rm: cannot remove tmp/nonall--basefile--clean: No such file or directory
rm: cannot remove tmp/nonall--basefile--clean: No such file or directory
echo '**'
**
echo '### Test --onall --basefile --cleanup (rm should fail)'; touch tmp/onall--basefile--clean; stdout parallel --onall --basefile tmp/onall--basefile--clean --cleanup -S parallel@lo,csh@lo ls {} ::: tmp/onall--basefile--clean; stdout parallel --onall -S parallel@lo,csh@lo rm {} ::: tmp/onall--basefile--clean; stdout rm tmp/onall--basefile--clean
### Test --onall --basefile --cleanup (rm should fail)
tmp/onall--basefile--clean
tmp/onall--basefile--clean
rm: cannot remove tmp/onall--basefile--clean: No such file or directory
rm: cannot remove tmp/onall--basefile--clean: No such file or directory
echo '**'
**
echo '### Test --workdir .'; ssh parallel@lo mkdir -p mydir; mkdir -p $HOME/mydir; cd $HOME/mydir; parallel --workdir . -S parallel@lo ::: pwd
### Test --workdir .
/home/parallel/mydir
echo '### Test --wd .'; ssh csh@lo mkdir -p mydir; mkdir -p $HOME/mydir; cd $HOME/mydir; parallel --workdir . -S csh@lo ::: pwd
### Test --wd .
/home/csh/mydir