diff --git a/doc/FUTURE_IDEAS b/doc/FUTURE_IDEAS index 6e89a550..4b3bc435 100644 --- a/doc/FUTURE_IDEAS +++ b/doc/FUTURE_IDEAS @@ -10,28 +10,24 @@ Transfer scriptfile before first job. Remove it when last job done. monitor to see which jobs are currently running http://code.google.com/p/ppss/ -Accept signal INT instead of TERM to complete current running jobs but -do not start new jobs. Print out the number of jobs waiting to -complete on STDERR. Accept sig INT again to kill now. This seems to be -hard, as all foreground processes get the INT from the shell. - If there are nomore jobs (STDIN is closed) then make sure to distribute the arguments evenly if running -X. -Parallelize so this can be done: -mdm.screen find dir -execdir mdm-run cmd {} \; -Maybe: -find dir -execdir par$ --communication-file /tmp/comfile cmd {} \; +=head1 options -find dir -execdir mutex -j4 -b cmd {} \; +One char options not used: F G J K P Q Y -=head2 Comfile +Skilletegn i sshlogin: +#=item B<--sshlogin> I<[ncpu/]sshlogin[,[ncpu/]sshlogin[,...]]> (beta testing) +# Skilletegn: +# No: "#!&()?\<>|;*'~ shellspecial +# No: @.- part of user@i.p.n.r i.p.n.r host-name +# No: , separates different sshlogins +# No: space Will make it hard to do: 8/server1,server2 +# Maybe: / 8//usr/bin/myssh,//usr/bin/ssh +# %/=:_^ -This will put a lock on /tmp/comfile. The number of locks is the number of running commands. -If the number is smaller than -j then it will start a process in the background ( cmd & ), -otherwise wait. -par$ --wait /tmp/comfile will wait until no more locks on the file =head2 mutex @@ -49,16 +45,27 @@ If -b given works like: mutex -l lockfile -n number_of_locks ; (command; mutex - Kan vi finde på lockid som giver mening? -=head1 options +Parallelize so this can be done: +mdm.screen find dir -execdir mdm-run cmd {} \; +Maybe: +find dir -execdir par$ --communication-file /tmp/comfile cmd {} \; -One char options not used: F G J K P Q Y +find dir -execdir mutex -j4 -b cmd {} \; + +=head2 Comfile + +This will put a lock on /tmp/comfile. The number of locks is the number of running commands. +If the number is smaller than -j then it will start a process in the background ( cmd & ), +otherwise wait. + +par$ --wait /tmp/comfile will wait until no more locks on the file + + + +=head1 Unlikely + +Accept signal INT instead of TERM to complete current running jobs but +do not start new jobs. Print out the number of jobs waiting to +complete on STDERR. Accept sig INT again to kill now. This seems to be +hard, as all foreground processes get the INT from the shell. -Skilletegn i sshlogin: -#=item B<--sshlogin> I<[ncpu/]sshlogin[,[ncpu/]sshlogin[,...]]> (beta testing) -# Skilletegn: -# No: "#!&()?\<>|;*'~ shellspecial -# No: @.- part of user@i.p.n.r i.p.n.r host-name -# No: , separates different sshlogins -# No: space Will make it hard to do: 8/server1,server2 -# Maybe: / 8//usr/bin/myssh,//usr/bin/ssh -# %/=:_^ diff --git a/src/parallel b/src/parallel index 7b26b0bd..59072ef7 100755 --- a/src/parallel +++ b/src/parallel @@ -175,6 +175,29 @@ B<-g> is the default. Can be reversed with B<-u>. Print a summary of the options to GNU B and exit. +=item B<--halt-on-error> <0|1|2> + +=item B<-H> <0|1|2> + +=over 3 + +=item 0 + +Do not halt if a job fails. This is the default. + +=item 1 + +Do not start new jobs if a job fails, but complete the running jobs +including cleanup. The exit status will be the exit status from the +last failing job. + +=item 2 + +Kill off all jobs immediately and exit without cleanup. The exit +status will be the exit status from the failing job. + +=back + =item B<-I> I @@ -958,6 +981,84 @@ This will tell GNU B to not start any new jobs, but wait until the currently running jobs are finished before exiting. +=head1 ENVIRONMENT VARIABLES + +=over 9 + +=item $PARALLEL_PID - unimplemented + +The environment variable $PARALLEL_PID is set by GNU B and +is visible to the jobs started from GNU B. This makes it +possible for the jobs to communicate directly to GNU . + +B If each of the jobs tests a solution and one of jobs finds +the solution the job can tell GNU B not to start more jobs +by: B. This only works on the local +computer. + +=item $PARALLEL + +The environment variable $PARALLEL will be used as default options for +GNU B. However, because some options take arguments the +options need to be split into groups in which only the last option +takes an argument. Each group of options should be put on a line of its +own. + +B + +B + +can be written as: + +B + +B + +can be written as: + +B + +Notice the newline in the middle is needed because both B<-S> and +B<-j> take an argument and thus both need to be at the end of a group. + +=back + +=head1 INIT FILE (RC FILE) + +The file ~/.parallelrc will be read if it exists. It should be +formatted like the environment variable $PARALLEL. Lines starting with +'#' will be ignored. + + +=head1 EXIT STATUS + +If B<--halt-on-error> 0 or not specified: + +=over 6 + +=item 0 + +All jobs ran without error. + +=item 1-253 + +Some of the jobs failed. The exit status gives the number of failed jobs + +=item 254 + +More than 253 jobs failed. + +=item 255 + +Other error. + +=back + +If B<--halt-on-error> 1 or 2: Exit status of the failing job. + + =head1 DIFFERENCES BETWEEN GNU Parallel AND ALTERNATIVES There are a lot programs with some of the functionality of GNU @@ -1254,58 +1355,6 @@ B>B< result> B -=head1 ENVIRONMENT VARIABLES - -=over 9 - -=item $PARALLEL_PID - unimplemented - -The environment variable $PARALLEL_PID is set by GNU B and -is visible to the jobs started from GNU B. This makes it -possible for the jobs to communicate directly to GNU . - -B If each of the jobs tests a solution and one of jobs finds -the solution the job can tell GNU B not to start more jobs -by: B. This only works on the local -computer. - -=item $PARALLEL - -The environment variable $PARALLEL will be used as default options for -GNU B. However, because some options take arguments the -options need to be split into groups in which only the last option -takes an argument. Each group of options should be put on a line of its -own. - -B - -B - -can be written as: - -B - -B - -can be written as: - -B - -Notice the newline in the middle is needed because both B<-S> and -B<-j> take an argument and thus both need to be at the end of a group. - -=back - -=head1 INIT FILE (RC FILE) - -The file ~/.parallelrc will be read if it exists. It should be -formatted like the environment variable $PARALLEL. Lines starting with -'#' will be ignored. - - - =head1 BUGS Filenames beginning with '-' can cause some commands to give @@ -1467,6 +1516,11 @@ init_run_jobs(); start_more_jobs(); ReapIfNeeded(); drain_job_queue(); +if($::opt_halt_on_error) { + exit $Global::halt_on_error_exitstatus; +} else { + exit(min($Global::exitstatus,254)); +} sub parse_options { # Defaults: @@ -1485,6 +1539,8 @@ sub parse_options { $Global::interactive = 0; $Global::stderr_verbose = 0; $Global::default_simultaneous_sshlogins = 9; + $Global::exitstatus = 0; + $Global::halt_on_error_exitstatus = 0; Getopt::Long::Configure ("bundling","require_order"); # Add options from .parallelrc @@ -1526,6 +1582,7 @@ sub parse_options { "trc=s" => \@::opt_trc, "transfer" => \$::opt_transfer, "cleanup" => \$::opt_cleanup, + "halt-on-error|H=s" => \$::opt_halt_on_error, # xargs-compatibility - implemented, man, unittest "max-procs|P=s" => \$::opt_P, "delimiter|d=s" => \$::opt_d, @@ -1590,7 +1647,7 @@ sub parse_options { if(defined $::opt_a) { if(not open(ARGFILE,"<",$::opt_a)) { print STDERR "$Global::progname: Cannot open input file `$::opt_a': No such file or directory\n"; - exit(-1); + exit(255); } $Global::argfile = *ARGFILE; } @@ -1922,7 +1979,7 @@ sub processes_available_by_system_limit { # The child takes one process slot # It will be killed later sleep 100000; - exit; + exit(0); } else { $max_system_proc_reached = 1; } @@ -2226,6 +2283,7 @@ sub min { # Variable structure: # $Global::running{$pid}{'seq'} = printsequence # $Global::running{$pid}{sshlogin} = server to run on +# $Global::running{$pid}{'exitstatus'} = exit status # $Global::host{$sshlogin}{'no_of_running'} = number of currently running jobs # $Global::host{$sshlogin}{'ncpus'} = number of cpus # $Global::host{$sshlogin}{'maxlength'} = max line length (currently buggy for remote) @@ -2288,7 +2346,11 @@ sub next_command_line_with_sshlogin { $post .= "$sshcmd $serverlogin rm -f ".shell_quote($file).";"; } } - return "$pre$sshcmd $serverlogin ".shell_quote($next_command_line)."; $post"; + if($post) { + # We need to save the exit status of the job + $post = '_EXIT_status=$?; '.$post.' exit $_EXIT_status;'; + } + return "$pre$sshcmd $serverlogin ".shell_quote($next_command_line).";".$post; } else { return $next_command_line; } @@ -2591,7 +2653,7 @@ sub sshcommand_of_sshlogin { } else { debug($master,"\n"); `$master`; - exit; + exit(0); } } } else { @@ -2661,9 +2723,13 @@ sub Reaper { # This is one of the ssh -M: ignore next; } - if($Global::keeporder) { + $Global::running{$stiff}{'exitstatus'} = $? >> 8; + debug("died ($Global::running{$stiff}{'exitstatus'}): $Global::running{$stiff}{'seq'}"); + # Force printing now if the job failed and we are going to exit + my $print_now = ($Global::running{$stiff}{'exitstatus'} and + $::opt_halt_on_error and $::opt_halt_on_error == 2); + if($Global::keeporder and not $print_now) { $Global::print_later{$Global::running{$stiff}{"seq"}} = $Global::running{$stiff}; - debug("died: $Global::running{$stiff}{'seq'}"); while($Global::print_later{$Global::job_end_sequence}) { debug("Found job end $Global::job_end_sequence"); print_job($Global::print_later{$Global::job_end_sequence}); @@ -2673,6 +2739,26 @@ sub Reaper { } else { print_job ($Global::running{$stiff}); } + if($Global::running{$stiff}{'exitstatus'}) { + # The jobs had a exit status <> 0, so error + $Global::exitstatus++; + if($::opt_halt_on_error) { + if($::opt_halt_on_error == 1) { + # If halt on error == 1 we should gracefully exit + print STDERR ("$Global::progname: Starting no more jobs. ", + "Waiting for ", scalar(keys %Global::running), + " jobs to finish. This job failed:\n", + $Global::running{$stiff}{"command"},"\n"); + $Global::StartNoNewJobs++; + $Global::halt_on_error_exitstatus = $Global::running{$stiff}{'exitstatus'}; + } elsif($::opt_halt_on_error == 2) { + # If halt on error == 2 we should exit immediately + print STDERR ("$Global::progname: This job failed:\n", + $Global::running{$stiff}{"command"},"\n"); + exit ($Global::running{$stiff}{'exitstatus'}); + } + } + } my $sshlogin = $Global::running{$stiff}{'sshlogin'}; $Global::host{$sshlogin}{'no_of_running'}--; $Global::running_jobs--; @@ -2690,7 +2776,7 @@ sub Reaper { sub die_usage { usage(); - exit(1); + exit(255); } sub usage { @@ -2808,8 +2894,13 @@ $Global::control_path = 0; # TODO Debian package # TODO transfer a script to be run -# TODO check that error code is passed out. echo | parallel /bin/false should give error code -# TODO halt on first error. (/bin/false; E=$?; /bin/true; echo $E; exit $E); echo $? -# TODO halt on first error --soft (let running complete) --hard (killall running) -# TODO to kill from a run script parallel should set PARALLEL_PID that can be sig termed +# -F basefile this file will be transferred to each sshlogin before a +# jobs is started. It will be removed if --cleanup is active. The file +# may be a script to run or some common base data needed for the jobs. +# Multiple -F can be specified to transfer more basefiles. + +# TODO to kill from a run script parallel should set PARALLEL_PID that can be sig termed +# TAGS: parallel | parallel processing | multicore | multiprocessor | Clustering/Distributed Networks +# job control | multiple jobs | parallelization | text processing | cluster | filters +# Clustering Tools | Command Line Tools | Utilities | System Administration diff --git a/unittest/actual-results/test21 b/unittest/actual-results/test21 index ffa47261..690ea1ac 100644 --- a/unittest/actual-results/test21 +++ b/unittest/actual-results/test21 @@ -1,11 +1,11 @@ ### Test $PARALLEL 1 -ssh -l parallel parallel-server2 echo\ 1; +ssh -l parallel parallel-server2 echo\ 1; 1 -ssh parallel-server1 echo\ 2; +ssh parallel-server1 echo\ 2; 2 ### Test ~/.parallelrc -ssh -l parallel parallel-server2 echo\ 1; +ssh -l parallel parallel-server2 echo\ 1; 1 -ssh parallel-server1 echo\ 2; +ssh parallel-server1 echo\ 2; 2 diff --git a/unittest/actual-results/test22 b/unittest/actual-results/test22 new file mode 100644 index 00000000..1bc1721b --- /dev/null +++ b/unittest/actual-results/test22 @@ -0,0 +1,58 @@ +### Test exit val +0 +1 +### Test --halt-on-error +1 +parallel: Starting no more jobs. Waiting for 2 jobs to finish. This job failed: +sleep 2;false +1 +parallel: This job failed: +sleep 2;false +1 +sh: non_exist: command not found +2 +parallel: Starting no more jobs. Waiting for 3 jobs to finish. This job failed: +sleep 1;false +sh: non_exist: command not found +parallel: Starting no more jobs. Waiting for 1 jobs to finish. This job failed: +sleep 1; non_exist +127 +parallel: This job failed: +sleep 2;false +1 +### Test last dying print --halt-on-error +0 +1 +parallel: Starting no more jobs. Waiting for 9 jobs to finish. This job failed: +perl -e sleep\ \$ARGV[0]\;print\ STDERR\ @ARGV,\"\\n\"\;\ exit\ shift 1 +2 +parallel: Starting no more jobs. Waiting for 8 jobs to finish. This job failed: +perl -e sleep\ \$ARGV[0]\;print\ STDERR\ @ARGV,\"\\n\"\;\ exit\ shift 2 +3 +parallel: Starting no more jobs. Waiting for 7 jobs to finish. This job failed: +perl -e sleep\ \$ARGV[0]\;print\ STDERR\ @ARGV,\"\\n\"\;\ exit\ shift 3 +4 +parallel: Starting no more jobs. Waiting for 6 jobs to finish. This job failed: +perl -e sleep\ \$ARGV[0]\;print\ STDERR\ @ARGV,\"\\n\"\;\ exit\ shift 4 +5 +parallel: Starting no more jobs. Waiting for 5 jobs to finish. This job failed: +perl -e sleep\ \$ARGV[0]\;print\ STDERR\ @ARGV,\"\\n\"\;\ exit\ shift 5 +6 +parallel: Starting no more jobs. Waiting for 4 jobs to finish. This job failed: +perl -e sleep\ \$ARGV[0]\;print\ STDERR\ @ARGV,\"\\n\"\;\ exit\ shift 6 +7 +parallel: Starting no more jobs. Waiting for 3 jobs to finish. This job failed: +perl -e sleep\ \$ARGV[0]\;print\ STDERR\ @ARGV,\"\\n\"\;\ exit\ shift 7 +8 +0 +parallel: Starting no more jobs. Waiting for 2 jobs to finish. This job failed: +perl -e sleep\ \$ARGV[0]\;print\ STDERR\ @ARGV,\"\\n\"\;\ exit\ shift 8 +9 +parallel: Starting no more jobs. Waiting for 1 jobs to finish. This job failed: +perl -e sleep\ \$ARGV[0]\;print\ STDERR\ @ARGV,\"\\n\"\;\ exit\ shift 9 +9 +0 +1 +parallel: This job failed: +perl -e sleep\ \$ARGV[0]\;print\ STDERR\ @ARGV,\"\\n\"\;\ exit\ shift 1 +1 diff --git a/unittest/tests-to-run/test22.sh b/unittest/tests-to-run/test22.sh new file mode 100755 index 00000000..121bdd45 --- /dev/null +++ b/unittest/tests-to-run/test22.sh @@ -0,0 +1,34 @@ +#!/bin/bash + +PAR=parallel +SERVER1=parallel-server1 +SERVER2=parallel-server2 + +( +echo '### Test exit val' +echo true | parallel +echo $? +echo false | parallel +echo $? + +echo '### Test --halt-on-error' +(echo "sleep 1;true"; echo "sleep 2;false";echo "sleep 3;true") | parallel --halt-on-error 0 +echo $? +(echo "sleep 1;true"; echo "sleep 2;false";echo "sleep 3;true") | parallel --halt-on-error 1 +echo $? +(echo "sleep 1;true"; echo "sleep 2;false";echo "sleep 3;true") | parallel --halt-on-error 2 +echo $? + +(echo "sleep 1;true"; echo "sleep 1;false";echo "sleep 1;true";echo "sleep 1; non_exist") | parallel -H0 +echo $? +(echo "sleep 1;true"; echo "sleep 1;false";echo "sleep 1;true";echo "sleep 1; non_exist") | parallel -H1 +echo $? +(echo "sleep 1;true"; echo "sleep 2;false";echo "sleep 3;true";echo "sleep 4; non_exist") | parallel -H2 +echo $? + +echo '### Test last dying print --halt-on-error' +(seq 0 8;echo 0; echo 9) | parallel -kqH1 perl -e 'sleep $ARGV[0];print STDERR @ARGV,"\n"; exit shift' +echo $? +(seq 0 8;echo 0; echo 9) | parallel -kqH2 perl -e 'sleep $ARGV[0];print STDERR @ARGV,"\n"; exit shift' +echo $? +) 2>&1 diff --git a/unittest/wanted-results/test21 b/unittest/wanted-results/test21 index ffa47261..690ea1ac 100644 --- a/unittest/wanted-results/test21 +++ b/unittest/wanted-results/test21 @@ -1,11 +1,11 @@ ### Test $PARALLEL 1 -ssh -l parallel parallel-server2 echo\ 1; +ssh -l parallel parallel-server2 echo\ 1; 1 -ssh parallel-server1 echo\ 2; +ssh parallel-server1 echo\ 2; 2 ### Test ~/.parallelrc -ssh -l parallel parallel-server2 echo\ 1; +ssh -l parallel parallel-server2 echo\ 1; 1 -ssh parallel-server1 echo\ 2; +ssh parallel-server1 echo\ 2; 2 diff --git a/unittest/wanted-results/test22 b/unittest/wanted-results/test22 new file mode 100644 index 00000000..1bc1721b --- /dev/null +++ b/unittest/wanted-results/test22 @@ -0,0 +1,58 @@ +### Test exit val +0 +1 +### Test --halt-on-error +1 +parallel: Starting no more jobs. Waiting for 2 jobs to finish. This job failed: +sleep 2;false +1 +parallel: This job failed: +sleep 2;false +1 +sh: non_exist: command not found +2 +parallel: Starting no more jobs. Waiting for 3 jobs to finish. This job failed: +sleep 1;false +sh: non_exist: command not found +parallel: Starting no more jobs. Waiting for 1 jobs to finish. This job failed: +sleep 1; non_exist +127 +parallel: This job failed: +sleep 2;false +1 +### Test last dying print --halt-on-error +0 +1 +parallel: Starting no more jobs. Waiting for 9 jobs to finish. This job failed: +perl -e sleep\ \$ARGV[0]\;print\ STDERR\ @ARGV,\"\\n\"\;\ exit\ shift 1 +2 +parallel: Starting no more jobs. Waiting for 8 jobs to finish. This job failed: +perl -e sleep\ \$ARGV[0]\;print\ STDERR\ @ARGV,\"\\n\"\;\ exit\ shift 2 +3 +parallel: Starting no more jobs. Waiting for 7 jobs to finish. This job failed: +perl -e sleep\ \$ARGV[0]\;print\ STDERR\ @ARGV,\"\\n\"\;\ exit\ shift 3 +4 +parallel: Starting no more jobs. Waiting for 6 jobs to finish. This job failed: +perl -e sleep\ \$ARGV[0]\;print\ STDERR\ @ARGV,\"\\n\"\;\ exit\ shift 4 +5 +parallel: Starting no more jobs. Waiting for 5 jobs to finish. This job failed: +perl -e sleep\ \$ARGV[0]\;print\ STDERR\ @ARGV,\"\\n\"\;\ exit\ shift 5 +6 +parallel: Starting no more jobs. Waiting for 4 jobs to finish. This job failed: +perl -e sleep\ \$ARGV[0]\;print\ STDERR\ @ARGV,\"\\n\"\;\ exit\ shift 6 +7 +parallel: Starting no more jobs. Waiting for 3 jobs to finish. This job failed: +perl -e sleep\ \$ARGV[0]\;print\ STDERR\ @ARGV,\"\\n\"\;\ exit\ shift 7 +8 +0 +parallel: Starting no more jobs. Waiting for 2 jobs to finish. This job failed: +perl -e sleep\ \$ARGV[0]\;print\ STDERR\ @ARGV,\"\\n\"\;\ exit\ shift 8 +9 +parallel: Starting no more jobs. Waiting for 1 jobs to finish. This job failed: +perl -e sleep\ \$ARGV[0]\;print\ STDERR\ @ARGV,\"\\n\"\;\ exit\ shift 9 +9 +0 +1 +parallel: This job failed: +perl -e sleep\ \$ARGV[0]\;print\ STDERR\ @ARGV,\"\\n\"\;\ exit\ shift 1 +1