parallel: Use --total-jobs + --bar if input is generated slowly.

This commit is contained in:
Ole Tange 2022-11-02 20:09:16 +01:00
parent 53e3f5b7a3
commit 466e184d5f
8 changed files with 62 additions and 33 deletions

View file

@ -176,6 +176,10 @@ torsocks git push
torsocks git push origin $TAG
torsocks git push origin $YYYYMMDD
git branch -d premaster
git branch premaster
== Zenodo ==
Add tar.bz2 [Start upload] and [Publish].
@ -255,40 +259,23 @@ from:tange@gnu.org
to:parallel@gnu.org, bug-parallel@gnu.org
stable-bcc: Jesse Alama <jessealama@fastmail.fm>
Subject: GNU Parallel 20221022 ('Nord Stream') released
Subject: GNU Parallel 20221122 ('Lula/#MashaAmina') released
GNU Parallel 20221022 ('Nord Stream') has been released. It is available for download at: lbry://@GnuParallel:4
GNU Parallel 20221122 ('<<>>') has been released. It is available for download at: lbry://@GnuParallel:4
Quote of the month:
If used properly, #gnuparallel actually enables time travel.
-- Dr. James Wasmuth @jdwasmuth@twitter
<<>>
New in this release:
* --latest-line chops line length at terminal width.
* Determine max command length faster on Microsoft Windows.
* <<>>
* Bug fixes and man page updates.
News about GNU Parallel:
* Distributed Task Processing with GNU Parallel https://www.youtube.com/watch?v=usbMLggdMgc
* GNU Parallel workflow for many small, independent runs https://docs.csc.fi/support/tutorials/many/
* Copy a File To Multiple Directories With A Single Command on Linux https://www.linuxfordevices.com/tutorials/linux/copy-file-to-multiple-directories-with-one-command
* Behind The Scenes: The Power Of Simple Command Line Tools At Cloud Scale https://blog.gdeltproject.org/behind-the-scenes-the-power-of-simple-command-line-tools-at-cloud-scale/
* Run lz4 compression in parallel using GNU parallel https://www.openguru.com/2022/09/
* Xargs / Parallel With Code Examples https://www.folkstalk.com/2022/09/xargs-parallel-with-code-examples.html
* Parallel processing on a single node with GNU Parallel https://www3.cs.stonybrook.edu/~cse416/Section01/Slides/SeaWulfIntro_CSE416_09222022.pdf
* Using GNU parallel painlessly -- from basics to bioinformatics job orchestration https://www.youtube.com/watch?v=qypUdm-IE9c
* <<>>
GNU Parallel - For people who live life in the parallel lane.

View file

@ -1922,6 +1922,8 @@ sub options_completion_hash() {
("eta[Show the estimated number of seconds before finishing]"
=> \$opt::eta),
"bar[Show progress as a progress bar]" => \$opt::bar,
("total-jobs|totaljobs|total=s".
"[Set total number of jobs]" => \$opt::totaljobs),
"shuf[Shuffle jobs]" => \$opt::shuf,
("arg-sep|argsep=s".
"[Use sep-str instead of ::: as separator string]:sep-str"
@ -8873,7 +8875,9 @@ sub total_jobs($) {
::error("--pipe is incompatible with --eta/--bar/--shuf");
::wait_and_exit(255);
}
if($opt::sqlworker) {
if($opt::totaljobs) {
$self->{'total_jobs'} = $opt::totaljobs;
} elsif($opt::sqlworker) {
$self->{'total_jobs'} = $Global::sql->total_jobs();
} else {
my $record;

View file

@ -556,6 +556,8 @@ It is compatible with B<zenity>:
2> >(perl -pe 'BEGIN{$/="\r";$|=1};s/\r/\n/g' |
zenity --progress --auto-kill) | wc
See also: B<--eta> B<--progress> B<--total-jobs>
=item B<--basefile> I<file>
@ -3053,6 +3055,10 @@ If I<password> is given, B<sshpass> will be used. Otherwise the
sshlogin must not require a password (B<ssh-agent> and B<ssh-copy-id>
may help with that).
If the hostname is an IPv6 address, the port can be given separated
with p or #. If the address is enclosed in [] you can also use :.
E.g. ::1p2222 ::1#2222 [::1]:2222
The sshlogin ':' is special, it means 'no ssh' and will therefore run
on the local computer.
@ -3280,6 +3286,20 @@ dies before the waiting time is up.
See also: B<--halt> B<--timeout> B<--memfree>
=item B<--total-jobs> I<jobs> (alpha testing)
=item B<--total> I<jobs> (alpha testing)
Provide the total number of jobs for computing ETA which is also used
for B<--bar>.
Without B<--total-jobs> GNU Parallel will read all jobs before
starting a job. B<--total-jobs> is useful if the input is generated
slowly.
See also: B<--bar> B<--eta>
=item B<--tmpdir> I<dirname>
Directory for temporary files.
@ -4295,8 +4315,8 @@ by others, the output might help them figure out the problem.
Whether you have watched the intro videos
(https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1), walked
through the tutorial (man parallel_tutorial), and read the EXAMPLE
section in the man page (man parallel - search for EXAMPLE:).
through the tutorial (man parallel_tutorial), and read the examples
(man parallel_examples).
=back

View file

@ -772,13 +772,16 @@ printed as soon as possible you can use B<-u>.
Compare the output of:
parallel wget --limit-rate=100k \
parallel wget --progress=dot --limit-rate=100k \
https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
::: {12..16}
parallel --line-buffer wget --limit-rate=100k \
parallel --line-buffer wget --progress=dot --limit-rate=100k \
https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
::: {12..16}
parallel -u wget --limit-rate=100k \
parallel --latest-line wget --progress=dot --limit-rate=100k \
https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
::: {12..16}
parallel -u wget --progress=dot --limit-rate=100k \
https://ftpmirror.gnu.org/parallel/parallel-20{}0822.tar.bz2 \
::: {12..16}

View file

@ -8,6 +8,20 @@
# Each should be taking 10-30s and be possible to run in parallel
# I.e.: No race conditions, no logins
par_totaljobs() {
. `which env_parallel.bash`
myrun() {
total="$@"
slowseq() { seq "$@" | pv -qL 3; }
elapsed() { /usr/bin/time -f %e stdout "$@" 2>&1 >/dev/null; }
slowseq 5 | elapsed parallel -j 1 $total --bar 'sleep 1; true'
}
export -f myrun
parset mytime myrun ::: '' '--total 5'
# --total should run > 2 sec faster
perl -E 'say ((2+shift) < (shift) ? "Error: --total should be faster" : "OK")' ${mytime[0]} ${mytime[1]}
}
par_ll_long_line() {
echo '### --latest-line with lines longer than terminal width'
COLUMNS=30 parallel --delay 0.3 --tagstring '{=$_.="x"x$_=}' \

View file

@ -960,10 +960,10 @@ par_sem_quote ### sem --quote should not add empty argument
par_sem_quote echo
par_sem_quote
par_shellcompletion ### --shellcompletion
par_shellcompletion 863f31c091219fc53dc89fd707f5995b -
par_shellcompletion 863f31c091219fc53dc89fd707f5995b -
par_shellcompletion 88a69a99c93b79b5ed6491c80e9762b0 -
par_shellcompletion 88a69a99c93b79b5ed6491c80e9762b0 -
par_shellcompletion 139a52b9a64a9fd8ec1f63c2d78ff9ac -
par_shellcompletion 139a52b9a64a9fd8ec1f63c2d78ff9ac -
par_shellcompletion 01947895bda95d99e1b8948a31b1c1f7 -
par_shellcompletion 01947895bda95d99e1b8948a31b1c1f7 -
par_slow_pipe_regexp ### bug #53718: --pipe --regexp -N blocks
par_slow_pipe_regexp This should take a few ms, but took more than 2 hours
par_slow_pipe_regexp 0 1 1

View file

@ -1074,6 +1074,7 @@ par_tmp_full parallel: Error: Change $TMPDIR with --tmpdir or use --compress.
par_tmux_fg bug #50107: --tmux --fg should also write how to access it
par_tmux_fg See output with: tmux -S tmp attach
par_tmux_fg open terminal failed: not a terminal
par_totaljobs OK
par_xargs_compat xargs compatibility
par_xargs_compat ### Test -L -l and --max-lines
par_xargs_compat a_b

View file

@ -120,7 +120,7 @@ par_kill_hup parallel: bash -c 'sleep 3 & pid=$!; wait $pid'
par_kill_hup bash---pstree
par_ll_lb_color bug #62386: --color (--ctag but without --tag)
par_ll_lb_color bug #62438: See last line from multiple jobslots
par_ll_lb_color 29fcbb4944fef7ba0cd0fa8358dba815 -
par_ll_lb_color c13699ada05324a5bab5aee05d97aa55 -
par_more_than_9_relative_sshlogin ### Check more than 9(relative) simultaneous sshlogins
par_more_than_9_relative_sshlogin 1
par_more_than_9_relative_sshlogin 2