mirror of
https://git.savannah.gnu.org/git/parallel.git
synced 2024-12-23 13:17:54 +00:00
parallel man page: parallel web spider
This commit is contained in:
parent
9705fcfeeb
commit
5298af094d
|
@ -177,50 +177,15 @@ cc:Peter Simons <simons@cryp.to>, Sandro Cazzaniga <kharec@mandriva.org>,
|
||||||
Christian Faulhammer <fauli@gentoo.org>, Ryoichiro Suzuki <ryoichiro.suzuki@gmail.com>,
|
Christian Faulhammer <fauli@gentoo.org>, Ryoichiro Suzuki <ryoichiro.suzuki@gmail.com>,
|
||||||
Jesse Alama <jesse.alama@gmail.com>
|
Jesse Alama <jesse.alama@gmail.com>
|
||||||
|
|
||||||
Subject: GNU Parallel 20110722 ('Murdoch') released
|
Subject: GNU Parallel 20110822 ('Utøya') released
|
||||||
|
|
||||||
GNU Parallel 20110722 ('Murdoch') has been released. It is
|
GNU Parallel 20110822 ('Utøya') has been released. It is
|
||||||
available for download at: http://ftp.gnu.org/gnu/parallel/
|
available for download at: http://ftp.gnu.org/gnu/parallel/
|
||||||
|
|
||||||
New in this release:
|
New in this release:
|
||||||
|
|
||||||
* niceload: --hard will suspend a program if a limit is reached - as
|
* Blog post about optimizing JPEGs. Thanks to Thomas Jost.
|
||||||
opposed to just slowing the program down.
|
http://schnouki.net/2011/07/22/optimizing-jpeg-pictures/
|
||||||
|
|
||||||
* niceload: --soft will slow the program down - as opposed to
|
|
||||||
suspending the program completely.
|
|
||||||
|
|
||||||
* niceload: --run-io will slow down a program if disk io goes above a
|
|
||||||
certain limit.
|
|
||||||
|
|
||||||
* niceload: --run-load will slow down a program if loadaverage goes
|
|
||||||
above a certain limit.
|
|
||||||
|
|
||||||
* niceload: --run-mem will slow down a program if free memory goes
|
|
||||||
below a certain limit.
|
|
||||||
|
|
||||||
* niceload: --run-noswap will slow down a program if the computer is
|
|
||||||
swapping.
|
|
||||||
|
|
||||||
* niceload: --start-io, --start-load, --start-mem, --start-noswap will
|
|
||||||
defer starting a program until the system is below the limit.
|
|
||||||
|
|
||||||
* --io, --load, --mem, and --noswap sets both --run-* and --start-*.
|
|
||||||
|
|
||||||
* niceload got a major rewrite and is now object oriented.
|
|
||||||
|
|
||||||
* GNU Parallel was presented at Nordic Perl Workshop 2011.
|
|
||||||
http://conferences.yapceurope.org/npw2011/talk/3416
|
|
||||||
|
|
||||||
* Blog post about zcat and GNU Parallel. Thanks to Dr. John.
|
|
||||||
http://drjohnstechtalk.com/blog/2011/06/gnu-parallel-really-helps-with-zcat/
|
|
||||||
|
|
||||||
* 2 blog posts in Japanese. Thanks to Negima.
|
|
||||||
http://d.hatena.ne.jp/negima1976/20110607/1307412660
|
|
||||||
http://d.hatena.ne.jp/negima1976/20110628/1309252494
|
|
||||||
|
|
||||||
* Blog post for bioinformatics. Thanks to Chris Miller.
|
|
||||||
http://chrisamiller.com/science/2010/05/26/use-parallel-for-easy-multi-processor-execution/
|
|
||||||
|
|
||||||
* Bug fixes and man page updates.
|
* Bug fixes and man page updates.
|
||||||
|
|
||||||
|
|
|
@ -1523,6 +1523,37 @@ B<$(date -d "today -{1} days" +%Y%m%d)> with give the dates in
|
||||||
YYYYMMDD with {1} days subtracted.
|
YYYYMMDD with {1} days subtracted.
|
||||||
|
|
||||||
|
|
||||||
|
=head1 EXAMPLE: Parallel spider
|
||||||
|
|
||||||
|
This script below will spider a URL in parallel (breadth first). Run
|
||||||
|
like this:
|
||||||
|
|
||||||
|
B<PARALLEL=-j50 ./parallel-spider http://www.gnu.org/software/parallel>
|
||||||
|
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# E.g. http://www.gnu.org/software/parallel
|
||||||
|
URL=$1
|
||||||
|
URLLIST=$(mktemp urllist.XXXX)
|
||||||
|
URLLIST2=$(mktemp urllist.XXXX)
|
||||||
|
SEEN=$(mktemp seen.XXXX)
|
||||||
|
|
||||||
|
# Spider to get the URLs
|
||||||
|
echo $URL >$URLLIST
|
||||||
|
cp $URLLIST $SEEN
|
||||||
|
|
||||||
|
while [ -s $URLLIST ] ; do
|
||||||
|
cat $URLLIST |
|
||||||
|
parallel lynx -listonly -image_links -dump {} \; echo Spidered: {} \>\&2 |
|
||||||
|
perl -ne 's/#.*//; s/\s+\d+.\s(\S+)$/$1/ and do { $seen{$1}++ or print }' |
|
||||||
|
grep -F $URL |
|
||||||
|
grep -v -x -F -f $SEEN | tee -a $SEEN > $URLLIST2
|
||||||
|
mv $URLLIST2 $URLLIST
|
||||||
|
done
|
||||||
|
|
||||||
|
rm -f $URLLIST $URLLIST2 $SEEN
|
||||||
|
|
||||||
|
|
||||||
=head1 EXAMPLE: Process files from a tar file while unpacking
|
=head1 EXAMPLE: Process files from a tar file while unpacking
|
||||||
|
|
||||||
If the files to be processed are in a tar file then unpacking one file
|
If the files to be processed are in a tar file then unpacking one file
|
||||||
|
|
Loading…
Reference in a new issue