optional/R: Package builds without warnings.

This commit is contained in:
Ole Tange 2014-01-25 23:29:25 +01:00
parent 74014985fe
commit b78d652486
16 changed files with 447 additions and 81 deletions

View file

@ -204,112 +204,52 @@ cc:Sandro Cazzaniga <kharec@mandriva.org>,
Ryoichiro Suzuki <ryoichiro.suzuki@gmail.com>,
Jesse Alama <jesse.alama@gmail.com>
Subject: GNU Parallel 20140122 ('Opportunity') released
Subject: GNU Parallel 20140222 ('') released
GNU Parallel 20140122 ('Opportunity') has been released. It is
available for download at: http://ftp.gnu.org/gnu/parallel/
No new functionality was introduced so this is a good candidate for a
stable release.
GNU Parallel 20140222 ('') has been released. It is available for download at: http://ftp.gnu.org/gnu/parallel/
New in this release:
* GNU Parallel was cited in: On the likelihood of multiple bit upsets
in logic circuits
http://arxiv-web3.library.cornell.edu/pdf/1401.1003
* --tollef has been retired.
* HaploClique uses GNU Parallel
https://github.com/armintoepfer/haploclique
* Scraping NSScreencast
https://blog.nicolai86.eu/posts/2014-01-12/scraping-nsscreencast/
* 30 Cool Open Source Software I Discovered in 2013
http://www.cyberciti.biz/open-source/30-cool-best-open-source-softwares-of-2013/
* [Unix] 13. The power of doing things in Parallel
http://leetaey.tistory.com/384
* Parallel the execution of a job that read from stdin
http://www.linuxask.com/questions/parallel-the-execution-of-a-job-that-read-from-stdin
* Mon Make à moi (6:38-11:50)
http://videos.rennes.inria.fr/ReNaBI-GO2013/indexPierreLindenbaum.html
* Shell-Abarbeitung beschleunigen: Wie Sie mit parallelen Prozesse
effizienter in der Shell arbeiten
https://www.hosteurope.ch/blog/shell-abarbeitung-beschleunigen-wie-sie-mit-parallelen-prozesse-effizienter-in-der-shell-arbeiten/
* Summary of GNU Parallel tutorial
http://hacktracking.blogspot.dk/2014/01/gnu-parallel-tutorial.html
* GNU Parallel is co-distributed with RepeatExplorer
http://www.vcru.wisc.edu/simonlab/bioinformatics/programs/repeatexplorer/README.txt
* Bug fixes and man page updates.
= About GNU Parallel =
GNU Parallel is a shell tool for executing jobs in parallel using one
or more computers. A job is can be a single command or a small script
that has to be run for each of the lines in the input. The typical
input is a list of files, a list of hosts, a list of users, a list of
URLs, or a list of tables. A job can also be a command that reads from
a pipe. GNU Parallel can then split the input and pipe it into
commands in parallel.
GNU Parallel is a shell tool for executing jobs in parallel using one or more computers. A job is can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU Parallel can then split the input and pipe it into commands in parallel.
If you use xargs and tee today you will find GNU Parallel very easy to
use as GNU Parallel is written to have the same options as xargs. If
you write loops in shell, you will find GNU Parallel may be able to
replace most of the loops and make them run faster by running several
jobs in parallel. GNU Parallel can even replace nested loops.
If you use xargs and tee today you will find GNU Parallel very easy to use as GNU Parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU Parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel. GNU Parallel can even replace nested loops.
GNU Parallel makes sure output from the commands is the same output as
you would get had you run the commands sequentially. This makes it
possible to use output from GNU Parallel as input for other programs.
GNU Parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU Parallel as input for other programs.
You can find more about GNU Parallel at:
http://www.gnu.org/s/parallel/
You can find more about GNU Parallel at: http://www.gnu.org/s/parallel/
You can install GNU Parallel in just 10 seconds with:
(wget -O - pi.dk/3 || curl pi.dk/3/) | bash
You can install GNU Parallel in just 10 seconds with: (wget -O - pi.dk/3 || curl pi.dk/3/) | bash
Watch the intro video on
http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Watch the intro video on http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial (man parallel_tutorial). Your commandline
will love you for it.
Walk through the tutorial (man parallel_tutorial). Your commandline will love you for it.
When using programs that use GNU Parallel to process data for
publication please cite:
When using programs that use GNU Parallel to process data for publication please cite:
O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login:
The USENIX Magazine, February 2011:42-47.
O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login: The USENIX Magazine, February 2011:42-47.
= About GNU SQL =
GNU sql aims to give a simple, unified interface for accessing
databases through all the different databases' command line
clients. So far the focus has been on giving a common way to specify
login information (protocol, username, password, hostname, and port
number), size (database and table size), and running queries.
GNU sql aims to give a simple, unified interface for accessing databases through all the different databases' command line clients. So far the focus has been on giving a common way to specify login information (protocol, username, password, hostname, and port number), size (database and table size), and running queries.
The database is addressed using a DBURL. If commands are left out you
will get that database's interactive shell.
The database is addressed using a DBURL. If commands are left out you will get that database's interactive shell.
When using GNU SQL for a publication please cite:
O. Tange (2011): GNU SQL - A Command Line Tool for Accessing Different
Databases Using DBURLs, ;login: The USENIX Magazine, April 2011:29-32.
O. Tange (2011): GNU SQL - A Command Line Tool for Accessing Different Databases Using DBURLs, ;login: The USENIX Magazine, April 2011:29-32.
= About GNU Niceload =
GNU niceload slows down a program when the computer load average (or
other system activity) is above a certain limit. When the limit is
reached the program will be suspended for some time. If the limit is a
soft limit the program will be allowed to run for short amounts of
time before being suspended again. If the limit is a hard limit the
program will only be allowed to run when the system is below the
limit.
>>>>>
GNU niceload slows down a program when the computer load average (or other system activity) is above a certain limit. When the limit is reached the program will be suspended for some time. If the limit is a soft limit the program will be allowed to run for short amounts of time before being suspended again. If the limit is a hard limit the program will only be allowed to run when the system is below the limit.

View file

@ -0,0 +1,15 @@
Package: gnuparallel
Type: Package
Title: Loading of GNU Parallel --results output
Version: 0.1
Date: 2014-01-22
Author: Ole Tange
Maintainer: Ole Tange <tange@gnu.org>
Description: Load GNU Parallel --results output as R objects
License: GPL (>= 3)
LazyLoad: yes
Collate: 'gnuparallel.R'
Date/Publication: 2014-01-22 21:19:14
Packaged: 2014-01-21 02:35:20 UTC; tange
NeedsCompilation: no
Depends: R (>= 2.12.1), data.table, plyr

View file

@ -0,0 +1,4 @@
c521738aeecfa8e237500f4a3263143e DESCRIPTION
d186dcbdce42279894f9241df00ed5a2 NAMESPACE
55cef8c319d20ad1a7825e2bafb52a34 R/gnuparallel.R
2fbd375e38ecad68a4e96c80ebad2143 man/gnuparallel-package.Rd

View file

@ -0,0 +1,7 @@
export(heckbert)
export(gnu.parallel.filenames)
export(gnu.parallel.load)
export(gnu.parallel.load.lines)
export(gnu.parallel.load.data.frame)
export(gnu.parallel.load.data.table)

View file

@ -0,0 +1,224 @@
## Copyright (C) 2014 Ole Tange, David Rosenberg, and Free Software
## Foundation, Inc.
##
## This program is free software; you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with this program; if not, see <http://www.gnu.org/licenses/>
## or write to the Free Software Foundation, Inc., 51 Franklin St,
## Fifth Floor, Boston, MA 02110-1301 USA
#' Functions for reading GNU Parallel results dir
#'
#' \tabular{ll}{
#' Package: \tab gnuparallel\cr
#' Type: \tab Package\cr
#' Version: \tab 0.1\cr
#' Date: \tab 2014-01-22\cr
#' License: \tab GPL >= 3\cr
#' LazyLoad: \tab no\cr
#' }
#'
#' Implements a number of functions for reading GNU Parallel results dir
#'
#' @name gnuparallel-package
#' @aliases gnuparallel
#' @docType package
#' @title Results dir loading
#' @author Ole Tange \email{tange@@gnu.org}
#' @references
#' Tange, O. (2011) GNU Parallel - The Command-Line Power Tool, ;login: The USENIX Magazine, February 2011:42-47.
#' Talbot, J. (2011) labeling R-package, CRAN 2011.
#' @keywords parallel
#' @seealso \code{\link{heckbert}}, \code{\link{gnu.parallel.filenames}}
#' @examples
#' heckbert(8.1, 14.1, 4) # 5 10 15
#' # When plotting, extend the plot range to include the labeling
#' # Should probably have a helper function to make this easier
#' data(iris)
#' x <- iris$Sepal.Width
#' y <- iris$Sepal.Length
#' xl <- extended(min(x), max(x), 6)
#' yl <- extended(min(y), max(y), 6)
#' plot(x, y,
#' xlim=c(min(x,xl),max(x,xl)),
#' ylim=c(min(y,yl),max(y,yl)),
#' axes=FALSE, main="Extended labeling")
#' axis(1, at=xl)
#' axis(2, at=yl)
c()
#' Find the filenames in a results dir
#'
#' @param resdir results dir from GNU Parallel's --results
#' @return filenametable with a column for each of GNU Parallel's input sources and a column for file name of both stdout and stderr
#' @export
gnu.parallel.filenames <- function(resdir) {
## Find files called .../stdout
stdoutnames <- list.files(path=resdir, pattern="stdout", recursive=T);
## Find files called .../stderr
stderrnames <- list.files(path=resdir, pattern="stderr", recursive=T);
if(length(stdoutnames) == 0) {
## Return empty data frame if no files found
return(data.frame());
}
## The argument names are every other dir level
## The argument values are every other dir level
## e.g. my/results/dir/age/18/chromosome/20/stdout
m <- matrix(unlist(strsplit(stdoutnames, "/")),nrow = length(stdoutnames),byrow=T);
filenametable <- as.table(m[,c(F,T)]);
## Append the stdout and stderr filenames
filenametable <- cbind(filenametable,
paste(resdir,unlist(stdoutnames),sep="/"),
paste(resdir,unlist(stderrnames),sep="/"));
colnames(filenametable) <- c(strsplit(stdoutnames[1],"/")[[1]][c(T,F)],"stderr");
return(filenametable);
}
#' Read the contents of the stdout and stderr files as 1 field each
#'
#' @param filenametable from gnu.parallel.filenames
#' @return table with a column for each of GNU Parallel's input sources and 2 columns for content of stdout and stderr
#' @export
gnu.parallel.load <- function(filenametable) {
## Read the files given in column stdout
stdoutcontents <-
lapply(filenametable[,c("stdout")],
function(filename) {
return(readChar(filename, file.info(filename)$size));
} );
## Read the files given in column stderr
stderrcontents <-
lapply(filenametable[,c("stderr")],
function(filename) {
return(readChar(filename, file.info(filename)$size));
} );
## Replace filenames with file contents
filenametable[,c("stdout","stderr")] <-
c(as.character(stdoutcontents),as.character(stderrcontents));
return(filenametable);
}
#' Read the contents of the stdout and produce a row for each line of stdout
#'
#' @param filenametable from gnu.parallel.filenames
#' @return table with a column for each of GNU Parallel's input sources and a column for content of stdout
#' @export
gnu.parallel.load.lines <- function(filenametable,split="\n") {
raw <- gnu.parallel.load(filenametable);
## Keep all columns except stdout and stderr
varnames = setdiff(colnames(raw), c("stdout","stderr"))
## Find the id of the non-stdout and non-stderr columns
header_cols = which(colnames(raw) %in% varnames)
## Split stdout on \n
splits = strsplit(raw[,"stdout"], split)
## Compute lengths of all the lines
lens = sapply(splits, length)
## The arguments should be repeated as many times as there are lines
reps = rep(1:nrow(raw), lens)
## Merge the repeating argument and the lines into a matrix
m = cbind(raw[reps, header_cols], unlist(splits))
return(m)
}
#' Read the contents of the stdout and produce a row for each line of stdout split into columns on \t
#'
#' @param filenametable from gnu.parallel.filenames
#' @return table with a column for each of GNU Parallel's input sources and stdout split by \t
#' @export
gnu.parallel.load.data.table <- function(filenametable, ...) {
raw <- gnu.parallel.load(filenametable);
require(data.table)
## Keep all columns except stdout and stderr
varnames = setdiff(colnames(raw), c("stdout","stderr"))
## after data.table feature request the as.data.frame can be skipped
## and will thus be much faster
ddt = as.data.table(as.data.frame(raw,stringsAsFactors=FALSE))
## ensure fread knows stdout is string and not filename by appending \n
ddt[, stdout := paste0(stdout,"\n")]
## drop files with empty stdout
ddd = ddt[nchar(stdout)>1,fread(stdout, header=FALSE, ...), by=varnames]
return(ddd)
}
#' Read the contents of the stdout and produce a row for each line of stdout split into columns on \t
#'
#' @param filenametable from gnu.parallel.filenames
#' @return table with a column for each of GNU Parallel's input sources and stdout split by \t
#' @export
gnu.parallel.load.data.frame <- function(filenametable, ...) {
raw <- gnu.parallel.load(filenametable);
require(plyr)
## Convert to data.frame without factors
raw = as.data.frame(raw, stringsAsFactors = FALSE)
## Keep all columns except stdout and stderr
varnames = setdiff(colnames(raw), c("stdout","stderr"))
dd = ddply(raw, .variables=varnames, function(row) {
## Ignore empty stdouts
if (nchar(row[,"stdout"]) == 0) {
return(NULL)
}
## Read stdout with read.table
con <- textConnection(row[,"stdout"], open = "r")
d = read.table(con, header=FALSE, ...)
return(d)
})
return(dd)
}
#' Heckbert's labeling algorithm
#'
#' @param dmin minimum of the data range
#' @param dmax maximum of the data range
#' @param m number of axis labels
#' @return vector of axis label locations
#' @references
#' Heckbert, P. S. (1990) Nice numbers for graph labels, Graphics Gems I, Academic Press Professional, Inc.
#' @author Justin Talbot \email{jtalbot@@stanford.edu}
#' @export
heckbert <- function(dmin, dmax, m)
{
range <- .heckbert.nicenum((dmax-dmin), FALSE)
lstep <- .heckbert.nicenum(range/(m-1), TRUE)
lmin <- floor(dmin/lstep)*lstep
lmax <- ceiling(dmax/lstep)*lstep
seq(lmin, lmax, by=lstep)
}
.heckbert.nicenum <- function(x, round)
{
e <- floor(log10(x))
f <- x / (10^e)
if(round)
{
if(f < 1.5) nf <- 1
else if(f < 3) nf <- 2
else if(f < 7) nf <- 5
else nf <- 10
}
else
{
if(f <= 1) nf <- 1
else if(f <= 2) nf <- 2
else if(f <= 5) nf <- 5
else nf <- 10
}
nf * (10^e)
}

View file

@ -0,0 +1,23 @@
\name{gnu.parallel.filenames}
\alias{gnu.parallel.filenames}
\title{Find the filenames in a GNU Parallel results dir}
\usage{
gnu.parallel.filenames(resdir)
}
\arguments{
\item{resdir}{results dir from GNU Parallel's --results}
}
\value{
filenametable with a column for each of GNU Parallel's input sources
and a column for file name of both stdout and stderr
}
\description{
Find the filenames in a GNU Parallel results dir
}
\author{
Ole Tange \email{tange@gnu.org}, David Rosenberg
}
\references{
Tange, O. (2011) \emph{GNU Parallel - The Command-Line Power Tool} ;login:
The USENIX Magazine, February 2011:42-47.
}

View file

@ -0,0 +1,22 @@
\name{gnu.parallel.load}
\alias{gnu.parallel.load}
\title{Read the contents of the stdout and stderr files as 1 field each}
\usage{
gnu.parallel.load(filenametable)
}
\arguments{
\item{filenametable}{filenametable from gnu.parallel.filenames}
}
\value{
table with a column for each of GNU Parallel's input sources and 2 columns for content of stdout and stderr
}
\description{
Read the contents of the stdout and stderr files as 1 field each.
}
\author{
Ole Tange \email{tange@gnu.org}
}
\references{
Tange, O. (2011) \emph{GNU Parallel - The Command-Line Power Tool} ;login:
The USENIX Magazine, February 2011:42-47.
}

View file

@ -0,0 +1,25 @@
\name{gnu.parallel.load.data.frame}
\alias{gnu.parallel.load.data.frame}
\title{Read the contents of the stdout and produce a row for each line of stdout split into columns on TAB}
\usage{
gnu.parallel.load.data.frame(filenametable,...)
}
\arguments{
\item{filenametable}{filenametable from gnu.parallel.filenames}
\item{...}{passed on}
}
\value{
table with a column for each of GNU Parallel's input sources and
stdout split by TAB.
}
\description{
Read the contents of the stdout and produce a row for each line of
stdout split into columns on TAB.
}
\author{
Ole Tange \email{tange@gnu.org}, David Rosenberg
}
\references{
Tange, O. (2011) \emph{GNU Parallel - The Command-Line Power Tool} ;login:
The USENIX Magazine, February 2011:42-47.
}

View file

@ -0,0 +1,25 @@
\name{gnu.parallel.load.data.table}
\alias{gnu.parallel.load.data.table}
\title{Read the contents of the stdout and produce a row for each line of stdout split into columns on TAB}
\usage{
gnu.parallel.load.data.table(filenametable,...)
}
\arguments{
\item{filenametable}{filenametable from gnu.parallel.filenames}
\item{...}{passed on}
}
\value{
table with a column for each of GNU Parallel's input sources and
stdout split by TAB.
}
\description{
Read the contents of the stdout and produce a row for each line of
stdout split into columns on TAB.
}
\author{
Ole Tange \email{tange@gnu.org}, David Rosenberg
}
\references{
Tange, O. (2011) \emph{GNU Parallel - The Command-Line Power Tool} ;login:
The USENIX Magazine, February 2011:42-47.
}

View file

@ -0,0 +1,24 @@
\name{gnu.parallel.load.lines}
\alias{gnu.parallel.load.lines}
\title{Read the contents of the stdout and produce a row for each line of stdout}
\usage{
gnu.parallel.load.lines(filenametable,split="\n")
}
\arguments{
\item{filenametable}{filenametable from gnu.parallel.filenames}
\item{split}{defaults to newline}
}
\value{
table with a column for each of GNU Parallel's input sources and a
column for content of stdout
}
\description{
Read the contents of the stdout and produce a row for each line of stdout.
}
\author{
Ole Tange \email{tange@gnu.org}, David Rosenberg
}
\references{
Tange, O. (2011) \emph{GNU Parallel - The Command-Line Power Tool} ;login:
The USENIX Magazine, February 2011:42-47.
}

View file

@ -0,0 +1,53 @@
\docType{package}
\name{gnuparallel-package}
\alias{gnuparallel}
\alias{gnuparallel-package}
\title{GNU Parallel}
\description{
Loading of GNU Parallel --results output
}
\details{
\tabular{ll}{ Package: \tab gnuparallel\cr Type: \tab
Package\cr Version: \tab 2014.01.22\cr Date: \tab 2014-01-22\cr
License: \tab GPL (>= 3)\cr LazyLoad: \tab yes\cr }
Implements a number of functions for reading GNU Parallel results dir.
}
\examples{
library(gnuparallel)
system("parallel --header : --results foobar printf out{1}\\\\\\\\tout{2}\\\\\\\\nline2{1}\\\\\\\\t{2}\\\\\\\\n ::: letters a b c ::: numbers 4 5 6")
fn <- gnu.parallel.filenames("foobar")
gnu.parallel.load(fn)
gnu.parallel.load.lines(fn)
gnu.parallel.load.data.frame(fn)
gnu.parallel.load.data.table(fn)
heckbert(8.1, 14.1, 4) # 5 10 15
# When plotting, extend the plot range to include the labeling
# Should probably have a helper function to make this easier
data(iris)
x <- iris$Sepal.Width
y <- iris$Sepal.Length
xl <- heckbert(min(x), max(x), 6)
yl <- heckbert(min(y), max(y), 6)
plot(x, y,
xlim=c(min(x,xl),max(x,xl)),
ylim=c(min(y,yl),max(y,yl)),
axes=FALSE, main="Extended labeling")
axis(1, at=xl)
axis(2, at=yl)
}
\author{
Ole Tange \email{tange@gnu.org}, David Rosenberg
}
\references{
Tange, O. (2011) GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.
}
\seealso{
\code{\link{gnu.parallel.filenames}}, \code{\link{gnu.parallel.load}},
\code{\link{gnu.parallel.load.data.frame}}, \code{\link{gnu.parallel.load.data.table}},
\code{\link{gnu.parallel.load.lines}}
}
\keyword{parallel}

View file

@ -1649,11 +1649,12 @@ sub progress {
if($opt::bar) {
my $arg = $Global::newest_job ?
$Global::newest_job->{'commandline'}->simple_replace_placeholders($Global::replace{"{}"}) : "";
my $bar = sprintf("%d%% %ds %s", $pctcomplete*100, $this_eta, $arg);
my $bar_text = sprintf("%d%% %d:%d=%ds %s",
$pctcomplete*100, $completed, $left, $this_eta, $arg);
my $rev = '';
my $reset = '';
my $terminal_width = terminal_columns();
my $s = sprintf("%-${terminal_width}s",$bar);
my $s = sprintf("%-${terminal_width}s",$bar_text);
my $width = int($terminal_width * $pctcomplete);
$s =~ s/^(.{$width})/$1$reset/;
$s = "\r# ".int($this_eta)." sec $arg" . "\r". $pctcomplete*100 # Prefix with zenity header

View file

@ -124,7 +124,7 @@
.\" ========================================================================
.\"
.IX Title "PARALLEL_TUTORIAL 1"
.TH PARALLEL_TUTORIAL 1 "2014-01-14" "20140122" "parallel"
.TH PARALLEL_TUTORIAL 1 "2014-01-25" "20140122" "parallel"
.\" For nroff, turn off justification. Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
@ -2213,6 +2213,7 @@ job by matching the header with \-\-header. If headers start with \f(CW%:\fR
Output (the order may be different):
.PP
.Vb 10
\& JOB1
\& %head1
\& %head2
\& 1

View file

@ -1528,6 +1528,7 @@ job by matching the header with --header. If headers start with %:</p>
cat num_%header | parallel --header '(%.*\n)*' --pipe -N3 echo JOB{#}\;cat</pre>
<p>Output (the order may be different):</p>
<pre>
JOB1
%head1
%head2
1

Binary file not shown.

View file

@ -1650,6 +1650,7 @@ job by matching the header with --header. If headers start with %:
Output (the order may be different):
JOB1
%head1
%head2
1