parallel/src/optional/python
2012-12-21 20:01:23 -08:00
..
gnuparallel optional: Python loader works with new --results format. 2012-12-21 20:01:23 -08:00
tests optional: Python loader works with new --results format. 2012-12-21 20:01:23 -08:00
.gitignore add gnuparallel python package. 2012-10-09 14:09:17 -07:00
README optional: Python loader works with new --results format. 2012-12-21 20:01:23 -08:00
setup.py add gnuparallel python package. 2012-10-09 14:09:17 -07:00

gnuparallel : Simple loading of GNU parallel result files.

The gnuparallel package provides a single function, `load`, which
loads results from files generated by GNU parallel into a Pandas
DataFrame object. See `help(gnuparallel.load)` for details.

Installation:

    # python setup.py install
    # aptitude install python-pandas

Sample usage:

1. Generate some results files by running parallel from the command line:

    # mkdir out
    # parallel --header : --results out echo {arg1} {arg2} ::: arg1 1 2 ::: arg2 three four

2. Load the results using the gnuparallel Python package:

    # python
    Python 2.7.3 (default, Apr 24 2012, 00:00:54) 
    [GCC 4.7.0 20120414 (prerelease)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import gnuparallel
    >>> help(gnuparallel.load)
    >>> my_df = gnuparallel.load('out')
    >>> my_df
      _stream  arg1    arg2                       resfile
    0  stdout     2   three  out/arg1/2/arg2/three/stdout
    1  stdout     2    four   out/arg1/2/arg2/four/stdout
    2  stdout     1   three  out/arg1/1/arg2/three/stdout
    3  stdout     1    four   out/arg1/1/arg2/four/stdout

See documentation for the pandas project (http://pandas.pydata.org/) for
instructions on how to access and manipulate the loaded results.