Skip to content

Instantly share code, notes, and snippets.

@run4flat
Created June 26, 2012 20:54
Show Gist options
  • Save run4flat/2998889 to your computer and use it in GitHub Desktop.
Save run4flat/2998889 to your computer and use it in GitHub Desktop.
revised P::G::Prima::plot() calling convention
use strict;
use warnings;
use PDL;
my $x = sequence(100)/10;
my $y = $x/2 - 3 + $x->grandom*3;
my $y_err = 2*$x->grandom->abs + 1;
use PDL::Graphics::Prima::Simple;
# Option 1
# dataset *names* end with "_" and a letter indicating the data type
plot(
-linear_p => [
$x,
$y,
diamonds => {filled => 'yes'},
error_bars => {y_err => $y_err},
trend_lines => {
weights => $y_err,
lineWidths => 2,
colors => pdl(255, 0, 0)->rgb_to_color,
},
],
-expected_f => [
sub { $_[0] / 2 - 3 },
lines => {},
],
);
# Option 2 (old option)
plot(
-data => [ pair =>
$x,
$y,
plotTypes => [
[diamonds => filled => 'yes'],
[error_bars => y_err => $y_err],
[trend_lines =>
weights => $y_err,
lineWidths => 2,
colors => pdl(255, 0, 0)->rgb_to_color,
],
],
],
);

Mission Statement: Dynamic Data Analysis

These are my notes and reflections on some ideas recently suggested by Joel on how to make the interface to plot() easier and more Perlish. In the course of writing up this document, I have come to the conclusion that PDL::Graphics::Prima is, first and foremost, a plotting library targeting dynamic data analysis. If we can write nice static plotting wrappers around the basic bindings (as PDL::Graphics::Prima::Simple attempts to do), that's great.

Eventually, I want to add a "Properties" menu item to the right-click menu, one which would allow you to change the title, specify details about the axes (including handling multiple axes automatically), and manipulate the dataSets and their PlotTypes. It is through this lens that I evaluate API ideas.

Syntax Goals

The current form of the constructor and plot() command is meant to parallel the api through which you interact with the plot widget. For example, the current way to indicate x and y plot labels is like so:

plot(
    ...
    x => {
        label => 'Distance (m)',
    },
    y => {
        label => 'Height (m)',
    },
    title => 'Ballistic Trajectory',
    ...
);

However, I could equally have chosen to specify the x- and y- axis labels with a labels key:

plot(
    ...
    labels => {
        x => 'Distance (m)',
        y => 'Height (m)',
    }
    title => 'Ballistic Trajectory',
    ...
);

The reason that I chose the former rather than the latter is because it emphasizes that the label is a property of the axis, not the plot. Later, if you want to change the x-label when working with an interactive plot widget, it will come as no surprise that you do this by saying

$plot_widget->x->label('Time (s)');

With the second form, you would expect to be able to do something like this:

$plot_widget->labels(x => 'Time (s)');

So, the goal of the plot syntax is to reflect the actual API used later when manipulating the widget. For the case of axis labels, I actually see no reason not to support both ideas. Consider this to be a low-priority planned feature. :-)

I am open to improving the calling convention, or even altering the underlying structure of the plot widget. I am even open to breaking backwards compatibility, at least while the plotting library is young.

Declaring PlotTypes

Discussions with Joel got me thinking about plot types and how to make it easier to declare them. Compare the current means for declaring plotTypes:

# Single plot type
plot(
    -data => ds::Pair($x, $y,
        plotType => ppair::Lines(thread_like => 'points')
    ),
);
# Multiple plot types
plot(
    -data => ds::Pair($x, $y,
        plotTypes => [
            ppair::Lines(thread_like => 'points'),
            ppair::Diamonds(filled => 0),
            ppair::Histogram,
        ],
    ),
);

with this tentative declaration:

# Single plot type
plot(
    -data => ds::Pair($x, $y,
        lines => { thread_like => 'points' }
    ),
);
# Multiple plot types
plot(
    -data => ds::Pair($x, $y,
        lines     => { thread_like => 'points' },
        diamonds  => { filled => 0 },
        histogram => {},
    ),
);

It's so much simpler! Also, there's no reason one couldn't create a plotType outside the plot for later manipulation with

# Multiple plot types
my $hist_plot_type = ppair::Histogram;
plot(
    -data => ds::Pair($x, $y,
        lines     => { thread_like => 'points' },
        diamonds  => { filled => 0 },
        $hist_plot_type,
    ),
);

And, there's no reason one couldn't specify multiple plotTypes of the same "key" so long as the arguments to the dataSet are processed properly.

Such a nice, simplifying idea! :-)

Declaring DataSets

The current means for specifying a dataSet is to use the short-form constructors:

plot(
    -data => ds::Pair($x, $y, plotType => ppair::Lines),
    ... other options here ...
);

Why, Joel has asked, must one provide an arbitrary name to every dataSet? First, the name is a form of self-documentation. Second, it provides a means to specify the plot order of the dataSets (alphabetic). Third, it allows for a simple way to get back to the dataSets for later manipulation, namely through the tied-hash that comes back when you call the dataSets method on the widget. For example, here is a way to access the function dataSet:

my $func_ds = $plot_widget->dataSets->{function};

At the moment, there are no methods to manipulate a dataSet (such as get the list of PlotTypes or change the data), but that is fairly trivial to add and is planned. Also, using the tied hash interface you can add new dataSets by simply creating a new key in the tied hash:

$plot_widget->dataSets->{new_data} = ds::Pair(...);

I decided to go with the tied hash interface because it makes programmatic manipulation of the dataSets easy and self-documenting. If I wanted to change the color of the lab_results dataset, I simply say:

$plot_widget->dataSets->{lab_results}->color($new_color);  # not yet implemented

So, the declaration of the dataSets mostly reflects the API of the widget itself. It would completely reflect that API if the plot command had a key called dataSets that accepted an anonymous hash with name => ds::Thing pairs, but that seemed unnecessary and overly indented.

Tentative DataSet Declaration

The problem, as pointed out, is that names as tags in the constructor are somewhat strange. The lack of a dash in front of the dataSet accessor is an inconsistency (albeit a fixable one). The whole -name => con::Structor() approach is a bit awkward and leads to lots of typing. Currently, if your forget the dash I believe that nothing happens, i.e. no warnings are issued (though I need to check that), which means that creating dataSets is error prone.

Joel has argued that the Perlish way to get at a dataSet for later manipulation is to declare it ahead-of-time, like so:

my $data = ds::Pair($x, $y);
my $plot_widget = $main_windows->insert(Plot =>
    -data => $data,
    ...
);

# Later...
$data->do_something;

This sort of approach lets us dispense with the dataSet names altogether: if the programmer wants to get at the dataSet at a later time, they only need to create a variable pointing to them. All other dataSets can be anonymous, like so:

my $data = ds::Pair($x, $y);
my $plot_widget = $main_windows->insert(Plot =>
    $data,
    [$x, $y, pair => options, ...],
    ...
);

Anonymous Arrays as DataSet Declarations is Hard

In the last code snippet, directly above, I declare a dataSet by simply using an anonymous array with the arguments. On might expect that it's possible to differentiate between different dataSets by examining the contents of the interior of that anonymous array. Unfortunately, this is quite difficult, if not impossible, to do in the general case.

First, there are currently three different dataSets that expect one piddle: ds::Set expects a distribution where the first dimension is the size of the populations and higher dimensions are threaded, ds::Matrix expects an image of scalar values where the first two dimensions are width and height and higher dimensions are threaded (a bit awkwardly, unfortunately), and ds::Image expects an image where the first dimension is the colorspace (3 for rgb) and the second and third are the width and height and the fourth and higher dimensions are threaded. They all expect to get different shapes of data but it is impossible to know which is which just by looking at the dimensions. One could hope to examine the other contents of the dataSet. For example, x/y boundaries are required for image and matrix dataSets but not for set dataSets, and I could use the ds::Image type if the first dimension is a small number. But, what if I concoct another dataSet that utilizes a single piddle? How then do I differentiate? For example, I have given thought to creating a two-color matrix plot in which (say) the intensity in the red indicates the population size and the intensity in blue indicates annual rainfall. Phoenix is dark red. Rural Washington state is dark blue. Seattle is dark purple. A desert is black or white depending on the details of the color scaling. If I use two piddles, then I have to find a way to differentiate this dataSet from the pairwise dataSet, which may be possible because it'll require x/y boundaries. But what next? Using anonymous arrays substantially limits the library's ability to grow and add new dataSets. The DWIMmery gets complicated very fast.

Note that a wrapper library does not suffer this limitation. A wrapper library could simply declare that it only supports certain dataSets and explain the criterion for making a dataSet choice. The DWIMmery is limited and probably works for 95% of the use cases. Where it fails, the user can fall back on the full plotting library.

That, however, leads to a pedagogical issue: if there is a more sugary wrapper that uses different mental constructs and the user finds themselves unable to fit their plot into those constructs, it's that much harder for the user to jump to the general plot command. In general, I'm happy with any development using PDL::Graphics::Prima (and there should be more than one way to do it, right?), but I would want any derived libraries to either be very similar to P::G::Prima or substantially easier to use.

Fixing DataSets Declared with Anonymous Arrays

If I want to incorporate the anonymous array declaration into PDL::Graphics::Prima, one solution would be for the user has to somehow specify the dataSet type for anonymous dataSet:

my $data = ds::Pair($x, $y);

# Idea 1 my $plot_widget = $main_windows->insert(Plot => $data, pair => [$x, $y, option => values], ... );

# Idea 2
my $plot_widget = $main_windows->insert(Plot =>
    $data,
    [$x, $y, option => values, type => 'pair'],
    ...
);

I personally prefer Idea 1 over Idea 2 because it's shorter. Also, the second form suggests that there should be a type method for a dataSet that would allow you to change it's type, which is nonsensical.

As to the issues raised above, the order would be the order in which the dataSets are specified (instead of alphabetic). Alternatively, one could have an optional layer or z_order option. To retrieve and manipulate the dataSets, one could define the dataSets ahead of time (as demonstrated), or the dataSets method could return a tied array of dataSets that could be iterated. Actually, the original dataSet interface used a tied array instead of a tied hash, but it made programmatic manipulation without variables error prone. For example, you could start with this:

my $plot_widget = $main_windows->insert(Plot =>
    func => [$x->minmax, \&my_sub],
    pair => [$x, $y],
    ...
);

# Later, change the color of the pairwise data:
$plot_widget->dataSets->[1]->color($new_color);

But what if you change the plot widget declaration and insert a dataSet between the function and the pairwise data? This leads to an irritating bug, easily solved but easily re-created later. Also, inserting a dataSet in-between two other dataSets involves a use of splice, a command that I do not much use and would rather avoid. Under this proposal, the best solution is to create the dataSet outside the plot constructor. If a programmer wanted the self-documenting names for an unspecified number of dataSets, they could emulate the hash dataSet retrieval by using their own hash like so:

my %dataSets = (
    data => ds::Pair($x, $y),
    func => ds::Func($x->minmax, sub { $_[0]->do_something },
);

my $plot_widget = $main_windows->insert(Plot =>
    values %dataSets,
    ...
);

# Later...
$dataSets{data}->color($new_color);

The only drawback here is if you add a new dataSet to the plot widget, you have to also add the dataSet to the hash.

In practice, I find that I don't often need to programatically change the order of the dataSets, so referring to them by number is probably fine. I just want to make sure that I have a sane programmatic interface for reasons discussed at the top of this file.

@jberger
Copy link

jberger commented Jul 1, 2012

Is the container holding the datasets really a tie-d hash? If so what magic is needed?

@run4flat
Copy link
Author

run4flat commented Jul 1, 2012

Yes, it is. See https://github.com/run4flat/PDL-Graphics-Prima/blob/master/lib/PDL/Graphics/Prima/DataSet.pm#L1138

The tied hash is most useful for the STORE operation because it automatically set's the dataSet's widget field to the widget that holds the dataSet container. This way, you can say something like

my $ds = ds::Pair($x, $y);
$plot_widget->dataSets->{new_data} = $ds;
say +($ds->widget != $plot_widget ? 'not ' : ''), 'ok';

$ds wouldn't know the plot widget to which it belongs unless you manually set it, which is a pessimisation easily solved by a tied hash. Also, there are all kinds of interface functions that I don't have to write by choosing to use the tied hash interface.

You might ask why dataSets need to know their parent widget. It is useful for cascading styles: https://github.com/run4flat/PDL-Graphics-Prima/blob/master/lib/PDL/Graphics/Prima/DataSet.pm#L105 and getting access to the axis structures such as here: https://github.com/run4flat/PDL-Graphics-Prima/blob/master/lib/PDL/Graphics/Prima/DataSet.pm#L517, and here: https://github.com/run4flat/PDL-Graphics-Prima/blob/master/lib/PDL/Graphics/Prima/DataSet.pm#L1054

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment