New Plot Types in Tablesaw

In a prior post, I showed how to create some native Java scatter plots and a quantile plot in Tablesaw. Since then, I’ve added a few more plot types.

When it comes to plotting, Tablesaw integrates other libraries and tries to make their use as consistent as possible. Like the earlier scatter plots, this line chart is rendered using XChart under the covers:boston_robberiesThe dramatic increase in armed robberies is shown by plotting the sorted data against its in sequence.  The code looks like this:

Table baseball = Table.createFromCsv("data/boston-robberies.csv");
NumericColumn x = baseball.nCol("Record");
NumericColumn y = baseball.nCol("Robberies");
Line.show("Monthly Boston Armed Robberies Jan. 1966 - Oct. 1975", x, y);

Histograms are a must have. We use the plotting capabilities of the Smile machine learning library to create the one below. batting_histogram

Although they’re from different libraries, the Tablesaw API is similar:

Table baseball = Table.createFromCsv("data/baseball.csv");
NumericColumn x = baseball.nCol("BA");
Histogram.show("Distribution of team batting averages", x);

This is currently the only Smile plot we’re using, but there’s more to come. Heatmaps, Contour plots and QQ plots are coming soon. We’re also starting to integrate Smile’s machine learning capabilities, which will be a huge step forward for Tablesaw.

Bar plots are unglamorous, but very useful. Tablesaw can produce both horizontal and vertical bar plots, and also creates Pareto charts directly as a convenience. They’re all based on the JavaFx chart library, and like the other Tablesaw plots, they’re rendered in Swing windows. Here we show a Pareto chart of tornado fatalities by US state.

paretoThe code to produce this chart, including a filter to remove states with fewer than three fatalities is shown below. The grouping is done using the summarize method, which produces tabular summaries that can be passed directly to the plotting API.

Note the use of the #sum method. Any numerical summary supported by Tablesaw (standard deviation, median, sumOfLogs, etc.) can be substituted for easy plotting.

Table table = Table.createFromCsv("data/tornadoes_1950-2014.csv");
table = table.selectWhere(column("Fatalities").isGreaterThan(3));
Pareto.show("Tornado Fatalities by State", 
    table.summarize("fatalities", sum).by("State"));

As you can see, loading from a CSV, filtering the data, grouping, summing, sorting, and plotting is all done in three lines of code.

Finally, we have a BoxPlot.

tornado_boxplot

For Boxplots, the groups are formed using Table’s splitOn() method, or simply by passing the names of the summary and grouping columns along with the Table:

Table table = Table.createFromCsv("data/tornadoes_1950-2014.csv");
Box.show("Tornado Injuries by Scale", table, "injuries", "scale");

I hope you’ll find Tablesaw useful for your data analytics work.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s