New Plot Types in Tablesaw

In a prior post, I showed how to create some native Java scatter plots and a quantile plot in Tablesaw. Since then, I’ve added a few more plot types.

When it comes to plotting, Tablesaw integrates other libraries and tries to make their use as consistent as possible. Like the earlier scatter plots, this line chart is rendered using XChart under the covers:boston_robberiesThe dramatic increase in armed robberies is shown by plotting the sorted data against its in sequence.  The code looks like this:

Table baseball = Table.createFromCsv("data/boston-robberies.csv");
NumericColumn x = baseball.nCol("Record");
NumericColumn y = baseball.nCol("Robberies");
Line.show("Monthly Boston Armed Robberies Jan. 1966 - Oct. 1975", x, y);

Histograms are a must have. We use the plotting capabilities of the Smile machine learning library to create the one below. batting_histogram

Although they’re from different libraries, the Tablesaw API is similar:

Table baseball = Table.createFromCsv("data/baseball.csv");
NumericColumn x = baseball.nCol("BA");
Histogram.show("Distribution of team batting averages", x);

This is currently the only Smile plot we’re using, but there’s more to come. Heatmaps, Contour plots and QQ plots are coming soon. We’re also starting to integrate Smile’s machine learning capabilities, which will be a huge step forward for Tablesaw.

Bar plots are unglamorous, but very useful. Tablesaw can produce both horizontal and vertical bar plots, and also creates Pareto charts directly as a convenience. They’re all based on the JavaFx chart library, and like the other Tablesaw plots, they’re rendered in Swing windows. Here we show a Pareto chart of tornado fatalities by US state.

paretoThe code to produce this chart, including a filter to remove states with fewer than three fatalities is shown below. The grouping is done using the summarize method, which produces tabular summaries that can be passed directly to the plotting API.

Note the use of the #sum method. Any numerical summary supported by Tablesaw (standard deviation, median, sumOfLogs, etc.) can be substituted for easy plotting.

Table table = Table.createFromCsv("data/tornadoes_1950-2014.csv");
table = table.selectWhere(column("Fatalities").isGreaterThan(3));
Pareto.show("Tornado Fatalities by State", 
    table.summarize("fatalities", sum).by("State"));

As you can see, loading from a CSV, filtering the data, grouping, summing, sorting, and plotting is all done in three lines of code.

Finally, we have a BoxPlot.

tornado_boxplot

For Boxplots, the groups are formed using Table’s splitOn() method, or simply by passing the names of the summary and grouping columns along with the Table:

Table table = Table.createFromCsv("data/tornadoes_1950-2014.csv");
Box.show("Tornado Injuries by Scale", table, "injuries", "scale");

I hope you’ll find Tablesaw useful for your data analytics work.

 

Tablesaw gets Graphic

Today we introduced the first elements of what will be Tablesaw’s support for exploratory data visualization in pure Java. As Tablesaw expands its scope to integrate statistical and machine learning capabilities, this kind of visualization will be critical.tornadosThis slightly ghostly US map image was created by as a simple scatter plot of the starting latitude and longitude for every US tornado between 1950 and 2014. The code below loads the data, filters out missing records, and renders the plot:

Table tornado = Table.createFromCsv("data/tornadoes_1950-2014.csv");

tornado = tornado.selectWhere(
    both(column("Start Lat").isGreaterThan(0f),
         column("Scale").isGreaterThanOrEqualTo(0)));

Scatter.show("US Tornados 1950-2014",
    tornado.numericColumn("Start Lon"),
    tornado.numericColumn("Start Lat"));

These plots provide visual feedback to the analyst while she’s working. They’re for discovery, rather than for presentation, and ease of use is stressed over beauty. Behind the scenes, the charts are created with Tim Molter’s awesome XChart library:  https://github.com/timmolter/XChart.

The following chart is taken from a baseball data set. It shows how to split a table on the values of one or more columns, producing a series for each group. In this case, we color the mark differently if the team made the playoffs. winsByYear

Here’s the code:

Table baseball = Table.createFromCsv("data/baseball.csv");
Scatter.show("Regular season wins by year",
    baseball.numericColumn("W"),
    baseball.numericColumn("Year"),
    baseball.splitOn(baseball.column("Playoffs")));

A chart that looks like a scatter plot and works like a histogram is a Quantile Plot. The plot below presents the distribution of public opinion poll ratings for one US president.

bush_quantiles

This chart was build using the Quantile class:

String title = "Quantiles: George W. Bush (Feb. 2001 - Feb. 2004)";
Quantile.show(title, bush.numericColumn("approval"));

Further down the line, I expect to add JavaScript plot support based on D3. These plots will be focused more on presentation, especially Web-based presentation, as Tablesaw becomes a complete platform for data science.