In this site, I propose that Java can be a good language for data science and analytics.

With a tool-chain that includes Hadoop, HDFS, etc., Java is very popular for big-data analytics. However, it is not widely used on smaller datasets, for quick analytics, exploration, or data massaging.

Here we focus on exactly that: Using java for datasets of ten to tens-of-millions of observations. In spite of the hype around  big-data, most datasets are of this scale. And you don’t want to setup HDFS to generate a simple histogram.

The primary tool used here is Outlier, which I developed as a Java alternative to data frames in R, Julia, or Pandas. Outlier lets you bend and twist datasets until they do what you want. It’s being integrated with advanced statistical tools for regression, classification, machine learning, and (someday) deep learning, so that you can go seamlessly from exploration to prediction.

Developing Outlier is great fun for me. I hope you have fun using it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s