In this site, I propose that Java can be a good language for data science and analytics.
With a tool-chain that includes Hadoop, HDFS, etc., Java is very popular for big-data analytics. However, it is not widely used on smaller datasets, for quick analytics, exploration, or data massaging.
Here we focus on exactly that: Using java for datasets of ten to tens-of-millions of observations. In spite of the hype around big-data, most datasets are of this scale. And you don’t want to setup HDFS to generate a simple histogram.
The primary tool used here is Outlier, which I developed as a Java alternative to data frames in R, Julia, or Pandas. Outlier lets you bend and twist datasets until they do what you want. It’s being integrated with advanced statistical tools for regression, classification, machine learning, and (someday) deep learning, so that you can go seamlessly from exploration to prediction.
Developing Outlier is great fun for me. I hope you have fun using it.