In an earlier post, I compared Outlier performance importing data from a CSV against published data with Pandas and Python. In Python it took 3,047 seconds (50 minutes) to load the 8 million rows of data. Outlier loaded 10 million rows of the same data in 2 minutes, or “25% more data, 25 times faster”.
Tablesaw loads the larger dataset from a CSV in 79 seconds: 25% more data, 38 times faster. Better still, that data can be saved in Tablesaw format in 1 second. Subsequent reads now take 3 seconds, or 1,015 times faster than in the original Python data.