Tablesaw performance: first results

In an earlier post, I compared Outlier performance importing data from a CSV against published data with Pandas and Python. In Python it took 3,047 seconds (50 minutes) to load the 8 million rows of data.  Outlier loaded 10 million rows of the same data in 2 minutes, or “25% more data, 25 times faster”.

Tablesaw loads the larger dataset from a CSV in 79 seconds: 25% more data, 38 times faster.  Better still, that data can be saved in Tablesaw format in 1 second.  Subsequent reads now take 3 seconds, or 1,015 times faster than in the original Python data. 

2 thoughts on “Tablesaw performance: first results

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s