New: Load data from any RDBMS

tedCodd
Ted Codd

As of today, you can easily import into Tablesaw from any data source with a JDBC driver. Meaning, pretty much every relational database. Meaning, we are now fully compliant with the 1970s, and with Ted Codd, who I’ve stolen many ideas from. Now I’m repaying Ted by putting his photo in this post.

Thank you, Ted.

To use this feature, you write standard Java/JDBC client code, execute a query, and pass the returned ResultSet into a static create() method on Table.  There’s a simple example below.

So bring on your databases.

 

String DB_URL = "jdbc:derby:CoffeeDB;create=true";
Connection conn = DriverManager.getConnection(DB_URL);

Table customer = null; 
try (Statement stmt = conn.createStatement()) {
  String sql = "SELECT * FROM Customer";
  try (ResultSet results = stmt.executeQuery(sql)) {
    customer = Table.create(results, "Customer");
  }
}

 

Tablesaw performance: first results

In an earlier post, I compared Outlier performance importing data from a CSV against published data with Pandas and Python. In Python it took 3,047 seconds (50 minutes) to load the 8 million rows of data.  Outlier loaded 10 million rows of the same data in 2 minutes, or “25% more data, 25 times faster”.

Tablesaw loads the larger dataset from a CSV in 79 seconds: 25% more data, 38 times faster.  Better still, that data can be saved in Tablesaw format in 1 second.  Subsequent reads now take 3 seconds, or 1,015 times faster than in the original Python data.