1 millisecond selects with 1/2 billion rows

Correction: There was a bug in the testHow embarrassing.

It really took about 20 ms to retrieve the 500 or so ‘hits’ out of the unsorted table. With some performance fixes, it’s now down to ~2 ms per request.  1 ms to go.

cheetah

I started running tests tonight on the largest data set I’ve used to date. This new test searches a medical records table with 4 columns (lab name, lab value, date, and patientId) and 500,000,000 rows. The CSV file that held all this data used 35 GB of disk.

The test was performed on a Macbook pro with one 4-core CPU and 16 GB of RAM.

The first bit of goodness was that loading the data from disk was reasonably performant.

Loaded 500,000,000 records from column storage in 174 seconds

On an un-indexed column, it took about 8 1/2 seconds to execute two queries, each returning about 5,000 records.  The code for the queries looked like this:

Table result = t.selectWhere(column("lab").isEqualTo(randomLab1));

and the timings:

lab found in 5317349 micros
lab found in 3020043 micros

Next I created an index on the patient id column and ran some queries on that. Each of these returns about 500 records. Creating the index can be done in a background thread.

total retrieval time 988 micros
total retrieval time 668 micros

Those were the original (and incorrect) results. After numerous tweaks, we’re now at:

total retrieval time 2085 micros
total retrieval time 1973 micros

One thing to keep in mind with these results is that when you’re measuring in low milliseconds, little things (like a minor garbage collection) can skew individual results. Which is to say ‘your mileage may vary’.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s