What is a Column-Oriented Database

Increasingly businesses are realizing a one size fits all isn't working for databases. When you want to analyse terabytes of data, with analytical queries that span 1000's of rows, column-oriented databases can provide a 100x speedup. Particularly for time-series data, column-oriented databases have been successfully used to deliver fast storage and analysis.

What is a column-oriented database?
Why columns not rows?
Database Vendors

What is a column-oriented database?

Data Storage in a Column-Oriented Database

A column-oriented database stores each column continuously. i.e. on disk or in-memory each column on the left will be stored in sequential blocks.

For analytical queries that perform aggregate operations over a small number of columns retrieving data in this format is extremely fast. As PC storage is optimized for block access, by storing the data beside each other we exploit locality of reference. On hard disk drives this is partiularly important which due to their performance characteristics provide optimal performace for sequential access.

Why columns not rows?

Look at the above image, now imagine which areas need read when you perform a query like "average price" for all dates. In row-oriented databases we have to read over large areas, in column-oriented databases the prices are stored as one sequential region and we can read just that region. Column-oriented databases are therefore extremely quick at aggregate queries (sum, average, min, max, etc.).

Why are most databases row-oriented? I hear you ask. Imagine we want to add one row somewhere in the middle of our data for 2011-02-26, on the row oriented database no problem, column oriented we will have to move almost all the data! Lucky since we mostly deal with time series new data only appends to the end of our table.

Column-Oriented Database vs Row Oriented Database

Operation	Column-Oriented Database	Row-Oriented Database
Aggregate Calulation of Single Column e.g. sum(price)	fast	slow
Compression	Higher. As stores similar data together	-
Retrieval of a few columns from a table with many columns	Faster	has to skip over unnecessary data
Insertion/Updating of single new record	Slow	Fast
Retrieval of a single record	Slow	Fast

Database Vendors

The major column-oriented databases include:

MonetDB (open source)
C-Store (open source)
Teradata
Vectorwise/Paraccel
Sybase IQ
...

What is a Column-Oriented Database

Contents