The Top Column-Oriented Databases (Updated 2024)
Updated Feb 11, 2024
This is a list of the top commercial, financial and open source column-oriented / tick databases available.
Businesses are realizing a one size fits all isn't working for databases. With the increasing acceptance and widespread adoption of alternative data storage systems such as NOSQL, column-oriented databases now receive more attention and a number of major vendors have started to provide columnar storage as a value add to their existing databases.
Contents
Open Source Column-Oriented Databases
The very early 1993-2007 databases were based on works of research groups that later saw commercial spinoffs.
2010+ saw the arrival of a new wave of open source column databases typically used by web companies to storing and analysing user data.
Product | Vendor (release year) | Description | Score | License |
---|---|---|---|---|
DuckDB | DuckDB Foundation
2018 |
An embeddable, in-process, column-oriented SQL OLAP RDBMS. OLAP version of sqlite. | 8 | MIT License |
Clickhouse | Started at yandex
(wp)
2016 |
Very fast OLAP database with cloud version available. Started 10 years ago at Yandex to store the russian equivalent of google analytics. Open sourced in 2016. Commercialization began shortly after with some of the original russian developers moving to US to form company for cloud offering. | 8 | Apache License 2.0 |
Doris | Started at Baidu
(wp)
2017 |
Very fast OLAP database with cloud version available. Started at Baidu 9Chinese Google). Open sourced in 2017. | ? | Apache License 2.0 |
InfluxDB | (wp)
2013 |
Originally built by startup for monitoring and alerting. Now specializing in time-series analysis and IoT. Provides an SQL-like language. | 7 | MIT License |
Druid | Started at metamarkets
(wp)
2011 |
A distributed data store written in Java. Druid is designed to quickly ingest massive quantities of event data, and provide aggregated queries ontop. Historically it was only designed to store data in aggregate but increasingly has expanded to support full granularity. | 7 | Apache License 2.0 |
LucidDB | Was a research project.
(wp)
2007 |
An open source project that DynamoBI attempted to commercialise but never really took off. Part java, part C++, only limited connectivity options are available but the architecture is clearly documented and looks good. | 2 | Apache License |
C-Store | University: Brown/Brandeis/MIT
(wp)
2006 |
An early open source column-oriented database produced as a joint research project optimized for reads. Mike Stonebraker from MIT moved on from c-store to commercialise vertica. | 2 | |
MonetDB | Research Centre based in the Netherlands
(wp)
1993 |
An early pioneering column data store whos technology has been imitated by others and directly lead to the actian/vectorwise commercial product. Extremely fast column-oriented database that can handle large amounts of data, however it's basis as a research project shows through in some frustrating aspects (areas of little research value can have outstanding issues for months). | 0 | Mozilla Public License 1.1 |
Benchmarks
As you can see, for certain queries, column-oriented databases are 100s of times faster.
Results reproduced from Mark Litwintschik's excellent article.
Setup | Total Query Time (lower = better) | Note |
---|---|---|
kdb+/q & 4 Intel Xeon Phi 7210 CPUs | 1.04 | |
ClickHouse, 3 x c5d.9xlarge cluster | 4.06 | |
Clickhouse on DoubleCloud, s1-c32-m128 | 5.77 | |
Redshift, 6-node ds2.8xlarge cluster | 8.03 | |
Vertica, Intel Core i5 4670K | 147.30 | |
Spark 1.6, 5-node m3.xlarge cluster w/ S3 | 2158.00 | NOT column oriented. |
SQLite 3, Parquet & HDFS | 6342.00 | NOT column oriented. |
Column Database Benchmarks
Clickbench results:
System & Machine | Relative time (lower is better) | Note |
---|---|---|
ClickHouse (c6a.metal, 500gb gp2): | ×1.59 | |
SelectDB (c6a.metal, 500gb gp2): | ×1.88 | |
ClickHouse (m5d.24xlarge): | ×2.15 | |
StarRocks (c6a.metal, 500gb gp2): | ×2.16 | |
Redshift (4×ra3.16xlarge): | ×2.20 | |
DuckDB (c6a.metal, 500gb gp2): | ×2.74 | |
MariaDB ColumnStore (c6a.4xlarge, 500gb gp2)†: | ×59.27 | |
Druid (c6a.4xlarge, 500gb gp2)†: | ×150.50 | |
PostgreSQL (c6a.4xlarge, 500gb gp2): | ×883.89 | NOT column oriented |
Financial Tick Databases
Product | Vendor (release year) | Description |
---|---|---|
kdb+ | KX
(wp)
1998 |
An early column-oriented database that has proven itself fast and capable of holding massive amounts of data, widely used in the finance industry. Provides it's own language vector based language q and offers a variant of sql specialised for order/time series based queries. A unique conciseness and consistency compared to other more monolithic databases as it was mostly created by one man, Arthur Whitney. |
One Tick Database | Onetick
2005 |
Column/Row oriented database targeted at the financial sector and specialised for tick data, created by Leonid Frants that had built a tick solution while at Goldman Sachs. |
eXtremeDB | McObject
2001 |
A fast embedded, mostly in-memory database targeted for financial firms and time series data. It's raw API and ability to be embedded within a process makes it fast, however this means a higher configuration cost and learning curve to get started. |
Commercial Column-Oriented Database Vendors
Product | Vendor (release year) | Description | Column-Oriented*Not all column-oriented databases can be considered equal, there are in fact differing levels, of how column-orented a database is depending on how
|
Grid Framework | Compression | Download |
---|---|---|---|---|---|---|
SingleStore | SingleStore
2012 |
Mixed database that tries to perform for both transactional and analytics queries. | Yes | Share-Nothing Scaleout | Yes | Cloud Trial |
InfiniDB | Calpont
(wp)
2000 |
MySQL compatible warehouse columnar engine that is multi-terabyte capable. | Yes | Share-Nothing Scaleout | Yes | Community Edition (single node limit) |
Greenplum | GoPivotal
(wp)
2003 |
Hybrid Column/Row oriented database based on postgreSQL with many enhancements to allow efficient parallel execution over multiple machines. | Medium | Shared Nothing MPP Architecture | YesAppend only tables. Supports zlib, quickLZ and Run Length Encoding | Trial Version |
Teradata Database | Teradata
(wp)
1979 |
One of the most longest established and largest suppliers of column-oriented databases with a full supporting stack of associated software. Continues to innovate and recently purchased kickfire a column-oriented database that used FPGA to accelerate SQL queries. | Medium | Share Nothing | YesAutomatically chooses from among six types of compression: run length, dictionary, trim, delta on mean, null and UTF8. based on the column demographics. | Express Edition Size limits vary by platform |
Vectorwise/Paraccel | Actian
(wp)
2008 |
Modern "Database architected for the new bottleneck: Memory Access." Based on research around the open source monetDB and the X100 project including efficient memory handling and vectorized query execution (SIMD). Consistently scores highly in the TPC-H benchmarks. | Hybrid | - | YesDictionary for strings, Proprietary speedy compression of numeric data. | 30 day Trial requires signup |
Sybase IQ | SAP
(wp)
1994 |
Mature column-oriented database by one of the first commercial vendors that has many deployments (2000+) and good tooling support. It may be showing it's age as I've heard reports it can struggle to handle very large amounts of data or be slower than newer entrants, however this is hearsay and Sybase version history shows a good ract record of feature updates. More details are available here. | High | Shared-Disk Architecture | YesToken/Dictionary | Express Edition 5GB Limit |
Vertica | HP
(wp)
2005 |
A modern parallel column-oriented database designed to run on multiple commodity servers. Co-founded by database researcher Michael Stonebraker based on previous open-source / academic work on c-store. More on the vertica architecture can be found here. | Yes | Shared-Nothing | YesLZO, Run Length Encoding, Delta | Community Edition 3 Node / 1 TB / Feature Limits. |
The major benchmark for analytical queries amongst these vendors is the TPC-H decision support database benchmark , you can download the benchmark and view past results. Vendors not listed that may be added to the table later include: Exasol, MS SQL Server ColumnStore, Infobright, IBM DB2.