A Columnar DBMS Benchmark

MonetDB vs. Vertica vs. MariaDB Columnstore

As discussed in  a previous article where PostgreSQL was pit against HPE Vertica, the choice of database is critical to a good BI solution. This article will further dive into columnar databases with a comparison of HPE Vertica, MonetDB and -the still in alpha- MariaDB Column Store (MariaDBCS).


MonetDB was developed at the CWI in Amsterdam and released as open-source software in 2004.

Its architecture is comprised of 3 layers; a query interface front-end, the optimization mid-layer and the MonetDB Assembly Language (MAL) interpreter. It is aimed at maximum efficiency of hardware for large scale processing. MonetDB also has (limited)  support for clustering.

HPE Vertica

A cluster-based analytics platform acquired by HP Software in 2011.

Designed to manage large, fast-growing volumes of data with fast query performance. It offers compression and uses it along with sorting and encoding for ‘cost-effective loads and reads’. Vertica is designed to run in a cluster and as such the results of this test would be vastly different when using even bigger datasets and multiple nodes. For the purpose of this test though, Vertica is set up in a single node format.

MariaDB ColumnStore

The new analytics engine MariaDB ColumnStore is MariaDB’s effort to provide a product which is scalable, reliable and open-source with a hundred percent SQL compatibility. The engine itself runs under the familiar MariaDB interface and as such is backwards-compatible with components like previous JDBC drivers.

Running some tests

Testing is done on a VirtualBox running Ubuntu 14.04 ( Trusty Tahr), with 8GB Memory and 50GB storage. Each database will have full reign of the machine as we’ll have consecutive runs for testing. This dataset is inserted into a data warehouse running on each database using Pentaho Data Integration. It contains 20.520.383 records total.

  • fact_ratings
  • dim_movies
  • dim_links
  • dim_tags

The DDL statements for each table:

CREATE TABLE dim_movies (
      movieId DECIMAL(20)

, title VARCHAR(200)

, genres VARCHAR(120)


CREATE TABLE dim_links (
      movieId DECIMAL(25)

, imdbId DECIMAL(25)

, tmdbId DECIMAL(25)


CREATE TABLE dim_tags (  
       userId DECIMAL(20)

, movieId DECIMAL(20)

, tag VARCHAR(500)

, "timestamp" VARCHAR(50)


CREATE TABLE "fact_ratings"(
      "fact_id" BIGINT , "userId" DECIMAL

, "movieId" DECIMAL

, "rating" DOUBLE

, "timestamp" VARCHAR(40)


Keep in mind that all 3 of these databases will require modification of the create table statements above. 

Noticeable when testing MariaDBCS was that it did not agree with some data types and did not allow LOCAL statement when using LOAD DATA.

The DWH writing times for each database:

MonetDB HPE Vertica MariaDBCS
3m 23s 3m 49s 4m 13s

Each test will be a simple every-day-use query.

The first test is a simple average query: “SELECT AVG(fact_id) FROM fact_ratings;”

MonetDB HPE Vertica MariaDBCS
1.40s 0.39s 28.37s

Result: 10000131 MariaDBCS takes a very long time to calculate and display the average here while the other 2 complete the task under a few seconds.

The second test query is a count of all the ratings:

SELECT rating, count(*) from fact_ratings group by rating order by rating;”

MonetDB HPE Vertica MariaDBCS
1.22s 0.95ms 29.24s


rating count
0.5 239.125
1 680.732
1.5 279.252
2 1.430.997
2.5 883.398
3 4.291.193
3.5 2.200.156
4 5.561.926
4.5 1.534.824
5 2.898.660

The third test is somewhat more complex. The goal is to have the top 20 movies (of the table) and their average ratings. “select dim_movies.title, avg(fact_ratings.rating)

from dim_movies

join fact_ratings on dim_movies.movieId = fact_ratings.”movieId”

group by dim_movies.title DESC

order by dim_movies.title

limit 20;”

MonetDB HPE Vertica MariaDBCS
6.53s 14.94s 5m 22s


title average
"Great Performances" Cats (1998) 2.75
chicagoGirl: The Social Network Takes on a Dictator (2013) 3.67
$ (Dollars) (1971) 2.83
$5 a Day (2008) 2.87
$9.99 (2008) 3.01
$ellebrity (Sellebrity) (2012) 2
'71 (2014) 3.66
'Hellboy': The Seeds of Creation (2004) 3.03
'Human' Factor, The (Human Factor, The) (1975) 2.5
'Neath the Arizona Skies (1934) 2.32
'R Xmas (2001) 2.58
'Round Midnight (1986) 3.61
'Salem's Lot (2004) 3.11
'Til There Was You (1997) 2.85
'Twas the Night Before Christmas (1974) 3.37
'burbs, The (1989) 3.07
'night Mother (1986) 3.45
(500) Days of Summer (2009) 3.80
(A)sexual (2011) 3.33
(Absolutions) Pipilotti's Mistakes ((Entlastungen)) Pipilottis Fehler) (1988) 1

The fourth test builds further on the third one: now we want the top 20 movies, their rating and imdbid; that are larger than the average rating. “select dim_movies.title, fact_ratings.rating, dim_links.imdbid

from dim_movies

left outer join fact_ratings on dim_movies.movieId = fact_ratings."movieId"

left outer join dim_links on dim_movies.movieId = dim_links.movieid

where fact_ratings.rating > (select avg(fact_ratings.rating) from fact_ratings)

order by fact_ratings.rating DESC

limit 20;”

MonetDB HPE Vertica MariaDBCS
8.47s 10.12s 6m 4s


title rating imdbid
District 9 (2009) 5 1.136.608
Lord of the Rings: The Two Towers, The (2002) 5 167.261
Lord of the Rings: The Return of the King, The (2003) 5 167.260
Freaks (1932) 5 22.913
Mr. Holland's Opus (1995) 5 113.862
From Dusk Till Dawn (1996) 5 116.367
Star Wars: Episode IV - A New Hope (1977) 5 76.759
Legends of the Fall (1994) 5 110.322
Jurassic Park (1993) 5 107.290
Blade Runner (1982) 5 83.658
Terminator 2: Judgment Day (1991) 5 103.064
2001: A Space Odyssey (1968) 5 62.622
Star Wars: Episode V - The Empire Strikes Back (1980) 5 80.684
Star Wars: Episode VI - Return of the Jedi (1983) 5 86.190
Alien (1979) 5 78.748
Femme Nikita, La (Nikita) (1990) 5 100.263
Darjeeling Limited, The (2007) 5 838.221
Diving Bell and the Butterfly, The (Scaphandre et le papillon, Le) (2007) 5 401.383
Kung Fu Panda (2008) 5 441.773
Lord of the Rings: The Fellowship of the Ring, The (2001) 5 120.737


From these results it's clear that MonetDB and HPE Vertica are the winners of this benchmark. MonetDB is the fastest choice when agglomerating multiple tables together to produce an end result, where as HPE Vertica offers the best results when executing simple queries and calculations. As the newcomer to this type of storage, MariaDB has offered up a valuable candidate which albeit slow in comparison; does offer full SQL.

Again I feel the need to throw in a reminder that if we were using a larger dataset (read: terra/petabytes) with Vertica properly set up, it would dwarf the other two in comparison.

Suggested is that; when choosing a DBMS; look at the dataset and the type of queries that will be used. Considering each database has its pitfalls, which is the best option for you? 

To conclude; speed is optimal and it’s clear there isn’t a monopoly on it just yet.