While most databases dis­trib­ute in­for­ma­tion in rows, columnar databases operate dif­fer­ent­ly. Here, the data is struc­tured in columns. What’s the point of this? Read on and find out what ad­van­tages these columnar databases offer.

How are Columnar Databases Struc­tured?

Typically, databases are struc­tured row by row. For each entry, the database man­age­ment system (DBMS) creates a row. The fields con­tain­ing the re­spec­tive in­for­ma­tion are therefore listed one after the other. Re­la­tion­al databases in par­tic­u­lar are based on this principle. Columnar databases flip this system over. With this system, each column rep­re­sents an entry: the data for each entry is ac­cord­ing­ly arranged ver­ti­cal­ly (and not hor­i­zon­tal­ly as is the case for row-based variants).

An il­lus­tra­tive example of the row-based system is shown below:

Number Last name First name Key
1 Skywalker Luke 3FN-Z768
2 Kenobi Obi-wan 7TR-K345
3 Organa Leia 8NN-R266

The columnar database es­sen­tial­ly turns this table on its side:

Number 1 2 3
Last name Skywalker Kenobi Organa
First name Luke Obi-wan Leia
Key 3FN-Z768 7TR-K345 8NN-R266

On the hard drive itself, however, the data appears one-di­men­sion­al: it’s displayed one piece after the other. For row-based databases, this looks as follows:

1, Skywalker, Luke, 3FN-Z768; 2, Kenobi, Obi-wan, 7TR-K345; 3, Organa, Leia, 8NN-R266

Columnar databases also save the in­for­ma­tion one piece after the other. But this al­ter­na­tive ori­en­ta­tion results in a different data sequence:

1, 2, 3; Skywalker, Kenobi, Organa; Luke, Obi-wan, Leia; 3FN-Z768, 7TR-K345, 8NN-R266
Tip

There are other al­ter­na­tives to the con­ven­tion­al model besides columnar databases. NoSQL databases or their sub­cat­e­go­ry document stores as well as the related key-value databases are used in­creas­ing­ly often. Graph databases are also becoming more popular since this model is able to map highly networked data very ef­fec­tive­ly.

Ap­pli­ca­tions of Columnar Databases

Re­la­tion­al systems based on rows are primarily used when lots of trans­ac­tions have to be performed quickly. Writing, changing and deleting entries works very well with re­la­tion­al databases. Columnar databases are par­tic­u­lar­ly useful when analyzing large volumes of data.

In many ap­pli­ca­tions—for example, in research and many other fields—data is analyzed on a con­tin­u­ous basis. This is much faster with columnar DBs. The reason for this is that fewer hard drive accesses are required. A category’s data is found close to each other. To read and analyses a dataset, only one block needs to be loaded. The entire database doesn’t need to be read in order to gather up the dis­trib­uted in­for­ma­tion.

Ad­van­tages and Dis­ad­van­tages of a Columnar Database

Columnar database man­age­ment systems play to their strengths when analyzing large volumes of data—such as big data. Since hard drive access con­sti­tutes the re­spec­tive bot­tle­neck when reading any database, and this access is more efficient with a column-oriented DBMS, the columnar variant is really effective here.

For trans­ac­tion­al ap­pli­ca­tions, the accesses typically work in a different way. In these cases, new in­for­ma­tion has to be dis­trib­uted across the entire database. A columnar database would perform this task more slowly than the con­ven­tion­al model.

Another advantage of columnar databases is the pos­si­bil­i­ty of com­press­ing. Data in one column is always of the same type, for example, a string or an integer. Since all entries of a type are found close together, they can be com­pressed more ef­fi­cient­ly.

The Most Well-Known Columnar Databases

Columnar database man­age­ment systems have been in use for a rel­a­tive­ly long time, but the number of available im­ple­men­ta­tions is still limited. The demand for re­la­tion­al databases is simply greater. Nonethe­less, a few systems have become es­tab­lished.

  • Amazon Redshift: As part of Amazon Web Services (AWS), Redshift offers a column-based data warehouse for big data.
  • MariaDB Column­Store: The open-source DBMS MariaDB (fork of MySQL) offers a com­bi­na­tion of a columnar and re­la­tion­al database with Column­Store.
  • SAP HANA: The SAP de­vel­op­ment platform likewise uses a com­bi­na­tion of a re­la­tion­al and columnar database.
  • Apache Cassandra: The free software is built on Apache Hadoop and is compiled in Java.
  • MonetDB: This open-source software was developed with a special focus on data mining.
Go to Main Menu