Databases are necessary for or­ga­niz­ing in­for­ma­tion in a practical manner. However, there are various ways for databases to be struc­tured. In elec­tron­ic data pro­cess­ing, re­la­tion­al databases are par­tic­u­lar­ly common and wide­spread. Besides these, there are document-based databases. These are based on a simple table structure with documents for storing in­for­ma­tion. How do these databases work and what are their ad­van­tages?

What is a Document Store?

Document-oriented databases, also known as document stores, are used to manage semi-struc­tured data. This data does not adhere to a fixed structure, instead it forms its own structure. The in­for­ma­tion can be ordered using markers within the semi-struc­tured data. Due to the lack of a defined structure, this data is not suitable for re­la­tion­al databases since its in­for­ma­tion cannot be arranged in tables.

A document database creates a simple pair: A key is assigned to a specific document. The actual in­for­ma­tion is then located within this document, which may be formatted as an XML, JSON or YAML file. Since the document does not require a specific schema, different types of documents can also be in­te­grat­ed together in a document store. Changes to the documents do not have to be com­mu­ni­cat­ed to the database.

Note

Document-based databases are very similar to other database models: the system can be con­sid­ered a sub­cat­e­go­ry of NoSQL databases and it’s closely related to key-value databases due to the com­bi­na­tion of keys and documents. As a row-oriented system, it stands in contrast to column-oriented databases.

How Do Document Databases Work?

In theory, data in all sorts of formats, even without a con­sis­tent schema can be stored in a document-based database. In practice, however, a file format is typically used for the documents and the in­for­ma­tion is ordered in a certain structure. This makes it easier to work with the in­for­ma­tion and database. By using data struc­tures, database search queries can be processed more ef­fec­tive­ly for example. You can generally perform the same actions in a document-based database as with a re­la­tion­al system: in­for­ma­tion can be added, changed, deleted, and queried.

To allow these actions to take place, each document is given a unique ID. How this iden­ti­fi­er is con­sti­tut­ed is not par­tic­u­lar­ly important. Both a simple number series, or the complete pathway can be used to address the document. When searching for in­for­ma­tion, the documents them­selves are checked. In other words, the data is pulled directly from the documents rather than from the columns within the database.

What Are the Pros and Cons of Document Stores?

In con­ven­tion­al re­la­tion­al databases, a field has to exist for each piece of in­for­ma­tion—and in every entry. If the in­for­ma­tion is not available, the cell is kept empty, but it must still exist. Document-oriented databases are much more flexible: the structure of in­di­vid­ual documents does not have to be con­sis­tent. Even large volumes of un­struc­tured data can be ac­com­mo­dat­ed in the database.

Plus, it’s easier to integrate new in­for­ma­tion. While in the case of a re­la­tion­al database a new in­for­ma­tion criterion must be added to all datasets, the new in­for­ma­tion only needs to be included in just a few datasets in a document store. The ad­di­tion­al content can be added to further documents, but it’s not required.

Moreover, with document stores the in­for­ma­tion is not dis­trib­uted over multiple linked tables. Every­thing is contained in a single location, and this can result in better per­for­mance. However, this speed advantage is only realized in document databases so long as you don’t attempt to use re­la­tion­al elements: ref­er­ences don’t really suit the concept of document stores. If you do try to interlink the documents, the system will become highly complex and cum­ber­some. So, a re­la­tion­al database system is more advisable for highly networked data volumes.

The Most Well-Known Document Databases

Es­pe­cial­ly for the de­vel­op­ment of web apps, databases for documents are hugely important. Due to the increased need resulting from web de­vel­op­ment, numerous database man­age­ment systems (DBMSs) have meanwhile been released on the market. The most well-known examples are outlined below:

  • BaseX: This open-source project uses Java and XML. BaseX is supplied with a graphical user interface.
  • CouchDB: The Apache Software Foun­da­tion released the open-source software CouchDB. The database man­age­ment system is written in Erlang, uses JavaScript, and is utilized in Ubuntu and Facebook ap­pli­ca­tions among others.
  • Elas­tic­search: This search engine works based on a document-oriented database. JSON documents are used to this end.
  • eXist: The open-source DBMS runs on a Java virtual machine and can therefore be used re­gard­less of the operating system. XML documents are primarily used.
  • MongoDB: MongoDB is by far the most wide­spread NoSQL database. The software is written in C++ and uses JSON-like documents.
  • SimpleDB: With SimpleDB (written in Erlang), Amazon developed its own DBMS for the company’s Cloud services. The provider charges a fee for use.
Go to Main Menu