How do Document Stores work?

Contents

Databases are necessary for organizing information in a practical manner. However, there are various ways for databases to be structured. In electronic data processing, relational databases are particularly common and widespread. Besides these, there are document-based databases. These are based on a simple table structure with documents for storing information. How do these databases work and what are their advantages?

What is a Document Store?

Document-oriented databases, also known as document stores, are used to manage semi-structured data. This data does not adhere to a fixed structure, instead it forms its own structure. The information can be ordered using markers within the semi-structured data. Due to the lack of a defined structure, this data is not suitable for relational databases since its information cannot be arranged in tables.

A document database creates a simple pair: A key is assigned to a specific document. The actual information is then located within this document, which may be formatted as an XML, JSON or YAML file. Since the document does not require a specific schema, different types of documents can also be integrated together in a document store. Changes to the documents do not have to be communicated to the database.

Note

Document-based databases are very similar to other database models: the system can be considered a subcategory of NoSQL databases and it’s closely related to key-value databases due to the combination of keys and documents. As a row-oriented system, it stands in contrast to column-oriented databases.

How Do Document Databases Work?

In theory, data in all sorts of formats, even without a consistent schema can be stored in a document-based database. In practice, however, a file format is typically used for the documents and the information is ordered in a certain structure. This makes it easier to work with the information and database. By using data structures, database search queries can be processed more effectively for example. You can generally perform the same actions in a document-based database as with a relational system: information can be added, changed, deleted, and queried.

To allow these actions to take place, each document is given a unique ID. How this identifier is constituted is not particularly important. Both a simple number series, or the complete pathway can be used to address the document. When searching for information, the documents themselves are checked. In other words, the data is pulled directly from the documents rather than from the columns within the database.

What Are the Pros and Cons of Document Stores?

In conventional relational databases, a field has to exist for each piece of information—and in every entry. If the information is not available, the cell is kept empty, but it must still exist. Document-oriented databases are much more flexible: the structure of individual documents does not have to be consistent. Even large volumes of unstructured data can be accommodated in the database.

Plus, it’s easier to integrate new information. While in the case of a relational database a new information criterion must be added to all datasets, the new information only needs to be included in just a few datasets in a document store. The additional content can be added to further documents, but it’s not required.

Moreover, with document stores the information is not distributed over multiple linked tables. Everything is contained in a single location, and this can result in better performance. However, this speed advantage is only realized in document databases so long as you don’t attempt to use relational elements: references don’t really suit the concept of document stores. If you do try to interlink the documents, the system will become highly complex and cumbersome. So, a relational database system is more advisable for highly networked data volumes.

The Most Well-Known Document Databases

Especially for the development of web apps, databases for documents are hugely important. Due to the increased need resulting from web development, numerous database management systems (DBMSs) have meanwhile been released on the market. The most well-known examples are outlined below:

BaseX: This open-source project uses Java and XML. BaseX is supplied with a graphical user interface.
CouchDB: The Apache Software Foundation released the open-source software CouchDB. The database management system is written in Erlang, uses JavaScript, and is utilized in Ubuntu and Facebook applications among others.
Elasticsearch: This search engine works based on a document-oriented database. JSON documents are used to this end.
eXist: The open-source DBMS runs on a Java virtual machine and can therefore be used regardless of the operating system. XML documents are primarily used.
MongoDB: MongoDB is by far the most widespread NoSQL database. The software is written in C++ and uses JSON-like documents.
SimpleDB: With SimpleDB (written in Erlang), Amazon developed its own DBMS for the company’s Cloud services. The provider charges a fee for use.

10 Years Digital Guide: A Success Story

File server: Definition and basics

Many companies perceive file management as a tiresome duty. But a well-organized file management system is one of the most important factors for smooth business operations. A possible solution for maintaining maximum control of all saved data is a local file server. We show you…

Encyclopedia

Imagewellshutterstock

What is file storage? An explanation of the classic file system

Storage isn’t all the same. Especially for businesses, different storage solutions come with their own strengths and characteristics. While traditional file storage does have its drawbacks, it remains popular — even compared to more modern alternatives. But how exactly does file…

Encyclopedia

NicoElNinoShutterstock

What is a graph database?

There are many models for processing and storing large amounts of data. Traditional databases with rigid table structures quickly reach their limits when mapping complex relationships. Graph databases have proven to be particularly efficient in handling highly connected data. But…

Golden Dayzshutterstock

What is CRUD (Create, Read, Update, Delete)?

It won’t take long for those looking to get into to software development to encounter the term CRUD. The acronym stands for the typical operations used for communicating with database systems; these normally form the basis of database management. But just how are these access…

Database
PHP

rangizzzshutterstock

What are CLOBs (Character Large Objects)?

Databases normally save information in database blocks. The size of the data plays a key role here: Especially large data objects that exclusively comprise strings are saved as CLOBs (Character Large Objects) or TEXT and are usually stored with a reference. Read the following…