Ag­gre­ga­tion in MongoDB is a valuable tool for analyzing and filtering databases. The pipeline system makes it possible to specify queries, allowing for highly cus­tomized outputs.

What is ag­gre­ga­tion in MongoDB?

MongoDB is a non-re­la­tion­al and document-oriented database that is designed for use with large and diverse amounts of data. By forgoing rigid tables and using tech­niques like sharding (storing data on different nodes), the NoSQL solution can scale hor­i­zon­tal­ly while remaining highly flexible and resilient to failures.

Documents in the binary JSON format BSON are bundled in col­lec­tions and can be queried and edited using the MongoDB Query Language (MQL). Even though this language offers many options, it’s not suitable (or perhaps not suitable enough) for data analysis. That’s why MongoDB provides ag­gre­ga­tion.

In computer science, this term refers to various processes. In MongoDB, ag­gre­ga­tion refers to the analysis and sum­ma­riz­ing of data using various operation to produce a single and clear result. During this process, data from one or more documents is analyzed and filtered according to user-defined factors.

In the following sections, we not only look at the pos­si­bil­i­ties that MongoDB ag­gre­ga­tion offers for com­pre­hen­sive data analysis, but also provide examples of how you can use the aggregate ( ) method with a database man­age­ment system.

What do I need for MongoDB ag­gre­ga­tion?

There are only a few re­quire­ments for using ag­gre­ga­tion in MongoDB. The method is executed in the shell and works according to logical rules that you can tailor to meet the needs of your analysis.

To use ag­gre­ga­tion in Mongo DB, you need to have MongoDB already installed on your computer. If it isn’t, you can find out how to download, install and run the database in our com­pre­hen­sive MongoDB tutorial.

You should also use a powerful firewall and make sure your database is set up according to all current security standards. To run ag­gre­ga­tion in MongoDB, you need to have ad­min­is­tra­tion rights.

The database works across all platforms, so the steps described below apply to all operating systems.

What is the pipeline in the MongoDB ag­gre­ga­tion framework?

In MongoDB, you can carry out simple searches or queries, with the database im­me­di­ate­ly dis­play­ing the results. However, this method is very limited, as it can only display results that already exist within the stored documents. This type of query is not intended for in-depth analysis, recurring patterns or for deriving further in­for­ma­tion.

Sometimes different sources within a database need to be taken into account in order to draw mean­ing­ful con­clu­sions. MongoDB ag­gre­ga­tion is used for sit­u­a­tions like these. To achieve such results, the aggregate ( ) method uses pipelines.

Role of the pipeline

Ag­gre­ga­tion pipelines in MongoDB are processes in which existing data is analyzed and filtered with the help of various steps in order to display the result users are looking for. These steps are referred to as stages. Depending on the re­quire­ments, one or more stages can be initiated. These are executed one after the other and change your original input so that the output (the in­for­ma­tion you are looking for) can be displayed at the end.

While the input is made up of numerous pieces of data, the output (i.e., the end result) is singular. We’ll explain the different stages of MongoDB ag­gre­ga­tion later on in this section.

Syntax of the MongoDB ag­gre­ga­tion pipeline

First, it’s worth taking a brief look at the syntax of ag­gre­ga­tion in MongoDB. The method is always struc­tured according to the same format and can be adapted to your specific re­quire­ments. The basic structure looks like this:

db.collection_name.aggregate ( pipeline, options )
shell

Here, collection_name is the name of the col­lec­tion in question. The stages of MongoDB ag­gre­ga­tion are listed under pipeline. options can be used for further optional pa­ra­me­ters that define the output.

Pipeline stages

There are numerous stages for the ag­gre­ga­tion pipeline in MongoDB. Most of them can be used multiple times within a pipeline. It would go beyond the scope of this article to list all the options here, es­pe­cial­ly as some are only required for very specific op­er­a­tions. However, to give you an idea of the stages, we’ll list a few of the most fre­quent­ly used ones here:

  • $count: This stage gives you an in­di­ca­tion of how many BSON documents have been con­sid­ered for the stage or stages in the pipeline.
  • $group: This stage sorts and bundles documents according to certain pa­ra­me­ters.
  • $limit: Limits the number of documents passed to the next stage in the pipeline.
  • $match: With the $match stage, you limit the documents that are used for the following stage.
  • $out: This stage is used to include the results of the MongoDB ag­gre­ga­tion in the col­lec­tion. This stage can only be used at the end of a pipeline.
  • $project: Use $project to select specific fields from a col­lec­tion.
  • $skip: This stage ignores a certain number of documents. You can specify this with an option.
  • $sort: This operation sorts the documents in the user’s col­lec­tion. However, the documents are not changed beyond this.
  • $unset: $unset excludes certain fields. It does the opposite of what $project does.

An example of ag­gre­ga­tion in MongoDB

To help you better un­der­stand how ag­gre­ga­tion in MongoDB works, we’ll show you some examples of different stages and how to use them. To use MongoDB ag­gre­ga­tion, open the shell as an ad­min­is­tra­tor. Normally, a test database will be displayed first. If you want to use a different database, use the use command.

For this example, let’s imagine a database that contains the data of customers who have purchased a specific product. To keep things simple, this database has just ten documents, which are all struc­tured the same:

{
	"name" : "Smith",
	"city" : "Los Angeles",
	"country" : "United States",
	"quantity" : 14
}
shell

The following in­for­ma­tion about the customers has been included: their name, place of residence, country and the number of products they have purchased.

If you want to try ag­gre­ga­tion in MongoDB, you can use the method insertMany ( ) to add all documents with customer data to the col­lec­tion named “customers”:

db.customers.insertMany ( [
	{ "name" : "Smith", "city" : "Los Angeles", "country" : "United States", "quantity" : 14 },
	{ "name" : "Meyer", "city" : "Hamburg", "country" : "Germany", "quantity" : 26 },
	{ "name" : "Lee", "city" : "Birmingham", "country" : "England", "quantity" : 5 },
	{ "name" : "Rodriguez", "city" : "Madrid", "country" : "Spain", "quantity" : 19 },
	{ "name" : "Nowak", "city" : "Krakow", "country" : "Poland", "quantity" : 13 },
{ "name" : "Rossi", "city" : "Milano", "country" : "Italy", "quantity" : 10 },
{ "name" : "Arslan", "city" : "Ankara", "country" : "Turkey", "quantity" : 18 },
{ "name" : "Martin", "city" : "Lyon", "country" : "France", "quantity" : 9 },
{ "name" : "Mancini", "city" : "Rome", "country" : "Italy", "quantity" : 21 },
{ "name" : "Schulz", "city" : "Munich", "country" : "Germany", "quantity" : 2 }
] )
shell

A list of object IDs for each in­di­vid­ual document will be displayed.

How to use $match

To il­lus­trate the pos­si­bil­i­ties of ag­gre­ga­tion in MongoDB, we’ll first apply the $match stage to our “customers” col­lec­tion. Without ad­di­tion­al pa­ra­me­ters, this would simply output the complete list of customer data listed above.

In the following example, however, we’ve in­struct­ed it to only show us customers from Italy. Here’s the command:

db.customers.aggregate ( [
	{ $match : { "country" : "Italy" } }
] )
shell

You’ll now only be shown the object IDs and in­for­ma­tion of the two customers from Italy.

Use $sort for a better overview

If you want to organize your customer database, you can use the $sort stage. In the following example, we instruct the system to sort all customer data according to the number of units purchased, starting with the highest number. The input looks like this:

db.customers.aggregate ( [
	{ $sort : { "quantity" : -1 } }
] )
shell

Limit the output with $project

With the stages used so far, you’ll see that the output is rel­a­tive­ly extensive. For example, in addition to the actual in­for­ma­tion within the documents, the object ID is also always output. You can use $project in the MongoDB ag­gre­ga­tion pipeline to determine which in­for­ma­tion should be output. To do this, we set the value 1 for required fields and 0 for fields that don’t need to be included in the output. In our example, we only want to see the customer name and the number of products purchased. To do this, we enter the following:

db.customers.aggregate ( [
	{ $project : { _id : 0, name : 1, city : 0, country : 0, quantity : 1 } }
] )
shell

Combine multiple stages with ag­gre­ga­tion in MongoDB

MongoDB ag­gre­ga­tion also gives you the option of applying several stages in suc­ces­sion. These are then run through one after the other, and at the end there is an output that takes all the desired pa­ra­me­ters into account. For example, if you only want to display the names and purchases of U.S. customers in de­scend­ing order, you can use the stages described above as follows:

db.customers.aggregate ( [
	{ $match : { "country" : "United States" } }
	{ $project : { _id : 0, name : 1, city : 0, country : 0, quantity : 1 } }
	{ $sort : { "quantity" : -1 } }
] )
shell
Tip

Want to find out more about MongoDB? We’ve got a lot more in­for­ma­tion in our Digital Guide. For example, you can read about how the list databases command works or how you can use MongoDB Sort to specify the order of your data output.

Go to Main Menu