Con­vo­lu­tion­al neural networks (ConvNets, CNNs) are ar­ti­fi­cial neural networks whose layers (con­vo­lu­tion­al layers) are applied to input data to extract features and ul­ti­mate­ly identify an object. ConvNets are essential to deep learning.

What are con­vo­lu­tion­al neural networks (CNN)?

A con­vo­lu­tion­al neural network is a spe­cial­ized type of ar­ti­fi­cial neural network that is par­tic­u­lar­ly effective at pro­cess­ing and analyzing visual data such as images and videos. CNNs are crucial in machine learning, es­pe­cial­ly in the ML subset deep learning.

ConvNets consist of node layers, including an input layer, one or more hidden layers and an output layer. The in­di­vid­ual nodes are in­ter­con­nect­ed, and each one has a weight and threshold as­so­ci­at­ed with it. Once the output of a single node exceeds its specified threshold, it activates and sends data to the next layer of the network.

Different types of neural networks are used for different types of data and use cases. For example, recurrent neural networks are often used for pro­cess­ing natural language and speech recog­ni­tion, while con­vo­lu­tion­al neural networks (CNNs) are more commonly employed for clas­si­fi­ca­tion and computer vision tasks. The ability of neural networks to recognize complex patterns in data makes them a sig­nif­i­cant tool in ar­ti­fi­cial in­tel­li­gence.

How do con­vo­lu­tion­al neural networks work?

ConvNets dis­tin­guish them­selves from other neural networks with their superior per­for­mance in pro­cess­ing image, speech and audio signals. They have three main types of layers, and with each layer, the CNN becomes more complex and is able to identify larger parts of an image, for example.

How the ConvNet algorithm processes images

Computers recognize images as number com­bi­na­tions, or more specif­i­cal­ly, as pixel values. The CNN algorithm does this as well. For example, a black-and-white image with length m and width n is rep­re­sent­ed as a 2-di­men­sion­al array of size mXn. With a color image of the same size, a 3-di­men­sion­al array is used. Each cell in the array contains the cor­re­spond­ing pixel value, and each image is rep­re­sent­ed by the re­spec­tive pixel values in three different channels, cor­re­spond­ing to a red, blue and green channel.

Next, the most important features of the image are iden­ti­fied. These are extracted using a method known as con­vo­lu­tion. This is an operation where one function modifies (folds) the shape of another function. Image sharp­en­ing, smoothing and en­hance­ment are common ways that con­vo­lu­tions are used for images. However, in the case of CNNs, con­vo­lu­tions are employed to extract sig­nif­i­cant features from images.

A filter or kernel is used to extract key features from an image. A filter is an array that rep­re­sents a specific feature that should be extracted. The filter is applied over the input array, and the resulting array is a two-di­men­sion­al array that shows where and how strongly the feature appears in the image. The output matrix is known as a feature map.

IONOS AI Model Hub
Your gateway to a secure mul­ti­modal AI platform
  • One platform for the most powerful AI models
  • Fair and trans­par­ent token-based pricing
  • No vendor lock-in with open source

Char­ac­ter­is­tics of the different con­vo­lu­tion layers

During the con­vo­lu­tion process, the input field is trans­formed into a smaller field that retains the spatial cor­re­la­tion between pixels by applying filters. Below we’ll take a look at the three main types of con­vo­lu­tion layers.

  • Con­vo­lu­tion­al layer: This layer is the first layer of a con­vo­lu­tion­al network. Using filters (small weight matrices) that slide over the images, the layer is able to recognize local features such as edges, corners and textures. Each filter creates a feature map that high­lights specific patterns. More than one con­vo­lu­tion­al layer can be used, creating a hi­er­ar­chi­cal structure in the CNN, whereby the sub­se­quent layers can see the pixels located in the receptive fields of the previous layers.
  • Pooling layer: This layer reduces the size of the feature maps by sum­ma­riz­ing local areas and dis­card­ing ir­rel­e­vant in­for­ma­tion. This reduces com­pu­ta­tion­al com­plex­i­ty while ensuring that the most important in­for­ma­tion is retained.
  • Fully-connected layer: Similar to the structure in a natural neural network, this layer connects all the neurons. Used for making the final clas­si­fi­ca­tion, it combines the extracted features to identify an object in an image.

A more detailed look at the con­vo­lu­tion process

Imagine you are trying to determine whether an image contains a human face. You can think of the face as the sum of its parts: two eyes, a nose, a mouth, two ears, etc. This is what the con­vo­lu­tion process looks like.

  1. First con­vo­lu­tion­al layers: The first con­vo­lu­tion­al layers use filters to recognize features from in­di­vid­ual pixels. For example, a filter might recognize a vertical edge rep­re­sent­ing the edge of an eye. As mentioned, local features form patterns reg­is­tered as feature maps during con­vo­lu­tion. In this case, a feature map might represent the edges of the eyes, the nose and the mouth.
  2. Ad­di­tion­al con­vo­lu­tion­al layers: Following the first con­vo­lu­tion­al layers, ad­di­tion­al ones can be applied or the pooling layers can be applied. The sub­se­quent con­vo­lu­tion­al layers combine simple features into more complex patterns. The in­di­vid­ual patterns gradually form a face. For example, edges and corners can be combined into shapes like eyes. The layers that are added see larger areas of the image (receptive fields) and recognize composite struc­tures, known as feature hi­er­ar­chies within the con­vo­lu­tion layers. A layer that is added later would be able to recognize that when two eyes and a mouth are arranged in a certain way, they form a face.
  3. Pooling layers: These reduce the size of the feature maps and further abstract the features. This reduces the amount of data that needs to be processed while still retaining the essential features.
  4. Fully-connected layer: The last layer of the ConvNet is the fully-connected layer. In this layer, the CNN would produce the image of a human face, which thanks to the con­vo­lu­tions would be clearly dis­tin­guish­able from another face.
Image: Diagram of a convolutional neural network
ConvNets au­to­mat­i­cal­ly extract features needed to identify objects in an images.

Ad­di­tion­al­ly, tech­niques such as dropout and reg­u­lar­iza­tion optimize the CNNs by pre­vent­ing over­fit­ting from occurring. Ac­ti­va­tion functions like ReLU (Rectified Linear Unit) introduce non-linearity and help the network recognize more complex patterns by ensuring not all neurons perform the same cal­cu­la­tions. Ad­di­tion­al­ly, batch nor­mal­iza­tion sta­bi­lizes and speeds up training by pro­cess­ing data more evenly.

What can con­vo­lu­tion­al neural networks be used for?

Before CNNs existed, objects were iden­ti­fied in images using time-consuming feature ex­trac­tion methods that had to be carried out manually. Con­vo­lu­tion­al neural networks offer a more scalable approach to image clas­si­fi­ca­tion and object detection. Employing prin­ci­ples of linear algebra, (in par­tic­u­lar, matrix mul­ti­pli­ca­tion), CNNs are able to recognize patterns in an image. They are now widely used in:

  • Image and speech recog­ni­tion: CNNs au­to­mat­i­cal­ly recognize objects or people in images and videos, for example, for photo-tagging in smart­phones, facial recog­ni­tion systems and voice as­sis­tants like Siri or Alexa.
  • Medical di­ag­nos­tics: Here, AI image recog­ni­tion tech­nol­o­gy enhances di­ag­nos­tics by aiding in the analysis of medical images such as X-rays, CT scans and MRIs.
  • Au­tonomous vehicles: ConvNets are used, for example, in self-driving cars to recognize road features and obstacles.
  • Social Media: CNNs are used for text mining, which allows social media platforms to au­to­mat­i­cal­ly moderate content and create per­son­al­ized ad­ver­tis­ing.
  • Marketing and retail: CNNs are used to mine data, enabling visual product searches and product placement.
AI Tools at IONOS
Empower your digital journey with AI
  • Get online faster with AI tools
  • Fast-track growth with AI marketing
  • Save time, maximize results

What are the ad­van­tages and dis­ad­van­tages of con­vo­lu­tion­al neural networks?

Con­vo­lu­tion­al neural networks can au­to­mat­i­cal­ly extract relevant features from data and they also achieve a high level of accuracy. However, training CNNs ef­fec­tive­ly requires a sub­stan­tial amount of com­pu­ta­tion­al resources, including large volumes of labeled data and powerful GPUs, to produce optimal results.

Ad­van­tages Dis­ad­van­tages
Automatic feature ex­trac­tion High com­pu­ta­tion­al re­quire­ments
High level of accuracy Large datasets needed
Summary

CNNs have rev­o­lu­tion­ized the field of ar­ti­fi­cial in­tel­li­gence and offer immense benefits across various sectors. Further de­vel­op­ments, such as hardware im­prove­ments, new data col­lec­tion methods and advanced ar­chi­tec­tures like Capsule Networks, can further optimize CNNs and integrate them into more tech­nolo­gies, making it possible to use them for a wider range of use cases.

Go to Main Menu