What is metadata?

Contents

The term metadata has been on everyone’s lips for a few years now. Today, billions of people around the world use digital media. Large amounts of metadata are constantly being generated in the process. The term “transparent citizen” is sometimes used to describe the resulting data protection risk.

The evaluation of metadata by artificial intelligence provides predictions about people’s behavior. In perspective, this poses a serious threat to the privacy of citizens and to democracy in practice. Yet metadata is not a bad thing. In this article, we explain what metadata actually is.

What’s the difference between metadata and data?

Definition

Metadata: The term refers to information that supplements actual data. Often, metadata provides more details about the context of the content or gives instructions on how to handle data. In this way, metadata plays a major role in both computing and traditional data processing (including things like library catalogs or the postal system).

To become more familiar with the term metadata, imagine a simple example: You send a letter through the mail. Now the document contained in the envelope corresponds to the actual, primary data. This data is private and protected by law against access by third parties – the secrecy of correspondence applies.

The envelope contains the metadata of the letter. This is additional data that accompanies the primary data:

Address and sender
Stamp and post mark
Where required, additional identifiers like bar codes

As you can see, all in all it is data that makes the sending of the letter possible in the first place. The metadata of the letter is visible to anyone. This means that it is not specially protected by the secrecy of correspondence, although postal secrecy does apply.

So, what is the danger posed by metadata? It’s not a problem if individual metadata can be read. If, for example, a third party gained knowledge of the existence of an individual envelope, it’s usually no cause for concern. However, this changes when more data is at stake, as is the case with massive data storage and its evaluation. On a larger scale, patterns emerge that reveal a lot about a person’s behavior: Who communicated with whom and when? Networks and chains of communication can be identified.

The distinction between data and metadata is fluid. The classification depends on the context and on perspective. Here’s another example. A book contains primary data, such as the title of the book and its content. Furthermore, a set of metadata is available for the publication of a book:

Author
Publisher
Time and place the book was published
Edition
ISBN

Let’s imagine that the metadata of many publications is collected in a database. Regarding this kind of a database, the publication information would be primary data. In addition, there would be a new set of metadata for each publication. For example, for each publication, the database could store when an entry was added and by which user.

What types of metadata exist and how are these used?

Metadata is found in all areas of data storage and processing. The use of metadata cannot be described conclusively. Here are three major areas of use:

1. To provide context for information.

Metadata often describes the process that led to the creation of information. Think, for example, of the geographic coordinates with which digital photos are tagged. The context – once lost – may not be reconstructed and is therefore stored.

2. To provide information that would otherwise be difficult to find.

Here, consider the length of a video. This length is embedded as a timer in the video file. Without saving the duration of a video, it would have to be calculated. A possible approach would be to count the number of frames and divide this by the frame rate – a relatively high effort.

3. Linking information, making it easily retrievable and searchable.

The main goal here is to support human-readable information with machine-readable data. The aim is to use automated processes to establish relationships between pieces of information. In particular, structured data, which, when connected, creates a so-called “semantic web”.

Metadata that describes images

Images taken with digital cameras and smartphones contain a large amount of metadata. On the one hand, this is technical data, such as image dimensions, the camera used, focal length, etc. These factors are defined in the EXIF standard and are created automatically by the camera. Furthermore, the IPTC standard defines metadata that describes the content of the photo and is entered by the user.

Standard	Image metadata	Creation
EXIF	Image information like dimensions, color space, color channels, etc.; photographic information, such as exposure time, aperture, ISO, etc.	Automatic when recording
IPTC	Keywords, copyrights, location and time information, content descriptions, etc.	Manually done by user

When sharing digital images, you should be careful: the image metadata can contain private information on the author. Many apps and social networks automatically clear images when they are uploaded. But it’s best to not rely on this. In certain instances, it’s better to use a special tool to delete image information.

Metadata that is embedded in digital videos

A video file typically consists of a container that holds various data. The primary data of a video includes the encoded video and audio content. Additional metadata that is embedded includes:

Length of the video
Data rate and image dimensions
Details of the audio and video codec used
Subtitles, if applicable in different languages

Metadata that is assigned to files

A file in a digital system includes two primary pieces of data: the contents of the file and its name. In addition, each file has a set of metadata associated with it. The file metadata is managed by the operating system and is also known as “file attribute”. Here is an overview of common file metadata:

File metadata	Description
Time stamp	For the creation, modification, and last time the file was opened
Saved location	File path in the data system
Ownership	Owner and group
File permissions	Read, right, execute: for users, groups, and other

In addition to file attributes, some file types include specific metadata. These are managed by the respective application. Even with this metadata, there is a risk of disclosing confidential information when sharing it.

Metadata that is created when an email is sent

An email includes – analogous to the classic postal letter – two key parts:

Email body
Email header

The body contains the actual message, which corresponds to the document in the envelope. Like the envelope, the header contains the addresses of the sender and recipient. As with the envelope, some information in the header can be easily forged. For the recipient, it then appears as if an email came from a different sender. This is a trick that is often used in spoofing attacks.

The email header usually contains a lot of other metadata, such as:

Various timestamps
Information on the formatting and coding of the message
Stages the email has passed through during transmission
Evaluation of the email by spam filters
Note on whether the email was checked by a virus scanner

The metadata of the email header is written and read by server software and application programs. The information generated in the process reveals a lot about an email and the path it has taken through the Internet. Among other things, statements can be made about the authenticity and confidentiality of an email. Furthermore, the header can contain the host name of the user’s own device and reveal the location from which an email was sent.

Metadata that is generated when you visit a website

From a technical point of view, visiting a website is retrieving an HTML document. The user’s browser retrieves the document from a server at the specified address. The HTTP or HTTPS protocol is used for this.

In addition to the actual HTML document that is displayed in the browser, metadata known as HTTP headers is transmitted. The HTTP headers are comparable to the fields of the email header. They contain information about the encoding, transmission, encryption, and compression of the HTTP connection.

Furthermore, metadata is generated during the transfer, which accumulates on the server. These include log files in which accesses to the server are logged, and which are needed for logfile analyses. For each access, another line is written to the log file. In addition, the browser usually sends further queries to the DNS server. Metadata is also generated and possibly stored and evaluated by the server operator.

Confusingly, in addition to the HTTP header already mentioned, there is also the HTML header. While the former refers to the connection, the latter contains metadata describing the contents of the document. Below is an overview of a HTTP server response. The introductory lines are the HTTP header. This is followed by the HTML source code with HTML head and HTML body elements:

HTTP/1.1 200 OK
Date: Mon, 01 Feb 2021 12:13:34 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 148
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux)
Accept-Ranges: bytes
Connection: close
<html>
    <head>
        <title>An Example Page</title>
    </head>
    <body>
        <p> The human readable text is in the body of the document</p>
    </body>
</html>

What metadata means for online marketing and search engine optimization

In this section, we focus on metadata that is embedded in a HTML document. We’ll leave out the HTTP metadata already mentioned, as well as server-side metadata such as log files. Usually, HTML metadata is embedded in the head of the HTML document.

Many of the elements used in the HTML header are directly used for search engine optimization. Search engine bots crawl the content of an HTML document. The human-readable part present in the HTML body is extracted and indexed. In addition, there is special metadata that is intended exclusively for bots. Here, we distinguish between “classic” and “modern” variants.

Website metadata illustrated with classic HTML head elements

The classic HTML head elements include the title and a handful of critical meta tags. The title is also visible to the user in various forms. For example, it is displayed in bookmarks or in the browser tab header. The other classic “<meta>” tags are used exclusively for search engine optimization. Here is an overview of the most important classic HTML head elements:

Tag	Description	Importance
<title>	Title of the document, displayed in results of a search	Critical
<meta name="description">	Description of the document, displayed in the search results	Critical
<meta name="keywords">	Keywords of the document, not displayed in search results	Minimal
<meta name="robots">	Directions for search engine bots for processing the document	Critical

Website metadata displayed with modern HTML head elements

In addition to the classic HTML head elements, a variety of other elements are used today to include metadata on a website. Search engine operators and large technology groups are constantly defining new metadata. The elements “<meta>” and “<link>” are ideal for this, as they can be expanded. Here is an overview of frequently used modern website metadata:

Tag	Description	Importance
<link rel="canonical">	Canonical tag to avoid duplicate content	Critical, if duplicate content exists
<link rel="alternate" hreflang="en">	Provide alternative language versions of the same document per hreflang	Optional
<meta property="og:…">	Open Graph for publication on social media	Optional

For the “<meta>” element, the “name” attribute is used to specify the specific type of metadata. For the “<link>” element, the “rel” attribute is used in a similar way. Depending on the metadata standard used, two alternative notations can be found for the “<meta>” element. We summarize them here:

How it’s written	Metadata standard
<meta name="">	HTML5
<meta property="">	RDFa
<meta itemprop="">	HTML Microdata

Website metadata defined with the Open Graph

Open Graph is a protocol developed by Google to enrich a web document with metadata. The Open Graph data provides information that is displayed as an overview when the document is shared on social networks. In this way, optimized images, titles, and descriptive texts can be specified. This makes sense, since depending on the platform, specific restrictions apply in terms of length of texts, dimensions of images, and the like. The protocol is used extensively by Facebook and Twitter. Here is an overview of the essential Open Graph metadata:

Open Graph metadata	Explanation
<meta property="og:title">	Title of the object
<meta property="og:type">	The type of objects e.g., image, web document, video, etc.
<meta property="og:image">	An image that represents an object
<meta property="og:url">	The canonical URL of the object

Tip

If you find errors in your web content when sharing content on Facebook, the problem is often associated with faulty Open Graph entries. In this case, a simple trick can fix the error: log in to your Facebook account and try the Sharing Debugger. This tells Facebook to read the Open Graph information again.

Website metadata defined with Rich Cards

Besides Open Graph, Rich Cards is a further metadata standard developed by Google. Rich Cards enrich a web document with structured metadata. For example, the website of a restaurant can be supplemented with information on geographical location, prices, opening hours, etc. The Rich Card information can be placed in the HTML head or in the HTML body.

Technically, Rich Cards are derived from the metadata standard Schema.org. Various formats are used to mark up the metadata. Besides the older standards which include RDFa and microdata, JSON-LD is also available today. The use of JSON-LD even comes officially recommended by Google.

10 Years Digital Guide: A Success Story

Personal Data

What do your e-mail address, eye color, preferred political party, and license plate have in common? They can all be regarded as personal data which companies and cybercriminals can use to draw direct or indirect conclusions about your physical, physiological, genetic,…

Data Protection
Security

AlexeysunShutterstock

How do EU cookie laws affect your business?

Cookies can be useful, but these small files often raise serious privacy concerns. To protect user privacy, the EU Cookie Law requires that tracking technologies may only be used after obtaining explicit user consent. This opt-in process has been mandatory for several years. This…

Data Protection
Encyclopedia

Valery EvlakhovShutterstock

The EU General Data Protection Regulation (GDPR)

Since May 25, 2018, a European law on data protection has been in place. However, despite it being relevant to day-to-day business, many companies and website operators are still unattuned to the General Data Protection Regulation (GDPR). On top of this, high fines are imposed on…

Data Protection
Security
Advice

vectorfusionartShutterstock

How to write an effective meta description

A meta description provides information about the content of a webpage and plays an important role in the overall visibility of a webpage. In this article, we’ll go over why meta descriptions are important for navigating online content and how to write one. At the end of the…

Elnurshutterstock

What are the SOLID principles?

Code can deteriorate, especially when the SOLID principles of object-based programming are not adhered to. These principles represent the five golden rules for maintaining clean and better code. Thanks to their specific laws and guidelines, they enable an easily understandable…

PHP
HTML

BEST-BACKGROUNDSShutterstock

What are the most important meta tags? An overview

Meta tags serve website operators by providing metadata in HTML documents. The encoded information interacts with web browsers and search engine crawlers and is therefore responsible for the searchability of the World Wide Web. But which meta elements are actually needed and how…

SEO
HTML
JavaScript
Advice