What is a Byte Order Mark (BOM)?

IONOS editorial team2023-07-314 mins

Contents

Information sent over the internet needs to be in a certain order. The data recipient (for example, a HTML page) needs to know how to read the information. To ensure this, different markers are put in the code. One such marker is the byte order mark (BOM). But what is the marker intended for?

Why Do You Need the BOM?

Characters can be coded in various ways. While today, UTF-8 is used a lot, UTF-16 encoding was previously popular – and is still often used today. UTF-32 is also used sometimes. Unlike with UTF-8, however, encoding with a larger number of bits per character requires the order of bytes to be known.

With UTF-8 encoding, each character can be presented within one byte (i.e. 8 bits). With UTF-16 on the other hand, you need two bytes (so 16 bits) to encode a character. In order for the character to be interpreted correctly, it must be clear whether the bytes are read from left to right or from right to left. Depending on this, a completely different value is created.

From left to right: 01101010 00110101 is 6a35 in hexadecimal notation
From right to left: 01101010 00110101 is 356a in hexadecimal notation

When looking at this number sequence in the context of a Unicode table, two completely different characters would be displayed. The first form of interpretation is known as Big Endian (BE), and the second as Little Endian (LE). The reason for this is that with Big Endian, the higher value is indicated first, and with Little Endian, the lower value is indicated first.

Fact

In everyday life, the Big Endian notation is more frequently used. But this is just a convention. Computers can handle both methods of storage, so it makes sense to mark them.

In order to indicate the order in which the bytes are to be read, you need a BOM. This is a character that is not visible and therefore also known as a zero-width no-break space. It’s a space that has a width of zero and does not trigger a line break. In UTF-16, this character (hexadecimal) is either feff (BE) or fffe (LE). This value is then prefixed to the actual character encoding.

UTF-8 doesn’t actually need the BOM – and yet it is also found in texts encoded with it. This is either a remnant that arose in the conversion from UTF-16/UTF-32 to UTF-8, or it has been automatically inserted by an editor. This is because, even if the byte order mark is not necessary for UTF-8, it usually does not get in the way since it is not displayed.

Issues with Byte Order Mark

Problems arise when the receiving system does not know how to handle the BOM. Some PHP versions or various Unix-like environments do not expect the character, which can lead to an incorrect presentation of a website, for example.

Problems can also arise between HTTP and HTML: One HTTP header already contains information about character encoding. This comes from the server settings. If the HTML document has been created with a BOM, but the HTTP header makes a different indication to the browser, this can also lead to display errors. This should no longer occur since the change in the HTML5 specification took place: There, it was required that the BOM overwrites the information of the HTTP header at the beginning. However, it’s possible that older browser versions have not yet implemented this new rule.

How to remove BOM

If you want to remove the byte order mark from a source code, you need a text editor that offers the option of saving the mark. You read the file with the BOM into the software, then save it again without the BOM and thereby convert the coding. The mark should then no longer appear. In the popular text editor Notepad++, for example, you can change the encoding and then save the file without the BOM.

With a text editor like Notepad++, you can remove the BOM by converting the file.

Note

In older versions of Notepad++, you can still find the menu entry UTF-8 without a BOM. In newer versions, this corresponds to UTF-8. With the marker, the entry would correspond to UTF-8 BOM.

Was this article helpful?

Learn HTML

In times of content management systems and website construction kits, you might think it’s a waste of time learning HTML. But if a page doesn’t work for some reason or if you plan to install dynamic elements, you won’t be able to continue without having knowledge of this web…

HTML
CSS
JavaScript
Tutorials

UnbekanntShutterstock

How to compress CSS for consistent loading times

It’s rare to see creative information directly placed into HTML code. Colors, fonts, and sizes of HTML elements are normally defined in style sheets, such as CSS. The more complex a website becomes, the more the range and amount of required CSS files increase. The extra burden…

Tutorials

BEST-BACKGROUNDSShutterstock

Less CSS Tutorial

CSS is one of the most important languages in the World Wide Web. But working with the stylesheet language is often unnecessarily complicated, which is why many developers prefer to use Less instead. The CSS preprocessor not only makes writing stylesheet code easier, it also…

RDVectorShutterstock

BLOBs (Binary Large Objects)

BLOB is short for Binary Large Object. The data types in a BLOB are typically classified as being unstructured data. A typical example of an unstructured data type is multimedia files, which are usually stored in databases as a BLOB. Since databases cannot read the unstructured…

Big Data

Protobuf: Structured Code with Protocol Buffers

The transmission of data in computer networks such as the Internet or between two applications is influenced by various factors. These transmissions focus on performance and security. Protocol Buffers, developed by Google, impresses in both areas and proves to be a worthwhile…

Tutorials