The struc­tur­ing of data plays an important role in the de­vel­op­ment of programs and websites. If project data is well struc­tured, for example, it can be easily and precisely read by other software. On the Internet, this is es­pe­cial­ly important for text-based search engines such as Google, Bing or Yahoo, which can capture the content of a website thanks to cor­re­spond­ing, struc­tured dis­tinc­tions.

The use of struc­tured data in software de­vel­op­ment is generally worth­while - whether for Internet or desktop ap­pli­ca­tions - wherever programs or services have to exchange data via in­ter­faces and a high data pro­cess­ing speed is desired. You will learn the role the se­ri­al­iza­tion format Protocol Buffers (Protobuf) can play and how this struc­tur­ing method differs from the known al­ter­na­tive JSONP in this article.

What is Protobuf (Protocol Buffers)?

Protocol Buffers, or Protobuf for short, a data in­ter­change format orig­i­nal­ly developed for internal use, has been offered to the general public as an open source project (partly Apache 2.0 license) by Google since 2008. The binary format enables ap­pli­ca­tions to store as well as exchange struc­tured data in an un­com­pli­cat­ed way, whereby these programs can even be written in different pro­gram­ming languages. The following, including others, are supported languages:

  • C#
  • C++
  • Go
  • Objective-C
  • Java
  • Python
  • Ruby

Protobuf is used in com­bi­na­tion with HTTP and RPCs (Remote Procedure Calls) for local and remote client-server com­mu­ni­ca­tion - to describe the in­ter­faces required here. The protocol com­po­si­tion is also called gRPC.

72mPlAfHIjs.jpg To display this video, third-party cookies are required. You can access and change your cookie settings here.

What are the benefits of Google’s Protocol Buffers?

When de­vel­op­ing Protobuf, Google placed emphasis on two factors: Sim­plic­i­ty and per­for­mance. At the time of de­vel­op­ment, the format - as already mentioned, initially used in­ter­nal­ly at Google - was to replace the similar XML format. Today it is also in com­pe­ti­tion with other solutions such as JSON(P) or Flat­Buffers. As Protocol Buffers are still the better choice for many projects, an analysis makes the char­ac­ter­is­tics and strengths of this struc­tur­ing method clear:

Clear, cross-ap­pli­ca­tion schemes

The basis of every suc­cess­ful ap­pli­ca­tion is a well-organized database system. A great deal of attention is paid to the or­ga­ni­za­tion of this system - including the data it contains - but the un­der­ly­ing struc­tures are then lost at the latest when the data is forwarded to a third-party service. The unique encoding of the data in the Protocol Buffers schema ensures that your project forwards struc­tured data as desired, without these struc­tures being broken up.

Backward and forward com­pat­i­bil­i­ty

The im­ple­men­ta­tion of Protobuf spares the annoying execution of version checks, which is usually as­so­ci­at­ed with "ugly" code. In order to maintain backward com­pat­i­bil­i­ty with older versions or forward com­pat­i­bil­i­ty with new versions, Protocol Buffers uses numbered fields that serve as reference points for accessing services. This means you do not always have to adapt the entire code in order to publish new features and functions.

Flex­i­bil­i­ty and comfort

With Protobuf coding, you au­to­mat­i­cal­ly use modifiers (optional: required, optional or repeated) which simplify the pro­gram­ming work con­sid­er­ably. This way the struc­tur­ing method allows you to determine data structure at scheme level, whereupon the im­ple­men­ta­tion details of the classes used for the different pro­gram­ming languages are au­to­mat­i­cal­ly regulated. You can also change the status at any time, for example from "required" to "optional". The transport of data struc­tures can also be regulated using Protocol Buffers: Through the coding of generic query and response struc­tures, a flexible and secure data transfer between multiple services is ensured in a simple manner.

Less boil­er­plate code

Boil­er­plate code (or simply boil­er­plate) plays a decisive role in pro­gram­ming, depending on the type and com­plex­i­ty of a project. Put simply, it is reusable code blocks that are needed in many places in software and are usually only slightly cus­tomiz­able. Such code is often used, for example, to prepare the use of functions from libraries. Boil­er­plates are common in the web languages JavaScript, PHP, HTML and CSS in par­tic­u­lar, although this is not optimal for the per­for­mance of the web ap­pli­ca­tion. A suitable Protocol Buffers scheme helps to reduce the boil­er­plate code and thereby improve per­for­mance in the long term.

Easy language in­ter­op­er­abil­i­ty

It is part of today's standard, that ap­pli­ca­tions are no longer simply written in one language, but that program parts or modules combine different language types. Protobuf sim­pli­fies in­ter­ac­tion between the in­di­vid­ual code com­po­nents con­sid­er­ably. If new com­po­nents are added whose language differs from the current project language, you can simply translate the Protocol Buffers scheme into the re­spec­tive target language using the ap­pro­pri­ate code generator, whereby your own effort is reduced to a minimum. The pre­req­ui­site is, of course, that the languages used are those supported by Protobuf by default, such as the languages already listed, or via a third-party add-on.

Protobuf vs. JSON: The two formats in com­par­i­son

First and foremost, Google developed Protocol Buffers as an al­ter­na­tive to XML (Ex­ten­si­ble Markup Language) and exceeded the markup language in many ways. Therefore struc­tur­ing the data with Protobuf not only tends to be simpler, but according to the search engine giant, also ensures a data structure that is between three to ten times smaller and 20 to 100 times faster than a com­pa­ra­ble XML structure.

Also, with the JavaScript markup language JSON (JavaScript Object Notation), Protocol Buffers often makes a direct com­par­i­son, whereby it should be mentioned that both tech­nolo­gies were designed with different ob­jec­tives: JSON is a message format which orig­i­nat­ed from JavaScript, which exchanges its messages in text format and is supported by prac­ti­cal­ly all common pro­gram­ming languages. The func­tion­al­i­ty of Protobuf includes more than one message format, as Google tech­nol­o­gy also offers various rules and tools for defining and ex­chang­ing messages. Protobuf also generally out­per­forms JSON when you look at the sending of messages in general, but the following tabular “Protobuf vs. JSON” list shows that both struc­tur­ing tech­niques have their ad­van­tages and dis­ad­van­tages:

Protobuf JSON
Developer Google Douglas Crockford
Function Markup format for struc­tured data (storage and trans­mis­sion) and library Markup format for struc­tured data (storage and trans­mis­sion)
Binary format Yes No
Stan­dard­iza­tion No Yes
Human-readable format Partially Yes
Community/Doc­u­men­ta­tion Small community, ex­pand­able online manuals Huge community, good official doc­u­men­ta­tion as well as various online tutorials etc.

So, if you need a well-doc­u­ment­ed se­ri­al­iza­tion format that stores and transmits the struc­tured data in human-readable form, you should use JSON instead of Protocol Buffers. This is es­pe­cial­ly true if the server-side part of the ap­pli­ca­tion is written in JavaScript and if a large part of the data is processed directly by browsers by default. On the other hand, if flex­i­bil­i­ty and per­for­mance of the data structure play a decisive role, Protocol Buffers tends to be the more efficient and better solution.

Tutorial: Practical in­tro­duc­tion to Protobuf using the example of Java

Protocol Buffers can make the dif­fer­ence in many software projects, but as is often the case, the first thing to do is get to know the par­tic­u­lar­i­ties and syntactic tricks of the se­ri­al­iza­tion tech­nol­o­gy and how to apply them. To give you an initial im­pres­sion of Pro­to­buf's syntax and message exchange, the following tutorial explains the basic steps with Protobuf - from defining your own format in a .proto file, to compiling the Protocol Buffers struc­tures. A simple Java address book ap­pli­ca­tion example will be used as a code base that can read contact in­for­ma­tion from a file and write to a file. The pa­ra­me­ters "Name", "ID", "email address" and "Telephone number" are assigned to each address book entry.

Define your own data format in the .proto file

You first describe any data structure that you want to implement with Protocol Buffers in the .proto file, the default con­fig­u­ra­tion file of the se­ri­al­iza­tion format. For each structure that you want to serialize in this file - that is, map in suc­ces­sion - simply add a message. Then you specify names and types for each field of this message and append the desired modifier(s). One modifier is required per field.

One possible mapping of the data struc­tures in the .proto file looks as follows for the Java address book:

syntax = "proto3";
package tutorial;
option java_package = "com.example.tutorial";
option java_outer_classname = "AddressBookProtos";
message Person {
    required string name = 1;
    required int32 id = 2;
    optional string email = 3;
    enum PhoneType {
        MOBILE = 0;
        HOME = 1;
        WORK = 2;
    }
    message PhoneNumber {
        required string number = 1;
        optional PhoneType type = 2 [default = HOME];
    }
    repeated PhoneNumber phones = 4;
}
message AddressBook {
    repeated Person people = 1;
}

The syntax of Protocol Buffers is therefore strongly rem­i­nis­cent of C++ or Java. The Protobuf version is always declared first (here proto3), followed by the de­scrip­tion of the software package whose data you want to structure. This includes a unique name ("tutorial“) and, in this code example, the two Java-specific options "java_package"(Java package in which the generated classes are saved) and "java_outer_classname“ (defines the class name under which the classes are sum­ma­rized).

This is followed by the Protobuf messages, which can be composed of any number of fields, whereby the typical data types such as "bool", "int32", "float", "double", or "string" are available. Some of these are also used in the example. As already mentioned, each field of a message must be assigned at least one modifier - i.e. either...

  • required: a value for the field is mandatory. If this value is missing, the message remains "unini­tial­ized", i.e. not ini­tial­ized or unsent.
  • optional: a value can be provided in an optional field but does not have to. If this is not the case, a value defined as the standard is used. In the code above, for example, the default value "HOME" (landline number at home) is entered for the telephone number type.
  • repeated: fields with the “repeated” modifier can be repeated any number of times (including zero times).

You can find detailed in­struc­tions on how to define your own data format with Protocol Buffers in the Google Developer Forum.

Compile your own Protocol Buffers schema

If your own data struc­tures are defined as desired in the .proto file, generate the classes needed to read and write the Protobuf messages. To do this, use the Protocol Buffers Compiler (protoc) on the con­fig­u­ra­tion file. If you have not yet installed it, simply download the current version from the official GitHub-Repos­i­to­ry. Unzip the ZIP file at the desired location and then start the compiler with a double click (located in the "bin" folder).

Note

Make sure you have the ap­pro­pri­ate edition the Protobuf compiler: Protoc is available for 32- or 64-bit ar­chi­tec­tures (Windows, Linux or macOS), as desired.

Finally, you specify:

  • the source directory which contains the code of your program (here place­hold­er "SRC_DIR"),
  • the des­ti­na­tion directory in which the generated code is to be stored (here place­hold­er "DST_DIR")
  • and the path to the .proto file.

As you want to generate Java classes, you also use the --java_out option (similar options are also available for the other supported languages). The complete compile command is as follows:

protoc -I=$SRC_DIR --java_out=$DST_DIR $SRC_DIR/addressbook.proto
Tip

A more detailed Protobuf Java tutorial, which explains, among other things, the trans­mis­sion of messages via Protocol Buffers (read/write), is offered by Google in the “De­vel­op­ers” section, the in-house project area of the search engine giant for de­vel­op­ers. Al­ter­na­tive­ly, you also have access there to in­struc­tions for the other supported languages such as C++, Go or Python.

Go to Main Menu