GlusterFS vs. Ceph: a comparison of two storage systems

Contents

Distributed file systems are a solution for storing and managing data that no longer fit onto a typical server. Lack of capacity can be due to more factors than just data volume. For example, if the data to be stored is unstructured, then a classic file system with a file structure will not do.

IONOS CLOUD Object Storage

Secure, affordable storage

Cost-effective, scalable storage that integrates into your application scenarios. Protect your data with highly secure servers and individual access control.

Saving large volumes of data – GlusterFS and Ceph make it possible

With bulk data, the actual volume of data is unknown at the beginning of a project. As such, systems must be easily expandable onto additional servers that are seamlessly integrated into an existing storage system while operating. For a user, so-called “distributed file systems” look like a single file in a conventional file system, and they are unaware that individual data or even a large part of the overall data might actually be found on several servers that are sometimes in different geographical locations. Since GlusterFS and Ceph are already part of the software layers on Linux operating systems, they do not place any special demands on the hardware. Linux runs on every standard server and supports all common types of hard drives.

High availability is decisive

High availability is an important topic when it comes to distributed file systems. Hardware malfunctions must be avoided as much as possible, and any software that is required for operation must also be able to continue running uninterrupted even while new components are being added to it. Maintenance work must be able to be performed while the system is operating, and all-important metadata should not be saved in a single central location. Access to metadata must be decentralized, and data redundancy must be a factor at all times. A server malfunction should never negatively impact the consistency of the entire system. GlusterFS and Ceph are two systems with different approaches that can be expanded to almost any size, which can be used to compile and search for data from big projects in one system.

Fact

The term “big data” is used in relation to very large, complex, and unstructured bulk data that is collected from scientific sensors (for example, GPS satellites), weather networks, or statistical sources. In addition to storage, efficient search options and the systematization of the data also play a vital role with big data.

A short introduction to GlusterFS

GlusterFS is a distributed file system with a modular design. Various servers are connected to one another using a TCP/IP network. As a POSIX (Portable Operating System Interface)-compatible file system, GlusterFS can easily be integrated into existing Linux server environments. This is also the case for FreeBSD, OpenSolaris, and macOS, which support POSIX. Integration into Windows environments can only be achieved in the roundabout way of using a Linux server as a gateway.

Functionalities of GlusterFS

During its beginnings, GlusterFS was a classic file-based storage system that later became object-oriented, at which point particular importance was placed on optimal integrability into the well-known open-source cloud solution OpenStack. GlusterFS still operates in the background on a file basis, meaning that each file is assigned an object that is integrated into the file system through a hard link. There are no dedicated servers for the user, since they have their own interfaces at their disposal for saving their data on GlusterFS, which appears to them as a complete system.

Pros	Cons
Easy integration into Linux systems	Integration into Windows systems can only be done indirectly
POSIX-compatibility
Supports FUSE (File System in User Space)

Short introduction to Ceph

The distributed open-source storage solution Ceph is an object-oriented storage system that operates using binary objects, thereby eliminating the rigid block structure of classic data carriers. Physically, Ceph also uses hard drives, but it has its own algorithm for regulating the management of the binary objects, which can then be distributed among several servers and later reassembled.

Functionalities of Ceph

Every component is decentralized, and all OSDs (Object-Based Storage Devices) are equal to one another. As such, any number of servers with different hard drives can be connected to create a single storage system. Ceph can be integrated several ways into existing system environments using three major interfaces: CephFS as a Linux file system driver, RADOS Block Devices (RBD) as Linux devices that can be integrated directly, and RADOS Gateway, which is compatible with Swift and Amazon S3.

Pros	Cons
Easy integration into all systems, irrespective of the operating system being used	Weaker file system functions
Block device for Linux	Higher integration effort needed due to completely new storage structures
CephFS file system for Linux
Amazon S3 API
Seamless connection to Keystone authentication
FUSE module (File System in User Space) to support systems without a CephFS client

Comparison: GlusterFS vs. Ceph

Due to the technical differences between GlusterFS and Ceph, there is no clear winner. Ceph is basically an object-oriented memory for unstructured data, whereas GlusterFS uses hierarchies of file system trees in block storage. GlusterFS has its origins in a highly-efficient, file-based storage system that continues to be developed in a more object-oriented direction. In contrast, Ceph was developed as binary object storage from the start and not as a classic file system, which can lead to weaker, standard file system operations.

GlusterFS	Ceph
File system strengths	Object storage strengths
Quicker storage algorithm	Better performance on simpler hardware
No central metadata server necessary	Easy integration into all systems, no matter the operating system being used
Lower complexity	Block device for Linux
Better suitability for saving larger files (starting at around 4 MB per file)	Easier possibilities to create customer-specific modifications
Better suitability for data with sequential access	RADOS compatibility

When should which system be used?

Because of its diverse APIs, Ceph works well in heterogeneous networks, in which other operating systems are used alongside Linux. But the strengths of GlusterFS come to the forefront when dealing with the storage of a large quantity of classic and also larger files. Since Ceph was developed as an open-source solution from the very start, it was easier to integrate into many locations earlier than GlusterFS, which only later became open-source. A major application for distributed memories is cloud solutions. In this regard, OpenStack is one of the most important software projects offering architectures for cloud computing. GlusterFS and Ceph both work equally well with OpenStack.

Reviewer

Christian Heldmaier
Christian Heldmaier is an experienced online marketing and SEO specialist from Karlsruhe. He has been working as an SEO Manager at IONOS since July 2020.

10 Years Digital Guide: A Success Story

What is a Storage Area Network (SAN) and how does it work?

More and more companies are considering moving away from local storage strategies and opting for a holistic solution in the form of a central storage network. This can be technically implemented with a Storage Area Network (SAN). Users of SAN Storage benefit from fast data access…

Cloud

WichyShutterstock

Servers with SSD storage

SSDs have been gaining ground for years now. Until recently, these flash-based storage devices have been mostly used by mobile devices, like smartphones or MP3 players. But more recently desktops and servers have been making use of this technology. What advantages do SSDs have…

Linux

spainter_vfxshutterstock

CAP Theorem

The CAP theorem states that distributed systems can only guarantee two out of the following three points at the same time: consistency, availability, and partition tolerance. In this article, we will explain where the CAP theorem originated and how it is defined. We will then…

GlusterFS vs. Ceph: a com­par­i­son of two storage systems

Saving large volumes of data – GlusterFS and Ceph make it possible

High avail­abil­i­ty is decisive

A short in­tro­duc­tion to GlusterFS

Func­tion­al­i­ties of GlusterFS

Short in­tro­duc­tion to Ceph

Func­tion­al­i­ties of Ceph

Com­par­i­son: GlusterFS vs. Ceph