HDFS is a highly availablefile system that is used to save large quantities of data in a computer cluster and is therefore responsible for storage within the framework. To this end, files are separated into blocks of data and are then redundantly distributed to different nodes; this is done without any predefined organizational scheme. According to the developers, HDFS is able to manage files numbering in the millions.
The Hadoop cluster generally functions according to the master/slave model. The architecture of this framework is composed of a master node to which numerous subordinate ‘slave’ nodes are assigned. This principle is found again in the HDFS structure, which is based on a NameNode and various subordinate DataNodes. The NameNode manages all metadata for the file system and for the directory structures and files. The actual data storage takes place on the subordinate DataNodes. In order to minimize data loss, these files are separated into single blocks and saved multiple times on different nodes. The standard configuration is organized in such a way that each data block is saved in triplicate.
Every DataNode sends the NameNode a sign of life, known as a ‘heartbeat’, in regular intervals. Should this signal fail to materialize, the NameNode declares the corresponding slave to be ‘dead’, and with the help of the data copies, ensures that enough copies of the data block in question are available in the cluster. The NameNode occupies a central role within the framework. In order to keep it from becoming a ‘single point of failure’, it’s common practice to provide this master node with a SecondaryNameNode. This is responsible for recording any changes made to meta data, making it possible to restore the HDFS’ centrally controlled instance.
During the transition phase from Hadoop 1 to Hadoop 2, HDFS added a further security system: NameNode HA (high availability) adds another failsafe mechanism to the system that automatically starts a backup component whenever a NameNode crash occurs. What’s more, a snapshot function enables the system to be set back to its prior status. Additionally, the extension, Federation, is able to operate multiple NameNodes within a cluster.