Introduction
GlusterFS is an open-source distributed file system that allows you to build a highly available and scalable storage infrastructure. It enables the consolidation of many storage servers into a single, consistent namespace. GlusterFS’s ability to create numerous sorts of volumes to fit diverse use cases is one of its primary characteristics. In this post, we will look at how to create different types of volume in GlusterFS and provide instructions on setting up those different types of volume.
GlusterFS Volume Types
A volume is a logical collection of bricks in trusted pool of GlusterFS storage nodes, each of which represents an export directory on one of the pool’s servers. Defining the blocks that will make up a new volume in your storage environment is the first step in making a new volume. A new volume needs to be started before it can be mounted once it has been created. Below are the types of volume and how those volumes can be created. The prerequisite steps are as follows:
- Installation of GlusterFS storage nodes
- Creation of Trusted Pool of GlusterFS storage nodes
- Creation of Bricks on GlusterFS storage nodes or unused disks to create bricks on the GlusterFS storage nodes.
If you are not aware of the above steps, please refer to my blog article “GlusterFS Installation on Ubuntu 22.04“.
Distributed Volume
Files are dispersed over multiple servers in a distributed volume, which results in increased performance as well as capacity for storage. Each file is then cut into stripes, which are then distributed throughout the many servers. When using GlusterFS, spreading data across bricks in a distributed volume eliminates the need for replication. It has a larger capacity for storage, but there is no redundancy in the system. Please follow these procedures in order to generate a distributed volume:
$ sudo gluster volume create <volume_name> <node1>:<path to brick> <node2>:<path to brick>...
By default, GlusterFS create volume as distributed volume if no redundancy and replication count provided.
$ sudo gluster volume create distributed_vol glsfshost01:/glsvol01/dist_br glsfshost02:/glsvol01/dist_br glsfshost03:/glsvol01/dist_br glsfshost04:/glsvol01/dist_br
$ sudo gluster volume start distributed_vol
$ sudo gluster volume info distributed_vol
Volume Name: distributed_vol
Type: Distribute
Volume ID: 6fe4b892-31ba-4cad-9d71-861c06055873
Status: Started
Snapshot Count: 0
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: glsfshost01:/glsvol01/dis_br
Brick2: glsfshost02:/glsvol01/dist_br
Brick3: glsfshost03:/glsvol01/dist_br
Brick4: glsfshost04:/glsvol01/dist_br
Options Reconfigured:
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
Replicated Volume
The creation of identical copies of files on each server by a replicated volume guarantees both the high availability of data and its continued redundancy. When you build a replicated volume in GlusterFS, an exact copy of the data is created across many bricks. It offers both high availability and redundancy at the same time. This volume type is helpful for applications that require fault tolerance because of its high level of redundancy. The following commands must be executed in order to configure a replicated volume:
$ sudo gluster volume create <volume_name> replica <N> <node1>:<path to brick> <node2>:<path to brick> …
Where N is the number of replicated copies of files across multiple bricks in the volume. Make sure to provide replication count value equal to number of bricks.
$ sudo gluster volume create replicated_vol replica 4 glsfshost01:/glsvol01/replicated_br glsfshost02:/glsvol01/replicated_br glsfshost03:/glsvol01/replicated_br glsfshost04:/glsvol01/replicated_br
$ sudo gluster volume start replicated_vol
$ sudo gluster volume info replicated_vol
Volume Name: replicated_vol
Type: Replicate
Volume ID: eff60316-3492-43cb-8c5f-fd38ece87d7a
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: glsfshost01:/glsvol01/replicated_br
Brick2: glsfshost02:/glsvol01/replicated_br
Brick3: glsfshost03:/glsvol01/replicated_br
Brick4: glsfshost04:/glsvol01/replicated_br
Options Reconfigured:
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
Distributed Replicated Volume
A distributed replicated volume combines the best parts of a distributed volume and a replicated volume. It does this while making copies of each file on each of the many servers it uses, so that it can handle errors. Distributed replicated volumes need an even number of GlusterFS nodes, such as 4, 6, 8, etc., and the same is true for bricks. A distributed replicated volume combines the benefits of both distributed and replicated volumes. It does this by putting the data on a number of servers and keeping redundant copies. The steps needed to build a distributed replicated volume are as follows:
$ sudo gluster volume create <volume_name> replica <N> <node1>:<path to brick> <node2>:<path to brick> …
Command syntax remains the same as replication volume. However, replica count N here determines replicated mirrored copies among distributed nodes. For example, For example, to create a four node distributed replicated volume with a two-way mirror N=2.
$ sudo gluster volume create dist_rep_vol replica 2 glsfshost01:/glsvol01/dist_rep_br glsfshost02:/glsvol01/dist_rep_br glsfshost03:/glsvol01/dist_rep_br glsfshost04:/glsvol01/dist_rep_br
$ sudo gluster volume start dist_rep_vol
$ sudo gluster volume info dist_rep_vol
Volume Name: dist_rep_vol
Type: Distributed-Replicate
Volume ID: 0aea8b3d-59d5-435d-8d43-d6e61ea0ce50
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: glsfshost01:/glsvol01/dis_rep_br
Brick2: glsfshost02:/glsvol01/dis_rep_br
Brick3: glsfshost03:/glsvol01/dis_rep_br
Brick4: glsfshost04:/glsvol01/dis_rep_br
Options Reconfigured:
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
Dispersed Volume
The dispersed volume type offers data redundancy as well as fault tolerance by distributing data across a number of different bricks. Erasure codes, the foundation of distributed volumes, save data against disk and server failures without wasting valuable storage space. A level of reliability can be achieved with minimal wasted space by creating dispersed volumes. A small, encoded piece of the original file is written to each brick, and then the entire file can be recovered by decoding just a few of the fragments.
The number of bricks in a dispersed volume, the number of redundancy bricks, or both can be used to create a dispersed volume.The entire volume will be regarded as one large volume made up of all the bricks given in the command line if disperse is left blank or the <count> is absent. Redundancy is automatically set the optimal value if it is not given. If this value is absent, it is assumed to be “1,” and the warning message is displayed while creating dispersed volume. The total number of bricks must be larger than 2 * redundancy and it must be greater than 0. In other words, a distributed volume needs at least three bricks.
<Usable Volume Size> = <Brick Size> * (No. of Bricks – Redundancy)
It is better to keep size of each brick same in dispersed volume Otherwise, the volume will be full when the smallest brick is filled. Below are the steps to create dispersed volumes.
$ gluster volume create <volume_name> disperse N redundancy M <node1>:<path to brick> <node2>:<path to brick>…
In this example, we will create a dispersed volume called with a disperse count N=4 (the data will be divided into 4 fragments) and a redundancy count M=1 (1 extra copies will be created for fault tolerance).
$ sudo gluster volume create dispersed_vol disperse 4 redundancy 1 glsfshost01:/glsvol01/dispersed_br glsfshost02:/glsvol01/dispersed_br glsfshost03:/glsvol01/dispersed_br glsfshost04:/glsvol01/dispersed_br
$ sudo gluster volume start dispersed_vol
$ sudo gluster volume info dispersed_vol
Volume Name: dispersed_vol
Type: Disperse
Volume ID: 3875c42b-139d-479c-a3db-a342c4775ef8
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (3 + 1) = 4
Transport-type: tcp
Bricks:
Brick1: glsfshost01:/glsvol01/dispersed_br
Brick2: glsfshost02:/glsvol01/dispersed_br
Brick3: glsfshost03:/glsvol01/dispersed_br
Brick4: glsfshost04:/glsvol01/dispersed_br
Options Reconfigured:
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
Distributed Dispersed Volume
A distributed dispersed volume will have its files dispersed among several dispersed sub volumes. It gives the benefits of both distributed volumes as well as dispersed volumes into a single package. The benefits offered by this method is same to those provided by distributed replicate volumes; however, disperse is used to store the data rather than replicate. The command format of distributed dispersed volume is same but difference in disperse count. For distributed disperse the number of bricks must be multiple of disperse count provided in command.
$ gluster volume create <volume_name> disperse N redundancy M <node1>:<path to brick> <node2>:<path to brick>…
Where N= Number of bricks / 2
sudo gluster volume create dispt_vol disperse 4 redundancy 1 glsfshost01:/glsvol01/dp_br glsfshost02:/glsvol01/dp02_br glsfshost03:/glsvol01/dp_br glsfshost04:/glsvol01/dp_br glsfshost05:/glsvol01/dp_br glsfshost06:/glsvol01/disp02_br glsfshost07:/glsvol01/dp_br glsfshost08:/glsvol01/dp_br
sudo gluster volume start disp_vol
$ sudo gluster volume info dispt_vol
Volume Name: dispt_vol
Type: Distributed-Disperse
Volume ID: 9fd349aa-3ca4-4925-90d0-7f35d056f818
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (3 + 1) = 8
Transport-type: tcp
Bricks:
Brick1: glsfshost01:/glsvol01/dp_br
Brick2: glsfshost02:/glsvol01/dp_br
Brick3: glsfshost03:/glsvol01/dp_br
Brick4: glsfshost04:/glsvol01/dp_br
Brick5: glsfshost05:/glsvol01/dp_br
Brick6: glsfshost06:/glsvol01/dp_br
Brick7: glsfshost07:/glsvol01/dp_br
Brick8: glsfshost08:/glsvol01/dp_br
Options Reconfigured:
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
Below are the types of volume which are deprecated in the newer GlusterFS version.
Striped Volume
The read and write speeds of a striped volume are much faster because the data is split into stripes and spread across a number of servers. This solution doesn’t have any benefits, like saving data more than once or being able to handle failures. When striped volumes are being used, GlusterFS works better because the data is split among a large number of “bricks.” It doesn’t give any kind of backup. This method is no longer recommended for GlusterFS volumes as well.
Distributed Striped Volume
Distributed striped volumes distribute data in a striped fashion across two or more nodes in the cluster. When there is a need to scale storage and when there is a high amount of users access, using distributed striped volumes is the best option for making sure that very huge files can be accessed quickly. This is also deprecated in newer version of GlusterFS.
Distributed Striped Replicated Volume
Distributed striped replicated volumes are used to distribute striped data across replicated bricks in the cluster. When working in high concurrency where the parallel access of very big files and performance are of crucial importance, you should employ these volumes for the best results. Only type of data that need map-reduce operations are supported this type of volumes. These kinds of volumes are also not used much anymore.
Striped Replicated Volume
Data is distributed evenly among replicated bricks in the cluster by using striped replicated volumes. The striped replicated volumes is recommended for the best possible output for application with parallel access to big files and require performance. Only data that need map-reduce operations are supported for this volume type in this release. This is type of also removed from newer version of GlusterFS.
Conclusion
GlusterFS is flexible and scalable because it works with different types of volume to meet different storage needs. In this article, we covered the configuration of distributed, replicated, striped, and distributed replicated volumes. By following the provided code snippets, you can easily set up the desired volume type in GlusterFS. Experiment with these volume types to build a robust and efficient distributed file system that meets your storage needs.
0 thoughts on “GlusterFS Volume Types: A Comprehensive Guide”