gluster – Hexadix

How to setup a network shared folder using GlusterFS on Ubuntu Servers with backup server array and auto mount

By Tan Nguyen | May 10, 2017 | Comments 0 Comment

Introduction

GlusterFS, NFS and Samba are the most popular 3 ways to setup a network shared folder on Linux.

In this tutorial, we will learn how to setup a network shared folder using GlusterFS on Ubuntu Servers.

For demonstration purposes, we will setup the shared folder on two server machines, server1.example.com and server2.example.com, and mount that shared folder on one client machine, client1.example.com.

Let’s say the servers’ IP’s are 10.128.0.1 and 10.128.0.2, and the client’s IP is 10.128.0.3.

Prepare the enviroment

Configure DNS resolution

In order for our different components to be able to communicate with each other easily, it is best to set up some kind of hostname resolution between each computer.

The easiest way to do this is editing the hosts file on each computer.

Open this file with root privileges on your first computer:

$ sudo nano /etc/hosts

You should see something that looks like this:

127.0.0.1       localhost client1

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

Below the local host definition, you should add each VPS’s IP address followed by the long and short names you wish to use to reference it.

It should look something like this when you are finished:

server1/server2/client1

127.0.0.1       localhost hostname

# add our machines here
10.128.0.1 server1.example.com server1
10.128.0.2 server2.example.com server2
10.128.0.3 client1.example.com client1
# end add our machines

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

The server1.example.com and server1 portions of the lines can be changed to whatever name you would like to use to access each computer. We will be using these settings for this guide.

When you are finished, copy the lines you added and add them to the /etc/hosts files on your other computer. Each /etc/hosts file should contain the lines that link your IPs to the names you’ve selected.

Save and close each file when you are finished.

Set Up Software Sources

Although Ubuntu 12.04 contains GlusterFS packages, they are fairly out-of-date, so we will be using the latest stable version as of the time of this writing (version 3.4) from the GlusterFS project.

We will be setting up the software sources on all of the computers that will function as nodes within our cluster, as well as on the client computer.

We will actually be adding a PPA (personal package archive) that the project recommends for Ubuntu users. This will allow us to manage our packages with the same tools as other system software.

First, we need to install the python-software-properties package, which will allow us to manage PPAs easily with apt:

server1/server2/client1

$ sudo apt-get update
$ sudo apt-get install python-software-properties

Once the PPA tools are installed, we can add the PPA for the GlusterFS packages by typing:

server1/server2/client1

$ sudo add-apt-repository ppa:semiosis/ubuntu-glusterfs-3.4

With the PPA added, we need to refresh our local package database so that our system knows about the new packages available from the PPA:

server1/server2/client1

$ sudo apt-get update

Repeat these steps on all of the VPS instances that you are using for this guide.

Install Server Components

On our cluster member machines (server1 and server2), we can install the GlusterFS server package by typing:

server1/server2

$ sudo apt-get install glusterfs-server

Once this is installed on both nodes, we can begin to set up our storage volume.

On one of the hosts, we need to peer with the second host. It doesn’t matter which server you use, but we will be preforming these commands from our server1 server for simplicity:

server1

$ sudo gluster peer probe server2.example.com

Console should output:

peer probe: success

This means that the peering was successful. We can check that the nodes are communicating at any time by typing:

server1

$ sudo gluster peer status

Console should output:

Number of Peers: 1

Hostname: server2.example.com
Port: 24007
Uuid: 7bcba506-3a7a-4c5e-94fa-1aaf83f5729b
State: Peer in Cluster (Connected)

At this point, our two servers are communicating and they can set up storage volumes together.

Create a Storage Volume

Now that we have our pool of servers available, we can make our first volume.

This step needs to be run only on either one of the two servers. In this guide we will be running from server1.

Because we are interested in redundancy, we will set up a volume that has replica functionality. This will allow us to keep multiple copies of our data, saving us from a single point-of-failure.

Since we want one copy of data on each of our servers, we will set the replica option to “2”, which is the number of servers we have. The general syntax we will be using to create the volume is this:

$ sudo gluster volume create volume_name replica num_of_servers transport tcp domain1.com:/path/to/data/directory domain2.com:/path/to/data/directory force

The exact command we will run is this:

server1

$ sudo gluster volume create volume1 replica 2 transport tcp server1.example.com:/gluster-storage server2.example.com:/gluster-storage force

The console would output like this:

volume create: volume1: success: please start the volume to access data

This will create a volume called volume1. It will store the data from this volume in directories on each host at /gluster-storage. If this directory does not exist, it will be created.

At this point, our volume is created, but inactive. We can start the volume and make it available for use by typing:

server1

$ sudo gluster volume start volume1

Console should output:

volume start: volume1: success

Our volume should be online currently.

Install and Configure the Client Components

Now that we have our volume configured, it is available for use by our client machine.

Before we begin though, we need to actually install the relevant packages from the PPA we set up earlier.

On your client machine (client1 in this example), type:

client1

$ sudo apt-get install glusterfs-client

This will install the client application, and also install the necessary fuse filesystem tools necessary to provide filesystem functionality outside of the kernel.

We are going to mount our remote storage volume on our client computer. In order to do that, we need to create a mount point. Traditionally, this is in the /mnt directory, but anywhere convenient can be used.

We will create a directory at /storage-pool:

client1

$ sudo mkdir /storage-pool

With that step out of the way, we can mount the remote volume. To do this, we just need to use the following syntax:

$ sudo mount -t glusterfs domain1.com:volume_name path_to_mount_point

Notice that we are using the volume name in the mount command. GlusterFS abstracts the actual storage directories on each host. We are not looking to mount the /gluster-storage directory, but the volume1 volume.

Also notice that we only have to specify one member of the storage cluster.

The actual command that we are going to run is this:

client1

$ sudo mount -t glusterfs server1.example.com:/volume1 /storage-pool

This should mount our volume. If we use the df command, you will see that we have our GlusterFS mounted at the correct location.

Testing the Redundancy Features

Now that we have set up our client to use our pool of storage, let’s test the functionality.

On our client machine (client1), we can type this to add some files into our storage-pool directory:

client1

$ cd /storage-pool
$ sudo touch file{1..20}

This will create 20 files in our storage pool.

If we look at our /gluster-storage directories on each storage host, we will see that all of these files are present on each system:

server1/server2

# on server1.example.com and server2.example.com
$ cd /gluster-storage
$ ls

Console should output:

file1  file10  file11  file12  file13  file14  file15  file16  file17  file18  file19  file2  file20  file3  file4  file5  file6  file7  file8  file9

As you can see, this has written the data from our client to both of our nodes.

If there is ever a point where one of the nodes in your storage cluster is down and changes are made to the filesystem. Doing a read operation on the client mount point after the node comes back online should alert it to get any missing files:

client1

$ ls /storage-pool

Set Up Backup Server(s) for the Client

Normally, once client1 has connected to server1, server1 will send all the nodes’ information to client1, so that client1 can connect to any node in the pool to get the data afterwards.

However, in our set up so far, if server1 is not available before client1 first connects to server1, client1 will not know about server2 and therefore can’t connect to our gluster volume.

To enable client1 to connect to server2 when server1 is not available, we can use the option backupvolfile-server as following

client1

$ sudo mount -t glusterfs server1.example.com:/volume1 /storage-pool -o backupvolfile-server=server2.example.com

If our gluster pool has more then one backup server, we can list all the server using the backupvolfile-servers as following (notice the plural s at the end of the param)

client1

$ sudo mount -t glusterfs server1.example.com:/volume1 /storage-pool -o backupvolfile-servers=server2.example.com:server3.example.com:server4.example.com

Set Up Auto Mounting on the Client

In theory adding the following line to the client’s fstab file should make the client mount the GlusterFS share at boot:

client1

server1.example.com:/volume1 /storage-pool glusterfs defaults,_netdev 0 0

Normally this should work since the _netdev param should force the filesystem to wait for a network connection.

If this didn’t work for you because the GlusterFS client wasn’t running when the fstab file was processed, try opening root’s crontab file and add a command to mount the share at reboot. This command opens the crontab file:

client1

$ sudo crontab -u root -e

Add this line, and press control-o and return to save changes, and control-x to quit from nano:

client1

@reboot sleep 10;mount -t glusterfs server1.example.com:/volume1 /storage-pool -o backupvolfile-server=server2.example.com

This will execute two commands when the server boots up: the first is just a 10 second delay to allow the GlusterFS daemon to boot, and the second command mounts the volume.

You may need to make your client wait longer before running mount. If your client doesn’t mount the volume when it boots, try using ‘sleep 15’ instead. This isn’t an ideal way to fix this problem, but it’s ok for most uses.

Another appropriate way to setup auto mounting is that instead of mounting the GlusterFS share manually on the client, you add the mount command to /etc/rc.local file. We will not add it to /etc/fstab as rc.local is always executed after the network is up which is required for a network file system.

Open /etc/rc.local

client1

$ nano /etc/rc.local

Append the following line:

client1

[...]
/usr/sbin/mount.glusterfs server1.example.com:/volume1 /storage-pool

To test if your modified /etc/rc.local is working, reboot the client:

$ reboot

After the reboot, you should find the share in the outputs of…

$ df -h

… and…

$ mount

Restrict Access to the Volume

Now that we have verified that our storage pool can be mounted and replicate data to both of the machines in the cluster, we should lock down our pool.

Currently, any computer can connect to our storage volume without any restrictions. We can change this by setting an option on our volume.

On one of your storage nodes, type:

server1

$ sudo gluster volume set volume1 auth.allow gluster_client_IP_addr

You will have to substitute the IP address of your cluster client (client1) in this command. Currently, at least with /etc/hosts configuration, domain name restrictions do not work correctly. If you set a restriction this way, it will block all traffic. You must use IP addresses instead.

If you need to remove the restriction at any point, you can type:

server1

$ sudo gluster volume set volume1 auth.allow *

This will allow connections from any machine again. This is insecure, but may be useful for debugging issues.

If you have multiple clients, you can specify their IP addresses at the same time, separated by commas:

server1

$ sudo gluster volume set volume1 auth.allow gluster_client1_ip,gluster_client2_ip

Getting Info with GlusterFS Commands

When you begin changing some of the settings for your GlusterFS storage, you might get confused about what options you have available, which volumes are live, and which nodes are associated with each volume.

There are a number of different commands that are available on your nodes to retrieve this data and interact with your storage pool.

If you want information about each of your volumes, type:

server1/server2

$ sudo gluster volume info

Console output:

Volume Name: volume1
Type: Replicate
Volume ID: 3634df4a-90cd-4ef8-9179-3bfa43cca867
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: server1.example.com:/gluster-storage
Brick2: server2.example.com:/gluster-storage
Options Reconfigured:
auth.allow: 111.111.1.11

Similarly, to get information about the peers that this node is connected to, you can type:

server1/server2

$ sudo gluster peer status

Console output:

Number of Peers: 1

Hostname: server1.example.com
Port: 24007
Uuid: 6f30f38e-b47d-4df1-b106-f33dfd18b265
State: Peer in Cluster (Connected)

If you want detailed information about how each node is performing, you can profile a volume by typing:

server1/server2

$ sudo gluster volume profile volume_name start

When this command is complete, you can obtain the information that was gathered by typing:

server1/server2

$ sudo gluster volume profile volume_name info

Console output:

Brick: server2.example.com:/gluster-storage
--------------------------------------------
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us             20     RELEASE
      0.00       0.00 us       0.00 us       0.00 us              6  RELEASEDIR
     10.80     113.00 us     113.00 us     113.00 us              1    GETXATTR
     28.68     150.00 us     139.00 us     161.00 us              2      STATFS
     60.52     158.25 us     117.00 us     226.00 us              4      LOOKUP
 
    Duration: 8629 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
. . .

You will receive a lot of information about each node with this command.

For a list of all of the GlusterFS associated components running on each of your nodes, you can type:

server1/server2

$ sudo gluster volume status

Console output:

Status of volume: volume1
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick server1.example.com:/gluster-storage             49152   Y       2808
Brick server2.example.com:/gluster-storage             49152   Y       2741
NFS Server on localhost                                 2049    Y       3271
Self-heal Daemon on localhost                           N/A     Y       2758
NFS Server on server1.example.com                      2049    Y       3211
Self-heal Daemon on server1.example.com                N/A     Y       2825

There are no active volume tasks

If you are going to be administering your GlusterFS storage volumes, it may be a good idea to drop into the GlusterFS console. This will allow you to interact with your GlusterFS environment without needing to type sudo gluster before everything:

server1/server2

$ sudo gluster

This will give you a prompt where you can type your commands. This is a good one to get yourself oriented:

> help

When you are finished, exit like this:

> exit

Conclusion

At this point, you should have a redundant storage system that will allow us to write to two separate servers simultaneously. This can be useful for a great number of applications and can ensure that our data is available even when one server goes down.