How to setup a network shared folder using GlusterFS on Ubuntu Servers with backup server array and auto mount
Introduction
GlusterFS, NFS and Samba are the most popular 3 ways to setup a network shared folder on Linux.
In this tutorial, we will learn how to setup a network shared folder using GlusterFS on Ubuntu Servers.
For demonstration purposes, we will setup the shared folder on two server machines, server1.example.com
and server2.example.com
, and mount that shared folder on one client machine, client1.example.com
.
Let’s say the servers’ IP’s are 10.128.0.1
and 10.128.0.2
, and the client’s IP is 10.128.0.3
.
Prepare the enviroment
Configure DNS resolution
In order for our different components to be able to communicate with each other easily, it is best to set up some kind of hostname resolution between each computer.
The easiest way to do this is editing the hosts file on each computer.
Open this file with root privileges on your first computer:
$ sudo nano /etc/hosts
You should see something that looks like this:
127.0.0.1 localhost client1 # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters
Below the local host definition, you should add each VPS’s IP address followed by the long and short names you wish to use to reference it.
It should look something like this when you are finished:
server1/server2/client1
127.0.0.1 localhost hostname # add our machines here 10.128.0.1 server1.example.com server1 10.128.0.2 server2.example.com server2 10.128.0.3 client1.example.com client1 # end add our machines # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters
The server1.example.com
and server1
portions of the lines can be changed to whatever name you would like to use to access each computer. We will be using these settings for this guide.
When you are finished, copy the lines you added and add them to the /etc/hosts
files on your other computer. Each /etc/hosts
file should contain the lines that link your IPs to the names you’ve selected.
Save and close each file when you are finished.
Set Up Software Sources
Although Ubuntu 12.04 contains GlusterFS packages, they are fairly out-of-date, so we will be using the latest stable version as of the time of this writing (version 3.4) from the GlusterFS project.
We will be setting up the software sources on all of the computers that will function as nodes within our cluster, as well as on the client computer.
We will actually be adding a PPA (personal package archive) that the project recommends for Ubuntu users. This will allow us to manage our packages with the same tools as other system software.
First, we need to install the python-software-properties
package, which will allow us to manage PPAs easily with apt:
server1/server2/client1
$ sudo apt-get update $ sudo apt-get install python-software-properties
Once the PPA tools are installed, we can add the PPA for the GlusterFS packages by typing:
server1/server2/client1
$ sudo add-apt-repository ppa:semiosis/ubuntu-glusterfs-3.4
With the PPA added, we need to refresh our local package database so that our system knows about the new packages available from the PPA:
server1/server2/client1
$ sudo apt-get update
Repeat these steps on all of the VPS instances that you are using for this guide.
Install Server Components
On our cluster member machines (server1 and server2), we can install the GlusterFS server package by typing:
server1/server2
$ sudo apt-get install glusterfs-server
Once this is installed on both nodes, we can begin to set up our storage volume.
On one of the hosts, we need to peer with the second host. It doesn’t matter which server you use, but we will be preforming these commands from our server1 server for simplicity:
server1
$ sudo gluster peer probe server2.example.com
Console should output:
peer probe: success
This means that the peering was successful. We can check that the nodes are communicating at any time by typing:
server1
$ sudo gluster peer status
Console should output:
Number of Peers: 1 Hostname: server2.example.com Port: 24007 Uuid: 7bcba506-3a7a-4c5e-94fa-1aaf83f5729b State: Peer in Cluster (Connected)
At this point, our two servers are communicating and they can set up storage volumes together.
Create a Storage Volume
Now that we have our pool of servers available, we can make our first volume.
This step needs to be run only on either one of the two servers. In this guide we will be running from server1.
Because we are interested in redundancy, we will set up a volume that has replica functionality. This will allow us to keep multiple copies of our data, saving us from a single point-of-failure.
Since we want one copy of data on each of our servers, we will set the replica option to “2”, which is the number of servers we have. The general syntax we will be using to create the volume is this:
$ sudo gluster volume create volume_name replica num_of_servers transport tcp domain1.com:/path/to/data/directory domain2.com:/path/to/data/directory force
The exact command we will run is this:
server1
$ sudo gluster volume create volume1 replica 2 transport tcp server1.example.com:/gluster-storage server2.example.com:/gluster-storage force
The console would output like this:
volume create: volume1: success: please start the volume to access data
This will create a volume called volume1
. It will store the data from this volume in directories on each host at /gluster-storage
. If this directory does not exist, it will be created.
At this point, our volume is created, but inactive. We can start the volume and make it available for use by typing:
server1
$ sudo gluster volume start volume1
Console should output:
volume start: volume1: success
Our volume should be online currently.
Install and Configure the Client Components
Now that we have our volume configured, it is available for use by our client machine.
Before we begin though, we need to actually install the relevant packages from the PPA we set up earlier.
On your client machine (client1 in this example), type:
client1
$ sudo apt-get install glusterfs-client
This will install the client application, and also install the necessary fuse filesystem tools necessary to provide filesystem functionality outside of the kernel.
We are going to mount our remote storage volume on our client computer. In order to do that, we need to create a mount point. Traditionally, this is in the /mnt
directory, but anywhere convenient can be used.
We will create a directory at /storage-pool
:
client1
$ sudo mkdir /storage-pool
With that step out of the way, we can mount the remote volume. To do this, we just need to use the following syntax:
$ sudo mount -t glusterfs domain1.com:volume_name path_to_mount_point
Notice that we are using the volume name in the mount command. GlusterFS abstracts the actual storage directories on each host. We are not looking to mount the /gluster-storage
directory, but the volume1
volume.
Also notice that we only have to specify one member of the storage cluster.
The actual command that we are going to run is this:
client1
$ sudo mount -t glusterfs server1.example.com:/volume1 /storage-pool
This should mount our volume. If we use the df
command, you will see that we have our GlusterFS mounted at the correct location.
Testing the Redundancy Features
Now that we have set up our client to use our pool of storage, let’s test the functionality.
On our client machine (client1), we can type this to add some files into our storage-pool directory:
client1
$ cd /storage-pool $ sudo touch file{1..20}
This will create 20 files in our storage pool.
If we look at our /gluster-storage
directories on each storage host, we will see that all of these files are present on each system:
server1/server2
# on server1.example.com and server2.example.com $ cd /gluster-storage $ ls
Console should output:
file1 file10 file11 file12 file13 file14 file15 file16 file17 file18 file19 file2 file20 file3 file4 file5 file6 file7 file8 file9
As you can see, this has written the data from our client to both of our nodes.
If there is ever a point where one of the nodes in your storage cluster is down and changes are made to the filesystem. Doing a read operation on the client mount point after the node comes back online should alert it to get any missing files:
client1
$ ls /storage-pool
Set Up Backup Server(s) for the Client
Normally, once client1 has connected to server1, server1 will send all the nodes’ information to client1, so that client1 can connect to any node in the pool to get the data afterwards.
However, in our set up so far, if server1 is not available before client1 first connects to server1, client1 will not know about server2 and therefore can’t connect to our gluster volume.
To enable client1 to connect to server2 when server1 is not available, we can use the option backupvolfile-server as following
client1
$ sudo mount -t glusterfs server1.example.com:/volume1 /storage-pool -o backupvolfile-server=server2.example.com
If our gluster pool has more then one backup server, we can list all the server using the backupvolfile-servers as following (notice the plural s at the end of the param)
client1
$ sudo mount -t glusterfs server1.example.com:/volume1 /storage-pool -o backupvolfile-servers=server2.example.com:server3.example.com:server4.example.com
Set Up Auto Mounting on the Client
In theory adding the following line to the client’s fstab file should make the client mount the GlusterFS share at boot:
client1
server1.example.com:/volume1 /storage-pool glusterfs defaults,_netdev 0 0
Normally this should work since the _netdev param should force the filesystem to wait for a network connection.
If this didn’t work for you because the GlusterFS client wasn’t running when the fstab file was processed, try opening root’s crontab file and add a command to mount the share at reboot. This command opens the crontab file:
client1
$ sudo crontab -u root -e
Add this line, and press control-o and return to save changes, and control-x to quit from nano:
client1
@reboot sleep 10;mount -t glusterfs server1.example.com:/volume1 /storage-pool -o backupvolfile-server=server2.example.com
This will execute two commands when the server boots up: the first is just a 10 second delay to allow the GlusterFS daemon to boot, and the second command mounts the volume.
You may need to make your client wait longer before running mount. If your client doesn’t mount the volume when it boots, try using ‘sleep 15’ instead. This isn’t an ideal way to fix this problem, but it’s ok for most uses.
Another appropriate way to setup auto mounting is that instead of mounting the GlusterFS share manually on the client, you add the mount command to /etc/rc.local
file. We will not add it to /etc/fstab
as rc.local
is always executed after the network is up which is required for a network file system.
Open /etc/rc.local
client1
$ nano /etc/rc.local
Append the following line:
client1
[...] /usr/sbin/mount.glusterfs server1.example.com:/volume1 /storage-pool
To test if your modified /etc/rc.local
is working, reboot the client:
$ reboot
After the reboot, you should find the share in the outputs of…
$ df -h
… and…
$ mount
Restrict Access to the Volume
Now that we have verified that our storage pool can be mounted and replicate data to both of the machines in the cluster, we should lock down our pool.
Currently, any computer can connect to our storage volume without any restrictions. We can change this by setting an option on our volume.
On one of your storage nodes, type:
server1
$ sudo gluster volume set volume1 auth.allow gluster_client_IP_addr
You will have to substitute the IP address of your cluster client (client1) in this command. Currently, at least with /etc/hosts
configuration, domain name restrictions do not work correctly. If you set a restriction this way, it will block all traffic. You must use IP addresses instead.
If you need to remove the restriction at any point, you can type:
server1
$ sudo gluster volume set volume1 auth.allow *
This will allow connections from any machine again. This is insecure, but may be useful for debugging issues.
If you have multiple clients, you can specify their IP addresses at the same time, separated by commas:
server1
$ sudo gluster volume set volume1 auth.allow gluster_client1_ip,gluster_client2_ip
Getting Info with GlusterFS Commands
When you begin changing some of the settings for your GlusterFS storage, you might get confused about what options you have available, which volumes are live, and which nodes are associated with each volume.
There are a number of different commands that are available on your nodes to retrieve this data and interact with your storage pool.
If you want information about each of your volumes, type:
server1/server2
$ sudo gluster volume info
Console output:
Volume Name: volume1 Type: Replicate Volume ID: 3634df4a-90cd-4ef8-9179-3bfa43cca867 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: server1.example.com:/gluster-storage Brick2: server2.example.com:/gluster-storage Options Reconfigured: auth.allow: 111.111.1.11
Similarly, to get information about the peers that this node is connected to, you can type:
server1/server2
$ sudo gluster peer status
Console output:
Number of Peers: 1 Hostname: server1.example.com Port: 24007 Uuid: 6f30f38e-b47d-4df1-b106-f33dfd18b265 State: Peer in Cluster (Connected)
If you want detailed information about how each node is performing, you can profile a volume by typing:
server1/server2
$ sudo gluster volume profile volume_name start
When this command is complete, you can obtain the information that was gathered by typing:
server1/server2
$ sudo gluster volume profile volume_name info
Console output:
Brick: server2.example.com:/gluster-storage -------------------------------------------- Cumulative Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 20 RELEASE 0.00 0.00 us 0.00 us 0.00 us 6 RELEASEDIR 10.80 113.00 us 113.00 us 113.00 us 1 GETXATTR 28.68 150.00 us 139.00 us 161.00 us 2 STATFS 60.52 158.25 us 117.00 us 226.00 us 4 LOOKUP Duration: 8629 seconds Data Read: 0 bytes Data Written: 0 bytes . . .
You will receive a lot of information about each node with this command.
For a list of all of the GlusterFS associated components running on each of your nodes, you can type:
server1/server2
$ sudo gluster volume status
Console output:
Status of volume: volume1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick server1.example.com:/gluster-storage 49152 Y 2808 Brick server2.example.com:/gluster-storage 49152 Y 2741 NFS Server on localhost 2049 Y 3271 Self-heal Daemon on localhost N/A Y 2758 NFS Server on server1.example.com 2049 Y 3211 Self-heal Daemon on server1.example.com N/A Y 2825 There are no active volume tasks
If you are going to be administering your GlusterFS storage volumes, it may be a good idea to drop into the GlusterFS console. This will allow you to interact with your GlusterFS environment without needing to type sudo gluster
before everything:
server1/server2
$ sudo gluster
This will give you a prompt where you can type your commands. This is a good one to get yourself oriented:
> help
When you are finished, exit like this:
> exit
Conclusion
At this point, you should have a redundant storage system that will allow us to write to two separate servers simultaneously. This can be useful for a great number of applications and can ensure that our data is available even when one server goes down.