Monday, 6 January 2014

Distributed Storage - too complicated to try?

The thing about distributed storage is that all the pieces that make the magic happen are.....well, distributed! The distributed nature of the components can represent a significant hurdle for people looking to evaluate whether distributed storage is right for them. Not only do people have to set up multiple servers, but they also have to get to grips with services/daemons, new terms and potentially clustering complexity.

So what can be done?

Well the first thing is to look for a distributed storage architecture that tries to make things simple in the first's too short for unnecessary complexity.

The next question is "Does the platform provide an easy to use and understand" deployment tool?"

Confession time - I'm involved with the gluster community. A while ago I started a project called gluster-deploy which aims to make the first time configuration of a gluster cluster, childs play. I originally blogged about an early release of the tool in October, so perhaps now is a good time to revisit the project and see how easy it is to get started with gluster (completely unbiased view naturally!)

At a high level, all distributed storage platforms consist of a minimum of two layers;
  • cluster layer - binding the servers together, into a single namespace
  • aggregated disk capacity - pooling storage from each of the servers together to present easy to consume capacity to the end user/applications
So the key thing is to deliver usable capacity as quickly and as pain-free as possible - whilst ensuring that the storage platform is configured correctly. Now I could proceed to show you a succession of screenshots of gluster-deploy in action - but to prevent 'death-by-screenshot' syndrome, I'll refrain from that and just pick out the highlights.

I wont cover installing the gluster rpms, but I will point out that if you're using fedora - they are in the standard repository and if you're not using fedora, head on over to the gluster download site

So let's assume that you have several servers available; each one has an unused disk and gluster installed and started. If you grab the gluster-deploy tool from the gluster-deploy link above you'll have a tar.gz archive that you can untar onto one of your test nodes. Login to one of the nodes as 'root' and untar the archive;

>tar xvzf gluster-deploy.tar.gz && cd gluster-deploy

This will untar the archive and place you in the gluster-deploy directory, so before we run it lets take a look at the options the program supports

[root@mynode gluster-deploy]# ./ -h
Usage: [options]

  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -n, --no-password     Skip access key checking (debug only)
  -p PORT, --port=PORT  Port to run UI on (> 1024)
  -f CFGFILE, --config-file=CFGFILE
                        Config file providing server list bypassing subnet

 Ok. So there is some tweaking we can do but for now,  let's just run it.

[root@mynode gluster-deploy]# ./

gluster-deploy starting

    Configuration file
        -> Not supplied, UI will perform subnet selection/scan

    Web server details:
        Access key  - pf20hyK8p28dPgIxEaExiVm2i6
        Web Address -

    Setup Progress

Taking the URL displayed in the CLI, and pasting into a browser, starts the configuration process.

The deployment tool basically walks through a series of pages that gather some information about how we'd like our cluster and storage to look. Once the information is gathered, the tool then does all the leg-work across the cluster nodes to complete the configuration, resulting in a working cluster and a volume ready to receive application data.
At a high level, gluster-deploy performs the following tasks;

- Build the cluster;
  • via a subnet scan - the user chooses which subnet to scan (based on the subnets seen on the server running the tool) 
  • via a config file that supplies the nodes to use in the cluster (-f invocation parameter) 
- Configure passwordless login across the nodes, enabling automation 

- Perform disk discovery. Any unused disk is shown up in the UI

- You then choose which of the discovered disks you want gluster to use

- Once the disks are selected, you define how you want the disks managed
  • lvm (default)
  • lvm with dm-thinp
  • btrfs (not supported yet, but soon!)
  NB When you choose to use snapshot support (lvm with dm-thinp or btrfs), confirmation is required since these are 'future' features, typically there for developers.

- Once the format is complete, you define the volume that you want gluster to present to your application(s). The volume create process includes 'some' intelligence to make life a little easier
  • tuning presets are provided for common gluster workloads like OpenStack cinder and glance, ovirt/rhev, and hadoop
  • distributed volumes and distributed-replicated volumes types are supported
  • for volumes that use replication, the UI prevents disks (bricks) from the same server being assigned to the same replica set
  • UI shows a summary of the capacity expectation for the volume given the brick configuration and replication overheads
Now, let's take a closer look at what you can expect to see during these phases.

The image above shows the results from the subnet scan. Four nodes have been discovered on the selected subnet that have gluster running on them. You then select which nodes you want from the left hand 'box' and click the 'arrow' icon to add them to the cluster nodes. Once you're happy, click 'Create'.

Passwordless login is a feature of ssh, which enables remote login by shared public keys. This capability is used by the tool to enable automation across the nodes.

With the public keys in place, the tool can scan for 'free' disks.

Choosing the disks to use is just a simple checkbox, and if they all look right - just click on the checkbox in the table heading. Understanding which disks to use is phase one, the next step is to confirm how you want to manage these disks (which at a low level defines the characteristics for the Logical Volume Manager)

Clicking on the "Build Bricks" button, initiates a format process across the servers to prepare the disks, building the low-level filesystem and updating the node's filesystem table (fstab). These bricks then become the component parts of the gluster volume that get's mounted by the users or applications.

Volumes can be tuned/optimised for different workloads, so the tool has a number of presets to choose from. Choose a 'Use case' that best fits your workload, and then a volume type (distributed or replicated) that meets your data availability requirements. Now you can see a list of bricks on the left and an empty table on the right. Select which bricks you want in the volume and click the arrow to add them to the table. A Volume Summary is presented at the bottom of the page showing you what will be built (space usable, brick count, fault tolerance). Once you're happy, simply click the "Create" button.

The volume will be created and started making it available to clients straight away. In my test environment the time to configure the cluster and storage < 1minute...

So, if you can use a mouse and a web browser, you can now configure and enjoy the gluster distributed filesystem : no excuses!

For a closer look at the tool's workflow, I've posted a video to youtube.

In a future post, I'll show you how to use foreman to simplify the provisioning of the gluster nodes themselves.