Tuesday, 26 April 2016

Using LIO with Gluster

In the past, gluster users of have been able to open up their gluster volumes to iSCSI using the tgt daemon. This has been covered in the past on other blogs and also documented on gluster.org.

But, tgt has been superseded in more recent distro's by LIO. LIO provides a number of different local storage options to be utilised as SCSI targets, including; FILEIO, BLOCK, PSCSI and RAMDISK. These SCSI targets are implemented as modules in kernel space, but what isn't immediately obvious is that LIO also provides a userspace framework called TCMU. TCMU enables userspace files to become iSCSI targets. 

With LIO, the easiest way to exploit gluster as an iSCSI target was through the FILEIO 'storage engine' over FUSE. However, the high number of context switches incurred within FUSE is likely to reduce the performance potential to your 'client' -  especially for random I/O access patterns.

Until now, FUSE was your only option. But Andy Grover at Red Hat has just changed things. Andy has developed tcmu-runner which utilises the TCMU framework, allowing a glusterfs target to be used over gluster's libgfapi interface. Typically, with libgfapi you can expect less context switching, and improved performance.

For those like me, with short attention spans, here's what the improvement looked like when I compared LIO/FUSE with LIO/gfapi using a couple of fio  based workloads.

Read Improvement
Mixed Workload Improvement

In both charts, IOPS and latency significantly improves using LIO/GFAPI, and further still by adopting the arbiter volume.

As you can see, for a young project, these results are really encouraging. The bad news is that to try tcmu-runner you'll need to either build systems based on Fedora F24/rawhide or compile it yourself from the github repo. Let's face it, there's always a price to pay for new shiny stuff :)

For the remainder of this article, I'll walk through the configuration of LIO and the iSCSI client that I used during my comparisons.

Preparing Your Environment

In the interests of brevity, I'm assuming that you know how to build servers,  create a gluster trusted pool and define volumes. Here's a checklist of the tasks you should do in order to prepare a test environment;
  1. build 3 Fedora24 nodes and install gluster (3.7.11) on each peer/node
  2. on each node, ensure /etc/gluster/glusterd.vol contains the following setting - option rpc-auth-allow-insecure on. This is needed for gfapi access. Once added, you'll need to restart glusterd.
  3. install targetcli (targetcli-2.1.fb43-1) and tcmu-runner (tcmu-runner-1.0.4-1) on each of your gluster nodes
  4. form a gluster trusted pool, and create a replica 3 volume or replica with arbiter volume (or both!) 
  5. issue "gluster vol set <vol_name> server.allow-insecure on" to enable libgfapi access to the volume
There are several ways to configure the iSCSI environment, but for my tests I adopted the following approach;
  • two of my three gluster nodes will be iSCSI gateways (LIO targets)
  • each gateway will have it's own iqn (iSCSI Qualified Name)
  • each gateway will only access the gluster volume from itself, so if gluster is down on this node so is the path for any attached client (makes things simple)
  • high availability for the LUN is provided by client side multipathing
Before moving on, you can confirm that targetcli/tcmu-runner are providing the gluster integration by simply running 'ls' from the targetcli.

# targetcli ls
o- / ...............
  o- backstores ....
  | o- block .......
  | o- fileio ......
  | o- pscsi .......
  | o- ramdisk .....
  | o- user:glfs ...    <--- gluster gfapi available through tcmu
  | o- user:qcow ...
  o- iscsi .........
  o- loopback ......
  o- vhost ......

With the preparation complete, let's configure the LIO gateways.

Configuring LIO - Node 1

The following steps provide an example configuration You'll need to make changes to naming etc specific to your test environment.

  1. Mount the volume (called iscsi-pool), and allocate the file that will become the LUN image
  2. # fallocate -l 100G mytest.img
  1. Enter the targetcli shell. The remaining steps all take place within this shell.
  1. Create the backing store connection to the glusterfs file
  2. /backstores/user:glfs create myLUN 100G iscsi-pool@iscsi-3/mytest.img
  1. Create the node's target portal (this is the name the client will connect to). In this example 'iscsi-3' is the node name
  2. /iscsi/ create iqn.2016-04.org.gluster:iscsi-3
    NB. this will create the target IQN and the iscsi portal will be enabled and listening on port 3260
  1. On the client, 'grab' it's iqn from /etc/iscsi/initiatorname.iscsi, then add it to the gateway
  2. /iscsi/iqn.2016-04.org.gluster:iscsi-3/tpg1/acls/ create iqn.1994-05.com.redhat:14a2b41fe9e4
  1. Add the LUN, "myLUN", to the target and automatically map it to the client(s) 
  2. /iscsi/iqn.2016-04.org.gluster:iscsi-3/tpg1/luns create /backstores/user:glfs/myLUN 0
  1. Issue saveconfig to commit the configuration (config is stored in /etc/target/saveconfig.json)

Configuring LIO - Node 2 

When a LUN is defined by targetcli, a wwn is automatically generated for it. This is neat, but to ensure multipathing works we need the LUN exported by the gateways to share the same wwn - if they don't match, the client will see two devices, not two paths to the same device.

So for subsequent nodes, the steps are slightly different.
  1. On the first node, look at /etc/target/saveconfig.json. You'll see a storage object item for the gluster file you've just created, together with the wwn that was assigned (highlighted).
  2.   "storage_objects": [
        {
          "config": "glfs/iscsi-pool@iscsi-3/mytest.img",
          "name": "myLUN",
          "plugin": "user",
          "size": 107374182400,
          "wwn": "653e4072-8aad-4e9d-900e-4059f0e19e7e"
        }
  1. Open the targetcli shell on node 2, and define a LUN pointing to the same backing file as node 1, but this time explicitly specifying the wwn (from step 1)
  2. /backstores/user:glfs create myLUN 100G iscsi-pool@iscsi-1/mytest.img 653e4072-8aad-4e9d-900e-4059f0e19e7e
    (if you cd to /backstores/user:glfs and use help create you'll see a summary of the options available when creating the LUN)
  1. With the LUN in place, you can follow steps 4-7 above to create the iqn, portal and LUN masking for this node.

At this point you have;
  • 3 gluster nodes
  • a gluster volume with a file defined, serving as an iscsi target
  • 2 gluster nodes defined as iscsi gateways
  • each gateway exports the same LUN to a client (supporting multipathing)

Next up...configuring the client.

Client Configuration

To get the client to connect to your 'exported' LUN(s), you first need to ensure that the following rpms are installed on the client; device-mapper-multipath, iscsi-initiator-utils and preferably sg3_utils. With these packages in place you can move on to configure multipathing and connect to you LUN(s).
  • Multipathing : the example below shows a devices section from /etc/multipath.conf that I used to ensure my exported LUNs are seen as multipath devices. With this in place, you can take a node down for maintenance and your LUN remains accessible (as long as your volume has quorum!)
#
# LIO iSCSI
devices {
    device {
        vendor "LIO-ORG"
        path_grouping_policy "multibus"
# I tested with a path_selector of "round-robin" and "queue-length"
        path_selector "queue-length 0"
        path_checker "directio"
        prio "const"
        rr_weight "uniform"
    }
}

  • iscsi discovery/login : to login to the gluster iscsi gateway's just use the iscsiadm command (from iscsi-initiator-utils rpm)

# iscsiadm -m discovery -t st -p <your_gluster_node_1> -l
# iscsiadm -m discovery -t st -p <your_gluster_node_2> -l

# #check your paths are working as expected with multipath command
# multipath -ll
mpathd (36001405891b9858f4b0440285cacbcca) dm-2 LIO-ORG ,TCMU device   
size=8.0G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
  |- 33:0:0:1 sdc 8:32 active ready running
  `- 34:0:0:1 sde 8:64 active ready running
mpathb (3600140596a3a65692104740a88516aba) dm-3 LIO-ORG ,TCMU device   
size=8.0G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
  |- 33:0:0:0 sdb 8:16 active ready running
  `- 34:0:0:0 sdd 8:48 active ready running
mpathf (36001405653e40728aad4e9d900e4059f) dm-6 LIO-ORG ,TCMU device   
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
  |- 35:0:0:0 sdf 8:80 active ready running
  `- 33:0:0:2 sdg 8:96 active ready running

You can see in this example, I have three LUN's exported, and each one has two active paths (one to each gluster node). By default, the iscsi node definition in (/var/lib/iscsi/nodes) uses a setting of node.startup=automatic, which means LUN(s) will automagically reappear on the client following a reboot.

But from the client's perspective, how do you know which LUN is from which glusterfs volume/file? For this, sg_inq is your friend...

# sg_inq -i /dev/dm-6
VPD INQUIRY: Device Identification page
  Designation descriptor number 1, descriptor length: 49
    designator_type: T10 vendor identification,  code_set: ASCII
    associated with the addressed logical unit
      vendor id: LIO-ORG
      vendor specific: 653e4072-8aad-4e9d-900e-4059f0e19e7e
  Designation descriptor number 2, descriptor length: 20
    designator_type: NAA,  code_set: Binary
    associated with the addressed logical unit
      NAA 6, IEEE Company_id: 0x1405
      Vendor Specific Identifier: 0x653e40728
      Vendor Specific Identifier Extension: 0xaad4e9d900e4059f
      [0x6001405653e40728aad4e9d900e4059f]
  Designation descriptor number 3, descriptor length: 39
    designator_type: vendor specific [0x0],  code_set: ASCII
    associated with the addressed logical unit
      vendor specific: glfs/iscsi-pool@iscsi-3/mytest.img

The highlighted text shows the configuration string you specified when you created the LUN in targetcli. If you run the same command against the devices themselves (/dev/sdf or /dev/sdg) you'd see the connection string from each of respective gateways. Nice and easy!


And Finally...

Remember, this is all shiny and new - so if you try it, expect some rough edges! However, I have to say that it looks promising, and during my tests I didn't lose any data...but YMMV :)

Happy testing!