So Storage needs to be 'always on'.
But maintenance still needs to happen, so the storage architecture needs to support this requirement.
I've been playing with Ceph recently, so I thought I'd look at the implications of upgrading my test cluster to the latest Dumpling release.
Here's my test environment
As you can see it's only a little lab, but will hopefully serve to indicate how well the Ceph guys are handling the requirement for 'always-on' storage.
The first thing I did was head on over to the upgrade docs at ceph.com. They're pretty straight forward, with a well defined sequence and some fairly good instructions(albeit with a focus skewed towards Ubuntu!)
For my test I decided to create a rbd volume, mount it from the cluster and then continue to access to the disk while performing the upgrade.
Preparation
[root@ceph2-admin /]# rbd create test.img --size 5120
[root@ceph2-admin /]# rbd info test.img
rbd image 'test.img':
size 5120 MB in 1280 objects
order 22 (4096 KB objects)
block_name_prefix: rb.0.1353.2ae8944a
format: 1
[root@ceph2-admin /]# rbd map test.img
[root@ceph2-admin /]# mkfs.xfs /dev/rbd1
log stripe unit (4194304 bytes) is too large (maximum is 256KiB)
log stripe unit adjusted to 32KiB
meta-data=/dev/rbd1 isize=256 agcount=9, agsize=162816 blks
= sectsz=512 attr=2, projid32bit=0
data = bsize=4096 blocks=1310720, imaxpct=25
= sunit=1024 swidth=1024 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=8 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
[root@ceph2-admin /]# mount /dev/rbd1 /mnt/temp
[root@ceph2-admin /]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 481M 0 481M 0% /dev
tmpfs 498M 0 498M 0% /dev/shm
tmpfs 498M 1.9M 496M 1% /run
tmpfs 498M 0 498M 0% /sys/fs/cgroup
/dev/mapper/fedora-root 11G 3.6G 7.0G 35% /
tmpfs 498M 0 498M 0% /tmp
/dev/vda1 485M 70M 390M 16% /boot
[root@ceph2-admin /]# rbd info test.img
rbd image 'test.img':
size 5120 MB in 1280 objects
order 22 (4096 KB objects)
block_name_prefix: rb.0.1353.2ae8944a
format: 1
[root@ceph2-admin /]# rbd map test.img
[root@ceph2-admin /]# mkfs.xfs /dev/rbd1
log stripe unit (4194304 bytes) is too large (maximum is 256KiB)
log stripe unit adjusted to 32KiB
meta-data=/dev/rbd1 isize=256 agcount=9, agsize=162816 blks
= sectsz=512 attr=2, projid32bit=0
data = bsize=4096 blocks=1310720, imaxpct=25
= sunit=1024 swidth=1024 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=8 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
[root@ceph2-admin /]# mount /dev/rbd1 /mnt/temp
[root@ceph2-admin /]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 481M 0 481M 0% /dev
tmpfs 498M 0 498M 0% /dev/shm
tmpfs 498M 1.9M 496M 1% /run
tmpfs 498M 0 498M 0% /sys/fs/cgroup
/dev/mapper/fedora-root 11G 3.6G 7.0G 35% /
tmpfs 498M 0 498M 0% /tmp
/dev/vda1 485M 70M 390M 16% /boot
Upgrade Process
I didn't want to update the OS on each node and ceph at the same time, so I decided to only bring ceph up to the current level. The dumpling version has a couple of additional dependencies, so I installed those first on each node;
>yum install python-requests and python-flask
Then I ran the rpm update on each node with the following command;
>yum update --disablerepo="*" --enablerepo="ceph"
Once this was done, the upgrade guide simply indicates a service restart at each layer is needed.
Monitors first
e.g.
[root@ceph2-3 ceph]# /etc/init.d/ceph restart mon.3
=== mon.3 ===
=== mon.3 ===
Stopping Ceph mon.3 on ceph2-3...kill 846...done
=== mon.3 ===
Starting Ceph mon.3 on ceph2-3...
Starting ceph-create-keys on ceph2-3...
=== mon.3 ===
=== mon.3 ===
Stopping Ceph mon.3 on ceph2-3...kill 846...done
=== mon.3 ===
Starting Ceph mon.3 on ceph2-3...
Starting ceph-create-keys on ceph2-3...
Once all the monitors were done, you notice that the admin box could no longer talk to the cluster...gasp...but the update from 0.61 to 0.67 has changed the protocol and port used by the monitors. So this is an expected outcome, until the admin client is updated.
Now each of the osd processes needed to be restarted
e.g.
[root@ceph2-4 ~]# /etc/init.d/ceph restart osd.4
=== osd.4 ===
=== osd.4 ===
Stopping Ceph osd.4 on ceph2-4...kill 956...done
=== osd.4 ===
create-or-move updated item name 'osd.4' weight 0.01 at location {host=ceph2-4,root=default} to crush map
Starting Ceph osd.4 on ceph2-4...
starting osd.4 at :/0 osd_data /var/lib/ceph/osd/ceph-4 /var/lib/ceph/osd/ceph-4/journal
[root@ceph2-4 ~]#
=== osd.4 ===
=== osd.4 ===
Stopping Ceph osd.4 on ceph2-4...kill 956...done
=== osd.4 ===
create-or-move updated item name 'osd.4' weight 0.01 at location {host=ceph2-4,root=default} to crush map
Starting Ceph osd.4 on ceph2-4...
starting osd.4 at :/0 osd_data /var/lib/ceph/osd/ceph-4 /var/lib/ceph/osd/ceph-4/journal
[root@ceph2-4 ~]#
Now my client rbd volume obviously connects to the osd's, but even with the osd restart my mounted rbd volume/filesystem carried on regardless.
Kudos to Ceph guys - a non-disruptive upgrade (at least for my small lab!)
I finished off the upgrade process by upgrading the rpm's on the client, and now my lab is running Dumpling.
After any upgrade I'd always recommend a sanity check before you consider it 'job done'. In this case, I thought I'd use a simple performance metric, and compare before and after. In this case, I just ran a 'dd' to the rbd device, before and after. The chart below shows the results;
As you can see the profile is very similar, with only minimal disparity between the releases.
It's good to see that open source storage is also delivering to the 'always on' principle. The next release of gluster (v3.5) is scheduled for December this year, so I'll run through the same scenario with gluster then.
No comments:
Post a Comment