Open Source Storage: April 2018

Thursday, 5 April 2018

Monitoring VDO Volumes

My previous post showed you how to get deduplication working on Linux with VDO. In some ways, that's the post that could cause trouble - if you start using vdo across a number of hosts, how can you easily establish monitoring or even alerting?

So that's the problem we're going to focus on in this post.

Monitoring

There are a heap of different ways to monitor systems, but the rising star currently is Prometheus. Historically, I've used monitoring systems that require clients to push data to a central server but Prometheus turns this around. With Prometheus data collection is initiated by the Prometheus server itself - it's called a 'scrape' job. This approach simplifies client configurations and management, which is a huge bonus for large installations.

To make vdo data available, we need an exporter. The exporter provides a http endpoint that the Prometheus server will scrape metrics from. There are a heap of exporters available to Prometheus covering a plethora of different subsystems, but since vdo is new there isn't something you can just pick up and run with. Well that was the case...

vdo_exporter Project

The scrape job simply issues a GET request to the "/metrics" HTTP API endpoint on a host. Developing an API endpoint for this in python is fairly straight forward, and given the metrics themselves are all nicely grouped together under sysfs, it seemed a bit of a no-brainer to develop an exporter. My exporter can be found here. The project's repo contains the python code, a systemd unit file and what I hope is a sensible README file documenting how to install the exporter (if you have a firewall active, remember to open port 9286!)

I'm leaving the installation of the exporter as an exercise for the reader, and use the rest of this article to show you how to quickly stand up prometheus and grafana to collect and visualise the vdo statistics. For this example, I'm again using Fedora so for other distributions you may have to tweak 'stuff'.

Containers to the Rescue!

The prometheus and grafana projects both provide docker images on docker hub, so assuming you already have docker installed on your machine you can grab the images with the following;

docker pull quay.io/prometheus/prometheus

docker pull docker.io/grafana/grafana:4.6.3

Containers are inherently stateless, but for monitoring and dashboards we need to make sure that these containers use either different docker volumes, or persist data to the host's filesystem. For this exercise, I'll be exposing some directories on the host's filesystem (change these to suit!)

mkdir -p /opt/docker/grafana-prom/{etc,data}

chown 104 /opt/docker/grafana-prom/{etc,data}

chgrp 107 /opt/docker/grafana-prom/{etc,data}

mkdir -p /opt/docker/grafana-prom/prom-{etc,data}

chown 65534 /opt/docker/grafana-prom/prom-{etc,data}

chgrp 65534 /opt/docker/grafana-prom/prom-{etc,data}

To launch the containers and manage them as a unit, I'm using "docker-compose" - so if you don't have that installed, talk nicely to your package manager :)

Assuming you have docker-compose available, you just need a compose file (docker-compose.yml) to bring the containers together;

version: '2'

services:

  grafana:

    image: docker.io/grafana/grafana:4.6.3

    container_name: grafana

    ports:

      - "3000:3000"

    volumes:

      - /opt/docker/grafana-prom/etc:/etc/grafana:Z

      - /opt/docker/grafana-prom/data:/var/lib/grafana:Z

    depends_on:

      - prometheus

  prometheus:

    image: docker.io/prom/prometheus

    container_name: prometheus

    network_mode: "host"

    ports:

      - "9090:9090"

    volumes:

      - /opt/docker/grafana-prom/prom-etc:/etc/prometheus:Z

      - /opt/docker/grafana-prom/prom-data:/prometheus:Z

With the directories in place for the persistent data within the containers, and the compose file ready you just need to start the containers. Run the docker-compose command from the directory that holds your docker-compose.yml file.

[root@myhost grafana_prom]# docker-compose up -d 

Creating network "grafanaprom_default" with the default driver

Creating prometheus ... 

Creating prometheus ... done

Creating grafana ... 

Creating grafana ... done

Configuring Prometheus

You should already have the vdo_exporter service running on your hosts that are using vdo, so the next step is to create a scrape job in prometheus to tell it to go and fetch the data. This is done by editing the prometheus.yml file - in my case this is in /opt/docker/grafana-prom/prom-etc. Under the scrape_configs section add something like this to collect data from your vdo host(s)

# VDO Information

- job_name: "vdo_stats"

  static_configs:

    - targets: [ '192.168.122.98:9286']

Now reload Prometheus to start the data collection

[root@myhost grafana_prom]# docker exec -it prometheus sh

/prometheus $ kill -SIGHUP 1 

Configuring Grafana

To visualize the vdo statistics that Prometheus is collecting, Grafana needs two things; the data source definition pointing to the prometheus container, and a dashboard that presents the data.

Login to your grafana instance (http://localhost:3000), using the default credentials (admin/admin)
Click on the Grafana icon in the top left, and select Data Sources
Click the "Add data source" button
Enter the prometheus details (and ensure you set the data source as the default)

The grafana directory in the vdo_exporter project holds a file called VDO_Information.json. This json file is the dashboard definition, so we need to import it.

Click on the grafana icon again, highlight the Dashboards entry, then select the import option from the pop-up menu.
Click on the Upload.json File, and pick the VDO_Information.json file to upload.

Now select the dashboard icon (to the right of the Grafana logo), and select "VDO Information". You should then see something like this

As you add more hosts that are vdo enabled, just add the host's ip to the prometheus scrape configuration and reload prometheus. Simples..

Grafana provides a notifications feature which enables you to define threshold based alerting. You could define a trigger for low "physical space" conditions, or alert based on recovery being active - I leave that up to you! Grafana supports a number of different notification endpoints including PagerDuty, Sensu and even email! So take some time and review the docs to see how Grafana could best integrate into your environment.

And Remember...

VDO is not the proverbial "silver bullet". The savings from any compression and deduplication technology is dependent on the data you're storing, and vdo is no different. Also, each vdo volume requires additional RAM, so if you want to move vdo out of the test environment into production you'll need to plan for additional CPU and RAM to "make the magic happen"™.

Wednesday, 4 April 2018

Shrinking Your Storage Requirements with VDO

Whether you're using proprietary storage arrays or software defined storage, the actual cost of capacity can sometimes provoke responses like, "why do you you need all that space?" or "OK, but that's all the storage you're going to get, so make it last".

The problem is that storage is a commodity resource, it's like toner or ink in a printer. When you run out, things will stop and lots of people tend to lose their sense of humor. Controlling storage growth has been going on for over 10 years in the proprietary storage space, with one of the most successful companies being NetApp who introduced data deduplication with their ASIS (advanced Single Instance Storage) feature back in 2007. The message was that if you wanted to reduce storage consumption, you basically had to buy the more expensive "stuff" in the first place.

This was the "status quo" until Red Hat acquired Permabit in mid 2017...now compression and deduplication features are heading towards a Linux server near you!

That's the history lesson, now let's look at how you can kick the tyres on open sourced based compression and deduplication. For the remainder of this article, I'll walk through the steps you need to quickly get "dedupe" up and running with Fedora.

Installation

Since we're just testing, create a vm and install Fedora 27. Use libvirt, parallels, virtualbox...whatever takes your fancy - or maybe just use a cloud image in AWS. The choice is yours! Just try to ensure the vm has something like; 2 vcpus, 4GB RAM, an OS disk (20GB) and a data disk for vdo testing.

Once installed you'll need to enable an additional repository to pick up the vdo deduplication modules (kvdo - kernel virtual data optimizer)

dnf copr enable rhawalsh/dm-vdo

dnf install vdo kmod-kvdo

depmod

Configuration

In my test environment, I'm using a 20g vdisk for my vdo testing.

[root@f27-vdo ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 252:0 0 4G 0 disk
└─vda1 252:1 0 4G 0 part /
vdb 252:16 0 20G 0 disk

Now with the kvdo module in place, let's create a vdo volume of 100G using the 20G /dev/vdb device

[root@f27-vdo ~]# vdo create --name=vdo0 --device=/dev/vdb \

--vdoLogicalSize=100g

Creating VDO vdo0

Starting VDO vdo0

Starting compression on VDO vdo0

VDO instance 0 volume is ready at /dev/mapper/vdo0

Not exactly complicated :) Couple of things worth noting though;

by default new volumes are created with compression and deduplication enabled. If you don't like that you can play with the --compression or --deduplication flags.
a vdo volume is actually a device mapper device, in this case /dev/mapper/vdo0. It's this 'dm' device that you'll use from here on in.

Usage

Now you have a vdo volume, next step is to get it deployed and understand how to report on space savings. The first thing is filesystem formatting. Make sure you use the -K switch to avoid issuing discards, remember a vdo volume is in effect a thin provisioned volume.

[root@f27-vdo ~]# mkfs.xfs -K /dev/mapper/vdo0

With the filesystem in place, the next step would normally be updating fstab...right? Well not this time. For vdo volumes, the boot time startup sequence between fstab and the vdo service is a problem - so we need to use a mount service to ensure vdo volumes are mounted correctly.

The vdo rpm provides a sample mount service definition (/usr/share/doc/vdo/examples/systemd/VDO.mount.example). For this example, I'm going to mount the vdo volume at /mnt/vdo0

mkdir /mnt/vdo0
cp /usr/share/doc/vdo/examples/systemd/VDO.mount.example /etc/systemd/system/mnt-vdo0.mount

Then update the mount unit to look like this

[Unit]

Description = Mount filesystem that lives on VDO0

name = mnt-vdo0.mount

Requires = vdo.service systemd-remount-fs.service

After = multi-user.target

Conflicts = umount.target

[Mount]

What = /dev/mapper/vdo0

Where = /mnt/vdo0

Type = xfs

Options = discard

[Install]

WantedBy = multi-user.target

Reminder: mount services are named to reflect their intended mount location within the filesystem.

Now reload systemd, enable the mount and start it

systemctl daemon-reload

systemctl enable mnt-vdo0.mount

systemctl start mnt-vdo0.mount

[root@f27-vdo ~]# df -h /mnt/vdo0

Filesystem         Size  Used Avail Use% Mounted on

/dev/mapper/vdo0   100G  135M  100G    1% /mnt/vdo0

At this point you've used the vdo command to create the volume, but there is also a command to look at the volume's statistics called vdostats. To give us something to look at I copied the same 200MB disk image to the volume 20 times, which will also help to explain vdo overheads.

[root@f27-vdo ~]# df -h /mnt/vdo0
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vdo0 100G 4.5G 96G 5% /mnt/vdo0

[root@f27-vdo ~]# vdostats --hu vdo0

Device               Size   Used   Available   Use% Space saving%

vdo0                20.0G   4.2G       15.8G    21%           95%

Wait a minute...at a logical layer, the filesystem says that it's 4.5G used, but at the physical vdo layer it's saying practically the same thing AND that there's a 95% saving! So which is right? The answer is both :) The vdo subsystem persists metadata on the volume (lookup maps etc), which accounts for a chunk of the physical space used, and the savings value is derived purely from the logical blocks "in" and the physical, unique blocks written. If you need to understand more you can dive into the sysfs filesystem.

Each vdo volume stores and maintains statistics under /sys/kvdo/<vol_name>/statistics (which is where vdostats gets it's information from!)

The most useful stats I've found to understand how space is consumed are;

overhead_blocks_used : metadata for the volume. The overhead is proportional to the physical size of the volume; for example, on an 8TB device, the overhead was around 9GB
data_blocks_used: this is the count of the physical blocks consumed by user data
logical_blocks_used: the count of blocks consumed at the filesystem level

In my case, the "overhead_blocks_used" was 4GB, and the "data_blocks_used" around 200MB. The savings% value is derived from data_blocks_used / logical_blocks_used, since it only applies to actual user data written to the volume, which equates to around 95%. Now it makes sense!

Final Words

Deduplication is a complex beast, but hopefully the above will at least get you up and running with this new Linux feature.

If you decide to use vdo across a number of servers, running vdostats isn't really a viable option. For that it would be more useful to leave the command line behind at look at solutions like prometheus and grafana to track capacity usage and generate alerts. Spoiler alert!...that's the subject of my next post :)

Useful Links

Public vdo mirror on github - https://github.com/dm-vdo/
Mail list - vdo-devel@redhat.com