The Definitive Guide: Ceph Cluster on Raspberry Pi

In Development, Linux Tutorials, Technology

3 Node Ceph Cluster on Raspberry Pi

A Ceph cluster on Raspberry Pi is an awesome way to create a RADOS home storage solution (NAS) that is highly redundant and low power usage. It’s also a low cost way to get into Ceph, which may or may not be the future of storage (software defined storage definitely is as a whole). Ceph on ARM is an interesting idea in and of itself. I built one of these as a development environment (playground) for home.  It can be done on a relatively small budget. Since this was a spur of the moment idea, I purchased everything locally. I opted for the Raspberry Pi 2 B (for the 4 cores and 1GB of RAM). I’d really recommend going with the Pi 2 B, so you have one core and 256MB RAM for each USB port (potential OSD). In this guide I will outline the parts, software I used and some options that you can use for achieving better performance. This guide assumes you have access to a Linux PC with an SD card reader. It also assumes you have a working knowledge of Linux in general and a passing familiarity with Ceph.

Parts

Although I will explain many options in this guide, this is the minimum you will need to get a cluster up and running, this list assumes 3 Pi nodes.

I used 3 x 64GB flash drives, 3 x 32GB MicroSD and existing ports on my router. My cost came in at about $250. You can add to this list based on what you add to your setup throughout the guide, but this is pretty much the minimum for a fully functional Ceph cluster.

Operating System

Raspbian. The testing repository for Raspbian has the many packages of Ceph 0.80.9 and dependencies pre-compiled. Everything you’ll need for this tutorial and is the “de facto” OS of choice for flexibility on Raspberry Pi. You can download the Raspbian image here: Raspbian Download. Once you have the image, you’ll want to put it on an SD card. For this application I recommend using at least a 16GB MicroSD card (Class 10 preferably – OS drive speed matters for Ceph monitor processes). To transfer the image on Linux, you can use DD. run the lsblk command to display your devices once you’ve inserted the card into your card reader. Then you can use dd  to transfer the image to your SD. The command below assumes the image name is raspbian-wheezy.img  and that it lives in your present working directory. The above command also assumes that your SD card is located at /dev/mmcblk0 adjust these accordingly and make sure that your SD card doesn’t contain anything important and is empty.

This command will take a few minutes to complete. Once it does run sync to flush all cache to disk and make sure it is safe to remove the device. You’ll then boot up into Raspbian, re-size the image to the full size of your MicroSD, set a memorable password, overclock if you want.

Once this is done there are a few modifications to make. We’ll get into this in the installation section below. I don’t recommend using too large of a MicroSD as later in this tutorial we will image the whole OS from our first MicroSD for deployment to our other Pi nodes.

Hardware Limitations

The first limitation to consider is overall storage space. Ceph OSD processes require roughly 1MB of RAM per GB of storage. Since we are co-locating monitor processes the effective storage limitation is 512GB per Pi 2 B (4 x 128GB sticks) RAW (before Ceph replication or erasure coding overhead). Network speed is also a factor as discussed later in document. You will hit network speed limitations before you hit the speed limitations of the Pi 2 B’s single USB 2.0 bus (480Mbit).

Network

In this setup I used empty ports on my router. I run a local DNS server on my home router and use static assignments for local DNS. You may want to consider just using a flat 5 or 8 port (depending on number of nodes you plan to have) gigabit switch for the cluster network and WiPi modules for the public (connected to your router via WiFi). The nice thing about using a flat layer 2 switch is that if all the Pi nodes are in the same subnet, you don’t have to worry about a gateway and it also keeps the cost down (compared to using router ports) while reducing the network overhead (for Ceph replication) on your home network. Using a dedicated switch for the cluster network will also increase your cluster performance, especially considering the 100Mbit limitations of the Pi 2 B’s network port. By using a BGN Dongle for Pi  and a dedicated switch for the cluster network, you will get a speedier cluster. This will use one of your 4 USB ports and thus, you will get one less OSD per Pi. Keep in mind, depending on if you use replication or erasure coding private traffic can be 1-X times greater then client IO  (X being 3 in a standard replication profile) if that matters for your application. Of course this is all optional and for additional “clustery goodness”. It really depends on budget, usage – etcetera.

Object Storage Daemons

In this guide, I co-located OSD journals on the OSD drives. For better performance, you can use a faster USB like the SanDisk Extreme 3.0 (keep in mind that you’ll be limited by the 60MB/s speed of USB 2.0). Using a dedicated (faster) journal drive will yield much better performance. But you don’t really need to worry about it unless you are using multiple networks as outlined above. If you are not, 4 decent USB sticks will saturate your 100Mbit NIC per node. There is a lot more to learn about Ceph architecture that I cover in this article and I highly recommend you do so here.

OSD Filesystem

XFS is the default in Ceph Firefly. I prefer BTRFS as an OSD filesystem for multi-fold reasons and I use it in this tutorial.

Installation

Assuming you have setup your network and operating system – have 3 nodes and the hardware you want to use – we can begin. The first thing to do is wire up power and network as you see fit. After that, you’ll want to run through the initial raspi-config on what will become your admin node. Then it’s time to make some changes. Once your admin node is booted and configured, you have to edit /etc/apt/sources.list . Raspbian Wheezy has archaic versions of Ceph in the main repository, but the latest firefly version in the testing repository. Before we delve into this, I find it useful to install some basic tools and requirements. Connect via SSH or directly to terminal and issue this command from the Pi:

From this point forward we will assume you are connecting to your Pi nodes via SSH. You’ve just installed BTRFS-tools, vim (better then vi) and some performance diagnostics tools I like. Now that we have vim  it’s time to edit our sources:

You’ll see the contents of your sources file. Which will look like this:

Modify it to look like this:

We’ve replaced wheezy  with testing .Once this is done, then issue this command:

Once this process has completed is time to start getting the OS ready for Ceph. Everything we do in this section up to the point of imaging the OS is needed for nodes that will run Ceph.

First we will create a ceph user and give it password-less sudo access. To do so issue these commands:

Set the password to a memorable one as it will be used on all of your nodes in this guide. Now we need to give the ceph user sudo access

We’ll be using ceph-deploy later and it’s best to have a defult user to login as all the time. Issue this command:

Then create this file using vi:

I assume 3 nodes in this tutorial and a naming convention of piY, where Y is the node number starting from 1.

Save the file and exit. As far as hostnames, you can use whatever you want of course. As I mentioned, I run local DNS and DHCP with static assignments. If you do not, you’ll need to edit /etc/hosts  so that your nodes can resolve each-other. You can do this after the OS image, as each node will have a different IP.

Now it’s time to install the ceph-deploy tool. Raspbian wget  can be strange with HTTPS so we will ignore the certificate (do so at your own peril):

Now that we’ve added the Ceph repository, we can install ceph-deploy:

Since we are installing ceph from the Raspbian repositories, we need to change the default behavior of ceph-deploy:

Change

To

This will prevent ceph-deploy from altering repos as the Ceph armhf (Rasberry Pi’s processor type) repos are mostly empty.

Finally, we should revert the contents of /etc/apt/sources.list :

You’ll see the contents of your sources file. Which will look like this:

Modify it to look like this:

 

We’ve replaced testing  with wheezy .Once this is done, then issue this command:

 

Kernel Tweaks

We are also going to tweak some kernel parameters for better stability. To do so we will edit /etc/sysctl.conf .

At the bottom of the file, change add the following lines:

Imaging the OS

Now we have a good baseline for deploying ceph to our other Pi nodes. It’s time to stop our admin node and image the drive (MicroSD). Issue:

Then unplug power to your Pi node and remove the MicroSD. Insert the microSD in your SD adapter, then the SD adapter into your Linux PC. You’ll need at least as much free drive space on your PC as the size of the MicroSD card.Where /dev/mmcblk0 is your SD card and pi-ceph.img is your image destination, run:

This can take a vary long time depending on the size of your SD and you can compress it with gzip  or xz  for long term storage (empty space compresses really well it turns out). Once the command returns, run sync  to flush the cache to disk and make sure you can remove the MicroSD

Imaging Your Nodes OS Drives

Now that you have a good baseline image on your PC, you are ready to crank out “Ceph-Pi” nodes – without redoing all of the above. To do so, insert a fresh MicroSD into your adapter and then PC. Then assuming ceph-pi.img  is your OS image and /dev/mmcblk0 is your MicroSD card run:

Repeat this for a many nodes as you intend to deploy.

Create a Ceph Cluster on Raspberry Pi

Insert your ceph-pi MicroSD cards into your Pi nodes and power them all on. You’ve made it this far, now it’s time to get “cephy”. Deploying with ceph-deploy is a breeze. First we need to SSH to our admin node, make sure you have setup IPs, network and /etc/hosts on all Pi nodes if you are not using local DNS and DHCP with static assignments.

We need to generate and distribute an SSH key for password-less authentication between nodes. To do so run (leave the password blank):

Now copy the key to all nodes (assuming 3 with the naming convention from above):

You will be prompted for the password you created for the ceph user each time to establish initial authentication.

Once that is done and you are connected to your admin node (1st node in the cluster) as the pi user you’ll want to create an admin node directory:

Creating an initial Ceph Configuration

We are going to create an initial Ceph configuration assuming all 3 pi nodes as monitors. If you have more, keep in mind – you always want an odd number of monitors to avoid a split-brain scenario. To to this run:

Now there are some special tweaks that should be made for best stability and performance within the hardware limitations of the Raspberry Pi 2 B. To apply these changes we’ll need to edit the ceph.conf here on the admin node before it is distributed. To do so:

After the existing lines add:

 

Creating Initial Monitors

Now we can deploy our spiffy ceph.conf, create our initial monitor daemons, deploy our authentication keyring and chmod it as needed. We will be deploying to all 3 nodes for the purposes of this guide:

Creating OSDs (Object Storage Daemons)

Ready to create some storage? I know I am. Insert your USB keys of choice into your Pi USB ports. For the purposes of this guide I will be deploying 1 OSD (USB key) per Pi node. I will also be using the BTRFS filesystem and co-locating the journals on the OSDs with a default journal size of 1GB (assuming 2 * 40MB/s throughput max and a default filestor max sync interval of 5). This value is hard coded into our ceph-pi config above. The formula is:

So let’s deploy our OSDs. Once our USBs are plugged in, use lsblk to display the device locations. To make sure our drives are clean and have a GPT partition table, use the gdisk  command for each OSD on each node. Assuming /dev/sda  as our OSD:

gdisk /dev/sda

Create a new partition table, write it to disk and exit. Do this for each OSD on each node. You can craft a bash for  loop if you are feeling “bashy” or programmatic.

Once all OSD drives have a fresh partition table you can use ceph-deploy to create your OSDs (using BTRFS for this guide) where pi1 is our present node and /dev/sda is the OSD we are creating:

Repeat this for all OSD drives on all nodes (or write a for loop). Once you’ve created at least 3 you are ready to move on.

Checking Cluster Health

Congratulations! You should have a working Ceph-Pi cluster. Trust, but verify. Get the health status of your cluster using this command:

and for a less verbose output

What to do now?

Use your storage cluster! Create an RBD, mount it – export NFS or CIFS. There is a lot of reading out there. Now you know how to deploy a Ceph cluster on Raspberry Pi.

References

http://millibit.blogspot.com/2014/12/ceph-pi-installing-ceph-on-raspberry-pi.html

http://ceph.com/docs/v0.80.5/start/

https://www.raspberrypi.org/

14 Comments

  1. Hi,

    This looks like an interesting use of the Raspberry Pi, but I wonder if this is really that cost-effective of a solution?

    When I crunched the numbers, it came out to about $1 / GB of storage, if you maxed out your nodes with 4 128GB drives and had 3 replicas… but it seems like, once you need to scale above a TB or so of storage, it’s more cost effective to just build “real” servers using spinning drives at a much higher capacity per node?

    • Of course, this is more of a proof-of-concept for learning ceph. Not meant to be cheaper per GB, but cheaper for initial cost. A x86_64 ceph cluster with 10Gbit networking costs 5 figures. This is a 3 figure cost of entry way to begin learning ceph.

  2. Hey, its working now on my 3 raspberrys 2 too with saltstack implementation and automatical installation script :)!
    Thanks for this documentation!
    Overread that you changed the source.list two times and only for the ceph installation on the first try.

    • Yeah, I automated install as well. However I am a fan of making people perform the commands so that they learn rather then:

      wget bash.sh
      chmod 755 bash.sh
      sudo ./bash.sh

      Teaches bad form (and security)!

      Thanks for going through the tutorial. Is there a link to your implementation for others to use?

  3. Hi Bryan,

    I’m getting stuck at the apt-get install ceph-deploy with the following error:
    Reading state information… Done
    E: Unable to locate package ceph-deploy

    Any thoughts on why this may be? Using Wheezy, also tried Jessie same result.

    ceph and ceph-common have been installed.

    Thanks,
    Niels

  4. just a note: doesn’t work for Debian Jessie. I either have to backport to Wheezy (not optimal) or go through a ton of various hacking and such without using ceph-deploy.

    just a heads up. 😉

  5. Hi Bryan,

    Great article on Ceph installation. I have one problem, at the step of installing ceph-deploy. It is not found in the package, I have tried different revisions of ceph and the package is just not found. The ceph and ceph-common packages are installed fine. What might I be doing wrong?

    Thanks for this intro to a cost effective ceph cluster 🙂

    Cheers,

    Niels

  6. Hi Bryan,
    Very usefull article Thanks for posting, I wanted to implement a storage server within area, where client side is windows OS, so is it possible to implement this project.

  7. Thanks a lot for this nice tutorial. Quick question: my deployment fails when I do:
    ceph-deploy mon create-initial

    It connects to the remote host, runs a bunch of stuff, then comes up with this error:
    Failed to execute command: sudo systemctl enable ceph.target

    I’m stuck; don’t know what to do next. If I run that command manually, I get the same message.
    Failed to execute operation: No such file or directory

  8. HI Bryan,

    Do you have any experience with the Ubuntu Mate on arm processor? I have the new Odroid which is much better (hardware-wise) than Rpi and I have trouble getting stuff to work. It installs CEPH just fine from repositories, but then.. I’m stuck 🙂
    Any advice?

Submit a comment