πΈπ π β
RADOS Objects in RBD
RADOS block devices (RBD) offer block device-like access on top of RADOS. A common use case is virtual machine images, for which the IO path would look like the following:
- Application writes on guest filesystem
- Filesystem accesses a block device on a volume manager like LVM
- Volume manager uses a virtual block device provided by the hypervisor
- Hypervisor implements the virtual block device with librbd
- Librbd converts block device accesses into RADOS object accesses
- The RADOS client communicates with OSDs to store the objects
- The OSD uses its object store backend to persist objects
- The FileStore (default object store) stores objects on a filesystem and journal
- Filesystem and journal are on disk/SSD
RBD is available as Linux kernel modules, FUSE and in form of librbd, which many clients such as QEMU and OpenStack Cinder use.
RBD clients usually use a single pool to store multiple images. Each image is consistently named, with blocks striped across multiple objects. Along with the block data objects there are also well-known metadata objects, used for synchronization, settings, and as a directory for images.
Metadata objects
rbd_directory
List of RBD images in this pool. Implemented as an object map containing key/value pairs for id to name mappings and vice versa.
rbd_children
Used when cloning. An attached object map maps a parent image to a list of child images.
rbd_lock
Used for RBD locking operations.
rbd_pool_settings
If present contains pool specific settings. For example concerning RBD mirroring.
Images
Images come in two flavors: Old style and new style. The data and metadata object names, and the supported features differ. Here are the per-image new-style object names:
rbd_id.<NAME>
Contains the ID of the image. The name is a human readable string set when the image is created.
rbd_header.<ID>
The image header in form of an object map. Contains settings, such as, the enabled features, the prefix used for data objects, the images size, and the layout settings for striping the objects.
rbd_object_map
Optional, and only used when the object-map feature is enabled. It tracks allocation and speeds up cloning.
rbd_data.<ID>.<STRIPE>
The data objects
Data Objects
Data objects have the following format: rbd_data.ID.STRIPE
.
The ID
is hex encoded. rbd_directory
stores the mapping from ID
to name. STRIPE
is a zero padded hex encoded number.
By default RBD fills 4 MB objects sequentially. Changing the default requires RBD version 2 with the stripingv2 feature enabled. Images are sparse; objects only exists when non-zero blocks are in their range.
Tools
rbd
- Command line tool, that exports most librbd operationsrbd-fuse
- FUSE filesystem. Contains images as files.
Clients
OpenStack Cinder
OpenStack Cinder has two RADOS clients: A volume driver and a backup driver. Both use RBDs, albeit with different defaults.
Volume
Backup
Uses the volume driver, but enables additional RBD features:
RBD_FEATURE_LAYERING
and RBD_FEATURE_STRIPINGV2
to set the
following default striper layout3:
rbd_stripe_unit
- 0rbd_stripe_count
- 0
QEMU RBD
QEMU has a built-in RBD support.4
It supports the following options:
BLOCK_OPT_CLUSTER_SIZE
- RBD object size. Defaults to RADOS default (4 MB)BLOCK_OPT_SIZE
- Virtual disk size
The stripe layouts are not customizable.
Footnotes
Source: Version for this article, Master Branch
Source: Version for this article, Master branch