From 441a5de8fe9873576bb1067f3f61cfdaef6664b8 Mon Sep 17 00:00:00 2001 From: Florian Haas Date: Tue, 26 Nov 2019 18:25:12 +0100 Subject: [PATCH] doc: RBD exclusive locks A discussion on the ceph-users list uncovered a bit of uncertainty about how exclusive locking works: https://www.mail-archive.com/ceph-users@lists.ceph.com/msg56910.html Add a bit of background information about exclusive locks, and cross-reference the documentatation on OpenStack, and on CephX. Signed-off-by: Florian Haas --- doc/rbd/rbd-exclusive-locks.rst | 81 +++++++++++++++++++++++++++++++++ doc/rbd/rbd-openstack.rst | 5 +- doc/rbd/rbd-operations.rst | 1 + 3 files changed, 86 insertions(+), 1 deletion(-) create mode 100644 doc/rbd/rbd-exclusive-locks.rst diff --git a/doc/rbd/rbd-exclusive-locks.rst b/doc/rbd/rbd-exclusive-locks.rst new file mode 100644 index 000000000000..f07651d32eda --- /dev/null +++ b/doc/rbd/rbd-exclusive-locks.rst @@ -0,0 +1,81 @@ +.. _rbd-exclusive-locks: + +==================== + RBD Exclusive Locks +==================== + +.. index:: Ceph Block Device; RBD exclusive locks; exclusive-lock + +Exclusive locks are a mechanism designed to prevent multiple processes +from accessing the same Rados Block Device (RBD) in an uncoordinated +fashion. Exclusive locks are heavily used in virtualization (where +they prevent VMs from clobbering each others' writes), and also in RBD +mirroring (where they are a prerequisite for journaling). + +Exclusive locks are enabled on newly created images by default, unless +overridden via the ``rbd_default_features`` configuration option or +the ``--image-feature`` flag for ``rbd create``. + +In order to ensure proper exclusive locking operations, any client +using an RBD image whose ``exclusive-lock`` feature is enabled should +be using a CephX identity whose capabilities include ``profile rbd``. + +Exclusive locking is mostly transparent to the user. + +#. Whenever any ``librbd`` client process or kernel RBD client + starts using an RBD image on which exclusive locking has been + enabled, it obtains an exclusive lock on the image before the first + write. + +#. Whenever any such client process gracefully terminates, it + automatically relinquishes the lock. + +#. This subsequently enables another process to acquire the lock, and + write to the image. + +Note that it is perfectly possible for two or more concurrently +running processes to merely open the image, and also to read from +it. The client acquires the exclusive lock only when attempting to +write to the image. + + +Blacklisting +============ + +Sometimes, a client process (or, in case of a krbd client, a client +node's kernel thread) that previously held an exclusive lock on an +image does not terminate gracefully, but dies abruptly. This may be +due to having received a ``KILL`` or ``ABRT`` signal, for example, or +a hard reboot or power failure of the client node. In that case, the +exclusive lock is never gracefully released. Thus, when a new process +starts and attempts to use the device, it needs a way to break the +previously held exclusive lock. + +However, a process (or kernel thread) may also hang, or merely lose +network connectivity to the Ceph cluster for some amount of time. In +that case, simply breaking the lock would be potentially catastrophic: +the hung process or connectivity issue may resolve itself, and the old +process may then compete with one that has started in the interim, +accessing RBD data in an uncoordinated and destructive manner. + +Thus, in the event that a lock cannot be acquired in the standard +graceful manner, the overtaking process not only breaks the lock, but +also blacklists the previous lock holder. This is negotiated between +the new client process and the Ceph Mon: upon receiving the blacklist +request, + +* the Mon instructs the relevant OSDs to no longer serve requests from + the old client process; +* once the associated OSD map update is complete, the Mon grants the + lock to the new client; +* once the new client has acquired the lock, it can commence writing + to the image. + +Blacklisting is thus a form of storage-level resource `fencing`_. + +In order for blacklisting to work, the client must have the ``osd +blacklist`` capability. This capability is included in the ``profile +rbd`` capability profile, which should generally be set on all Ceph +:ref:`client identities ` using RBD. + +.. _fencing: https://en.wikipedia.org/wiki/Fencing_(computing) diff --git a/doc/rbd/rbd-openstack.rst b/doc/rbd/rbd-openstack.rst index 3ee2359d0ea3..65e74d3bc3a9 100644 --- a/doc/rbd/rbd-openstack.rst +++ b/doc/rbd/rbd-openstack.rst @@ -55,7 +55,10 @@ Three parts of OpenStack integrate with Ceph's block devices: advantageous because it allows you to perform maintenance operations easily with the live-migration process. Additionally, if your hypervisor dies it is also convenient to trigger ``nova evacuate`` and run the virtual machine - elsewhere almost seamlessly. + elsewhere almost seamlessly. In doing so, + :ref:`exclusive locks ` prevent multiple + compute nodes from concurrently accessing the guest disk. + You can use OpenStack Glance to store images in a Ceph Block Device, and you can use Cinder to boot a VM using a copy-on-write clone of an image. diff --git a/doc/rbd/rbd-operations.rst b/doc/rbd/rbd-operations.rst index d6489667a19a..92694ea7cfc8 100644 --- a/doc/rbd/rbd-operations.rst +++ b/doc/rbd/rbd-operations.rst @@ -6,6 +6,7 @@ :maxdepth: 1 Snapshots + Exclusive Locking Mirroring Live-Migration Persistent Cache -- 2.47.3