doc: RBD exclusive locks

author Florian Haas <florian@citynetwork.eu>

Tue, 26 Nov 2019 17:25:12 +0000 (18:25 +0100)

committer Florian Haas <florian@citynetwork.eu>

Wed, 4 Dec 2019 15:03:08 +0000 (16:03 +0100)
author Florian Haas <florian@citynetwork.eu>
Tue, 26 Nov 2019 17:25:12 +0000 (18:25 +0100)
committer Florian Haas <florian@citynetwork.eu>
Wed, 4 Dec 2019 15:03:08 +0000 (16:03 +0100)
diff --git a/doc/rbd/rbd-exclusive-locks.rst b/doc/rbd/rbd-exclusive-locks.rst

new file mode 100644 (file)

index 0000000..f07651d
--- /dev/null
+++ b/doc/rbd/rbd-exclusive-locks.rst
@@ -0,0 +1,81 @@
+.. _rbd-exclusive-locks:
+
+====================
+ RBD Exclusive Locks
+====================
+
+.. index:: Ceph Block Device; RBD exclusive locks; exclusive-lock
+
+Exclusive locks are a mechanism designed to prevent multiple processes
+from accessing the same Rados Block Device (RBD) in an uncoordinated
+fashion. Exclusive locks are heavily used in virtualization (where
+they prevent VMs from clobbering each others' writes), and also in RBD
+mirroring (where they are a prerequisite for journaling).
+
+Exclusive locks are enabled on newly created images by default, unless
+overridden via the ``rbd_default_features`` configuration option or
+the ``--image-feature`` flag for ``rbd create``.
+
+In order to ensure proper exclusive locking operations, any client
+using an RBD image whose ``exclusive-lock`` feature is enabled should
+be using a CephX identity whose capabilities include ``profile rbd``.
+
+Exclusive locking is mostly transparent to the user.
+
+#. Whenever any ``librbd`` client process or kernel RBD client
+   starts using an RBD image on which exclusive locking has been
+   enabled, it obtains an exclusive lock on the image before the first
+   write.
+
+#. Whenever any such client process gracefully terminates, it
+   automatically relinquishes the lock.
+
+#. This subsequently enables another process to acquire the lock, and
+   write to the image.
+
+Note that it is perfectly possible for two or more concurrently
+running processes to merely open the image, and also to read from
+it. The client acquires the exclusive lock only when attempting to
+write to the image.
+
+
+Blacklisting
+============
+
+Sometimes, a client process (or, in case of a krbd client, a client
+node's kernel thread) that previously held an exclusive lock on an
+image does not terminate gracefully, but dies abruptly. This may be
+due to having received a ``KILL`` or ``ABRT`` signal, for example, or
+a hard reboot or power failure of the client node. In that case, the
+exclusive lock is never gracefully released. Thus, when a new process
+starts and attempts to use the device, it needs a way to break the
+previously held exclusive lock.
+
+However, a process (or kernel thread) may also hang, or merely lose
+network connectivity to the Ceph cluster for some amount of time. In
+that case, simply breaking the lock would be potentially catastrophic:
+the hung process or connectivity issue may resolve itself, and the old
+process may then compete with one that has started in the interim,
+accessing RBD data in an uncoordinated and destructive manner.
+
+Thus, in the event that a lock cannot be acquired in the standard
+graceful manner, the overtaking process not only breaks the lock, but
+also blacklists the previous lock holder. This is negotiated between
+the new client process and the Ceph Mon: upon receiving the blacklist
+request,
+
+* the Mon instructs the relevant OSDs to no longer serve requests from
+  the old client process;
+* once the associated OSD map update is complete, the Mon grants the
+  lock to the new client;
+* once the new client has acquired the lock, it can commence writing
+  to the image.
+
+Blacklisting is thus a form of storage-level resource `fencing`_.
+
+In order for blacklisting to work, the client must have the ``osd
+blacklist`` capability. This capability is included in the ``profile
+rbd`` capability profile, which should generally be set on all Ceph
+:ref:`client identities <user-management>` using RBD.
+
+.. _fencing: https://en.wikipedia.org/wiki/Fencing_(computing)
diff --git a/doc/rbd/rbd-openstack.rst b/doc/rbd/rbd-openstack.rst

index 3ee2359d0ea34cc89c457ee6ccaf0a63fa9621fa..65e74d3bc3a9dbff5a1aa00783ecd7f944e1f826 100644 (file)
--- a/doc/rbd/rbd-openstack.rst
+++ b/doc/rbd/rbd-openstack.rst
@@ -55,7 +55,10 @@ Three parts of OpenStack integrate with Ceph's block devices:
    advantageous because it allows you to perform maintenance operations easily
    with the live-migration process. Additionally, if your hypervisor dies it is
    also convenient to trigger ``nova evacuate`` and  run the virtual machine
-  elsewhere almost seamlessly.
+  elsewhere almost seamlessly. In doing so,
+  :ref:`exclusive locks <rbd-exclusive-locks>` prevent multiple
+  compute nodes from concurrently accessing the guest disk.
+
  
  You can use OpenStack Glance to store images in a Ceph Block Device, and you
  can use Cinder to boot a VM using a copy-on-write clone of an image.
diff --git a/doc/rbd/rbd-operations.rst b/doc/rbd/rbd-operations.rst

index d6489667a19a3889ea7fef9cda82858c80e46ab5..92694ea7cfc80ea3779dfec73d2321fd3443e293 100644 (file)
--- a/doc/rbd/rbd-operations.rst
+++ b/doc/rbd/rbd-operations.rst
@@ -6,6 +6,7 @@
     :maxdepth: 1
  
     Snapshots<rbd-snapshot>
+   Exclusive Locking <rbd-exclusive-locks>
     Mirroring <rbd-mirroring>
     Live-Migration <rbd-live-migration>
     Persistent Cache <rbd-persistent-cache>
author	Florian Haas <florian@citynetwork.eu>
	Tue, 26 Nov 2019 17:25:12 +0000 (18:25 +0100)
committer	Florian Haas <florian@citynetwork.eu>
	Wed, 4 Dec 2019 15:03:08 +0000 (16:03 +0100)
doc/rbd/rbd-exclusive-locks.rst	[new file with mode: 0644]	patch \| blob
doc/rbd/rbd-openstack.rst		patch \| blob \| history
doc/rbd/rbd-operations.rst		patch \| blob \| history