From d862d74a0988d7d29ea5f00b036625c84bb0318a Mon Sep 17 00:00:00 2001 From: Patrick Donnelly Date: Tue, 31 Dec 2024 14:46:03 -0500 Subject: [PATCH] doc: add docs for CephFS charmap config Signed-off-by: Patrick Donnelly --- PendingReleaseNotes | 4 + doc/cephfs/charmap.rst | 187 +++++++++++++++++++++++++++++++++++++++++ doc/cephfs/index.rst | 1 + 3 files changed, 192 insertions(+) create mode 100644 doc/cephfs/charmap.rst diff --git a/PendingReleaseNotes b/PendingReleaseNotes index 146b7e18ac033..e14ad5bf5a19f 100644 --- a/PendingReleaseNotes +++ b/PendingReleaseNotes @@ -27,6 +27,10 @@ of the column showing the state of a group snapshot in the unformatted CLI output is changed from 'STATUS' to 'STATE'. The state of a group snapshot that was shown as 'ok' is now shown as 'complete', which is more descriptive. +* CephFS: Directories may now be configured with case-insensitive or + normalized directory entry names. This is an inheritable configuration making + it apply to an entire directory tree. For more information, see + https://docs.ceph.com/en/latest/cephfs/charmap/ * Based on tests performed at scale on an HDD based Ceph cluster, it was found that scheduling with mClock was not optimal with multiple OSD shards. For example, in the test cluster with multiple OSD node failures, the client diff --git a/doc/cephfs/charmap.rst b/doc/cephfs/charmap.rst new file mode 100644 index 0000000000000..d49b2706778de --- /dev/null +++ b/doc/cephfs/charmap.rst @@ -0,0 +1,187 @@ +.. _charmap: + +CephFS Directory Entry Name Normalization and Case Folding +========================================================== + +CephFS allows configuring directory trees to **normalize** and possibly **case +fold** directory entry names. This is typically a useful property for file +systems exported by gateways like Samba which enforce a case-insensitive view +of the file system, typically with performance penalties on file systems which +are not case-insensitive. + +The following virtual extended attributes control the **character mapping** +rules for directory entries: + +* ``ceph.dir.casesensitive``: A boolean setting for the case sensitivity of the directory. If true, case fold the directory entry names. +* ``ceph.dir.normalization``: A string setting for the type of Unicode normalization to apply for directory entry names. Currently the normalization forms D (``nfd``), C (``nfc``), KD (``nfkd``), and KC (``nfkc``) are understood by the client. +* ``ceph.dir.encoding``: A string setting for the encoding to use and enforce for directory entry names. The default and presently only supported encoding is UTF-8 (``utf8``). + +There is also a convenience virtual extended attribute that is useful for +getting the JSON encoding of the case sensitivity, normalization, and encoding +configurations: + +* ``ceph.dir.charmap``: The complete character mapping configuration for a directory. + +It can also be used to **remove** all settings and restore the default CephFS behavior +for directory entry names: uninterpreted bytes without ``/`` that are NUL terminated. + +Note the following restrictions on manipulating any of these extended attributes: + +* The directory must be empty. +* The directory must not be part of a snapshot. + +New subdirectories created under a directory with a ``charmap`` configuration will +inherit (copy) the parent's configuration. + +.. note:: You can remove a ``charmap`` on a subdirectory which inherited + the configuration so long as the preconditions apply: it is empty + and not part of an existing snapshot. + + +Normalization +------------- + +The ``ceph.dir.normalization`` attribute accepts the following normalization forms: + +* **nfd**: Form D (Canonical Decomposition) +* **nfc**: Form C (Canonical Decomposition, followed by Canonical Composition) +* **nfkd**: Form KD (Compatibility Decomposition) +* **nfkc**: Form KC (Compatibility Decomposition, followed by Canonical Composition) + +The default normalization for a character mapping configuration is ``nfd``. + +.. note:: For more information about Unicode normalization forms, please see `Unicode normalization standard documents`_. + +Whenever a directory entry name is generated during path traversal or lookup, +the client will apply the normalization to the name before submitting any +operation to the MDS. On the MDS side, the directory entry names which +are stored are only these normalized names. + +For example, to set the normalization on a directory: + +:: + + $ setfattr -n ceph.dir.normalization -v "" foo/ + + $ getfattr -n ceph.dir.charmap foo/ + # file: foo/ + ceph.dir.charmap="{\"casesensitive\":true,\"normalization\":\"nfd\",\"encoding\":\"utf8\"}" + + $ getfattr -n ceph.dir.normalization foo/ + # file: foo/ + ceph.dir.normalization="nfd" + +.. note:: Setting the empty string will cause the MDS to pick the default normalization. + +All character mapping configurations must have a normalization enabled. Removing the normalization +will cause the default to be restored: + +:: + + $ setfattr -n ceph.dir.normalization -v nfc foo/ + $ getfattr -n ceph.dir.normalization foo/ + # file: foo/ + ceph.dir.normalization="nfc" + + $ setfattr -x ceph.dir.normalization foo/ + $ getfattr -n ceph.dir.normalization foo/ + # file: foo/ + ceph.dir.normalization="nfd" + +To remove normlization on a directory, you must remove the ``ceph.dir.charmap`` +configuration. + +.. note:: The MDS maintains an ``alternate_name`` metadata (also used for + encryption) for directory entries which allows the client to persist the + original un-normalized name used by the application. The MDS does not + interpret this metadata in any way; it's only used by clients to reconstruct + the original name of the directory entry. + + +Case Folding +------------ + +The ``ceph.dir.casesensitive`` attribute accepts a boolean value. By +default, names are case-sensitive (as normal in a POSIX file system). Setting +this value to false will make the directory (and its children) +case-insensitive. + +Case folding requires that names are also normalized. By default, after setting +a directory to be case-insensitive, the ``charmap`` will be: + +:: + + $ setfattr -n ceph.dir.casesensitive -v 0 foo/ + $ getfattr -n ceph.dir.casesensitive foo/ + # file: foo/ + ceph.dir.casesensitive="0" + + $ getfattr -n ceph.dir.charmap foo/ + # file: foo/ + ceph.dir.charmap="{\"casesensitive\":false,\"normalization\":\"nfd\",\"encoding\":\"utf8\"}" + +Note that setting the case sensitivity on a directory will cause the default +normalization to be selected. + +.. note:: Normalization is applied before case folding. The directory entry name used + by the MDS is the case folded and normalized name. + + +Removing Character Mapping +-------------------------- + +If a directory is empty and not part of a snapshot, the ``charmap`` can be +removed: + +:: + + $ setfattr -x ceph.dir.charmap foo/ + +One can confirm that this restores the normal CephFS behavior: + +:: + + $ getfattr -n ceph.dir.charmap foo/ + foo/: ceph.dir.charmap: No such attribute + +If the attribute does not exist, then there is no character mapping for the +directory. Note that a (future) child or parent directory may have a charmap +configuration but it will have no effect on this directory. A charmap +configuration is only inherited at directory creation. + + +.. note:: The default charmap includes normalization that cannot be disabled. + The only way to turn off this functionality is by removing + this ``charmap`` virtual extended attribute. + + +Restricting Incompatible Client Access +-------------------------------------- + +The MDS protects access to directory trees with a ``charmap`` via a new client +feature bit. The MDS will not allow a client that does not understand the +``charmap`` feature to modify a directory with a ``charmap`` configuration +except to unlink files or remove subdirectories. + +You can also require that all clients understand the ``charmap`` feature +to use the file system at all: + +.. prompt:: bash # + + ceph fs required_client_features add charmap + +.. note:: The kernel driver does not understand the ``charmap`` feature + and probably will not because existing kernel libraries have + opinionated case folding and normalization forms. For this reason, + adding ``charmap`` to the required client features is not + recommended. + +Permissions +----------- + +As with other CephFS virtual extended atributes, a client may only set the +``charmap`` configuration on a directory with the **p** MDS auth cap. Viewing +the configuration does not require this cap. + + +.. _Unicode normalization standard documents: https://unicode.org/reports/tr15/ diff --git a/doc/cephfs/index.rst b/doc/cephfs/index.rst index 630d29f1956be..a1ec0533188d5 100644 --- a/doc/cephfs/index.rst +++ b/doc/cephfs/index.rst @@ -145,6 +145,7 @@ CephFS Concepts Distributed Metadata Cache Dynamic Metadata Management in CephFS CephFS IO Path + Case Sensitivity and Normalization LazyIO Directory fragmentation Multiple active MDS daemons -- 2.39.5