From 4b4aa6a59513b91b005039d8d50849edefef57a7 Mon Sep 17 00:00:00 2001 From: Sidharth Anupkrishnan Date: Tue, 14 Jan 2020 19:05:47 +0530 Subject: [PATCH] mds: add ephemeral pinning for subtrees This PR introduces inode xattrs export_ephemeral_random and export_ephemeral_distributed which enables two different metadata distribution strategies - the first being suitable for a more depthwise scaling of metadata (height of the tree keeps increasing) and the latter for horizontal scaling (many subtrees under a single parent). export_ephemeral_distributed applies is not hierarchical. Any direct descendant directory (i.e. a child directory) has an ephemeral export pin applied to it according to a consistent hash of the child directory inode number. export_ephemeral_distributed is hierarchical like "export_pin". Any CDir loaded into the cache may be ephemerally pinned to a random rank. Like "export_ephemeral_distributed", the random rank is determined by a consistent hash. The metadata distribution strategies are facilitated by using John Lamping and Eric Veach's Jump Consistent Hashing as the consistent hash algorithm. This hashing algorithm eliminates the need to store the data structures representing the consistent hash cluster state and performs as well as Akamai's original implementation providing a fairly uniform distribution. This algorithm only works for distributed systems with numbered buckets (nodes) arranged in ascending order and cluster resizes does not produce any holes in the arrangement of nodes i.e (0, 1, 2, 3) --[removing node 1]--> (0, 1, 2). CephFS satisfies these conditions as the MDSs are arranged as numbered ranks and cluster modifications does not produce any holes in the resulting arrangement of ranks. Fixes: https://tracker.ceph.com/issues/41302 Signed-off-by: Sidharth Anupkrishnan Signed-off-by: Patrick Donnelly (cherry picked from commit ced15ed7ef70ff832d4bebedecb89944276b0395) Conflicts: src/mds/MDSRank.cc --- src/common/options.cc | 12 +++ src/mds/CDir.cc | 1 + src/mds/CInode.cc | 170 ++++++++++++++++++++++++++++++++++++- src/mds/CInode.h | 24 ++++++ src/mds/MDBalancer.cc | 9 +- src/mds/MDCache.cc | 72 +++++++++++++++- src/mds/MDCache.h | 17 +++- src/mds/MDSRank.cc | 4 +- src/mds/Migrator.cc | 30 ++++++- src/mds/Server.cc | 43 ++++++++++ src/mds/events/EMetaBlob.h | 9 +- src/mds/journal.cc | 5 ++ src/mds/mdstypes.h | 20 ++++- 13 files changed, 405 insertions(+), 11 deletions(-) diff --git a/src/common/options.cc b/src/common/options.cc index 331dac6a21104..c456ea173a87a 100644 --- a/src/common/options.cc +++ b/src/common/options.cc @@ -7831,6 +7831,18 @@ std::vector