mds: improve eviction usability and update docs

author John Spray <john.spray@redhat.com>

Mon, 24 Apr 2017 21:19:43 +0000 (17:19 -0400)

committer John Spray <john.spray@redhat.com>

Tue, 23 May 2017 09:22:17 +0000 (05:22 -0400)
author John Spray <john.spray@redhat.com>
Mon, 24 Apr 2017 21:19:43 +0000 (17:19 -0400)
committer John Spray <john.spray@redhat.com>
Tue, 23 May 2017 09:22:17 +0000 (05:22 -0400)
diff --git a/doc/cephfs/eviction.rst b/doc/cephfs/eviction.rst

index a881dc6af4c1c05e85f98b32ec30ec95f4ce29f6..e1400a1cec2e8b4e42990e06e013405f2fedff66 100644 (file)
--- a/doc/cephfs/eviction.rst
+++ b/doc/cephfs/eviction.rst
@@ -1,4 +1,5 @@
  
+===============================
  Ceph filesystem client eviction
  ===============================
  
@@ -6,122 +7,139 @@ When a filesystem client is unresponsive or otherwise misbehaving, it
  may be necessary to forcibly terminate its access to the filesystem.  This
  process is called *eviction*.
  
-This process is somewhat thorough in order to protect against data inconsistency
-resulting from misbehaving clients.
+Evicting a CephFS client prevents it from communicating further with MDS
+daemons and OSD daemons.  If a client was doing buffered IO to the filesystem,
+any un-flushed data will be lost.
+
+Clients may either be evicted automatically (if they fail to communicate
+promptly with the MDS), or manually (by the system administrator).
+
+The client eviction process applies to clients of all kinds, this includes
+FUSE mounts, kernel mounts, nfs-ganesha gateways, and any process using
+libcephfs.
+
+Automatic client eviction
+=========================
+
+There are two situations in which a client may be evicted automatically:
+
+On an active MDS daemon, if a client has not communicated with the MDS for
+over ``mds_session_autoclose`` seconds (300 seconds by default), then it
+will be evicted automatically.
+
+During MDS startup (including on failover), the MDS passes through a
+state called ``reconnect``.  During this state, it waits for all the
+clients to connect to the new MDS daemon.  If any clients fail to do
+so within the time window (``mds_reconnect_timeout``, 45 seconds by default)
+then they will be evicted.
+
+A warning message is sent to the cluster log if either of these situations
+arises.
  
-OSD blacklisting
-----------------
+Manual client eviction
+======================
  
-First, prevent the client from performing any more data operations by *blacklisting*
-it at the RADOS level.  You may be familiar with this concept as *fencing* in other
-storage systems.
+Sometimes, the administrator may want to evict a client manually.  This
+could happen if a client is died and the administrator does not
+want to wait for its session to time out, or it could happen if
+a client is misbehaving and the administrator does not have access to
+the client node to unmount it.
  
-Identify the client to evict from the MDS session list:
+It is useful to inspect the list of clients first:
  
  ::
  
-    # ceph daemon mds.a session ls
+    ceph tell mds.0 client ls
+
      [
-        { "id": 4117,
-          "num_leases": 0,
-          "num_caps": 1,
-          "state": "open",
-          "replay_requests": 0,
-          "reconnecting": false,
-          "inst": "client.4117 172.16.79.251:0\/3271",
-          "client_metadata": { "entity_id": "admin",
-              "hostname": "fedoravm.localdomain",
-              "mount_point": "\/home\/user\/mnt"}}]
-
-In this case the 'fedoravm' client has address ``172.16.79.251:0/3271``, so we blacklist
-it as follows:
+        {
+            "id": 4305,
+            "num_leases": 0,
+            "num_caps": 3,
+            "state": "open",
+            "replay_requests": 0,
+            "completed_requests": 0,
+            "reconnecting": false,
+            "inst": "client.4305 172.21.9.34:0/422650892",
+            "client_metadata": {
+                "ceph_sha1": "ae81e49d369875ac8b569ff3e3c456a31b8f3af5",
+                "ceph_version": "ceph version 12.0.0-1934-gae81e49 (ae81e49d369875ac8b569ff3e3c456a31b8f3af5)",
+                "entity_id": "0",
+                "hostname": "senta04",
+                "mount_point": "/tmp/tmpcMpF1b/mnt.0",
+                "pid": "29377",
+                "root": "/"
+            }
+        }
+    ]
+    
+
+
+Once you have identified the client you want to evict, you can
+do that using its unique ID, or various other attributes to identify it:
  
  ::
+    
+    # These all work
+    ceph tell mds.0 client evict id=4305
+    ceph tell mds.0 client evict client_metadata.=4305
+
  
-    # ceph osd blacklist add 172.16.79.251:0/3271
-    blacklisting 172.16.79.251:0/3271 until 2014-12-09 13:09:56.569368 (3600 sec)
+Advanced: Un-blacklisting a client
+==================================
  
-OSD epoch barrier
------------------
+Ordinarily, a blacklisted client may not reconnect to the servers: it
+must be unmounted and then mounted anew.
  
-While the evicted client is now marked as blacklisted in the central (mon) copy of the OSD
-map, it is now necessary to ensure that this OSD map update has propagated to all daemons
-involved in subsequent filesystem I/O.  To do this, use the ``osdmap barrier`` MDS admin
-socket command.
+However, in some situations it may be useful to permit a client that
+was evicted to attempt to reconnect.
  
-First read the latest OSD epoch:
+Because CephFS uses the RADOS OSD blacklist to control client eviction,
+CephFS clients can be permitted to reconnect by removing them from
+the blacklist:
  
  ::
  
-    # ceph osd dump
-    epoch 12
-    fsid fd61ca96-53ff-4311-826c-f36b176d69ea
-    created 2014-12-09 12:03:38.595844
-    modified 2014-12-09 12:09:56.619957
-    ...
+    ceph osd blacklist ls
+    # ... identify the address of the client ...
+    ceph osd blacklist rm <address>
  
-In this case it is 12.  Now request the MDS to barrier on this epoch:
+Doing this may put data integrity at risk if other clients have accessed
+files that the blacklisted client was doing buffered IO to.  It is also not
+guaranteed to result in a fully functional client -- the best way to get
+a fully healthy client back after an eviction is to unmount the client
+and do a fresh mount.
  
-::
+If you are trying to reconnect clients in this way, you may also
+find it useful to set ``client_reconnect_stale`` to true in the
+FUSE client, to prompt the client to try to reconnect.
  
-    # ceph daemon mds.a osdmap barrier 12
+Advanced: Configuring blacklisting
+==================================
  
-MDS session eviction
---------------------
+If you are experiencing frequent client evictions, due to slow
+client hosts or an unreliable network, and you cannot fix the underlying
+issue, then you may want to ask the MDS to be less strict.
  
-Finally, it is safe to evict the client's MDS session, such that any capabilities it held
-may be issued to other clients.  The ID here is the ``id`` attribute from the ``session ls``
-output:
+It is possible to respond to slow clients by simply dropping their
+MDS sessions, but permit them to re-open sessions and permit them
+to continue talking to OSDs.  To enable this mode, set
+``mds_session_blacklist_on_timeout`` to false on your MDS nodes.
  
-::
+For the equivalent behaviour on manual evictions, set
+``mds_session_blacklist_on_evict`` to false.
+
+Note that if blacklisting is disabled, then evicting a client will
+only have an effect on the MDS you send the command to.  On a system
+with multiple active MDS daemons, you would need to send an
+eviction command to each active daemon.  When blacklisting is enabled 
+(the default), sending an eviction to command to just a single
+MDS is sufficient, because the blacklist propagates it to the others.
+
+Advanced options
+================
+
+``mds_blacklist_interval`` - this setting controls how many seconds
+entries will remain in the blacklist for.
  
-    # ceph daemon mds.a session evict 4117
-
-That's it!  The client has now been evicted, and any resources it had locked will
-now be available for other clients.
-
-Background: blacklisting and OSD epoch barrier
-----------------------------------------------
-
-After a client is blacklisted, it is necessary to make sure that 
-other clients and MDS daemons have the latest OSDMap (including
-the blacklist entry) before they try to access any data objects
-that the blacklisted client might have been accessing.
-
-This is ensured using an internal "osdmap epoch barrier" mechanism.
-See :doc:
-
-The purpose of the barrier is to ensure that when we hand out any
-capabilities which might allow touching the same RADOS objects, the
-clients we hand out the capabilities to must have a sufficiently recent
-OSD map to not race with cancelled operations (from ENOSPC) or
-blacklisted clients (from evictions)
-
-More specifically, the cases where we set an epoch barrier are:
-
- * Client eviction (where the client is blacklisted and other clients
-   must wait for a post-blacklist epoch to touch the same objects)
- * OSD map full flag handling in the client (where the client may
-   cancel some OSD ops from a pre-full epoch, so other clients must
-   wait until the full epoch or later before touching the same objects).
- * MDS startup, because we don't persist the barrier epoch, so must
-   assume that latest OSD map is always required after a restart.
-
-Note that this is a global value for simplicity: we could maintain this on
-a per-inode basis.  We don't, because:
-
- * It would be more complicated
- * It would use an extra 4 bytes of memory for every inode
- * It would not be much more efficient as almost always everyone has the latest
-   OSD map anyway, in most cases everyone will breeze through this barrier
-   rather than waiting.
- * We only do this barrier in very rare cases, so any benefit from per-inode
-   granularity would only very rarely be seen.
-
-The epoch barrier is transmitted along with all capability messages, and
-instructs the receiver of the message to avoid sending any more RADOS
-operations to OSDs until it has seen this OSD epoch.  This mainly applies
-to clients (doing their data writes directly to files), but also applies
-to the MDS because things like file size probing and file deletion are
-done directly from the MDS.
  
diff --git a/src/mds/MDSDaemon.cc b/src/mds/MDSDaemon.cc

index 68f7a984504b3234094e6e5e6632263bd1d5cba1..29aa5c2ce178c7eba043d0995d3c48c106f6c53c 100644 (file)
--- a/src/mds/MDSDaemon.cc
+++ b/src/mds/MDSDaemon.cc
@@ -661,9 +661,15 @@ COMMAND("cpu_profiler " \
  COMMAND("session ls " \
         "name=filters,type=CephString,n=N,req=false",
         "List client sessions", "mds", "r", "cli,rest")
+COMMAND("client ls " \
+       "name=filters,type=CephString,n=N,req=false",
+       "List client sessions", "mds", "r", "cli,rest")
  COMMAND("session evict " \
         "name=filters,type=CephString,n=N,req=false",
         "Evict client session(s)", "mds", "rw", "cli,rest")
+COMMAND("client evict " \
+       "name=filters,type=CephString,n=N,req=false",
+       "Evict client session(s)", "mds", "rw", "cli,rest")
  COMMAND("damage ls",
         "List detected metadata damage", "mds", "r", "cli,rest")
  COMMAND("damage rm name=damage_id,type=CephInt",
diff --git a/src/mds/MDSRank.cc b/src/mds/MDSRank.cc

index 536350a99759ed67c8a67133a1ae4cbf978aaeea..25975a1a67df5479113d143437e7ce0676bc90a2 100644 (file)
--- a/src/mds/MDSRank.cc
+++ b/src/mds/MDSRank.cc
@@ -2795,7 +2795,7 @@ bool MDSRankDispatcher::handle_command(
    std::string prefix;
    cmd_getval(g_ceph_context, cmdmap, "prefix", prefix);
  
-  if (prefix == "session ls") {
+  if (prefix == "session ls" || prefix == "client ls") {
      std::vector<std::string> filter_args;
      cmd_getval(g_ceph_context, cmdmap, "filters", filter_args);
  
@@ -2810,7 +2810,7 @@ bool MDSRankDispatcher::handle_command(
      f->flush(*ds);
      delete f;
      return true;
-  } else if (prefix == "session evict") {
+  } else if (prefix == "session evict" || prefix == "client evict") {
      std::vector<std::string> filter_args;
      cmd_getval(g_ceph_context, cmdmap, "filters", filter_args);
  
diff --git a/src/mds/Server.cc b/src/mds/Server.cc

index 7d48420209e9f2430c4366770c0b760a031efda3..6db12f668d5e92d20a5c687f93c6b227ad370a84 100644 (file)
--- a/src/mds/Server.cc
+++ b/src/mds/Server.cc
@@ -753,8 +753,8 @@ void Server::find_idle_sessions()
    for (const auto &session: to_evict) {
      utime_t age = now;
      age -= session->last_cap_renew;
-    mds->clog->info() << "closing stale session " << session->info.inst
-       << " after " << age;
+    mds->clog->warn() << "evicting unresponsive client " << *session
+                      << ", after " << age << " seconds";
      dout(10) << "autoclosing stale session " << session->info.inst << " last "
               << session->last_cap_renew << dendl;
  
@@ -1023,6 +1023,10 @@ void Server::reconnect_tick()
        assert(session);
        dout(1) << "reconnect gave up on " << session->info.inst << dendl;
  
+      mds->clog->warn() << "evicting unresponsive client " << *session
+                        << ", after waiting " << g_conf->mds_reconnect_timeout
+                        << " seconds during MDS startup";
+
        if (g_conf->mds_session_blacklist_on_timeout) {
          std::stringstream ss;
          mds->evict_client(session->info.inst.name.num(), false, true, ss,
author	John Spray <john.spray@redhat.com>
	Mon, 24 Apr 2017 21:19:43 +0000 (17:19 -0400)
committer	John Spray <john.spray@redhat.com>
	Tue, 23 May 2017 09:22:17 +0000 (05:22 -0400)
doc/cephfs/eviction.rst		patch \| blob \| history
src/mds/MDSDaemon.cc		patch \| blob \| history
src/mds/MDSRank.cc		patch \| blob \| history
src/mds/Server.cc		patch \| blob \| history