John Spray [Thu, 7 Aug 2014 14:56:40 +0000 (15:56 +0100)]
mds: convert IO contexts
As of this change, the only thing in the MDS inheriting
directly from Context is MDSContext.
The only files touching mds_lock explicitly are MDS, MDLog and
MDSContext -- everyone else should be getting their locking behaviour
via the contexts. (one minor exception made for an assertion in
Locker).
John Spray [Thu, 7 Aug 2014 14:52:58 +0000 (15:52 +0100)]
osdc/Journaler: use finisher for public callbacks
This is needed because of occasional lock cycles with
external callers doing e.g. write_head.
We do get some weird-looking multiply-nested
C_OnFinisher(C_OnFinisher(...)) from this approach,
where one finisher exists to protect journaler from
lock cycles wrt objecter, and the other exists
to protect the MDS from lock cycles wrt journaler.
John Spray [Wed, 6 Aug 2014 13:35:57 +0000 (14:35 +0100)]
mds: add MDSContext subclasses
These allow contexts within the MDS to identify themselves
as either 'internal' contexts (expecting to be called within
the big MDS lock) or 'IO' contexts (which should take the big
mds lock themselves when called back).
John Spray [Mon, 28 Jul 2014 16:22:59 +0000 (17:22 +0100)]
osdc/Objecter: make homeless_session a pointer
Have a non-pointer member that's a RefCountedObject
was awkward, e.g. tripping nref==0 assertion during
destruction. Rather than play games with refcount
during destruction, just make it a new/delete instance
instead.
John Spray [Wed, 23 Jul 2014 16:35:24 +0000 (17:35 +0100)]
mds: update mds_lock handling in Locker contexts
For some contexts, we expect to be called back from the objecter/filer
on an I/O completion, so we must take mds_lock before updating any
MDS metadata. In others, we expect to be called back from the MDCache
in response to updates to a CInode's state, so we assert that mds_lock
is already held.
John Spray [Wed, 23 Jul 2014 16:32:57 +0000 (17:32 +0100)]
osdc: Use a finisher from Journaler
Completions from I/O operations (i.e. the objecter) hop
through the finisher twice, because of the three layers of
locking (MDS::mds_lock -> Journaler::lock -> Objecter osd session lock)
Because on the way "right" we take the locks in that order, to avoid
deadlock we can't take the locks in the opposite order on the way
"left", hence the finishers.
John Spray [Fri, 25 Jul 2014 16:26:22 +0000 (17:26 +0100)]
mds: fix calls to Objecter::wait_for_map
These were wrong in the earlier commit:
"mds: use lock-safe OSDMap accessors; adjust Objecter wait_for_map call"
Rather than checking epoch explicitly and dropping the lock before
calling wait_for_map, just make a single call to wait_for_map and handle
the return code to learn whether we are waiting or not.
Sage Weil [Mon, 21 Jul 2014 21:11:42 +0000 (14:11 -0700)]
librados: wait for map on create_ioctx failure
Ensure we have a map so we don't simply complain that a pool doesn't
exists. Only take the lock and wait if we fail to lookup the pool,
though, so we avoid contending the lock in the general case.
Sage Weil [Mon, 21 Jul 2014 03:50:00 +0000 (20:50 -0700)]
client: let Objecter dispatch directly
Add Objecter as a direct dispatcher. Drop all of the callbacks and
messages we were passing along. Wrap the IO completions in client_lock
(via C_Lock) and shunt them to the objecter_finisher.
Sage Weil [Sun, 20 Jul 2014 22:00:55 +0000 (15:00 -0700)]
mds: push objecter completions to a Finisher
Most/all of the MDS completions need to be reentrant (and potentially
call back into the Objecter). Shove them all onto a Finisher to make
sure that is safe.
Sage Weil [Sun, 20 Jul 2014 21:51:28 +0000 (14:51 -0700)]
mds: mark objecter completions with _IO_, take mds_lock
For any completion we pass directly to Objecter, make sure we take the
mds_lock in finish(), and mark the class with _IO_ in the name.
Note that this doesn't address the use of Journaler. And this assumes that
we are not holding the mds_lock already when Objecter::handle_osd_op_reply
is called.
Yehuda Sadeh [Wed, 4 Jun 2014 18:46:50 +0000 (11:46 -0700)]
objecter: split objecter initialization
Separate objecter initialization to non cluster related work (e.g.,
internal data structures, other registrations), and to operations that
can initiate cluster interaction. This is so that we don't hit a rare
race where we can get called indirectly from one of the dispatcher callbacks
e.g., into handle_osd_map() when not yet being initialized.
This requires that objecter->init() should be called before
messenger->add_dispatcher_head(), and objecter->start() after it.
Yehuda Sadeh [Wed, 28 May 2014 19:12:31 +0000 (12:12 -0700)]
objecter: don't serialize responses if there's no object name
This implicitly fixes an issue with the list_objects() being reentrant,
and triggers a lock dependency issue. The better solution would be to
have the callback context specify whether it's reentrant or not, but
will require a much bigger change.
Yehuda Sadeh [Mon, 12 May 2014 23:58:31 +0000 (16:58 -0700)]
objecter: shard completion_lock
Object ops responses are sharded, lock hashed by object name. This
guarantees ordering on the same object. Cross object order is not
guaranteed anymore.
Yehuda Sadeh [Wed, 4 Jun 2014 21:55:13 +0000 (14:55 -0700)]
objecter: a major refactoring
Fixes: #7619
Removed the client_lock (that used to pass in as a param) and replaced
it with a read-write lock (completely controlled by the objecter). Also
added a per-session read-write lock. Adapt code to use the new locking
scheme, removed locking where not needed. Replaced various counters to
atomics instead of grabbing the lock for updates. Moved ops to live
under the session.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
Xiaoxi Chen [Wed, 20 Aug 2014 07:35:44 +0000 (15:35 +0800)]
CrushWrapper: pick a ruleset same as rule_id
Originally in the add_simple_ruleset funtion, the ruleset_id
is not reused but rule_id is reused. So after some add/remove
against rules, the newly created rule likely to have
ruleset!=rule_id.
We dont want this happen because we are trying to hold the constraint
that ruleset == rule_id.