John Spray [Fri, 22 Aug 2014 12:37:46 +0000 (13:37 +0100)]
osdc/Objecter: disable lockdep for double lock
There is a special case in _recalc_linger_op_target
where we lock two sessions at once to transfer an op
between them. It is deadlock safe because it's the only
place we lock two at once, and we hold rwlock for write
while we do it.
John Spray [Fri, 15 Aug 2014 00:26:20 +0000 (01:26 +0100)]
osdc/Objecter: fix resource management
The refactor introduced various reference leaks, and
lacked cleanup in shutdown.
Things done here:
* Reinstate _recalc_linger_op_target, which was accidentally
disabled and let to freezes in notify() (#9112)
* Make reference counting on OSDSessions much more explicit, using
put_session and get_session everywhere
* Add assertions in ~OSDSession and ~Objecter that the various
maps of operations have been emptied.
* Reassign ops away from closing session to homeless session in
close_session()
* Delete/deref all the ops from the objecter-wide maps of operations
in shutdown()
John Spray [Fri, 15 Aug 2014 00:28:28 +0000 (01:28 +0100)]
librados: separate ::notify return values
There is a return code from objecter for committing
the notify linger op, and then later a code in the
CEPH_MSG_WATCH_NOTIFY handled by RadosClient directly.
Afaict there isn't any nice ordering guarantee here,
so they could stamp on each other. Use a SaferCond
for the submit one.
I don't think this was related to #9112 but while
I'm here...
Get rid of a level of intermediate classes with confusing names and put
the notify and notify finish logic in a single place so that it is easier
to follow and understand.
Pass the return value from the notify completion message to the caller.
Sage Weil [Mon, 11 Aug 2014 00:52:18 +0000 (17:52 -0700)]
osd: include ETIMEDOUT in notify reply on timeout
If a notify operation times out (all watchers to not ACK in time), include
an ETIMEDOUT in the final error message back to the client, so that they
know about it.
John Spray [Thu, 14 Aug 2014 13:39:10 +0000 (14:39 +0100)]
librados: avoid unnecessary locks
Revise wait_for_osdmap to be called outside of RadosClient::lock
and only take the lock if it has to wait for a map.
Also, now that objecter handles its own locking nicely,
there are various places where it is no longer necessary
for RadosClient to take its own lock -- all the calls that
go directly into objecter (RadosClient::pool_*) don't need
to hold RadosClient::lock.
John Spray [Thu, 14 Aug 2014 10:56:07 +0000 (11:56 +0100)]
librados: fix race on osdmap initialization
This would cause occasional failures where calls
to lookup_pool immediately after connect() would
fail to find any pool because the OSD map had not
yet been loaded. The wait for the map was lost when
the pool name cache was lost in ce176b827.
To avoid similar issues, the pool_requires_alignment
and pool_required_alignment helpers need the same
wait_for_osdmap before proceeding. Usually callers
would call lookup_pool before these guys but it's
not guaranteed.
John Spray [Wed, 13 Aug 2014 01:19:22 +0000 (02:19 +0100)]
librados: update Objecter shutdown
Previously checking for CONNECTED was equivalent to
checking the objecter had been initialized, but since
the separation between init() and start() that is
no longer the case. Avoid the need to be smart by
just readint Objecter::initialized to learn whether
to call Objecter::shutdown
Fixes: #9067 Signed-off-by: John Spray <john.spray@redhat.com>
John Spray [Tue, 12 Aug 2014 16:47:01 +0000 (17:47 +0100)]
tools: update for Journaler/Objecter interfaces
Journaler now requires a Finisher: construct one in
MDSUtility.
Objecter now requires separate calls to init() and start(),
do that in MDSUtility and also take advantage of Objecter's
new ability to act as its own dispatcher.
John Spray [Fri, 8 Aug 2014 00:49:26 +0000 (01:49 +0100)]
osdc: Add lock to Filer::Probe
This is necessary now that Objecter can call back
from multiple OSD op completions in parallel: otherwise
we get multiple threads trying to update
the same Probe object.
John Spray [Thu, 7 Aug 2014 14:56:40 +0000 (15:56 +0100)]
mds: convert IO contexts
As of this change, the only thing in the MDS inheriting
directly from Context is MDSContext.
The only files touching mds_lock explicitly are MDS, MDLog and
MDSContext -- everyone else should be getting their locking behaviour
via the contexts. (one minor exception made for an assertion in
Locker).
John Spray [Thu, 7 Aug 2014 14:52:58 +0000 (15:52 +0100)]
osdc/Journaler: use finisher for public callbacks
This is needed because of occasional lock cycles with
external callers doing e.g. write_head.
We do get some weird-looking multiply-nested
C_OnFinisher(C_OnFinisher(...)) from this approach,
where one finisher exists to protect journaler from
lock cycles wrt objecter, and the other exists
to protect the MDS from lock cycles wrt journaler.
John Spray [Wed, 6 Aug 2014 13:35:57 +0000 (14:35 +0100)]
mds: add MDSContext subclasses
These allow contexts within the MDS to identify themselves
as either 'internal' contexts (expecting to be called within
the big MDS lock) or 'IO' contexts (which should take the big
mds lock themselves when called back).
John Spray [Mon, 28 Jul 2014 16:22:59 +0000 (17:22 +0100)]
osdc/Objecter: make homeless_session a pointer
Have a non-pointer member that's a RefCountedObject
was awkward, e.g. tripping nref==0 assertion during
destruction. Rather than play games with refcount
during destruction, just make it a new/delete instance
instead.
John Spray [Wed, 23 Jul 2014 16:35:24 +0000 (17:35 +0100)]
mds: update mds_lock handling in Locker contexts
For some contexts, we expect to be called back from the objecter/filer
on an I/O completion, so we must take mds_lock before updating any
MDS metadata. In others, we expect to be called back from the MDCache
in response to updates to a CInode's state, so we assert that mds_lock
is already held.
John Spray [Wed, 23 Jul 2014 16:32:57 +0000 (17:32 +0100)]
osdc: Use a finisher from Journaler
Completions from I/O operations (i.e. the objecter) hop
through the finisher twice, because of the three layers of
locking (MDS::mds_lock -> Journaler::lock -> Objecter osd session lock)
Because on the way "right" we take the locks in that order, to avoid
deadlock we can't take the locks in the opposite order on the way
"left", hence the finishers.
John Spray [Fri, 25 Jul 2014 16:26:22 +0000 (17:26 +0100)]
mds: fix calls to Objecter::wait_for_map
These were wrong in the earlier commit:
"mds: use lock-safe OSDMap accessors; adjust Objecter wait_for_map call"
Rather than checking epoch explicitly and dropping the lock before
calling wait_for_map, just make a single call to wait_for_map and handle
the return code to learn whether we are waiting or not.
Sage Weil [Mon, 21 Jul 2014 21:11:42 +0000 (14:11 -0700)]
librados: wait for map on create_ioctx failure
Ensure we have a map so we don't simply complain that a pool doesn't
exists. Only take the lock and wait if we fail to lookup the pool,
though, so we avoid contending the lock in the general case.
Sage Weil [Mon, 21 Jul 2014 03:50:00 +0000 (20:50 -0700)]
client: let Objecter dispatch directly
Add Objecter as a direct dispatcher. Drop all of the callbacks and
messages we were passing along. Wrap the IO completions in client_lock
(via C_Lock) and shunt them to the objecter_finisher.
Sage Weil [Sun, 20 Jul 2014 22:00:55 +0000 (15:00 -0700)]
mds: push objecter completions to a Finisher
Most/all of the MDS completions need to be reentrant (and potentially
call back into the Objecter). Shove them all onto a Finisher to make
sure that is safe.
Sage Weil [Sun, 20 Jul 2014 21:51:28 +0000 (14:51 -0700)]
mds: mark objecter completions with _IO_, take mds_lock
For any completion we pass directly to Objecter, make sure we take the
mds_lock in finish(), and mark the class with _IO_ in the name.
Note that this doesn't address the use of Journaler. And this assumes that
we are not holding the mds_lock already when Objecter::handle_osd_op_reply
is called.
Yehuda Sadeh [Wed, 4 Jun 2014 18:46:50 +0000 (11:46 -0700)]
objecter: split objecter initialization
Separate objecter initialization to non cluster related work (e.g.,
internal data structures, other registrations), and to operations that
can initiate cluster interaction. This is so that we don't hit a rare
race where we can get called indirectly from one of the dispatcher callbacks
e.g., into handle_osd_map() when not yet being initialized.
This requires that objecter->init() should be called before
messenger->add_dispatcher_head(), and objecter->start() after it.
Yehuda Sadeh [Wed, 28 May 2014 19:12:31 +0000 (12:12 -0700)]
objecter: don't serialize responses if there's no object name
This implicitly fixes an issue with the list_objects() being reentrant,
and triggers a lock dependency issue. The better solution would be to
have the callback context specify whether it's reentrant or not, but
will require a much bigger change.
Yehuda Sadeh [Mon, 12 May 2014 23:58:31 +0000 (16:58 -0700)]
objecter: shard completion_lock
Object ops responses are sharded, lock hashed by object name. This
guarantees ordering on the same object. Cross object order is not
guaranteed anymore.
Yehuda Sadeh [Wed, 4 Jun 2014 21:55:13 +0000 (14:55 -0700)]
objecter: a major refactoring
Fixes: #7619
Removed the client_lock (that used to pass in as a param) and replaced
it with a read-write lock (completely controlled by the objecter). Also
added a per-session read-write lock. Adapt code to use the new locking
scheme, removed locking where not needed. Replaced various counters to
atomics instead of grabbing the lock for updates. Moved ops to live
under the session.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>