Dan Mick [Tue, 12 Aug 2014 23:31:22 +0000 (16:31 -0700)]
ceph.spec.in: tests for rhel or centos need to not include _version
rhel_version and centos_version are apparently the OpenSUSE Build
names; the native macros are just "rhel" and "centos" (and contain
a version number, should it be necessary).
Dan Mick [Tue, 12 Aug 2014 21:09:43 +0000 (14:09 -0700)]
ceph.spec.in: No version on ceph-libs Obsoletes.
If we are installing with the new package structure we don't ever want the
new package to co-exist with the old one; this includes the mistakenly-
released v0.81 on Fedora, which should be removed in favor of this
version.
Signed-off-by: Sandon Van Ness <sandon@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
Erik Logtenberg [Thu, 31 Jul 2014 22:13:50 +0000 (00:13 +0200)]
ceph.spec.in, init-ceph.in: Don't autostart ceph service on Fedora.
This patch is taken from the current Fedora package and makes the upstream
ceph.spec compliant with Fedora policy. The goal is to be fully compliant
upstream so that we can replace current Fedora package with upstream
package to fix many bugs in Fedora.
Addition from Dan Mick <dan.mick@inktank.com>:
Do this for RHEL and Centos as well, since they surely will benefit
from the same policy. Note: this requires changes to
autobuild-ceph and ceph-build scripts, which currently copy
only the dist tarball to the rpmbuild/SOURCES dir.
Signed-off-by: Erik Logtenberg <erik@logtenberg.eu> Signed-off-by: Dan Mick <dan.mick@inktank.com>:
Erik Logtenberg [Thu, 31 Jul 2014 21:49:56 +0000 (23:49 +0200)]
ceph.spec.in: add ceph-libs-compat
Added a ceph-libs-compat package in accordance with Fedora packaging
guidelines [1], to handle the recent package split more gracefully.
In Fedora this is necessary because there are already other packages
depending on ceph-libs, that need to be adjusted to depend on the new
split packages instead. In the mean time, ceph-libs-compat prevents
breakage.
common: config: let us obtain a diff between current and default config
It's mildly annoying when trying to figure out what has been changed on
a running system's config options and having to rely on whatever is set
on ceph.conf and the admin's memory of what has been injected.
With this we can simply ask the daemon for the diff between what would be
its default and what is its current config.
Current form will output extraneous information that was not directly
supplied by the user though, such as 'host' 'fsid' and 'daemonize', as
well as defaults we may rewrite ourselves (leveldb tunables on the monitor
for instance). Nonetheless, it's way better than the alternative and
considering it should be used solely for debug purposes I think we can
get away with it.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Sage Weil [Tue, 26 Aug 2014 15:16:29 +0000 (08:16 -0700)]
osd/OSDMap: encode blacklist in deterministic order
When we use an unordered_map the encoding order is non-deterministic,
which is problematic for OSDMap. Construct an ordered map<> on encode
and use that. This lets us keep the hash table for lookups in the general
case.
Fixes: #9211
Backport: firefly Signed-off-by: Sage Weil <sage@redhat.com>
Loic Dachary [Mon, 25 Aug 2014 15:05:04 +0000 (17:05 +0200)]
common: ROUND_UP_TO accepts any rounding factor
The ROUND_UP_TO function was limited to rounding factors that are powers
of two. This saves a modulo but it is not used where it would make a
difference. The implementation is changed so it is generic.
We need to identify whether an object is just composed of a head, or
also has a tail. Test for pre-firefly objects ("explicit objs") was
broken as it was just looking at the number of explicit objs in the
manifest. However, this is insufficient, as we might have empty head,
and in this case it wouldn't appear, so we need to check whether the
sole object is actually pointing at the head.
Somnath Roy [Mon, 18 Aug 2014 23:59:36 +0000 (16:59 -0700)]
CollectionIndex: Collection name is added to the access_lock name
The CollectionIndex constructor is changed to accept the coll_t
so that the collection name can be used to form access_lock(RWLock)
name.This is needed otherwise lockdep will report a recursive lock error
and assert. lockdep needs unique lock names for each Index object.
Sage Weil [Mon, 25 Aug 2014 04:18:00 +0000 (21:18 -0700)]
msg/Accepter: do not unlearn_addr on bind()
It is dangerous to set need_addr = true as it means someone may set the
addr to something else (specifically the port) in a racing thread.
However, it is not necessary: the only reason we added it way back in 5d5045d31a9e10d21b44eb1bd137db9ae53128ff was so that
local_connection->peer_addr would get updated, and bind() now calls that
unconditionally.
Fixes: #9079
Backport: firefly Signed-off-by: Sage Weil <sage@redhat.com>
John Spray [Mon, 25 Aug 2014 00:45:22 +0000 (01:45 +0100)]
osd: update handle_osd_map call
I had changed the implementation in Objecter
to avoid a spurious get/put cycle in "osdc/Objecter: fix resource
management", but this guy was still going a get() before
calling handle_osd_map.
John Spray [Mon, 25 Aug 2014 00:16:39 +0000 (01:16 +0100)]
osdc/Objecter: fix op_cancel on homeless session
Wrote this block without realizing that op_cancel
takes write lock on session lock, and that operation
is undefined when you already hold the read lock.
Fixes: #9214 Signed-off-by: John Spray <john.spray@redhat.com>
John Spray [Sun, 24 Aug 2014 22:48:57 +0000 (23:48 +0100)]
osdc/Objecter: hold session ref longer in resend
This is mostly cosmetic: in fact we are getting an extra
ref in _map_session and holding the session lock, so
it's safe, but it's awkward to be giving up the ref on
a session and then continuing to refer to it.
John Spray [Fri, 22 Aug 2014 12:37:46 +0000 (13:37 +0100)]
osdc/Objecter: disable lockdep for double lock
There is a special case in _recalc_linger_op_target
where we lock two sessions at once to transfer an op
between them. It is deadlock safe because it's the only
place we lock two at once, and we hold rwlock for write
while we do it.
John Spray [Fri, 15 Aug 2014 00:26:20 +0000 (01:26 +0100)]
osdc/Objecter: fix resource management
The refactor introduced various reference leaks, and
lacked cleanup in shutdown.
Things done here:
* Reinstate _recalc_linger_op_target, which was accidentally
disabled and let to freezes in notify() (#9112)
* Make reference counting on OSDSessions much more explicit, using
put_session and get_session everywhere
* Add assertions in ~OSDSession and ~Objecter that the various
maps of operations have been emptied.
* Reassign ops away from closing session to homeless session in
close_session()
* Delete/deref all the ops from the objecter-wide maps of operations
in shutdown()
John Spray [Fri, 15 Aug 2014 00:28:28 +0000 (01:28 +0100)]
librados: separate ::notify return values
There is a return code from objecter for committing
the notify linger op, and then later a code in the
CEPH_MSG_WATCH_NOTIFY handled by RadosClient directly.
Afaict there isn't any nice ordering guarantee here,
so they could stamp on each other. Use a SaferCond
for the submit one.
I don't think this was related to #9112 but while
I'm here...
Get rid of a level of intermediate classes with confusing names and put
the notify and notify finish logic in a single place so that it is easier
to follow and understand.
Pass the return value from the notify completion message to the caller.
Sage Weil [Mon, 11 Aug 2014 00:52:18 +0000 (17:52 -0700)]
osd: include ETIMEDOUT in notify reply on timeout
If a notify operation times out (all watchers to not ACK in time), include
an ETIMEDOUT in the final error message back to the client, so that they
know about it.
John Spray [Thu, 14 Aug 2014 13:39:10 +0000 (14:39 +0100)]
librados: avoid unnecessary locks
Revise wait_for_osdmap to be called outside of RadosClient::lock
and only take the lock if it has to wait for a map.
Also, now that objecter handles its own locking nicely,
there are various places where it is no longer necessary
for RadosClient to take its own lock -- all the calls that
go directly into objecter (RadosClient::pool_*) don't need
to hold RadosClient::lock.
John Spray [Thu, 14 Aug 2014 10:56:07 +0000 (11:56 +0100)]
librados: fix race on osdmap initialization
This would cause occasional failures where calls
to lookup_pool immediately after connect() would
fail to find any pool because the OSD map had not
yet been loaded. The wait for the map was lost when
the pool name cache was lost in ce176b827.
To avoid similar issues, the pool_requires_alignment
and pool_required_alignment helpers need the same
wait_for_osdmap before proceeding. Usually callers
would call lookup_pool before these guys but it's
not guaranteed.
John Spray [Wed, 13 Aug 2014 01:19:22 +0000 (02:19 +0100)]
librados: update Objecter shutdown
Previously checking for CONNECTED was equivalent to
checking the objecter had been initialized, but since
the separation between init() and start() that is
no longer the case. Avoid the need to be smart by
just readint Objecter::initialized to learn whether
to call Objecter::shutdown
Fixes: #9067 Signed-off-by: John Spray <john.spray@redhat.com>
John Spray [Tue, 12 Aug 2014 16:47:01 +0000 (17:47 +0100)]
tools: update for Journaler/Objecter interfaces
Journaler now requires a Finisher: construct one in
MDSUtility.
Objecter now requires separate calls to init() and start(),
do that in MDSUtility and also take advantage of Objecter's
new ability to act as its own dispatcher.
John Spray [Fri, 8 Aug 2014 00:49:26 +0000 (01:49 +0100)]
osdc: Add lock to Filer::Probe
This is necessary now that Objecter can call back
from multiple OSD op completions in parallel: otherwise
we get multiple threads trying to update
the same Probe object.
John Spray [Thu, 7 Aug 2014 14:56:40 +0000 (15:56 +0100)]
mds: convert IO contexts
As of this change, the only thing in the MDS inheriting
directly from Context is MDSContext.
The only files touching mds_lock explicitly are MDS, MDLog and
MDSContext -- everyone else should be getting their locking behaviour
via the contexts. (one minor exception made for an assertion in
Locker).
John Spray [Thu, 7 Aug 2014 14:52:58 +0000 (15:52 +0100)]
osdc/Journaler: use finisher for public callbacks
This is needed because of occasional lock cycles with
external callers doing e.g. write_head.
We do get some weird-looking multiply-nested
C_OnFinisher(C_OnFinisher(...)) from this approach,
where one finisher exists to protect journaler from
lock cycles wrt objecter, and the other exists
to protect the MDS from lock cycles wrt journaler.
John Spray [Wed, 6 Aug 2014 13:35:57 +0000 (14:35 +0100)]
mds: add MDSContext subclasses
These allow contexts within the MDS to identify themselves
as either 'internal' contexts (expecting to be called within
the big MDS lock) or 'IO' contexts (which should take the big
mds lock themselves when called back).