Sage Weil [Fri, 11 Jul 2014 18:31:22 +0000 (11:31 -0700)]
osd/osd_types: be pedantic about encoding last_force_op_resend without feature bit
The addition of the value is completely backward compatible, but if the
mon feature bits don't match it can cause monitor scrub noice (due to the
parallel OSDMap encoding). Avoid that by only adding the new field if the
feature (which was added 2 patches after the encoding, see 3152faf79f498a723ae0fe44301ccb21b15a96ab and 45e79a17a932192995f8328ae9f6e8a2a6348d10.
Fixes: #8815
Backport: firefly Signed-off-by: Sage Weil <sage@redhat.com>
`/etc/init.d/rbdmap start` was doing `mount -a`. Although (arguably)
`mount -a -O _netdev` could be less disruptive, it's not RBD mapping job to
mount unrelated devices and potentially do it at the wrong time.
Solution is to call `mount {device}` which works as expected and mounts
device even if it given in form `mount /dev/rbd/pool/imagename` while
`/etc/fstab` uses UUID or LABEL notation.
Furthermore this commit
* fixes global exit code (it was always 0): now it is 0 only when
all devices were (un)mounted successfully; otherwise non-zero.
* replaces `mount -a` with per-device post-mapping `mount {dev}`
* show mapping progress using LSB functions per device instead of for
{start|stop} invocation.
* capture output of `(u)mount` (if any) and report it as "info".
mon: OSDMonitor: be scary about inconsistent pool tier ids
We may not crash your cluster, but you'll know that this is not something
that should have happened. Big letters makes it obvious. We'd make them
red too if we bothered to look for the ANSI code.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Josh Durgin [Thu, 29 May 2014 19:23:30 +0000 (12:23 -0700)]
os: add prototype KineticStore
Implement the KeyValueDB interface using libkinetic_client,
and allow it to be configured as the backend for the KeyValueStore,
running the entire OSD on it.
This prototype implementation has no transaction safety, and is
only suitable as a proof of concept. Since the libkinetic_client
API does not provide reverse iteration over keys without also reading
the value off disk, it implements iterators in a very slow but correct way.
These are used heavily by the KeyValueDB callers, so this is a bottleneck
in performance.
Matt Benjamin [Thu, 29 May 2014 14:34:20 +0000 (10:34 -0400)]
Work around an apparent binding bug (GCC 4.8).
A reference to h->seq passed to std::pair ostensibly could not bind
because the header structure is packed. At first this looked like
a more general unaligned access problem, but the only location the
compiler rejects is a false positive.
Guang Yang [Wed, 9 Jul 2014 11:20:36 +0000 (11:20 +0000)]
Fix the PG listing issue which could miss objects for EC pool (where there is object shard and generation).
Backport: firefly Signed-off-by: Guang Yang (yguang@yahoo-inc.com)
Revert "qa: add an fsx run which turns on kernel debugging"
This reverts commit 29c33f0c057acc4e0f4e5022c97553a2dc095b21.
We don't need the debugging any more, and having two separate fsx runners
already caused one update-in-the-wrong-place issue.
Revert "qa: add an fsx run which turns on kernel debugging"
This reverts commit 29c33f0c057acc4e0f4e5022c97553a2dc095b21.
We don't need the debugging any more, and having two separate fsx runners
already caused one update-in-the-wrong-place issue.
Haomai Wang [Thu, 20 Mar 2014 08:20:39 +0000 (16:20 +0800)]
Add random cache and replace SharedLRU in KeyValueStore
SharedLRU plays pool performance in KeyValueStore with large header cache size,
so a performance optimized RandomCache could improve it.
RandomCache will record the lookup frequency of key. When evictint element,
it will randomly compare several elements's frequency and evict the least
one.
Sage Weil [Fri, 9 May 2014 15:41:33 +0000 (08:41 -0700)]
osd: cancel agent_timer events on shutdown
We need to cancel all agent timer events on shutdown. This also needs to
happen early so that any in-progress events will execute before we start
flushing and cleaning up PGs.
Backport: firefly Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Tue, 8 Jul 2014 23:11:44 +0000 (16:11 -0700)]
osd: s/applying repop/canceling repop/
The 'applying' language dates back to when we would wait for acks from
replicas before applying writes locally. We don't do any of that any more;
now, this loop just cancels the repops with remove_repop() and some other
cleanup.
Sage Weil [Tue, 8 Jul 2014 23:10:58 +0000 (16:10 -0700)]
osd: separate cleanup from PGBackend::on_change()
The generic portion of on_change() cleaned up temporary on-disk objects
and requires a Transaction. The rest is clearing out in-memory state and
does not. Separate the two.
A while ago we bumped the head version and reset the compat version to 0.
Doing this so happens to make the messenger assume that the message does
not support the compat versioning and sets the compat version to the head
version -- thus making compat = 2 when it should have been 1.
The nasty side-effect of this is that upgrading from emperor to firefly
will have emperor-leaders being unable to decode forwarded messages from
firefly-peons.
Fixes: #8727 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Sage Weil [Tue, 1 Jul 2014 21:31:11 +0000 (14:31 -0700)]
osd: clear Sessions for loopback Connections on shutdown
Starting with the fast dispatch patches, we are calling the handle_connect
on loopback. Make sure we zap them on shutdown to break the Session <->
Connection ref cycle.
Dan Mick [Thu, 3 Jul 2014 23:08:44 +0000 (16:08 -0700)]
Fix/add missing dependencies:
- rbd-fuse depends on librados2/librbd1
- ceph-devel depends on specific releases of libs and libcephfs_jni1
- librbd1 depends on librados2
- python-ceph does not depend on libcephfs1
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
We observed that the WBThrottle perfcounters are leaking upwards
at a rate of around 50-100 ios_dirtied per day. The counters are
currently not decremented in clear_object, so that's the likely
explanation. Decrement them like elsewhere in WBThrottle.
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
Haomai Wang [Thu, 20 Mar 2014 06:09:49 +0000 (14:09 +0800)]
Remove exclusive lock on GenericObjectMap
Now most of GenericObjectMap interfaces use header as argument not the union of
coll_t and ghobject_t. So caller should be responsible for maintain the
exclusive header.
Haomai Wang [Thu, 20 Mar 2014 06:04:45 +0000 (14:04 +0800)]
Add Header cache to KeyValueStore
In the performance statistic recently, the header lookup becomes the main time
consuming for the read/write operations. Most of time it occur 50% to deal with
header lookup, decode/encode logics.
Now adding header cache using SharedLRU structure which will maintain the header
cache and caller will get the pointer to the real header. It also avoid too much
header copy operations overhead.