mon: DataHealthService: monitor backing store's size and report it
If the store's size grows beyond what we believe to be reasonable, we must
let the user know that something fishy may be going on. This intends to
act as an early warning system for monitors suffering from leveldb
compaction issues. However, if the monitor's store is just growing a lot
due to normal cluster behaviour, we made sure that the warning threshold
is adjustable by tuning 'mon_leveldb_size_warn' (defaulting to 40GB).
Fixes: #5909 Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
os: KeyValueDB: expose interface to obtain estimated store size
On LevelDBStore, instead of using leveldb's GetApproximateSizes() function,
we will instead assess what's the store's raw size from the contents of
the store dir (this means .sst's, .log's, etc). The reason behind this
approach is that GetApproximateSizes() would expect us to provide a range
of keys for which to obtain an approximate size; on the other hand, what we
really want is to obtain the size of the store -- not the size of the
data (besides, with the compaction issues we've been seeing, we wonder
how reliable such approximation would be).
Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
Sage Weil [Thu, 15 Aug 2013 21:36:57 +0000 (14:36 -0700)]
config: fix stringification of config values
The std::copy construct leaves a trailing separator character, which breaks
parsing for booleans (among other things) and probably mangles everything
else too.
Backport: dumpling Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Sage Weil [Thu, 15 Aug 2013 20:42:50 +0000 (13:42 -0700)]
config: fix stringification of config values
The std::copy construct leaves a trailing separator character, which breaks
parsing for booleans (among other things) and probably mangles everything
else too.
Backport: dumpling Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
Josh Durgin [Wed, 14 Aug 2013 22:28:19 +0000 (15:28 -0700)]
rados.py: fix Rados() backwards compatibility
Previously it had no name parameter, so the default will be used by
old clients. However, if an old client set rados_id, a new check that
both rados_id and name are set would result in an error. Fix this by
only applying the default names after the check, and add tests of this
behavior.
Fixes: #5970 Reported-by: Michael Morgan <mmorgan@dca.net> Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Sage Weil <sage.weil@inktank.com>
Sage Weil [Tue, 13 Aug 2013 19:52:41 +0000 (12:52 -0700)]
librados: fix async aio completion wakeup
For aio flush, we register a wait on the most recent write. The write
completion code, however, was *only* waking the waiter if they were waiting
on that write, without regard to previous writes (completed or not).
For example, we might have 6 and 7 outstanding and wait on 7. If they
finish in order all is well, but if 7 finishes first we do the flush
completion early. Similarly, if we
Josh Durgin [Tue, 13 Aug 2013 02:17:09 +0000 (19:17 -0700)]
librados: fix locking for AioCompletionImpl refcounting
Add an already-locked helper so that C_Aio{Safe,Complete} can
increment the reference count when their caller holds the
lock. C_AioCompleteAndSafe's caller is not holding the lock, so call
regular get() to ensure no racing updates can occur.
This eliminates all direct manipulations of AioCompletionImpl->ref,
and makes the necessary locking clear.
The only place C_AioCompleteAndSafe is used is in handling
aio_flush_async(). This could cause a missing completion.
Refs: #5919 Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Tested-by: Oliver Francke <Oliver.Francke@filoo.de>
Laurent Barbe [Tue, 13 Aug 2013 15:16:35 +0000 (17:16 +0200)]
init-rbdmap: fix for recursive umount
Umount is not always done in the correct order.
For exemple in that case :
/dev/rbd1 on /myrbd
/dev/rbd2 on /myrbd/.snapshots/@GMT-2013.08.09-10.14.44
rbd2 should be umounted before rbd1
Gary Lowell [Tue, 6 Aug 2013 00:57:26 +0000 (17:57 -0700)]
Makefile.am: fix libglobal.la races
Several targets had libglobal.la in the _LDFLAGS macro definition
when it should have been in the _LDADD macro. Remove those occurrance
and add the LIBGLOBAL_LDA macro to the targets _LDADD instead.
Yehuda Sadeh [Mon, 12 Aug 2013 17:05:44 +0000 (10:05 -0700)]
rgw: fix multi delete
Fixes: #5931
Backport: bobtail, cuttlefish
Fix a bad check, where we compare the wrong field. Instead of
comparing the ret code to 0, we compare the string value to 0
which generates implicit casting, hence the crash.
Sage Weil [Sat, 10 Aug 2013 01:02:32 +0000 (18:02 -0700)]
ceph-disk: fix mount options passed to move_mount
Commit 6cbe0f021f62b3ebd5f68fcc01a12fde6f08cff5 added a mount_options but
in certain cases it may be blank. Fill in with the defaults, just as we
do in mount().
Backport: cuttlefish Reviewed-by: Dan Mick <dan.mick@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
common: pick_addresses: fix bug with observer class that triggered #5205
The Observer class we defined to observe conf changes and thus avoid
triggering #5205 (as fixed by eb86eebe1ba42f04b46f7c3e3419b83eb6fe7f9a),
was returning always the same const static array, which would lead us to
always populate the observer's list with an observer for 'public_addr'.
This would of course become a problem when trying to obtain the observer
for 'cluster_add' during md_config_t::set_val() -- thus triggering the
same assert as initially reported on #5205.
Backport: cuttlefish Fixes: #5205 Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com> Reviewed-by: Sage Weil <sage@inktank.com>
Yehuda Sadeh [Fri, 9 Aug 2013 18:52:25 +0000 (11:52 -0700)]
rgw: return 423 Locked response when failing to lock object
Fixes: #5882
Translate the EBUSY we get when trying to lock a shard / object
to 423 Locked response. Beforehand it was just translated to the
default 500.
Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Yehuda Sadeh [Fri, 9 Aug 2013 16:31:28 +0000 (09:31 -0700)]
rgw: rename data receive callbacks in swift token revocation
Fixes: #5921
As part of the work that was made for dumpling, the http
client in-data callback was renamed in order to avoid confusion.
However, we missed the rename in a couple of places, which this
patch amend.
Reported-by: Roald van Loon <roaldvanloon@gmail.com> Reviewed-by: Sage Weil <sage@inktank.com> Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Dan Mick [Tue, 6 Aug 2013 01:18:59 +0000 (18:18 -0700)]
ceph.in: Re-enable ceph interactive mode (missing its output).
Also, loop on error. There's no reason to exit the interpreter loop on
an error, and it's probably less annoying if we don't. Print the error,
and any output, and continue.
Fixes: #5746 Signed-off-by: Dan Mick <dan.mick@inktank.com>
Sage Weil [Thu, 8 Aug 2013 15:30:01 +0000 (08:30 -0700)]
mon: fix 'osd crush rule rm ...' dup arg
This was broken way back in 0d66c9ebbf626117c641c975a8682a0aaba588c4, but
we were ignoring the dup until recently.
t Signed-off-by: Sage Weil <sage@inktank.com>
If an entity already existed, 'auth add' would smash its key and caps
with whatever was on the supplied keyring file; if a keyring weren't
specified, we would simply generate a new key and destroy all existing
caps (unless caps were specified and happened to be different from the
already in-place caps). This behaviour is obviously sketchy.
With this patch we now enforce the following behaviour:
- if entity does not exist in current state, check if we are about to
create it (by checking the pending state); if so, wait for the new state
to be committed and re-handle the command then, so we don't get bad
results from pending request
- if the command reproduces the current state (same key, same caps), we
return 0; else,
- if entity exists and supplied key OR caps are different, return -EINVAL
- else create a new entity.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Dan Mick [Sat, 3 Aug 2013 04:26:51 +0000 (21:26 -0700)]
mon/PGMonitor: add 'pg dump pgs_brief' subcommand
It is useful to map OSDs to PGs and vice-versa; pg dump gives that
information, but gives a lot of other stuff. This is the same dump
as pg dump pgs, but omitting everything except pgid, state, and
osd up and acting sets.
Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>