mon: check changes to the whole CRUSH map and to tunables against cluster features
When we change the tunables, or set a new CRUSH map, we need to make sure it's
supported by all the monitors and OSDs currently participating in the cluster.
If the test is run against a cluster started with vstart.sh (which is
the case for make check), the --asok-does-not-need-root disables the use
of sudo and allows the test to run without requiring privileged user
permissions.
Somnath Roy [Wed, 2 Jul 2014 18:51:38 +0000 (11:51 -0700)]
OSD: adjust share_map() to handle the case that the osd is down
The assert was hitting while OSd is waiting for becoming healthy
in handle_osd_map(). This can happen while io is going on and
OSDs are made down forcefully by say osd thrash command.
So, the fix could be instead of asserting just return from here.
Fixes: #8646 Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
mon: OSDMonitor: 'osd pool' - if we can set it, we must be able to get it
Add support to get the values for the following variables:
- target_max_objects
- target_max_bytes
- cache_target_dirty_ratio
- cache_target_full_ratio
- cache_min_flush_age
- cache_min_evict_age
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Somnath Roy [Wed, 2 Jul 2014 18:20:29 +0000 (11:20 -0700)]
ReplicatedPG: Removed the redundant register_snapset_context call
In the get_object_context(), the get_snapset_context is been called
and the register_snapset_context is already been invoked from there.
This call seems to be redundant.
Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
Somnath Roy [Wed, 2 Jul 2014 18:01:55 +0000 (11:01 -0700)]
OpTracker: use mark_event rather than _mark_event
mark_event() interfaced changed to accept time and it is default
to 'now'. The mark_event is the wrapper function of _mark_event()
and it has the check for optracking enabled/disabled as well.
The _mark_event() is now a private function.
Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
Sage Weil [Wed, 2 Jul 2014 17:38:43 +0000 (10:38 -0700)]
qa/workunits/rest/test.py: make osd create test idempotent
Avoid possibility that we create multiple OSDs do to retries by passing in
the optional uuid arg. (A stray osd id will make the osd tell tests a
few lines down fail.)
Fixes: #8728 Signed-off-by: Sage Weil <sage@inktank.com>
Samuel Just [Tue, 1 Jul 2014 18:04:51 +0000 (11:04 -0700)]
OSD: wake_pg_waiters after dropping pg lock
Otherwise, we dispatch_session_waiting while still holding the pg lock,
which is obviously wrong. Unfortunately, this places an additional
burden on any user of _create_lock_pg, but I think it's unavoidable
since that method must atomically add the pg to the map and lock it.
If the test is run against a cluster started with vstart.sh (which is
the case for make check), the --asok-does-not-need-root disables the use
of sudo and allows the test to run without requiring privileged user
permissions.
Greg Farnum [Fri, 27 Jun 2014 21:59:23 +0000 (14:59 -0700)]
OSD: await_reserved_maps() prior to calling mark_down
send_message_osd_cluster() et al are *trying* to protect their Connection
lookups (and not re-open zapped Connections) via map reservations, but
that only works if we know that we haven't already called mark_down() on
the entities they might be looking up. So we need to await_reserved_maps
before we do any mark_down calls.
Since the waiting might take some time (fast dispatch in progress), only do
so if we are actually going to mark somebody down.
Fixes: #8512 Signed-off-by: Greg Farnum <greg@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Loic Dachary [Mon, 30 Jun 2014 19:10:41 +0000 (21:10 +0200)]
osd: improve tests for configuration updates
Extract the default values from the actual configuration instead of
having them hardcoded. Also check that lowering the osd_map_cache_size
triggers the expected warning.
John Spray [Tue, 29 Apr 2014 15:22:51 +0000 (16:22 +0100)]
mon: warn in newfs if crash_replay_interval=0
This is the setting we would apply to data pools
created automatically, so notify the user if they're
failing to use it for data pools they have created
by hand.
Signed-off-by: John Spray <john.spray@inktank.com>
John Spray [Tue, 29 Apr 2014 14:39:45 +0000 (15:39 +0100)]
osdmap: Don't create FS pools by default
Because many Ceph users don't use the filesystem,
don't create the 'data' and 'metadata' pools by
default -- they will be created by newfs if
they are needed.
Signed-off-by: John Spray <john.spray@inktank.com>
Ilya Dryomov [Wed, 25 Jun 2014 16:23:44 +0000 (20:23 +0400)]
krbd: rework the unmap retry loop
The retry loop in the unmap path turned out to be insufficient for
doing long fsx -K runs. Replace it with a single retry and then call
out to 'udevadm settle' if the retry doesn't help.
Andrey Kuznetsov [Fri, 27 Jun 2014 06:33:58 +0000 (10:33 +0400)]
[RGW, memory leak] Memory leak in RGW has been fixed: deletion of allocated pointer to pointer to Log object has been added to "on_exit" handler.
Memory leaks detector report:
$ valgrind --leak-check=full /usr/bin/radosgw -c /etc/ceph/ceph.conf -n
client.radosgw.gateway -
...
==16986== 8 bytes in 1 blocks are definitely lost in loss record 14 of 83
==16986== at 0x4A075BC: operator new(unsigned long) (vg_replace_malloc.c:298)
==16986== by 0x58980B8: ceph::log::Log::set_flush_on_exit() (in /usr/lib64/librados.so.2.0.0)
==16986== by 0x6BE3CA: global_init(std::vector<char const*, std::allocator<char const*> >*, st
==16986== by 0x5B6B0A: main (in /usr/bin/radosgw)
...
Andrey Kuznetsov [Thu, 19 Jun 2014 13:59:31 +0000 (17:59 +0400)]
[RGW, memory leak] Memory leak in RGW GC (losing pointer during allocating Ceph-context) has been fixed.
Memory leaks detector report:
...
==117947== 11,725 (200 direct, 11,525 indirect) bytes in 25 blocks are definitely lost in loss
record 82 of 82
==117947== at 0x4A075BC: operator new(unsigned long) (vg_replace_malloc.c:298)
==117947== by 0x687CC1: RGWGC::process(int, int) (in /usr/bin/radosgw)
==117947== by 0x6884D1: RGWGC::process() (in /usr/bin/radosgw)
==117947== by 0x68875C: RGWGC::GCWorker::entry() (in /usr/bin/radosgw)
==117947== by 0x55F890A: Thread::_entry_func(void*) (in /usr/lib64/librados.so.2.0.0)
==117947== by 0x34D46079D0: start_thread (in /lib64/libpthread-2.12.so)
==117947== by 0x34D42E8B6C: clone (in /lib64/libc-2.12.so)
...
Andrey Kuznetsov [Thu, 19 Jun 2014 13:56:01 +0000 (17:56 +0400)]
[RGW, memory leaks] Memory leak in RGW initialization (Inserting new connection into connections map w/o check) has been fixed.
Memory leaks detector report:
$ valgrind --leak-check=full /usr/bin/radosgw -c /etc/ceph/ceph.conf -n
client.radosgw.gateway -f
...
=16986== 1,262 (48 direct, 1,214 indirect) bytes in 1 blocks are definitely lost in loss record 81
of 83
==16986== at 0x4A075BC: operator new(unsigned long) (vg_replace_malloc.c:298)
==16986== by 0x618F0D: RGWRados::init_complete() (in /usr/bin/radosgw)
==16986== by 0x618FE6: RGWRados::initialize() (in /usr/bin/radosgw)
==16986== by 0x63BB23: RGWRados::initialize(CephContext*, bool) (in /usr/bin/radosgw)
==16986== by 0x634D20: RGWStoreManager::init_storage_provider(CephContext*, bool) (in
/usr/bin/radosgw)
==16986== by 0x5B8970: RGWStoreManager::get_storage(CephContext*, bool) (in /usr/bin/radosgw)
==16986== by 0x5B6D5D: main (in /usr/bin/radosgw)
...
Ilya Dryomov [Thu, 26 Jun 2014 13:34:19 +0000 (17:34 +0400)]
map-unmap.sh: fail if 'rbd rm' fails
Fail if 'rbd rm' fails - most probably it'd fail with "image still has
watchers" and in that case it's a bug in the kernel client which we do
want to notice. Also nuke the trap-based error handling - cleanup() is
half-baked and not really necessary here.
Ilya Dryomov [Thu, 26 Jun 2014 13:34:19 +0000 (17:34 +0400)]
map-unmap.sh: drop the get_id() logic
Take advantage of the fact that 'rbd map' will now talk to udev and
output the device that got assigned by the kernel to the newly created
mapping. Drop the get_id() cruft, udevadm settle and chown calls.
Ilya Dryomov [Fri, 27 Jun 2014 18:43:39 +0000 (22:43 +0400)]
test_librbd_fsx: use private RNG context
It is at the core of fsx to be able to reproduce the exact op sequence
that lead to a failure. Use reentrant glibc RNG functions to make sure
that nothing else has to chance to mess up the sequence.