Jason Dillaman [Thu, 30 Apr 2015 20:04:28 +0000 (16:04 -0400)]
librbd: simplify state machine handling of exclusive lock
It is expected that all IO is flushed and all async ops are cancelled
prior to releasing the exclusive lock. Therefore, replace handling of
lost exclusive locks in state machines with an assertion.
Jason Dillaman [Thu, 30 Apr 2015 19:32:38 +0000 (15:32 -0400)]
librbd: ObjectMap::aio_update can acquire snap_lock out-of-order
Detected during an fsx run where a refresh and CoR were occurring
concurrently. The refresh held the snap_lock and was waiting on
the object_map_lock, while the CoR held object_map_lock and was
waiting for snap_lock.
Jason Dillaman [Thu, 16 Apr 2015 18:15:10 +0000 (14:15 -0400)]
librbd: move copyup class method call to CopyupRequest
Move AbstractWrite's invocation of copyup to the CopyupRequest
class. The AioRequest write path will now always create a
CopyupRequest, which will now append the actual write ops to the
copyup.
Moved all parent overlap computation to within AioRequest so that
callers don't need to independently compute the overlap. Also
removed the need to pass the snap_id for write operations since
it can only be CEPH_NOSNAP.
Danny Al-Gaaf [Sat, 14 Mar 2015 00:16:31 +0000 (01:16 +0100)]
librbd/AioRequest.h: fix UNINIT_CTOR
Fix for:
CID 1274319: Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member m_object_state is not
initialized in this constructor nor in any functions that it calls.
In order to support the invariant that all state machine
callbacks occur without holding locks, transitions that
don't always involve a librados call should queue their
callback.
Jason Dillaman [Thu, 7 May 2015 19:32:27 +0000 (15:32 -0400)]
librbd: disable lockdep on AioCompletion
It is only used by clients and it causes a large slowdown
in performance due to the rate at which the lock is constructed/
destructed for each IO request.
Jason Dillaman [Thu, 30 Apr 2015 17:40:16 +0000 (13:40 -0400)]
librbd: complete cache read in a new thread context
The ObjectCacher complete the read callback while still holding
the cache lock. This introduces lock ordering issues which are
resolved by queuing the completion to execute in a clean (unlocked)
context.
Jason Dillaman [Thu, 30 Apr 2015 17:29:12 +0000 (13:29 -0400)]
common: lockdep now support unregistering once destructed
librbd use of an image hierarchy resulted in lock names being
re-used and incorrectly analyzed. librbd now uses unique lock
names per instance, but to prevent an unbounded growth of
tracked locks, we now remove lock tracking once a lock is
destructed.
Jason Dillaman [Tue, 23 Jun 2015 15:14:51 +0000 (11:14 -0400)]
librbd: only update image flags when holding exclusive lock
It was possible for a client to open an image while another client
was shrinking an image. This would result in the former invalidating
the object map on-disk if it openned the image between updating the
image header and resizing the object map.
Ken Dreyer [Tue, 14 Apr 2015 13:58:17 +0000 (07:58 -0600)]
debian: move ceph_argparse into ceph-common
Prior to this commit, if a user installed the "ceph-common" Debian
package without installing "ceph", then /usr/bin/ceph would crash
because it was missing the ceph_argparse library.
Ship the ceph_argparse library in "ceph-common" instead of "ceph". (This
was the intention of the original commit that moved argparse to "ceph", 2a23eac54957e596d99985bb9e187a668251a9ec)
http://tracker.ceph.com/issues/11388 Refs: #11388
Reported-by: Jens Rosenboom <j.rosenboom@x-ion.de> Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
(cherry picked from commit 110608e5bdd9e2f03020ad41f0c2d756684d4417)
Conflicts:
debian/ceph.install
There is no ceph_daemon.py in hammer
debian/control
Depends/Replaces/Breaks version adapted (from 9.0.0 to 0.94.2)
also adapted ceph-dbg Replaces/Breaks
Zhiqiang Wang [Fri, 20 Mar 2015 08:15:42 +0000 (16:15 +0800)]
test: potential memory leak in FlushAioPP
Should call the release function instead of deleting it to free
librbd::RBD::AioCompletion and librbd::AioCompletion. Otherwise there is
a potential memory leak.
Jason Dillaman [Tue, 28 Apr 2015 14:56:15 +0000 (10:56 -0400)]
krbd: fix incorrect types in the krbd API
The C API functions were referencing the C++ CephContext
instead of the C rados_config_t. Additionally, the ceph
namespace was missing on the Formatter class.
Thorsten Behrens [Sun, 15 Mar 2015 23:13:38 +0000 (00:13 +0100)]
Conditional-compile against minimal tcmalloc.
Certain older systems (SLE11 in this case) do not provide the full
tcmalloc functionality, due to e.g. incomplete libunwind
pieces. Use --with-tcmalloc-minimal to enable the cut-down
version.
Here's how the various mem allocator switches interact now:
--with-jemalloc: overrides --with-tcmalloc & --with-tcmalloc-minimal
--with-tcmalloc-minimal: overrides --with-tcmalloc
--with-tcmalloc: the default. use --without-tcmalloc to disable
crush/CrushTester: return EINVAL if crushtool returns non-zero
this backports a tiny part of ec02441, otherwise
CrushTester will return 1, and "ceph" cli will take it
as EPERM, which is miss leading, and fails
osd-crush.sh:TEST_crush_reject_empty.
* Back in Hammer, the osd-crush.sh individual tests did not run the
monitor, it was taken care of by the run() function. An attempt to run
another mon fails with:
This problem was introduced by cc1cc033930e8690a57674e842a003f6bbc7a242
from https://github.com/ceph/ceph/pull/4936
* replace test/mon/mon-test-helpers.sh with test/ceph-helpers.sh as
we need run_osd() in this newly added test
* update the run-dir of commands: ceph-helpers.sh use the different
convention for the run-dir of daemons.
Loic Dachary [Wed, 10 Jun 2015 21:16:01 +0000 (23:16 +0200)]
tests: display the output of failed make check runs
After a make check fails, it shows a summary but not the output of the
failed tests although they contain information to diagnose the problem.
Set the VERBOSE=true automake variable which is documented to collect
and display the failed script output at the end of a run (the content of
the test-suite.log file (valid from automake-1.11 up).
Kefu Chai [Mon, 25 May 2015 12:14:32 +0000 (20:14 +0800)]
mon: validate new crush for unknown names
* the "osd tree dump" command enumerates all buckets/osds found in either the
crush map or the osd map. but the newly set crushmap is not validated for
the dangling references, so we need to check to see if any item in new crush
map is referencing unknown type/name when a new crush map is sent to
monitor, reject it if any.
Kefu Chai [Tue, 26 May 2015 04:08:09 +0000 (12:08 +0800)]
crush/CrushTester: add check_name_maps() method
* check for dangling bucket name or type names referenced by the
buckets/items in the crush map.
* also check for the references from Item(0, 0, 0) which does not
necessarily exist in the crush map under testing. the rationale
behind this is: the "ceph osd tree" will also print stray OSDs
whose id is greater or equal to 0. so it would be useful to
check if the crush map offers the type name indexed by "0"
(the name of OSDs is always "OSD.{id}", so we don't need to
look up the name of an OSD item in the crushmap).
Samuel Just [Tue, 7 Jul 2015 18:43:01 +0000 (11:43 -0700)]
OSDMonitor: allow addition of cache pool with non-empty snaps with config
We need to be able to allow the version of ceph_test_* from earlier
versions of ceph to continue to work. This patch also adjusts the
work unit to use a single rados snap to test the condition without
--force-nonempty to ensure that we don't need to be careful about
the config value when running that script.
This fixes a problem, wherein calamari does not provide
popup drill-downs for warnings or errors, should the summary
be missing.
Calamari gets health info from /api/v1/cluster/$FSID/health.
If the data here has a summary field, this summary is provided
in a popup window:
/api/v1/cluster/$FSID/health is populated (ultimately) with
status obtained via librados python bindings from the ceph
cluster. In the case where there's clock skew, the summary
field supplied by the ceph cluster is empty.
No summary field, no popup window with more health details.