Adam Crume [Wed, 13 Aug 2014 18:42:00 +0000 (11:42 -0700)]
lttng: Remove tracing from libcommon
This is a short-term fix for issues caused by tracepoints in libcommon.
Code crashes at runtime if the same tracepoints are linked into the
program multiple times. This happens with libcommon because it is
statically linked into dynamic libraries such as librados, then
statically linked into executables because symbols from libcommon are
not exposed in librados. Therefore, any programs that use librados and
libcommon would crash because of duplicate tracepoints.
Adam Crume [Thu, 7 Aug 2014 16:05:00 +0000 (09:05 -0700)]
rbd-replay: Fix compiler warning in unit tests
Was getting:
test/test_rbd_replay.cc:44:3: warning: converting ‘false’ to pointer type for argument 1 of ‘char testing::internal::IsNullLiteralHelper(testing::internal::Secret*)’ [-Wconversion-null]
Fixed by changing EXPECT_EQ(false, xxx) to EXPECT_FALSE(xxx).
For completeness, also changed EXPECT_EQ(true, xxx) to EXPECT_TRUE(xxx).
Adam Crume [Thu, 31 Jul 2014 23:22:44 +0000 (16:22 -0700)]
rbd-replay: Support replaying partial traces
Tracing may start after the application is started, and image open calls
may missed. To support replaying these traces, additional information is
traced, allowing missing open calls to be generated.
Adam Crume [Mon, 28 Jul 2014 23:32:15 +0000 (16:32 -0700)]
lttng: Preload liblttng-ust-fork.so in TESTS_ENVIRONMENT
This adds LD_PRELOAD=liblttng-ust-fork.so to TESTS_ENVIRONMENT.
This prevents lttng from complaining when processes are forked.
The complaints otherwise taint the output and cause tests to fail.
Adam Crume [Thu, 17 Jul 2014 22:01:42 +0000 (15:01 -0700)]
rbd-replay: Switch logging from cout to dout
To enable logs, we also have to use global_init to parse our
command-line args, so we now have other standard Ceph goodies
such as picking up config options from the environment.
This adds objectstore tracepoints for the filestore. It'd be nice to add
these to the objectstore interface some how so we can get all
implementations for free, but that might just be a bit difficult
especially since each impl will apply transactions in a differnet way.
TrackedOp: Removed redundant lock in OpTracker::_mark_event()
ops_in_flight_lock seems redundant in OpTracker::_mark_event()
and this lock is highly contended for. Removing the same
is giving a significant performance boost.
Somnath Roy [Mon, 18 Aug 2014 23:59:36 +0000 (16:59 -0700)]
CollectionIndex: Collection name is added to the access_lock name
The CollectionIndex constructor is changed to accept the coll_t
so that the collection name can be used to form access_lock(RWLock)
name.This is needed otherwise lockdep will report a recursive lock error
and assert. lockdep needs unique lock names for each Index object.
Fixes: #9145 Signed-off-by: Somnath Roy <somnath.roy@sandisk.com>
Loic Dachary [Mon, 18 Aug 2014 23:30:15 +0000 (01:30 +0200)]
erasure-code: preload the jerasure plugin
Load the jerasure plugin when ceph-osd starts to avoid the following
scenario:
* ceph-osd-v1 is running but did not load jerasure
* ceph-osd-v2 is installed being installed but takes time : the files
are installed before ceph-osd is restarted
* ceph-osd-v1 is required to handle an erasure coded placement group and
loads jerasure (the v2 version which is not API compatible)
* ceph-osd-v1 calls the v2 jerasure plugin and does not reference the
expected part of the code and crashes
Although this problem shows in the context of teuthology, it is unlikely
to happen on a real cluster because it involves upgrading immediately
after installing and running an OSD. Once it is backported to firefly,
it will not even happen in teuthology tests because the upgrade from
firefly to master will use the firefly version including this fix.
While it would be possible to walk the plugin directory and preload
whatever it contains, that would not work for plugins such as jerasure
that load other plugins depending on the CPU features, or even plugins
such as isa which only work on specific CPU.
Sage Weil [Tue, 12 Aug 2014 03:54:38 +0000 (20:54 -0700)]
mon/OSDMonitor: respect CRUSH weights for reweight-by-pg
Do not assume that all OSDs are weighted equally for reweight-by-pg.
Note that reweight-by-utilization already reweights based on the size of
the OSD volume; we presume that this is already reflected by the CRUSH
weights.
Sage Weil [Wed, 6 Aug 2014 15:51:18 +0000 (08:51 -0700)]
mon/OSDMonitor: reweight-by-pg for pool(s)
Allow the reweight-by-pg to look at a specific set of pools. If the list
is ommitted, use PGs from all pools. This allows you to focus on a
specific pool (the one that will dominate data usage). Otherwise things
may not be quite right because other pools may have PGs that contain
much less data.
Sage Weil [Wed, 6 Aug 2014 15:35:07 +0000 (08:35 -0700)]
mon/OSDMonitor: adjust weights up, when possible
Note when OSDs are underloaded, as well. If that is the case, adjust the
OSD reweight value if, if possible. (It won't always be possible since
weights are capped at 1.)
Note that we set the underload threshold to the average, as we want to
aggressively adjust weights up (back to 1.0) whenever possible. This gets
us a more efficient mapping calculation and reduces the amount of "noise"
in the weights.
Sage Weil [Mon, 4 Aug 2014 22:40:35 +0000 (15:40 -0700)]
mon/OSDMonitor: reweight-by-pg
This is just like reweight-by-utilization, but looks purely at the PG to
OSD mapping, not at the number of bytes used on the target disks. This
allows the reweighting to be done before any data is written into the
cluster, when no data will need to migrate as a result of the reweight.