Alex Mikheev [Mon, 12 Jun 2017 08:32:38 +0000 (08:32 +0000)]
msg/async/rdma: fixes crash in fio
fio creates multiple CephContext in a single process.
Crash(es) happen because rdma stack has a global resources that
are still used from one ceph context while have already been destroyed
by another context.
The commit removes global instances of RDMA dispatcher and infiniband
and makes them context (rdma stack) specific.
Signed-off-by: Adir Lev <adirl@mellanox.com> Signed-off-by: Alex Mikheev <alexm@mellanox.com>
Sage Weil [Wed, 9 Aug 2017 20:40:43 +0000 (16:40 -0400)]
qa/suites/upgrade/jewel-x/parallel: thrash layout
We can't kill and restart osds because that will interfere with
the upgrade process. We can, however, thrash the layout by
tweaking osd weights and so on. This will exercise osd recovery
paths during the upgrade that aren't normally exercised (outside
of stress-split..which doesn't upgrade individual osds while they
are non-clean).
Sage Weil [Wed, 9 Aug 2017 16:50:57 +0000 (12:50 -0400)]
osd/PG: force rebuild of missing set on jewel upgrade
Previously we were detecting the need to rebuild missing based on
whether the "divergent_priors" omap key was present. Unfortunately,
jewel does not always set this, so it is not a reliable indicator.
(It only gets set if you actually have a divergent prior at some
point in the PG's life time on that OSD.)
Fix by using the info_struct_v on the PG to detect whether we need
to do the conversion. We didn't bump the value when we adding
the missing persistence, but the fastinfo was also added during
the same period between jewel and kraken, so it will work just as
well.
Fixes: http://tracker.ceph.com/issues/20958 Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Tue, 8 Aug 2017 22:43:22 +0000 (18:43 -0400)]
mon/Elector: force election epoch bump on start
We are generally careful when bumping the epoch so that we can join
existing rounds. However, if we restart in the middle of an election,
and change versions, we need to be certain that our previous ACK (as
$version - 1) isn't accepted as truth for the restarted daemon (running
$version) keeping the same epoch.
The conservatism with bumping is to avoid spurious election cycles, but
mon restarts are more rare, and we need them here.
Fixes: http://tracker.ceph.com/issues/20949 Signed-off-by: Sage Weil <sage@redhat.com>
amitkuma [Wed, 9 Aug 2017 10:11:59 +0000 (15:41 +0530)]
messages: Initializing members in MOSDPGUpdateLogMissing
Fixes the coverity issues:
** 1355242 Uninitialized scalar field
2. uninit_member: Non-static class member map_epoch is not initialized
in this constructor nor in any functions that it calls.
4. uninit_member: Non-static class member min_epoch is not initialized
in this constructor nor in any functions that it calls.
CID 1355242 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
6. uninit_member: Non-static class member rep_tid is not initialized
in this constructor nor in any functions that it calls.
** 1355243 Uninitialized scalar field
2. uninit_member: Non-static class member map_epoch is not initialized
in this constructor nor in any functions that it calls.
4. uninit_member: Non-static class member min_epoch is not initialized
in this constructor nor in any functions that it calls.
CID 1355243 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
6. uninit_member: Non-static class member rep_tid is not initialized
in this constructor nor in any functions that it calls.
James Page [Wed, 9 Aug 2017 09:04:37 +0000 (10:04 +0100)]
Align use of uint64_t in service_daemon::AttributeType
size_t on a 32-bit architecture is a 32 bit unsigned int which
created ambiguity when casting to bool, uint64_t or std::string
(which are boost::variants for service_daemon::AttributeType).
Align to use of uint64_t to resolve compilation failures in
all 32-bit architectures.
Adam C. Emerson [Thu, 27 Jul 2017 04:55:36 +0000 (00:55 -0400)]
throttle: Do not destroy condition variables with waiters
Destroying a condition variable on which someone is waiting is Undefined
Behavior. it's bad and terrible and awful. On some machines it makes
the destructor just outright hang.
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Marcus Watts [Sat, 5 Aug 2017 00:01:32 +0000 (20:01 -0400)]
Test bytes_sent bugs.
Rearrange logic to make it easier to measure accumulation.
Instrument the boto request/response loop to count bytes in and out.
Accumulate byte counts in usage like structure.
Compare actual usage reported by ceph against local usage measured.
Report and assert if there are any short-comings.
Remove zone placement rule that was newly added at end: tests should be rerunable.
Nit: the logic to wait for "delete_obj" is not quite right.
Fixes: http://tracker.ceph.com/issues/19870 Signed-off-by: Marcus Watts <mwatts@redhat.com>
Marcus Watts [Sun, 18 Jun 2017 22:18:39 +0000 (18:18 -0400)]
Fix bytes_sent bugs.
log bytes sent/received.
add cct to bufferingfilter
add cct to RGWRestfulIO
AccountingFilter - save cct for debugging output
implement AccountingFilter::complete_request() - account for bytes reported here.
BufferingFilter<T>::complete_request() - ignore counts from send_content_length() complete_header();
Code quality note:
this patch makes "cct" available for a lot of newly added debug
statements. The debug statements are mostly not very useful (and should
go away in the future) - *But* the "cct" logic should be redone and
incorporated into some base class (such RestfulClient) so that it is
possible to easily add in debug statements such as these in the future.
Fixes: http://tracker.ceph.com/issues/19870 Signed-off-by: Marcus Watts <mwatts@redhat.com>
Sage Weil [Fri, 4 Aug 2017 17:58:17 +0000 (13:58 -0400)]
common/LogClient: assign seq and queue atomically
The _get_mon_log_message() assumes that log_last and log_queue
are in sync, but it was previously possible to increment log_last
setting e.seq in do_log(), and only later queue it. If a racing
thread ran get_mon_log_message() in the meantime it would fail
an assertion.
Fix by assigning the seq and queueing it atomically. If the
cluster log is not enabled, use the get_next_seq() helper so that
graylog or syslog messages still have a seq assigned.
Fixes: http://tracker.ceph.com/issues/18209 Signed-off-by: Sage Weil <sage@redhat.com>
amitkuma [Tue, 8 Aug 2017 18:28:06 +0000 (23:58 +0530)]
messages: Initialization of is_primary
Fixes the coverity issue:
** 717269 Uninitialized scalar field
CID 717269 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member is_primary is not initialized
in this constructor nor in any functions that it calls.