On IBM Z the Boost tagged pointer implementation cannot use
"pointer compression" as there are no unused bits in an address;
the whole 64-bit address space is available to user space code.
Instead, Boost uses 16-byte atomics. This is always supported
on IBM Z, but depending on the particular compiler (version)
it may require linking against libatomic. The existing checks
in CheckCxxAtomic.cmake do not catch this, however, as they only
test for (up to) 8-byte atomic support.
Fixed by adding a test for 16-byte atomic support on IBM Z.
Sage Weil [Tue, 24 Sep 2019 17:05:24 +0000 (12:05 -0500)]
osd/PeeringState: skip wait state if osd set is empty
If there are no down OSDs from prior intervals, then the normal peering
process will end up contacting all of the prior OSDs and ensuring that
their prior interval is terminated during peering.
Sage Weil [Mon, 23 Sep 2019 19:46:07 +0000 (14:46 -0500)]
osd: is_replica() -> is_nonprimary()
The 'replica' term does not map well onto EC pools. More importantly,
the implementation is often wrong for EC pools, where role may be 0 or 1
for EC pools independent of whether the OSD is the primary or not.
Introduce 'nonprimary' to mean an acting osd that is not the primary.
Sage Weil [Tue, 6 Aug 2019 22:04:44 +0000 (17:04 -0500)]
osd/PeeringState: piggyback lease and ack on activation messages
The lease goes out with the MOSDPGLog or info, and the ack comes back with
the info.
We no longer need to renew the lease explicitly in
all_activated_and_committed() because we *just* piggybacked on activation.
We can just wait for the normal renew event to fire.
Sage Weil [Tue, 6 Aug 2019 03:05:38 +0000 (22:05 -0500)]
osd/PeeringState: renew before activate messages; send after activated
We want to renew before we prepeare or send activate messages so that we
have the opportunity to include leases in them (coming soon!).
And we do not want to send explicit lease messages until we know that the
peers have activate. In particular, we want to avoid queueing a notify
(via pending_activators) and then sending a lease that will arrive before
it.
If we see that a prior_readable_down_osd is known to be dead, we can
remove it from the set. And if the set is empty, we can skip the rest of
our waiting period and leave the WAIT state.
Sage Weil [Tue, 23 Jul 2019 19:07:59 +0000 (14:07 -0500)]
osd/PeeringState: track down OSDs relevant to prior_readable_until_ub
Keep track of which OSDs from the prior set we care about that affect
the prior_readable_until_ub. Note that it is only the *down* OSDs that
we have to track here, since everything in the *probe* set we will already
contact during peering (they are still up), guaranteeing that those PGs
are aware of the interval change and are no longer readable in the prior
interval.
bloom-filter: Remove POD overloads for insert and contains
These are not used in Ceph code currently, and should not
be used in the future either, since any use will introduce
byte-order dependent behavior. Remove them to prevent
accidental use.
The integer bloom filter test cases do not really match typical usage
of the bloom filter in actual Ceph code. In particular:
- the tests use consecutive integer ranges, while Ceph code uses
hash values uniformly distributed over the uint32_t space;
- the tests pass "int" to the insert and contains functions, which
causes the generic C++ POD type overload to be selected instead
of the uint32_t overload that is used by Ceph code. The POD
overload is dependent on host byte order, and behaves actually
different from the uint32_t overload on little-endian systems.
To fix these issues, this patch changes the integer tests to
always pass in uint32_t (instead of int), and to use results
of a pseudo-random number generator instead of consecutive
sequences. (We assume the period of the generator is long
enough that all values generated within one test instance
are distinct.)
This not only makes the test pass on both big- and little-endian
hosts now, but it also allows tightening of the allowable actual
false positive rates, as they now match much closer the expected
values.
Sage Weil [Fri, 27 Sep 2019 17:08:37 +0000 (12:08 -0500)]
Merge PR #30525 into master
* refs/pull/30525/head:
qa/tasks/ceph.conf.template: disable power-of-2 warning
qa/standalone/mon/health-mute: use power of 2 for pg_num
osd/OSDMap: remove remaining g_conf() usage
PendingReleaseNotes: add note for 14.2.5 so we can backport this
osd/OSDMap: health alert for non-power-of-two pg_num
Sage Weil [Fri, 27 Sep 2019 15:11:58 +0000 (10:11 -0500)]
Merge PR #30431 into master
* refs/pull/30431/head:
pybind/ceph_argparse: add :int or :float to numerical args
pybind/ceph_argparse: simplify osd name and target types
pybind/ceph_argparse: prefer field names to types in help output
pybind/ceph_argparse: more concise n=N '...'
pybind/ceph_argparse: [] (not {}) around optional args
Jeff Layton [Wed, 18 Sep 2019 12:09:25 +0000 (08:09 -0400)]
vstart_runner: allow the use of it with kernel mounts
Add a new command-line switch to allow it to use the kernel client
instead, and add all of the machinery to handle local kcephfs mounts.
Document this in the developer guide, along with the appropriate scary
warnings about using this on a machine that you care about. While we're
in there, also correct a typo about FUSE configuration.
Fixes: https://tracker.ceph.com/issues/41910 Signed-off-by: Jeff Layton <jlayton@redhat.com>
Patrick Donnelly [Fri, 27 Sep 2019 05:12:03 +0000 (22:12 -0700)]
Merge PR #29906 into master
* refs/pull/29906/head:
vstart_runner: name booleans for options differently
qa/vstart_runner.py: add an option to remove old log
qa/vstart_runner.py: make log initialization code reusable
qa/vstart_runner.py: make printing of stdout of ps optional
qa/vstart_runner.py: add timeout for vstart.sh and stop.sh
qa/vstart_runner.py: add an option to teardown cluster
Sage Weil [Tue, 23 Jul 2019 18:16:53 +0000 (13:16 -0500)]
osd/PeeringState: set WAIT state and block ops to wait for prior readable_until
If we start a new interval and the prior interval may have OSDs that
are still readable, set the WAIT state bit and block operations until
sufficient time has elapsed.
Sage Weil [Fri, 19 Jul 2019 21:52:17 +0000 (16:52 -0500)]
osd/PeeringState: refresh prior_readable_until_ub in pg_history_t on share
Before we share pg_history_t, refresh the prior_readable_until_ub to be
a simple duration from *now*, so that it is completely clock-independent.
The receiver can interpret it based on the receive time for the message,
which loses a bit of precision but is safe since this is an upper bound.
Patrick Donnelly [Thu, 26 Sep 2019 13:25:17 +0000 (06:25 -0700)]
Merge PR #29818 into master
* refs/pull/29818/head:
client/MetaRequest: Add age to MetaRequest dump
osdc/Objecter: Add age to the ops
common/ceph_time: Use fixed floating-point notation for mono_clock
Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Adam C. Emerson <aemerson@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Thu, 26 Sep 2019 13:20:48 +0000 (06:20 -0700)]
Merge PR #30202 into master
* refs/pull/30202/head:
mds: Explicitly call slave_updates with 0 size
mds: Move log_segment_seq_t into class LogSegment
mds: Reorganize class members in LogSegment header
Reviewed-by: Jos Collin <jcollin@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>