Sage Weil [Tue, 24 Sep 2019 17:05:24 +0000 (12:05 -0500)]
osd/PeeringState: skip wait state if osd set is empty
If there are no down OSDs from prior intervals, then the normal peering
process will end up contacting all of the prior OSDs and ensuring that
their prior interval is terminated during peering.
Sage Weil [Mon, 23 Sep 2019 19:46:07 +0000 (14:46 -0500)]
osd: is_replica() -> is_nonprimary()
The 'replica' term does not map well onto EC pools. More importantly,
the implementation is often wrong for EC pools, where role may be 0 or 1
for EC pools independent of whether the OSD is the primary or not.
Introduce 'nonprimary' to mean an acting osd that is not the primary.
Sage Weil [Tue, 6 Aug 2019 22:04:44 +0000 (17:04 -0500)]
osd/PeeringState: piggyback lease and ack on activation messages
The lease goes out with the MOSDPGLog or info, and the ack comes back with
the info.
We no longer need to renew the lease explicitly in
all_activated_and_committed() because we *just* piggybacked on activation.
We can just wait for the normal renew event to fire.
Sage Weil [Tue, 6 Aug 2019 03:05:38 +0000 (22:05 -0500)]
osd/PeeringState: renew before activate messages; send after activated
We want to renew before we prepeare or send activate messages so that we
have the opportunity to include leases in them (coming soon!).
And we do not want to send explicit lease messages until we know that the
peers have activate. In particular, we want to avoid queueing a notify
(via pending_activators) and then sending a lease that will arrive before
it.
If we see that a prior_readable_down_osd is known to be dead, we can
remove it from the set. And if the set is empty, we can skip the rest of
our waiting period and leave the WAIT state.
Sage Weil [Tue, 23 Jul 2019 19:07:59 +0000 (14:07 -0500)]
osd/PeeringState: track down OSDs relevant to prior_readable_until_ub
Keep track of which OSDs from the prior set we care about that affect
the prior_readable_until_ub. Note that it is only the *down* OSDs that
we have to track here, since everything in the *probe* set we will already
contact during peering (they are still up), guaranteeing that those PGs
are aware of the interval change and are no longer readable in the prior
interval.
Sage Weil [Tue, 23 Jul 2019 18:16:53 +0000 (13:16 -0500)]
osd/PeeringState: set WAIT state and block ops to wait for prior readable_until
If we start a new interval and the prior interval may have OSDs that
are still readable, set the WAIT state bit and block operations until
sufficient time has elapsed.
Sage Weil [Fri, 19 Jul 2019 21:52:17 +0000 (16:52 -0500)]
osd/PeeringState: refresh prior_readable_until_ub in pg_history_t on share
Before we share pg_history_t, refresh the prior_readable_until_ub to be
a simple duration from *now*, so that it is completely clock-independent.
The receiver can interpret it based on the receive time for the message,
which loses a bit of precision but is safe since this is an upper bound.
Patrick Donnelly [Thu, 26 Sep 2019 13:25:17 +0000 (06:25 -0700)]
Merge PR #29818 into master
* refs/pull/29818/head:
client/MetaRequest: Add age to MetaRequest dump
osdc/Objecter: Add age to the ops
common/ceph_time: Use fixed floating-point notation for mono_clock
Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Adam C. Emerson <aemerson@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Thu, 26 Sep 2019 13:20:48 +0000 (06:20 -0700)]
Merge PR #30202 into master
* refs/pull/30202/head:
mds: Explicitly call slave_updates with 0 size
mds: Move log_segment_seq_t into class LogSegment
mds: Reorganize class members in LogSegment header
Reviewed-by: Jos Collin <jcollin@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Jeff Layton [Thu, 26 Sep 2019 11:50:28 +0000 (07:50 -0400)]
ceph.spec.in: fix Cython package dependency for Fedora
Fedora distros do not have python3?-Cython packages, but they do have
python3-Cython ones. Fix the BuildRequires so that we only use the
python3_version_nodots based version string for RHEL.
Fixes: https://tracker.ceph.com/issues/42032 Signed-off-by: Jeff Layton <jlayton@redhat.com>
Kefu Chai [Fri, 30 Aug 2019 11:49:28 +0000 (19:49 +0800)]
ceph.spec.in: s/pkgversion/version_nodots/
`python3_pkgversion` is now defined as 3, while we don't have packages
like python3-Cython yet in EPEL7. but we do have `python36-Cython`. so
let's use `python3_version_nodots` instead.
Patrick Donnelly [Tue, 24 Sep 2019 11:32:28 +0000 (04:32 -0700)]
Merge PR #29824 into master
* refs/pull/29824/head:
qa: whitelist new FS_INLINE_DATA_DEPRECATED health warning
mds: add a HEALTH_WARN message when inline_data is enabled
mds: log a warning message when mds is started on an fs with inline_data
mon: deprecate CephFS inline_data support
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Douglas Fuller <dfuller@redhat.com>
* pass CEPH_BIN env variable if necessary
* do not 'make' unless necessary
* use `cmake --build` as developer might be using some different cmake
generator for building ceph.
Sage Weil [Mon, 23 Sep 2019 18:20:29 +0000 (13:20 -0500)]
mon/MonClient: skip CEPHX_V2 challenge if client doesn't support it
If the client doesn't support the CEPHX_V2 challenge, and we don't require
it, skip it. This allows the client to authenticate without getting an
error like
cephx: verify_reply couldn't decrypt with error: error decoding block for decryption
Note that we don't have this problem in the monitor exchange in
Monitor::handle_auth_request() because that verify_authorizer() caller is
only used for msgrv2, and all such clients support CEPHX_V2. Instead,
those client authenticate via the MAuth messages, a path that does not use
authorizers at all.
Fixes: https://tracker.ceph.com/issues/40716 Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 23 Sep 2019 14:12:42 +0000 (09:12 -0500)]
Merge PR #30475 into master
* refs/pull/30475/head:
qa/standalone/ceph-helpers: default pg autoscale mode off for standalone
os/bluestore: fix objectstore_blackhole read-after-write
test,misc: do not specify pg_num per pool
mgr/volumes: do not specify pg_num
pybind/ceph_volume_client: do not specify pg_num for new pools
doc: remove all pg_num arguments to 'osd pool create'
mon: do not require pg_num to 'osd pool create'
common: default pg_autoscale_mode=on for new pools
ceph-volume: do not fail when trying to remove crypt mapper
In a containerized context, at some point, need to run `simple scan` on a device
from a separate container (not the existing and running corresponding container
to that device), but this can't work because when it tries to remove the
mapper which is still in use by the corresponding running osd container,
it fails.
This can be a bit more permissive and simply throw a warning.
Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: J. Eric Ivancich <ivancich@redhat.com> Reviewed-by: Neha Ojha <nojha@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>