Sage Weil [Mon, 29 Apr 2019 16:04:15 +0000 (11:04 -0500)]
Merge PR #27804 into master
* refs/pull/27804/head:
common, osd: remove erroneous 'section' in dump functions
dencoder: include some missed types
common: make utime_t dencoder-compatible
common: make uuid_d dencoder-compatible
osd: make watch_item_t dencoder-compatible
crimson/mon: hold rotating_keyring using a pointer
we could have kept rotating_keyring as a plain member variable, but
`AuthClientHandler` keeps a weak reference to it, we are not able to
update it to point it to the new rotating_keyring in the newly created
active_con created from the connected pending connection -- the move
ctor of ceph::mon::Connection does create a new RotatingKeyring. so this
has two consequences:
- the raw pointer held by AuthClientHandler is not valid anymore, after
the pending connection is destroyed when it is promoted to active_con.
so we are writing to freed memory when renewing the rotating keyring.
- we won't have access to the updated keyring.
if we use a std::vector<unique_ptr<Connection>> for pending_conns, this
change won't be needed. we could make this change in a future change
though.
auth,mon,crimson: pass KeyStore by const reference
AuthAuthorizeHandler::verify_authorizer() neither changes the keystore,
nor expects a nullptr. so we should pass the keystore by const reference
for better readability
crimson-common and ceph-common have subtle difference which is
conditionalized by WITH_SEASTAR macro. and the different implementations
share the same name, so we should not link crimson executables against ceph-common .
crimson messenger's tests does not use ConfigProxy at the time of writing,
and ldout() cannot tell if a log message should be written to the
logfile or not without querying config subsystem, so, to avoid involving
config system in crimson messenger's tests, the log line is disabled if
WITH_SEASTAR, we will kill this workaround once we ditch crimson/net/Config.h.
auth/cephx: do not require a cct for accessing config
crimson/msgr does not depend on ConfigProxy at the time of writing. we
want to decouple it from config subsystem in hope to have less cross
subsystem dependencies and speed up the compiling speed.
but, to introduce cephx to msgr, we need to enable msgr to read from
ConfigProxy at runtime. so, to avoid keeping a CephContext instance
simply for accessing the ConfigProxy reference in it, it would be more
efficient to define a free function `conf()` which hides the difference
between crimson and classic user of auth/cephx.
CephxAuthorizeHandler::verify_authorizer() uses it for querying config
and for a RNG, so we have to pass it a CephContext which has `_conf` and
`random()`.
This flag is used for compatibility with pre-nautilus OSDs, which do not
send authorizers on the OSD heartbeat connections. However, because the
AuthServer is implemented by MonClient, which is shared across all
OSD messengers, we can't set this to false for the OSD without disabling
all auth. Instead, make it a Messenger property, and set it only on the
heartbeat server messengers.
Sage Weil [Sun, 28 Apr 2019 13:43:28 +0000 (08:43 -0500)]
msg: set_require_authorizer on messenger, not dispatcher
This flag is used for compatibility with pre-nautilus OSDs, which do not
send authorizers on the OSD heartbeat connections. However, because the
AuthServer is implemented by MonClient, which is shared across all
OSD messengers, we can't set this to false for the OSD without disabling
all auth. Instead, make it a Messenger property, and set it only on the
heartbeat server messengers.
Jason Dillaman [Fri, 19 Apr 2019 18:46:02 +0000 (14:46 -0400)]
librbd: removed 'ImageCtx::md_lock'
This lock used to protect the IO pathway to prevent writes but
that is now handled by the io::ImageRequestWQ. Additional
historical uses have been temporarily moved to the
'ImageCtx::image_lock'
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
crimson/net: prefer <fmt/chrono.h> over <fmt/time.h>
in latest libfmt, <fmt/time.h> is deprecated.
to silence warnings like
/home/kchai/ceph/src/fmt/include/fmt/time.h:13:2: warning: #warning
fmt/time.h is deprecated, use fmt/chrono.h instead [-Wcpp]
#warning fmt/time.h is deprecated, use fmt/chrono.h instead
^~~~~~~
In file included from
/home/kchai/ceph/src/seastar/include/seastar/core/reactor.hh:72:0,
from
/home/kchai/ceph/src/seastar/include/seastar/core/sharded.hh:24,
from /home/kchai/ceph/src/crimson/net/Fwd.h:18,
from /home/kchai/ceph/src/crimson/net/Protocol.h:9,
from /home/kchai/ceph/src/crimson/net/ProtocolV2.h:6,
from /home/kchai/ceph/src/crimson/net/ProtocolV2.cc:4:
Sage Weil [Fri, 26 Apr 2019 15:40:31 +0000 (10:40 -0500)]
Merge PR #27655 into master
* refs/pull/27655/head:
common/options: flag misc ms_* options STARTUP
common/options: flag misc options STARTUP
common/options: mark cluster log options with FLAG_RUNTIME, use get_val
common/options: mark a bunch of options with FLAG_STARTUP
Jan Fajerski [Mon, 15 Apr 2019 13:35:09 +0000 (15:35 +0200)]
monitoring: add a few prometheus alerts
Alerts are from
https://github.com/SUSE/DeepSea/blob/SES5/srv/salt/ceph/monitoring/prometheus/files/ses_default_alerts.yml
but updated for the mgr module and node_exporter >= 0.15.
osd/PG: do not use approx_missing_objects pre-nautilus
We changed async recovery cost calculation in nautilus to also take
into account approx_missing_objects in ab241bf7e927cda2d0ed1698383d18dc4a4b601c
This commit depends on https://github.com/ceph/ceph/pull/23663, hence
wasn't backported to mimic.
Mimic only uses the difference in length of logs as the cost. Due to this,
the same OSD might have different costs in a mixed mimic and nautilus(or above)
cluster. This can lead to choose_acting() cycling between OSDs, when trying
to select the acting set and async_recovery_targets.
David Zafman [Thu, 18 Apr 2019 02:41:58 +0000 (19:41 -0700)]
mgr: If the requested OSD is down don't trust osd_stat info
If we have a down AND out OSD it may contains osd_stat with num_pgs == 0.
When all PGs aren't active+clean we need an accurate value or we consider
the osd missing stat info.
Fixes: https://tracker.ceph.com/issues/38930 Signed-off-by: David Zafman <dzafman@redhat.com>
Sage Weil [Thu, 25 Apr 2019 15:49:04 +0000 (10:49 -0500)]
os/bluestore: correctly measure deferred writes into new blobs
Writes into new blobs were all counted as write_small_new, but those can
still be deferred later in _do_alloc_write if they are <= than the
prefer_deferred setting.
librbd: the first post-migration snapshot isn't always dirty
Currently, the first post-migration snapshot is always marked EXISTS
(i.e. dirty). This is wrong, because the data can be inherited from
a pre-migration snapshot, handled by deep copy.
Mark all post-migration snapshots EXISTS_CLEAN in this case.
librbd: don't update snapshot object maps if copyup data is all zeros
If the data read from the parent is all zeros, deep copyup isn't
performed. However snapshot object maps are updated unconditionally,
causing inconsistencies where nonexistent objects are marked
OBJECT_EXISTS or OBJECT_EXISTS_CLEAN.