Kefu Chai [Fri, 15 Mar 2019 04:36:09 +0000 (12:36 +0800)]
crimson/osd: call engine().exit(0) after mkfs
* crimson/osd/main.cc:
if crimson-osd is launched as a daemon, we can always stop it using
SIGINT, and seastar::reactor does stop itself when handling this signal.
but if we start crimson-osd as an ordinary command line tool, we should
stop it explicitly after it is done with its job. so call
seastar::engine().exit().
* crimson/osd/osd.cc:
do not stop unitialized services in OSD::stop(). OSD is initialized as
a sharded service in main.cc, so we have to stop it before stopping
engine. if OSD is used for mkfs, the internal services like heartbeat,
monc, mgrc are not initialized at all, so we should not stop them in
this case. and in theory, user could stop crimson-osd when it boots,
so we need to check the pointers for null before dereferencing them.
Kefu Chai [Tue, 5 Mar 2019 04:05:31 +0000 (12:05 +0800)]
crimson: add AuthService
AuthService is introduced to allow Dispatchers to access to
authorizers, even if it is not chained with a Dispatcher implementing
Dispatcher::ms_get_authorizer(). in this case, we need to grant access to
Heartbeat class. it has its own messengers dedicated for heartbeat
traffic. it's mon::Client which provides the facilities of authorization
via Dispatcher interface.
we could just cast mon::Client to ceph::common::Dispatch for accessing
Dispatcher::ms_get_authorizer(), but i want to make this explicit using
AuthService. as the consumers of Dispatch inteface is messenger and
ChainedDispatcher not the domain specific classes.
in future, we need to either implement Auth{Client,Server} or adapt to
this machinery for msgr V2.
Kefu Chai [Mon, 4 Mar 2019 05:01:50 +0000 (13:01 +0800)]
crimson/{net,osd}: make ms_get_authorizer() sync
the authorizer manager does not perform (significant) i/o for building
an authorizer. see CephXTicketHandler::build_authorizer(). what it does
is but read random bytes using getentropy(3) which uses getrandom(2).
getrandom(2) could potentially block if the system just boots and does
not have enough randomness. but i think it's safe to assume that we have
enough entrophy when crimson-osd starts.
Patrick Donnelly [Tue, 19 Mar 2019 20:46:39 +0000 (13:46 -0700)]
Merge PR #26056 into master
* refs/pull/26056/head:
mds: check earlier if directories are already exported
mds: dont print auth trees if they are too many
mds: dont print subtrees if they are too many/big
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Tue, 19 Mar 2019 20:13:05 +0000 (13:13 -0700)]
Merge PR #26895 into master
* refs/pull/26895/head:
mds: convert unnecessary list usage to vector
mds: convert get_*dirfrags to use vector
mds: convert iterator loop to generic loop
mds: list to std::list
mds: constantize dump_loads
Reviewed-by: Jos Collin <jcollin@redhat.com> Reviewed-by: Zheng Yan <zyan@redhat.com>
Mykola Golub [Tue, 19 Mar 2019 12:04:23 +0000 (12:04 +0000)]
librbd: fix typo in deep_copy::ObjectCopyRequest::compute_read_ops
The second arg for interval_set insert method is the inserting
interval lenth, while the end position was provided. It still
worked correctly, because the end position value is always larger
than the truncated length.
The stress-split thrasher already had this off, but the ec variant did
not. We don't support ceph-objectstore-tool exports/imports between major
versions.
Fixes: http://tracker.ceph.com/issues/38294 Signed-off-by: Sage Weil <sage@redhat.com>
Kefu Chai [Sun, 17 Mar 2019 07:21:14 +0000 (15:21 +0800)]
osd: transpose two wait lists in comment
see PrimaryLogPG::do_request(), we check for
1. is_peered(), then
2. flushes_in_progress, then
3. is_active()
4. scrubber.is_chunky_scrub_active() && write_blocked_by_scrub(head) in
PrimaryLogPG::do_op() which is called by PrimaryLogPG::do_request().
while in PrimaryLogPG::on_change()
we requeue the waiting request in the reversed order,
so the comment is not in sync with the code. in this change,
"waiting_for_active" and "waiting_for_flush" are transposed in the
comment explaining blocked request wait lists.
also, sync the pre-conditions of "waiting_for_peered" and
"waiting_for_flush" with "PrimaryLogPG::do_request()"
Sage Weil [Sat, 16 Mar 2019 20:06:00 +0000 (15:06 -0500)]
mon/OSDMonitor: allow 'osd pool set pgp_num_actual'
Normally we let the mgr control pgp_num_actual for us in a nice, safe, controlled
way. However, it is very conservative, and only makes changes if all PGs are healthy.
There are situations where the user wants to be move aggressive than this.
For example, if you have a pool with many PGs (say, 4096) and set pg_num_target to a
small number like 4, the mgr will adjust pgp_num way down. This can lead to an OSD
hitting max_pgs_per_osd. That prevents the PGs from being active+clean, however,
which prevents the mgr adjusting pgp_num back up even if the user sets the target to
a larger value.
This patch lets the user directly adjust pgp_num_actual. Note that we still do
not expose access to pg_num_actual, since there are much stricter conditions that
must be true in order to safely make downward adjustments.
The idea of this change is to use a more allocation efficient structure.
For reasons I don't understand, this patch caused CInode::get_nested_dirfrags
and CInode::get_subtree_dirfrags to fail to compile when in-lined in the
header. The problem was that the inlined methods tried to access the CDir
internals when CDir is an incomplete types. What confuses me is that those
inlined methods ever compiled. In any case, I have moved the methods to the
CInode.cc source to avoid the issue.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Sage Weil [Fri, 15 Mar 2019 17:24:52 +0000 (12:24 -0500)]
osd/PG: fix pg merge check for rc clusters
If a cluster had a pg merge pending before last_pg_merge_meta was
introduced then the source_pgid will be pg_t(). If that's the case,
skip these new checks.
Likewise, if we decode a legacy pg_pool_t, put the old merge les/lec
values into the correct location.
Sage Weil [Fri, 15 Mar 2019 17:08:34 +0000 (12:08 -0500)]
Merge PR #26965 into nautilus
* refs/pull/26965/head:
ms/async/ProtocolV2: add ms_die_on_bug and assert rxbuf/txbuf don't get big
msg/async/ProtocolV2: do not reenable pre_auth buffering on from reset_recv_state
Neha Ojha [Mon, 4 Mar 2019 04:29:05 +0000 (20:29 -0800)]
osd/PG: skip rollforward when !transaction_applied during append_log()
Earlier, we did pg_log.roll_forward(&handler), when
!transaction_applied, which advanced the crt and trimmed the entries
in rollforward(). Due to this, during _merge_object_divergent_entries(),
when we tried to rollback entries, those objects were not found in the
backend, and thus we hit this bug http://tracker.ceph.com/issues/36739.
With this change, we are advancing the crt value, without deleting the
objects, so that _merge_object_divergent_entries() does not fail
because of deleted objects.