Kefu Chai [Fri, 15 Mar 2019 11:15:43 +0000 (19:15 +0800)]
crimson/osd: create msgrs in main.cc
messengers are sharded<Service>. we should not create them in another
sharded service's start() method. to ensure the ordering of stop of
sharded services, we should create the sharded services in main().
and register their stop() method in the proper order.
Kefu Chai [Thu, 28 Feb 2019 10:44:04 +0000 (18:44 +0800)]
crimson/osd: skip fast_info if it is not around
fast_info is optional. for instance, there is chance that it
is the first time the info is written to store, in that case,
there is no delta, i.e. fast info yet.
Kefu Chai [Sat, 16 Mar 2019 02:48:17 +0000 (10:48 +0800)]
crimson/osd: pass unknown args to ConfigProxy::parse_args()
these args are not recognized by seastar's app_template's option parser,
so they are likely to be the ceph options and should be fed to
ConfigProxy::parse_args()
Kefu Chai [Thu, 14 Mar 2019 10:53:36 +0000 (18:53 +0800)]
crimson/osd: do not pass ceph options to seastar
the program_option parser used by seastar::app_template does not allow
unrecognized options. but ceph options can be specfied by being passed
to ceph applications as command line options, for instance, we can
specify the "key" or "keyfile" when creating an objectstore using
"--mkfs", like:
ceph-osd --mkfs --key <key>
in this change, all options known by seastar's app_template are
enumerated and stored into a separated vector. so it can be passed to
"app". and the unknown ones are passed to ceph functions.
Kefu Chai [Fri, 15 Mar 2019 04:36:09 +0000 (12:36 +0800)]
crimson/osd: call engine().exit(0) after mkfs
* crimson/osd/main.cc:
if crimson-osd is launched as a daemon, we can always stop it using
SIGINT, and seastar::reactor does stop itself when handling this signal.
but if we start crimson-osd as an ordinary command line tool, we should
stop it explicitly after it is done with its job. so call
seastar::engine().exit().
* crimson/osd/osd.cc:
do not stop unitialized services in OSD::stop(). OSD is initialized as
a sharded service in main.cc, so we have to stop it before stopping
engine. if OSD is used for mkfs, the internal services like heartbeat,
monc, mgrc are not initialized at all, so we should not stop them in
this case. and in theory, user could stop crimson-osd when it boots,
so we need to check the pointers for null before dereferencing them.
Kefu Chai [Tue, 5 Mar 2019 04:05:31 +0000 (12:05 +0800)]
crimson: add AuthService
AuthService is introduced to allow Dispatchers to access to
authorizers, even if it is not chained with a Dispatcher implementing
Dispatcher::ms_get_authorizer(). in this case, we need to grant access to
Heartbeat class. it has its own messengers dedicated for heartbeat
traffic. it's mon::Client which provides the facilities of authorization
via Dispatcher interface.
we could just cast mon::Client to ceph::common::Dispatch for accessing
Dispatcher::ms_get_authorizer(), but i want to make this explicit using
AuthService. as the consumers of Dispatch inteface is messenger and
ChainedDispatcher not the domain specific classes.
in future, we need to either implement Auth{Client,Server} or adapt to
this machinery for msgr V2.
Kefu Chai [Mon, 4 Mar 2019 05:01:50 +0000 (13:01 +0800)]
crimson/{net,osd}: make ms_get_authorizer() sync
the authorizer manager does not perform (significant) i/o for building
an authorizer. see CephXTicketHandler::build_authorizer(). what it does
is but read random bytes using getentropy(3) which uses getrandom(2).
getrandom(2) could potentially block if the system just boots and does
not have enough randomness. but i think it's safe to assume that we have
enough entrophy when crimson-osd starts.
Patrick Donnelly [Tue, 19 Mar 2019 20:46:39 +0000 (13:46 -0700)]
Merge PR #26056 into master
* refs/pull/26056/head:
mds: check earlier if directories are already exported
mds: dont print auth trees if they are too many
mds: dont print subtrees if they are too many/big
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Tue, 19 Mar 2019 20:13:05 +0000 (13:13 -0700)]
Merge PR #26895 into master
* refs/pull/26895/head:
mds: convert unnecessary list usage to vector
mds: convert get_*dirfrags to use vector
mds: convert iterator loop to generic loop
mds: list to std::list
mds: constantize dump_loads
Reviewed-by: Jos Collin <jcollin@redhat.com> Reviewed-by: Zheng Yan <zyan@redhat.com>
Mykola Golub [Tue, 19 Mar 2019 12:04:23 +0000 (12:04 +0000)]
librbd: fix typo in deep_copy::ObjectCopyRequest::compute_read_ops
The second arg for interval_set insert method is the inserting
interval lenth, while the end position was provided. It still
worked correctly, because the end position value is always larger
than the truncated length.
Kefu Chai [Mon, 18 Mar 2019 05:42:46 +0000 (13:42 +0800)]
messages,osd: remove MPGStats::had_map_for
MPGStats::had_map_for was added back in 7844d0e5, the last release that
still checks this field was mimic -- monitor sends OSD incremental
osdmaps if the monitor finds that the pg_stats' had_map_for is greater
than 30 and the epoch is less than that of latest osdmap.
but DaemonServer as the new consumer of MPGStats does not check
had_map_for anymore -- it simply updates the cluster state with the
pg_stats reported by OSD. and we directs OSD to mgr for sending pg_stats
since mimic. so, we can safely drop the support of had_map_for in
octopus, as it has been 2 releases.
The stress-split thrasher already had this off, but the ec variant did
not. We don't support ceph-objectstore-tool exports/imports between major
versions.
Fixes: http://tracker.ceph.com/issues/38294 Signed-off-by: Sage Weil <sage@redhat.com>
Kefu Chai [Sun, 17 Mar 2019 07:21:14 +0000 (15:21 +0800)]
osd: transpose two wait lists in comment
see PrimaryLogPG::do_request(), we check for
1. is_peered(), then
2. flushes_in_progress, then
3. is_active()
4. scrubber.is_chunky_scrub_active() && write_blocked_by_scrub(head) in
PrimaryLogPG::do_op() which is called by PrimaryLogPG::do_request().
while in PrimaryLogPG::on_change()
we requeue the waiting request in the reversed order,
so the comment is not in sync with the code. in this change,
"waiting_for_active" and "waiting_for_flush" are transposed in the
comment explaining blocked request wait lists.
also, sync the pre-conditions of "waiting_for_peered" and
"waiting_for_flush" with "PrimaryLogPG::do_request()"
Sage Weil [Sat, 16 Mar 2019 20:06:00 +0000 (15:06 -0500)]
mon/OSDMonitor: allow 'osd pool set pgp_num_actual'
Normally we let the mgr control pgp_num_actual for us in a nice, safe, controlled
way. However, it is very conservative, and only makes changes if all PGs are healthy.
There are situations where the user wants to be move aggressive than this.
For example, if you have a pool with many PGs (say, 4096) and set pg_num_target to a
small number like 4, the mgr will adjust pgp_num way down. This can lead to an OSD
hitting max_pgs_per_osd. That prevents the PGs from being active+clean, however,
which prevents the mgr adjusting pgp_num back up even if the user sets the target to
a larger value.
This patch lets the user directly adjust pgp_num_actual. Note that we still do
not expose access to pg_num_actual, since there are much stricter conditions that
must be true in order to safely make downward adjustments.
Kefu Chai [Mon, 11 Mar 2019 11:08:03 +0000 (19:08 +0800)]
osd/osd_types: remove copy ctor of osd_reqid_t
do not define the copy constructor of osd_reqid_t explicitly. compiler
will define a default one for us. the implicitly defined one will call
the copy constructor of each member variable. so it is as good as the
user-defined one.
another reason to remove this copy ctor is that, it prevents the
compiler from defining a default move constructor, without which, we
cannot return `future<object_info_t>`. as `object_info_t` contains an
instance of `osd_reqid_t`.