Aravind Ramesh [Thu, 18 Aug 2022 09:06:48 +0000 (14:36 +0530)]
crimson/zns: ZNSSegmentManager::release() should reset the zone.
For a ZNS device, a open/full zone has to be reset before it can be
reused to write from start. Seastore releases a segment/zone and marks
it empty and expects to be able to write to it from start. So as a part
of release reset the zone, so it moves to empty state on the device.
crimson/zns: segment_close() should finish the zone.
Zones in IMP-OPEN, EXP-OPEN, CLOSED states in a ZNS device are
counted as active resources. ZNS SSDs can have a limit on the
number of zones that can be active at the same time (max_active_resources).
If CLOSED zones reach max_active_zones supported by the device, then
opening/writing to newer zones will fail.
So a close_segment() from Seastore is essentially a FINISH
operation on a ZNS zone.
Do FINISH operation on a zone instead of CLOSE from segment_close().
crimson/zns: advance write pointer before writing tail-info.
SegmentAllocator::close_segment() writes tail information to a
segment before closing the segment, and this is written at the
end of segment. However, for ZNS SSDs, the writes have to always happen
at write pointer, so writing tail info at the end of a zone fails if
the WP is not at the offset requested by close_segment().
If the write pointer is not at lba where the tail information is written,
then advance write pointer by writing zeroes to the zone from it's current
write pointer. Then write the tail information at the end of zone.
Added advance_wp() function which advances the write pointer and then write
tail information, in case of ZNS devices but for a regular device it
continues to write at the end of segment.
Do close_segment() call after writing tail information, closing a segment
first and then writing tail information can cause potential race conditions
on a zns backed segment.
Aravind Ramesh [Wed, 29 Jun 2022 10:57:40 +0000 (16:27 +0530)]
crimson/zns: ensure writes happen at write pointer.
For ZNS SSDs, every write to a segment/zone has to happen at the zone's
write pointer. Any write request to an offset which is not at
write pointer will be failed by the drive.
In ZNSSegment::write() error out if write offset is not same as WP.
J. Eric Ivancich [Tue, 23 Aug 2022 20:44:24 +0000 (16:44 -0400)]
rgw: remove dout_subsys defs from header files
Each compilation unit should be able to define its own dout_subsys
without generating a redefinition warning. When dout_subsys is defined
in header files, it complicates this matter. This commit removes
definitions and header files and makes sure definitions are added to
.cc files as needed.
Additionally, at Adam Emerson's suggestion, use "static constexpr"
rather than "#define" to set "dout_subsys" in a few places as a
reminder to ultimately do it more broadly.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Ilya Dryomov [Tue, 30 Aug 2022 09:45:44 +0000 (11:45 +0200)]
rbd-mirror: skip setting error code on snapshot replayer shutdown
This is regarding failures in unregister_remote_update_watcher() and
unregister_local_update_watcher(). handle_replay_complete() can't be
called in these cases anymore as it would blindly attempt to unregister
watchers from scratch again. Dropping handle_replay_complete() calls
there means that these failures would only be logged and would not be
surfaced by snapshot replayer. But the only caller ignores them
anyway:
void ImageReplayer<I>::shut_down(int r) {
...
// close the replayer
if (m_replayer != nullptr) {
ctx = new LambdaContext([this, ctx](int r) {
m_replayer->destroy();
m_replayer = nullptr;
ctx->complete(0); <------
});
ctx = new LambdaContext([this, ctx](int r) {
m_replayer->shut_down(ctx);
});
}
Ilya Dryomov [Wed, 24 Aug 2022 10:56:31 +0000 (12:56 +0200)]
rbd-mirror: resume pending shutdown on error in snapshot replayer
If a shutdown is requested, e.g. by update_pool_replayers() because
remote RADOS instance got blocklisted, and Replayer::shut_down() pends
it on completion of current snapshot sync, it gets stuck if replayer
encounters an error in the interim. This is particularly likely in the
blocklist case: a higher layer may detect that client got blocklisted
and request a shutdown first, and then when replayer sees EBLOCKLISTED
in turn, it calls handle_replay_complete() -- which does not resume
a pending shutdown. Because update_pool_replayers() blocks on shutdown
with Mirror::m_lock held, eventually the entire daemon hangs in
perpetuity.
The addition of unselectable prompts to these three files
completes the work begun in PR#47810 (d8064b4), which sought
to bring dashboard.rst into line with the unselectable prompt
standard introduced by Kefu Chai in 2020.
Ronen Friedman [Thu, 18 Aug 2022 15:27:47 +0000 (18:27 +0300)]
common: improving fmtlib handling of ceph::utime_t
1. fixing the output to show local-time instead of UTC format, matching
operator<<() handling (and all the rest of our logs)
2. adding a 'short' mode (as {:s}) for when, e.g. in most scrub logs,
we only need 3 digits for the sub-second, and do not need the
trailing TZ designation.
so we can use the formatter defined for `LogEntry` in fmtlib v9.
in this new version of fmtlib, it is required to define a specialization
for the formatted type even when it comes to the types with an override of
operator<<(). since we already have an override for `LogEntry`, let's define
the specialization for `fmt::formatter<LogEntry>`.
this change should address the FTBFS when building with fmtlib v9.
Kefu Chai [Sat, 27 Aug 2022 02:27:01 +0000 (10:27 +0800)]
common/Journald: include msg/msg_fmt.h
so we can use the formatter defined for `entity_name_t`. in fmtlib v9,
it is required to define a specialization for the formatted type even
the type has an override of operator<<(). now that we already have a
formatter for `entity_name_t`, let's just use it.
this change should address the FTBFS when building with fmtlib v9.
Ilya Dryomov [Sat, 27 Aug 2022 09:09:00 +0000 (11:09 +0200)]
librbd: use actual monitor addresses when creating a peer bootstrap token
Relying on mon_host config option is fragile, as the user may confuse
v1 and v2 addresses, group them incorrectly, etc. Get mon_host value
only as a fallback.
Kefu Chai [Sat, 27 Aug 2022 15:46:00 +0000 (23:46 +0800)]
mon/MgrMonitor: do not propse again for "mgr fail"
in 23c3f76018b446fb77bbd71fdd33bddfbae9e06d, the change to fail the mgr
is proposed immediately. but `MgrMonitor::prepare_command()` method still
returns `true` in this case. its indirect caller of
`PaxosService::dispatch()` considers this as a sign that it needs to
propose the change with `propose_pending()`. but the pending change has
already been proposed by `MgrMonitor::prepare_command()`, and
`have_pending` is also cleared by this call. as we don't allow
consecutive paxos proposals, the second `propose_pending()` call is
delayed with a configured latency. but when the timer is fired, this
poseponed call would find itself trying to propose nothing. the change
to fail the mgr has been proposed. that's why we have
`ceph_assert(have_pending)` assertion failures.
in this change, the second proposal is not proposed anymore if the
proposal is proposed immediately. this should avoid the assertion
failure.
Kefu Chai [Sat, 27 Aug 2022 01:51:02 +0000 (09:51 +0800)]
cmake: set CMP0135 policy
so the `DOWNLOAD_EXTRACT_TIMESTAMP` property of
`ExternalProject_Add()` command is set by default on CMake v3.24 and up.
it helps to set the a more accurate timestamp for the downloaded
content, hence the targets depending on the extracted content can be
rebuilt if the URL changes.
see also https://cmake.org/cmake/help/latest/policy/CMP0135.html
Adam King [Thu, 25 Aug 2022 16:09:49 +0000 (12:09 -0400)]
mgr/orchestrator/tests: don't match exact whitespace in table output
It seems that the exact spacing may differ a bit between
python versions. Currently seeing py3 (which cooresponds to py 3.6
on my system) passing these tests and py37 (which is python 3.7
obviously) failing. I think verifying against the exact whitespace
is unnecessary anyhow. As long as it isn't egregious, we don't
really need to worry about exactly what the spacing is.
Zac Dover [Thu, 25 Aug 2022 15:56:41 +0000 (01:56 +1000)]
doc/mgr: add prompt directives to dashboard.rst
This commit adds prompt directives (.. prompt:: bash $) to
the commands in dashboard.rst.
There are several ".. include::" directives in the dashboard.rst
file, which means that part of this page is sourced from elsewhere
than the dashboard.rst file. Because I have not yet added prompt
directives to those files, there is an inconsistency in the rendering
of this file. Most of the commands on this page have unselectable
prompts (unselectable prompts are the prompts that don't get added to
the buffer when you copy them to one of the clipboards). But the
commands on this page that come from those ".. include::" directives
do not yet have unselectable prompts.
This file is over 1600 lines long. It was perhaps not optimally wise
of me to have edited all of it in one fell swoop. It took many hours,
and carefully checking it will probably take at least one hour. I
suggest that whoever reviews this should not spend much time on it,
but should instead make a quick pass over the page and make sure that
it looks passable.
The English syntax on this page (and throughout the Dashboard doc-
umentation) will be tightened to remove ambiguity and to improve
readability in the near future, so hold all English-language-related
comments for a future pull request.
John Mulligan [Thu, 25 Aug 2022 13:58:55 +0000 (09:58 -0400)]
pybind/mgr: tox.ini remove redundant `tox` env
Fixes: https://tracker.ceph.com/issues/57153
The envlist contained an environment named `lint`. There was no specific
customization of the lint testenv so it is essentially the same as
running the `py3` testenv.
This was probably a typo and was meant to be `pylint`. Unfortunately,
the pylint test env does not appear to work, probably because it was
never run as part of any automation. At the risk of leaving old stuff
behind I'm not removing the pylint testenv at the moment, only the
`lint` item in order to not run redundant tests.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Adam King [Wed, 24 Aug 2022 14:36:53 +0000 (10:36 -0400)]
doc/cephadm: fix example for specifying networks for rgw
count_per_host must be used with underscores rather
than dashes to work, you need to pass service_id not
service_name and the option for the port is called
rgw_frontend_port not just "port"
Zac Dover [Tue, 23 Aug 2022 06:59:04 +0000 (16:59 +1000)]
doc/mgr: edit orchestrator.rst
This PR improves the English language in the "Orchestrator CLI"
section of the MGR documentation. It adds a couple of section
headers in order to signpost the information in the document
a bit more than had already been done, but it makes no major
structural changes to the presentation of the information here.
This PR was motivated by feedback from the 2022 Ceph User Survey
in which one of the respondents wrote "better ceph orch documen-
tation".
The final section on this page, "Current Implementation Status",
must be verified by someone who is familiar with the current state
of "ceph orch" and a date stamp should be applied to the top of
the section so that the word "current" has a meaningful referent.