With auto-deletion of trashed snapshots, it is relatively easy to lose
a race to "rbd flatten" as follows:
- when V2_GET_PARENT runs, the image is technically still a clone
- when V2_REFRESH_PARENT runs, the image is fully flattened and the
snapshot in the parent image is deleted
This results in a spurious ENOENT error, mainly when trying to open the
image (e.g. for "rbd info"). This race condition has always been there
but auto-deletion of trashed snapshots makes it much worse.
Retry ENOENT in V2_REFRESH_PARENT the same way as in V2_GET_SNAPSHOTS.
librbd: fix a bunch of issues with restarting RefreshRequest
Make RefreshRequest properly restartable, at least up until and including
V2_REFRESH_PARENT step:
- clear m_migration_spec when skipping GET_MIGRATION_HEADER
- don't rely on potentially stale m_incomplete_update on retry
- reset m_legacy_parent when retrying more than just V2_GET_PARENT
- don't rely on potentially stale m_parent_md.overlap and
m_head_parent_overlap on retry
- clear m_metadata before fetching image metadata (but not before
fetching pool metadata)
- clear m_op_features when skipping V2_GET_OP_FEATURES
- clear m_group_spec on EOPNOTSUPP error in V2_GET_GROUP
- reset m_legacy_snapshot when retrying more than just V2_GET_SNAPSHOTS
- don't rely on potentially stale m_snap_parents on retry
The addition of unselectable prompts to these three files
completes the work begun in PR#47810 (d8064b4), which sought
to bring dashboard.rst into line with the unselectable prompt
standard introduced by Kefu Chai in 2020.
Ronen Friedman [Thu, 18 Aug 2022 15:27:47 +0000 (18:27 +0300)]
common: improving fmtlib handling of ceph::utime_t
1. fixing the output to show local-time instead of UTC format, matching
operator<<() handling (and all the rest of our logs)
2. adding a 'short' mode (as {:s}) for when, e.g. in most scrub logs,
we only need 3 digits for the sub-second, and do not need the
trailing TZ designation.
so we can use the formatter defined for `LogEntry` in fmtlib v9.
in this new version of fmtlib, it is required to define a specialization
for the formatted type even when it comes to the types with an override of
operator<<(). since we already have an override for `LogEntry`, let's define
the specialization for `fmt::formatter<LogEntry>`.
this change should address the FTBFS when building with fmtlib v9.
Kefu Chai [Sat, 27 Aug 2022 02:27:01 +0000 (10:27 +0800)]
common/Journald: include msg/msg_fmt.h
so we can use the formatter defined for `entity_name_t`. in fmtlib v9,
it is required to define a specialization for the formatted type even
the type has an override of operator<<(). now that we already have a
formatter for `entity_name_t`, let's just use it.
this change should address the FTBFS when building with fmtlib v9.
Kefu Chai [Sat, 27 Aug 2022 15:46:00 +0000 (23:46 +0800)]
mon/MgrMonitor: do not propse again for "mgr fail"
in 23c3f76018b446fb77bbd71fdd33bddfbae9e06d, the change to fail the mgr
is proposed immediately. but `MgrMonitor::prepare_command()` method still
returns `true` in this case. its indirect caller of
`PaxosService::dispatch()` considers this as a sign that it needs to
propose the change with `propose_pending()`. but the pending change has
already been proposed by `MgrMonitor::prepare_command()`, and
`have_pending` is also cleared by this call. as we don't allow
consecutive paxos proposals, the second `propose_pending()` call is
delayed with a configured latency. but when the timer is fired, this
poseponed call would find itself trying to propose nothing. the change
to fail the mgr has been proposed. that's why we have
`ceph_assert(have_pending)` assertion failures.
in this change, the second proposal is not proposed anymore if the
proposal is proposed immediately. this should avoid the assertion
failure.
Adam King [Thu, 25 Aug 2022 16:09:49 +0000 (12:09 -0400)]
mgr/orchestrator/tests: don't match exact whitespace in table output
It seems that the exact spacing may differ a bit between
python versions. Currently seeing py3 (which cooresponds to py 3.6
on my system) passing these tests and py37 (which is python 3.7
obviously) failing. I think verifying against the exact whitespace
is unnecessary anyhow. As long as it isn't egregious, we don't
really need to worry about exactly what the spacing is.
Zac Dover [Thu, 25 Aug 2022 15:56:41 +0000 (01:56 +1000)]
doc/mgr: add prompt directives to dashboard.rst
This commit adds prompt directives (.. prompt:: bash $) to
the commands in dashboard.rst.
There are several ".. include::" directives in the dashboard.rst
file, which means that part of this page is sourced from elsewhere
than the dashboard.rst file. Because I have not yet added prompt
directives to those files, there is an inconsistency in the rendering
of this file. Most of the commands on this page have unselectable
prompts (unselectable prompts are the prompts that don't get added to
the buffer when you copy them to one of the clipboards). But the
commands on this page that come from those ".. include::" directives
do not yet have unselectable prompts.
This file is over 1600 lines long. It was perhaps not optimally wise
of me to have edited all of it in one fell swoop. It took many hours,
and carefully checking it will probably take at least one hour. I
suggest that whoever reviews this should not spend much time on it,
but should instead make a quick pass over the page and make sure that
it looks passable.
The English syntax on this page (and throughout the Dashboard doc-
umentation) will be tightened to remove ambiguity and to improve
readability in the near future, so hold all English-language-related
comments for a future pull request.
John Mulligan [Thu, 25 Aug 2022 13:58:55 +0000 (09:58 -0400)]
pybind/mgr: tox.ini remove redundant `tox` env
Fixes: https://tracker.ceph.com/issues/57153
The envlist contained an environment named `lint`. There was no specific
customization of the lint testenv so it is essentially the same as
running the `py3` testenv.
This was probably a typo and was meant to be `pylint`. Unfortunately,
the pylint test env does not appear to work, probably because it was
never run as part of any automation. At the risk of leaving old stuff
behind I'm not removing the pylint testenv at the moment, only the
`lint` item in order to not run redundant tests.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Adam King [Wed, 24 Aug 2022 14:36:53 +0000 (10:36 -0400)]
doc/cephadm: fix example for specifying networks for rgw
count_per_host must be used with underscores rather
than dashes to work, you need to pass service_id not
service_name and the option for the port is called
rgw_frontend_port not just "port"
Zac Dover [Tue, 23 Aug 2022 06:59:04 +0000 (16:59 +1000)]
doc/mgr: edit orchestrator.rst
This PR improves the English language in the "Orchestrator CLI"
section of the MGR documentation. It adds a couple of section
headers in order to signpost the information in the document
a bit more than had already been done, but it makes no major
structural changes to the presentation of the information here.
This PR was motivated by feedback from the 2022 Ceph User Survey
in which one of the respondents wrote "better ceph orch documen-
tation".
The final section on this page, "Current Implementation Status",
must be verified by someone who is familiar with the current state
of "ceph orch" and a date stamp should be applied to the top of
the section so that the word "current" has a meaningful referent.
Yingxin Cheng [Wed, 17 Aug 2022 07:06:19 +0000 (15:06 +0800)]
crimson/os/seastore: move AsyncCleaner to EPM
There are two purposes related to generic device tiering:
* Make AsyncCleaner (need to introduce a generic interface class later)
to hide differences between segment-based tier and RB-based tier
cleaner implementations.
* Make EPM to coordinate cleaning transactions across tiers with general
AsyncCleaners.