In the monitors we hold 2 copies of disallowed_leader ...
1. MonMap class 2. Elector class.
When computing the ConnectivityScore for the monitors during
the election, we use the `disallowed_leader` from Elector
class to determine which monitors we shouldn't allow to lead.
Now, we rely on the function `set_elector_disallowed_leaders`
to set the `disallowed_leader` of the Elector class, MonMap
class copy of the `disallowed_leader` contains the
`tiebreaker_monitor` so we inherit that plus we also add the
monitors that are dead due to a zone failure.
Hence, the `adding dead monitors` phase is only allowed if we can
enter stretch_mode. However, there is a problem when failing over a stretch cluster
zone and reviving the entire zone back up, the revived monitors
couldn't enter stretch_mode when they are at the state of "probing"
since PaxosServices like osdmon becomes unreadable (this is expected)
Solution:
We unconditionally add monitors that are in
`monmap->stretch_marked_down_mons` to the
`disallowed_leaders` list in
`Monitor::set_elector_disallowed_leaders` since
if the monitors are in `monmap->stretch_marked_down_mons`
we know that they probably belong in a marked down
zone and is not fit for lead.
This will fix the problem of newly revived monitors
having different disallowed_leaders set
and getting stuck in election.
Ilya Dryomov [Tue, 10 Oct 2023 10:31:28 +0000 (12:31 +0200)]
qa/suites/rbd: drop redundant ignorelist entries
CACHE_POOL_NO_HIT_SET is retained in *api_tests*.yaml and
rbd_mirror.yaml snippets for TestLibRBD.ListChildrenTiered and
TestClusterWatcher.CachePools tests.
With cache tiering facets gone, "pool" facets are strictly about
--data-pool option now. Rename to "data-pool" and create symlinks
to a common directory.
Cache tiering facets have been a constant source of job timeouts
accompanied by "slow request" warnings on the OSDs for at least two
years. Same workloads pass without pool/small-cache-pool.yaml or
thrashers/cache.yaml.
See cache tiering deprecation note added in commit 535b8db33ea0 ("doc:
deprecate the cache tiering").
Venky Shankar [Mon, 9 Oct 2023 05:06:49 +0000 (10:36 +0530)]
Revert "mds: disable delegating inode ranges to clients"
This isn't necessary -- the MDS handles delegating inode ranges
to clients from its preallocated inode set properly - the suspected
bug involving not persisting the sessionmap and causing asserts
during replay isn't an issue. The preallocated set is persisted
with the log event and the MDS correctly rebuild the set from
this information during replay.
Ramana Raja [Mon, 2 Oct 2023 16:39:26 +0000 (12:39 -0400)]
librbd/ManagedLock: kickstart ExclusiveLock state machine
... that is stalled waiting for lock. Do this when trying to reacquire
lock in the ImageWatcher's rewatch mechanism. This would enable the
ExclusiveLock state machine to propagate the blocklist error to the
caller trying to perform an image operation requiring an exclusive
lock.
Previous attempt, e66db763, to fix the hang due to exclusive lock
acquisiton (stuck waiting for lock) racing with client blocklisting
did not always work. e66db763 kickstarted the ExclusiveLock state
machine when the ImageWatcher tried to schedule a exclusive lock
request and the blocklisting was detected. However, there is a short
window between a watch getting deregistered and client blocklisting
getting detected as part of rewatching. If hit when trying to schedule
a lock request, the ExclusiveLock state machine wasn't kickstarted,
blocklist error wasn't propagated, and the hang resurfaced.
A more robust approach is taken to resume the ExclusiveLock state
machine stuck waiting for lock during client blocklisting. Whenever
a client's ImageWatcher loses connection to the cluster, as it happens
during blocklising, the ImageWatcher initiates a mechanism to rewatch
the image and tries to reacquire the lock. Piggyback on this rewatch
mechanism that gets triggered during client blocklisting. And when
trying to reacquire the lock, kickstart the ExclusiveLock state
machine stalled waiting for lock (STATE_WAITING_FOR_LOCK).
Fixes: https://tracker.ceph.com/issues/63009 Signed-off-by: Ramana Raja <rraja@redhat.com>
Zac Dover [Sat, 7 Oct 2023 21:43:43 +0000 (07:43 +1000)]
doc/architecture: repair RBD sentence
Improve an ambiguous sentence in doc/architecture.rst.
The problem presented by the original sentence is that the phrasal verb
"to provide with" is implicated in one of its possible readings.
Interpreted in that way, the sentence seems to express the incorrect
idea that RBD furnishes block devices with snapshotting and cloning, as
though snapshotting and cloning are being delivered to the block
devices. In fact, snapshotting and cloning are just features of RBD, and
are features that are described on this page:
https://docs.ceph.com/en/quincy/rbd/rbd-snapshot/.
Adam King [Fri, 6 Oct 2023 15:20:57 +0000 (11:20 -0400)]
mgr/cephadm: fix upgrades with nvmeof
Currently, nvmeof was being treated as if it used
a ceph image during upgrades. This would cause logging
of messages like (I've removed the nvmeof daemon id)
log [WRN] : Upgrade daemon: nvmeof.<id>: Cannot redeploy
nvmeof.<id> with a new image: Supported types are: mgr, mon,
crash, osd, mds, rgw, rbd-mirror, cephfs-mirror, ceph-exporter,
iscsi, nfs
and if you had set a custom image for the
mgr/cephadm/container_image_nvmeof setting, this would
be undone as part of the upgrade process.
Fixes: https://tracker.ceph.com/issues/63127 Signed-off-by: Adam King <adking@redhat.com>
rgw/lua/doc: support reloading lua packages on all RGWs
without requiring a restart of the RGWs
test instructions:
https://gist.github.com/yuvalif/95b8ed9ea73ab4591c59644a050e01e2
also use capitalized "Lua" in logs/doc
Rishabh Dave [Thu, 28 Sep 2023 17:34:51 +0000 (23:04 +0530)]
mon/AuthMonitor: check if entity is absent before creating it
Although this code path is not used for creating entities yet, it is
better to fix the bug sooner than later. Method
AuthMonitor::_update_or_create_entity() must exit (with appropriate
error code) when entity to be created on the Ceph cluster is already
present.
Casey Bodley [Thu, 5 Oct 2023 15:59:52 +0000 (11:59 -0400)]
rgw: fix http error checks in keystone/barbican/vault clients
when RGWHTTPManager encounters an http error, it uses
rgw_http_error_to_errno() to map that to a negative posix error code.
RGWHTTPClient::process() returns that mapped error code, and exposes the
original http error via get_http_status()
the http client code for keystone, barbican, and vault were returning
early on the errors from process(), so weren't getting to the http error
checks
these clients now check for specific http errors before testing the
result of process()
Dhairya Parmar [Thu, 5 Oct 2023 08:12:31 +0000 (13:42 +0530)]
doc: remove egg fragment from dev/developer_guide/running-tests-locally
DEPRECATION: git+https://github.com/ceph/teuthology#egg=teuthology
[test] contains an egg fragment with a non-PEP 508 name pip 25.0 will enforce
this behaviour change. A possible replacement is to use the req @ url syntax,
and remove the egg fragment. Discussion can be found at
https://github.com/pypa/pip/issues/11617
Rishabh Dave [Wed, 4 Oct 2023 18:52:51 +0000 (00:22 +0530)]
src/MDSMonitor: make use of imported namespace symbols
Symbols imported into current namespace should be used directly; there
is no need to mention their parent namespace while using them. IOW, to use
"std::string" after it has been imported, just write "string" instead of
"std::string".
Rishabh Dave [Wed, 4 Oct 2023 18:16:32 +0000 (23:46 +0530)]
mon/FSCommands: make use of imported namespace symbols
There's no need to mention the "home" namespace of a symbol while using
it after it has been imported into the current namespace. IOW, no need to
write, for example, "std::string" after it has been imported from its
namespace; instead simply writing "string" will suffice.
Rishabh Dave [Wed, 4 Oct 2023 18:22:27 +0000 (23:52 +0530)]
mon/AuthMonitor: make use of imported namespace symbols
Once a symbol has been imported into the current namespace, no need to
mention the original namespace while using it. IOW, no need to write
"std::string" after it has been imported from the namespace "std" into the
current namespace.