At some point the debug builds for wip branches no longer had the .git
directory available so the Debug build type was unset. This meant we are
no longer doing numerous checks (like mutex ownership checks) that we
would normally be doing in the qa suite.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Ilya Dryomov [Thu, 12 Oct 2023 17:03:10 +0000 (19:03 +0200)]
pybind/rbd: don't produce info on errors in aio_mirror_image_get_info()
Check completion return value before attemting to decode c_info.
Otherwise we are guaranteed to access invalid memory in decode_cstr()
while trying to compute global_id string length when the client is
blocklisted for example.
In the monitors we hold 2 copies of disallowed_leader ...
1. MonMap class 2. Elector class.
When computing the ConnectivityScore for the monitors during
the election, we use the `disallowed_leader` from Elector
class to determine which monitors we shouldn't allow to lead.
Now, we rely on the function `set_elector_disallowed_leaders`
to set the `disallowed_leader` of the Elector class, MonMap
class copy of the `disallowed_leader` contains the
`tiebreaker_monitor` so we inherit that plus we also add the
monitors that are dead due to a zone failure.
Hence, the `adding dead monitors` phase is only allowed if we can
enter stretch_mode. However, there is a problem when failing over a stretch cluster
zone and reviving the entire zone back up, the revived monitors
couldn't enter stretch_mode when they are at the state of "probing"
since PaxosServices like osdmon becomes unreadable (this is expected)
Solution:
We unconditionally add monitors that are in
`monmap->stretch_marked_down_mons` to the
`disallowed_leaders` list in
`Monitor::set_elector_disallowed_leaders` since
if the monitors are in `monmap->stretch_marked_down_mons`
we know that they probably belong in a marked down
zone and is not fit for lead.
This will fix the problem of newly revived monitors
having different disallowed_leaders set
and getting stuck in election.
this structure should be created at the frontend and trickle all the way
to the RADOS layer. holding: dout prefix, optional yield and trace.
in this commit, so far it was only added to the "complete()" sal interface,
and to the "write_meta()" rados interface.
in the future, it should be added to more sal interfaces, replacing the
current way where dpp and optional yield are passed as sepearte
arguments to all functions.
in addition, if more information would be needed, it should be possible
to add that information to the request context struct without changing
many function prototypes
John Mulligan [Tue, 3 Oct 2023 20:52:09 +0000 (16:52 -0400)]
cephadm: convert ceph exporter type to a ContainerDaemonForm
CephExporter was being (partially) over-shadowed by the Ceph class as
the Ceph class listed 'ceph-exporter' as one of the daemon types it
handled. This change updates CephExporter to a ContainerDaemonForm while
simultaneously breaking the link between Ceph and 'ceph-exporter',
allowing CephExporter to handle all the duty of managing ceph-exporter,
continuing the process of having clearer logical responsibilities and
class hierarchy in cephadm.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Tue, 3 Oct 2023 20:51:49 +0000 (16:51 -0400)]
cephadm: mock os.path.listdir in daemon forms test
Prevent classes that want to check the filesystem from breaking the
simple daemon forms instantiation test case. A better future fix would
be avoiding checking the file system during __init__ of the class but
that is left for future improvements.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Tue, 3 Oct 2023 20:43:59 +0000 (16:43 -0400)]
cephadm: stop directly using Ceph.daemons property
The Ceph.daemons property has two unfortunate behaviors: most important,
it includes ceph-exporter which causes the other CephExporter class to
be over-shadowed the DaemonForms mechanism. Second, it couples all
functions that want to know the names of ceph daemon types to the Ceph
class preventing future refactoring of that class.
Break the existing coupling by adding a new `ceph_daemons` function
similar to `get_supported_daemons` but returning the same value that
Ceph.daemons used to provide. This will permit future fixes and
improvements.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Thu, 28 Sep 2023 18:15:55 +0000 (14:15 -0400)]
cephadm: eliminate _dispatch_deploy function
Eliminate the _dispatch_deploy function, folding it into the
_common_deploy function, because the mass of if-elif lines have
been replaced and keeping it as a separate function no longer
serves much of a useful purpose.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Ilya Dryomov [Tue, 10 Oct 2023 10:31:28 +0000 (12:31 +0200)]
qa/suites/rbd: drop redundant ignorelist entries
CACHE_POOL_NO_HIT_SET is retained in *api_tests*.yaml and
rbd_mirror.yaml snippets for TestLibRBD.ListChildrenTiered and
TestClusterWatcher.CachePools tests.
With cache tiering facets gone, "pool" facets are strictly about
--data-pool option now. Rename to "data-pool" and create symlinks
to a common directory.
Cache tiering facets have been a constant source of job timeouts
accompanied by "slow request" warnings on the OSDs for at least two
years. Same workloads pass without pool/small-cache-pool.yaml or
thrashers/cache.yaml.
See cache tiering deprecation note added in commit 535b8db33ea0 ("doc:
deprecate the cache tiering").
Venky Shankar [Mon, 9 Oct 2023 05:06:49 +0000 (10:36 +0530)]
Revert "mds: disable delegating inode ranges to clients"
This isn't necessary -- the MDS handles delegating inode ranges
to clients from its preallocated inode set properly - the suspected
bug involving not persisting the sessionmap and causing asserts
during replay isn't an issue. The preallocated set is persisted
with the log event and the MDS correctly rebuild the set from
this information during replay.
Ramana Raja [Mon, 2 Oct 2023 16:39:26 +0000 (12:39 -0400)]
librbd/ManagedLock: kickstart ExclusiveLock state machine
... that is stalled waiting for lock. Do this when trying to reacquire
lock in the ImageWatcher's rewatch mechanism. This would enable the
ExclusiveLock state machine to propagate the blocklist error to the
caller trying to perform an image operation requiring an exclusive
lock.
Previous attempt, e66db763, to fix the hang due to exclusive lock
acquisiton (stuck waiting for lock) racing with client blocklisting
did not always work. e66db763 kickstarted the ExclusiveLock state
machine when the ImageWatcher tried to schedule a exclusive lock
request and the blocklisting was detected. However, there is a short
window between a watch getting deregistered and client blocklisting
getting detected as part of rewatching. If hit when trying to schedule
a lock request, the ExclusiveLock state machine wasn't kickstarted,
blocklist error wasn't propagated, and the hang resurfaced.
A more robust approach is taken to resume the ExclusiveLock state
machine stuck waiting for lock during client blocklisting. Whenever
a client's ImageWatcher loses connection to the cluster, as it happens
during blocklising, the ImageWatcher initiates a mechanism to rewatch
the image and tries to reacquire the lock. Piggyback on this rewatch
mechanism that gets triggered during client blocklisting. And when
trying to reacquire the lock, kickstart the ExclusiveLock state
machine stalled waiting for lock (STATE_WAITING_FOR_LOCK).
Fixes: https://tracker.ceph.com/issues/63009 Signed-off-by: Ramana Raja <rraja@redhat.com>
Zac Dover [Sat, 7 Oct 2023 21:43:43 +0000 (07:43 +1000)]
doc/architecture: repair RBD sentence
Improve an ambiguous sentence in doc/architecture.rst.
The problem presented by the original sentence is that the phrasal verb
"to provide with" is implicated in one of its possible readings.
Interpreted in that way, the sentence seems to express the incorrect
idea that RBD furnishes block devices with snapshotting and cloning, as
though snapshotting and cloning are being delivered to the block
devices. In fact, snapshotting and cloning are just features of RBD, and
are features that are described on this page:
https://docs.ceph.com/en/quincy/rbd/rbd-snapshot/.
Adam King [Fri, 6 Oct 2023 15:20:57 +0000 (11:20 -0400)]
mgr/cephadm: fix upgrades with nvmeof
Currently, nvmeof was being treated as if it used
a ceph image during upgrades. This would cause logging
of messages like (I've removed the nvmeof daemon id)
log [WRN] : Upgrade daemon: nvmeof.<id>: Cannot redeploy
nvmeof.<id> with a new image: Supported types are: mgr, mon,
crash, osd, mds, rgw, rbd-mirror, cephfs-mirror, ceph-exporter,
iscsi, nfs
and if you had set a custom image for the
mgr/cephadm/container_image_nvmeof setting, this would
be undone as part of the upgrade process.
Fixes: https://tracker.ceph.com/issues/63127 Signed-off-by: Adam King <adking@redhat.com>
rgw/lua/doc: support reloading lua packages on all RGWs
without requiring a restart of the RGWs
test instructions:
https://gist.github.com/yuvalif/95b8ed9ea73ab4591c59644a050e01e2
also use capitalized "Lua" in logs/doc