Ramana Raja [Wed, 15 Feb 2023 15:12:54 +0000 (10:12 -0500)]
mgr/rbd_support: recover from rados client blocklisting
In certain scenarios the OSDs were slow to process RBD requests.
This lead to the rbd_support module's RBD client not being able to
gracefully handover a RBD exclusive lock to another RBD client.
After the condition persisted for some time, the other RBD client
forcefully acquired the lock by blocklisting the rbd_support module's
RBD client, and consequently blocklisted the module's RADOS client. The
rbd_support module stopped working. To recover the module, the entire
mgr service had to be restarted which reloaded other mgr modules.
Instead of recovering the rbd_support module from client blocklisting
by being disruptive to other mgr modules, recover the module
automatically without restarting the mgr serivce. On client getting
blocklisted, shutdown the module's handlers and blocklisted client,
create a new rados client for the module, and start the new handlers.
librbd: localize snap_remove op for mirror snapshots
A client may attempt a lock request not quickly enough to
obtain exclusive lock for operations when another competing
client responds quicker. This can happen when a peer site has
different performance characteristics or latency. Instead of
relying on this unpredictable behavior, localize operation to
primary cluster.
Fixes: https://tracker.ceph.com/issues/59393 Signed-off-by: Christopher Hoffman <choffman@redhat.com>
(cherry picked from commit ac552c9b4d65198db8038d397a3060d5a030917d)
librbd: always refresh after creating snapshot in CreatePrimaryRequest
Up until now this was conditioned on whether the caller expressed
interest in the ID of the created snapshot and happened to work only
because CreatePrimaryRequest wasn't actually consulting any mirror
snapshot metadata. This has just changed with unlink_peer() needing to
see an up-to-date complete flag which is set in SetImageStateRequest
following the write out of image state object(s).
librbd: remove previous incomplete primary snapshot after successfully creating a new one
Problem:
-------
At a high level, creating a primary snapshot consists of three steps:
1. actually creating a snapshot in the mirror namespace
2. generating a set of image state objects with additional metadata for
the snapshot
3. marking the snapshot as complete after the image state objects are
written out
Depending on the circumstances, a request to create a primary snapshot
can be forwarded to rbd-mirror daemon. If that happens and rbd-mirror
daemon gets axed for some practical reason after completing steps (1)
and/or (2) but before completing step (3), we are left with a
permanently incomplete primary snapshot because upon retrying that
primary snapshot creation request, librbd notices that such snapshot
already exists. It does not check whether this "pre-existing" snapshot
is complete.
Solution:
--------
As part of the next mirror snapshot create (say triggered by the
scheduler) the unlink_peer() is called, it checks if there exists any
incomplete snapshot and delete them accordingly.
Zac Dover [Fri, 12 May 2023 10:35:25 +0000 (20:35 +1000)]
doc/cephfs: rectify prompts in fs-volumes.rst
Make sure all prompts are unselectable. This PR is meant to be
backported to Reef, Quincy, and Pacific, to get all of the prompts into
a fit state so that a line-edit can be performed on the Englsh language
in this file.
Zac Dover [Fri, 5 May 2023 06:35:28 +0000 (16:35 +1000)]
doc/cephfs: repairing inaccessible FSes
Add a procedure to doc/cephfs/troubleshooting.rst that explains how to
restore access to FileSystems that became inaccessible after
post-Nautilus upgrades. The procedure included here was written by Harry
G Coin, and merely lightly edited by me. I include him here as a
"co-author", but it should be noted that he did the heavy lifting on
this.
See the email thread here for more context:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/HS5FD3QFR77NAKJ43M2T5ZC25UYXFLNW/
Co-authored-by: Harry G Coin <hgcoin@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
Signed-off-by: Pedro Gonzalez Gomez <pegonzal@redhat.com> Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
(cherry picked from commit 8177a748bd831568417df5c687109fbbbd9b981d)
Pere Diaz Bou [Mon, 6 Mar 2023 19:32:24 +0000 (20:32 +0100)]
mgr/dashboard: replace ajsf with formly
ajsf json schema library for angular doesn't seem to be actively
maintained. Instead, fromly is a well maintained replacement with extra
stuff like validators builtin, support for json schemas, custom
components, etc...
Textareas weren't supported on ajsf, therefore, it made sense to move to
this dep instead.
Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com> Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 2c43dd0c16e3cc3b3eada03ed11958a689cc4bcd)
Laura Flores [Mon, 1 May 2023 16:28:54 +0000 (16:28 +0000)]
mgr: add urllib3==1.26.15 to mgr/requirements.txt
We do not depend on any particular version of
urllib3, but as a workaround to the incompatibility
of urllib3 constraints between kubernetes and
requests, we need to pin it temporarily to
the version both are happy with.
Fixes: https://tracker.ceph.com/issues/59591 Signed-off-by: Laura Flores <lflores@redhat.com>
(cherry picked from commit 80d460005e44649191aa862fa78bd278644b5237)
It actually didn't test the invalid path but still ended with
ENOENT(which is expected in case path is invalid) as the test
didn't create a fs, and it failed saying "FS nfs-cephfs not found"
which too raises ENOENT and thus it always passed.
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
(cherry picked from commit 5cc0857)
mgr/nfs/utils: changes to helper func to check cephfs path
- Renamed to cephfs_path_is_dir
- Removed exception handling to prevent redundant log statements like:
"No such file or directory error in stat: b'/mnt/testdir_symlink': No such file or directory [Errno 2]"
Exceptions handled inside caller eliminates this redundancy
- Set modifier flag AT_SYMLINK_NOFOLLOW
- Removed string "{path} is not a dir" when raising NotADirectoryError
Rationale: will be handled in export.py
- change mock to cephfs_path_is_dir
Signed-off-by: Dhairya Parmar <dparmar@redhat.com>
(cherry picked from commit f3d7370)
qa/: Override mClock profile to 'high_recovery_ops' for qa tests
The qa tests are not client I/O centric and mostly focus on triggering
recovery/backfills and monitor them for completion within a finite amount
of time. The same holds true for scrub operations.
Therefore, an mClock profile that optimizes background operations is a
better fit for qa related tests. The osd_mclock_profile is therefore
globally overriden to 'high_recovery_ops' profile for the Rados suite as
it fits the requirement.
Also, many standalone tests expect recovery and scrub operations to
complete within a finite time. To ensure this, the osd_mclock_profile
options is set to 'high_recovery_ops' as part of the run_osd() function
in ceph-helpers.sh.
A subset of standalone tests explicitly used 'high_recovery_ops' profile.
Since the profile is now set as part of run_osd(), the earlier overrides
are redundant and therefore removed from the tests.
doc/: Modify mClock configuration documentation to reflect profile changes
Modify the relevant documentation to reflect:
- change in the default mClock profile to 'balanced'
- new allocations for ops across mClock profiles
- change in the osd_max_backfills limit
- miscellaneous changes related to warnings.
common/options/osd.yaml.in: Change mclock max sequential bandwidth for SSDs
The osd_mclock_max_sequential_bandwidth_ssd is changed to 1200 MiB/s as
a reasonable middle ground considering the broad range of SSD capabilities.
This allows the mClock's cost model to extract the SSDs capability
depending on the cost of the IO being performed.
osd/: Retain the default osd_max_backfills limit to 1 for mClock
The earlier limit of 3 was still aggressive enough to have an impact on
the client and other competing operations. Retain the current default
for mClock. This can be modified if necessary after setting the
osd_mclock_override_recovery_settings option.
Edit the "stretch mode" section in doc/rados/operations/stretch-mode.rst
so that the procedure is formatted as a procedure and the sentences
correctly have heads.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit a19ff7a5ea9bbd24365648a90abfa1b720c5b231)
Mistakenly removed in commit d79f2a81541c ("docs: warning and remove
few docs section for Filestore Update docs after filestore removal.").
The kernel client, however new, will continue to be able to talk to
FileStore OSDs for as long as they exist.
Samuel Just [Tue, 11 Apr 2023 15:10:04 +0000 (08:10 -0700)]
osd/scheduler/mClockScheduler: avoid limits for recovery
Now that recovery operations are split between background_recovery and
background_best_effort, rebalance qos params to avoid penalizing
background_recovery while idle.
Samuel Just [Thu, 6 Apr 2023 05:57:42 +0000 (22:57 -0700)]
osd/: differentiate scheduler class for undersized/degraded vs data movement
Recovery operations on pgs/objects that have fewer than the configured
number of copies should be treated more urgently than operations on
pgs/objects that simply need to be moved to a new location.
Samuel Just [Tue, 4 Apr 2023 23:34:17 +0000 (23:34 +0000)]
osd/scheduler: simplify qos specific params in OpSchedulerItem
is_qos_item() was only used in operator<< for OpSchedulerItem. However,
it's actually useful to see priority for mclock items since it affects
whether it goes into the immediate queues and, for some types, the
class. Unconditionally display both class_id and priority.