Matan Breizman [Wed, 30 Aug 2023 08:57:18 +0000 (08:57 +0000)]
osd/OSD: introduce trim_stale_maps
```
/**
* trim_stale_maps
*
* trim_maps had a possible (rare) leak which resulted in stale osdmaps.
* This method will cleanup any existing osdmaps from the store
* with an epoch earlier than the superblock's oldest_map epoch.
* See: https://tracker.ceph.com/issues/61962
*/
```
Patrick Donnelly [Fri, 15 Dec 2023 13:18:28 +0000 (08:18 -0500)]
Merge PR #52196 into main
* refs/pull/52196/head:
qa: configure balancer for multi-mds workloads
qa: create qa subvolumes in named subvolumegroup
qa: do not rely on default max_mds value
qa: add automate_balance to dashboard qa schema
doc/cephfs: add docs for balance_automate
doc/cephfs: use bash prompt for shell code
mds: add balance_automate fs setting
Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Xiubo Li <xiubli@redhat.com>
Ilya Dryomov [Sun, 10 Dec 2023 16:01:24 +0000 (17:01 +0100)]
test/pybind/rbd: don't ignore from_snapshot in check_diff()
Despite the test in test_diff_iterate() being correct, it started
failing:
> check_diff(self.image, 0, IMG_SIZE, 'snap1', [(0, 512, False)])
...
a = [], b = [(0, 512, False)]
...
> assert a == b
E AssertionError
This is because check_diff() drops 'snap1' argument on the floor and
passes None to image.diff_iterate() instead. This goes back to 2013,
see commit e88fe3cbbc8f ("rbd.py: add some missing functions").
- beginning of time -> HEAD, through intermediate snap
- snap -> snap, directly
- snap -> HEAD, directly
But coverage is too weak: none of the weird OBJECT_PENDING cases and
only a single diff-iterate vs deep-copy case is tested, for example.
Coverage is missing completely for:
- beginning of time -> HEAD, directly
- beginning of time -> snap, directly
- beginning of time -> snap, through intermediate snap
- snap -> snap, through intermediate snap
- snap -> HEAD, through intermediate snap
Casey Bodley [Wed, 13 Dec 2023 14:30:35 +0000 (09:30 -0500)]
rgw/multisite: forwarded requests always pass a bufferlist
d2dbe7550296da6db885b5344c71f77f9acbfd8f added a rgw_forward_request_to_master()
that took the input bufferlist by pointer instead of reference so it
could be optional; however, RGWRESTSimpleRequest::forward_request()
omits the Content-Length header when the data is nullptr. this was an
unintended change and broke the forwarding of some requests
Patrick Donnelly [Fri, 23 Jun 2023 21:01:00 +0000 (17:01 -0400)]
mds: add balance_automate fs setting
To turn off the automatic ("default") balancer in multiple MDS clusters. The
new default is "off" as the balancer is a constant source of problems and
surprise for administrators trying multiple actives. Instead, it should be a
deliberate decision to turn it on and usually with customization like the
"bal_rank_mask" setting or pinning.
Fixes: https://tracker.ceph.com/issues/61378 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Ilya Dryomov [Fri, 8 Dec 2023 14:19:02 +0000 (15:19 +0100)]
librbd: OBJECT_PENDING should always be treated as dirty
OBJECT_PENDING is a transition state which normally isn't encountered
in (snapshot) object maps. In case it's encountered, for example when
a snapshot is taken after losing power at the time a discard was being
handled, the object should be treated as dirty and produce a diff as
a result.
Assuming an object is marked OBJECT_PENDING, theoretically there are
four cases with respect to object's state in the next snapshot:
Prior to commit b81cd2460de7 ("librbd/object_map: diff state machine
should track object existence"), (3) was handled incorrectly (diff set
to DIFF_STATE_NONE instead of DIFF_STATE_UPDATED).
Post commit 399a45e11332 ("librbd/object_map: rbd diff between two
snapshots lists entire image content"), (4) is handled incorrectly
(diff set to DIFF_STATE_DATA instead of DIFF_STATE_DATA_UPDATED).
Similar to DiffIterateTest.DiffIterateDeterministic, systematically
cover the most common cases involving full-object discards. With this
in place, issue [1] can be reproduced by any of:
(preparatory) before snap3 is taken
(1) beginning of time -> HEAD
(2) snap1 -> HEAD
(5) beginning of time -> snap3
(6) snap1 -> snap3
Sub-object discards aren't covered here because of further issues
[2][3].
Ilya Dryomov [Fri, 10 Nov 2023 10:14:42 +0000 (11:14 +0100)]
librbd: resurrect "exists" assert in simple_diff_cb()
This effectively reverts commit 3ccc3bb4bd35 ("librbd: diff_iterate
needs to handle holes in parent images") which just dropped the assert
instead of addressing the root cause of reported crashes.
Ilya Dryomov [Thu, 9 Nov 2023 19:44:18 +0000 (20:44 +0100)]
librbd: diff-iterate shouldn't ever report "new hole" against a hole
If an object doesn't exist in both start and end versions but there is
an intermediate snapshot which contains it (i.e. the object is written
to and captured at some point but then discarded prior to or in the end
version), diff-iterate reports "new hole" -- callback is invoked with
exists=false. This occurs both on the slow list_snaps path and in
fast-diff mode.
Despite going all the way back to the introduction of diff-iterate in
commit 0296c7cdae91 ("librbd: implement diff_iterate"), this behavior
is wrong and contradicts diff-iterate API documentation added in commit a69532e86450 ("librbd: document diff_iterate in header") in the same
series:
If the source snapshot name is NULL, we interpret that as
the beginning of time and return all allocated regions of the
image.
It also triggered an assert added in commit c680531e070a ("librbd:
change diff_iterate interface to be more C-friendly") in the same
series. Unfortunately, commit f1f6407221a0 ("test_librbd: add
diff_iterate test including discard"), also part of the same series,
added a test which expected the wrong behavior. Very confusing!
A year later, a different manifestation of this bug was fixed in commit 9a1ab95176fe ("rbd: Fix rbd diff for non-existent objects"), but the
fix only covered the case where calc_snap_set_diff() goes past the end
snap ID while processing clones. The case where it runs out of clones
to process before reaching the end snap ID remained mishandled.
A year after that, commit 3ccc3bb4bd35 ("librbd: diff_iterate needs to
handle holes in parent images") dropped the assert mentioned above and
this bug got enshrined in the newly introduced fast-diff mode.
Finally, a few years later, deep-copy actually started relying on this
bug in commit e5a21e904142 ("librbd: deep-copy image copy state machine
skips clean objects"). This necessitates bifurcation in DiffRequest
because deep-copy wants the "has this object been touched" semantics,
which is different from diff-iterate (and also potentially much more
expensive to produce!).
This commit brings a minimal update to TestMockObjectMapDiffRequest
tests and DiffIterateTest.DiffIterateDiscard. Coverage is expanded in
the following commits.
Patrick Donnelly [Mon, 11 Dec 2023 13:37:39 +0000 (08:37 -0500)]
Merge PR #54726 into main
* refs/pull/54726/head:
PendingReleaseNotes: announce cephfs-shell avail. on rhel9
qa: test fs:shell on all distros
qa: add cephfs-shell to installed rpm packages
ceph.spec.in: enable support for cephfs-shell by default via EPEL9
John Mulligan [Wed, 15 Nov 2023 21:39:07 +0000 (16:39 -0500)]
cephadm: move abstract script handling functions to runscripts.py
Add a new file runscripts.py for the lower-level management of scripts
and related files that are invoked by systemd units. This patch ended up
uglier than I desired because there was a bunch of daemon specific logic
that remains in cephadm.py and those functions all needed to be updated
to avoid calling functions that write to the scripts directly.
Now customizations are done by passing a list of commands: these
commands can be either a string that will be literally added to the
scipt, a list that will be quoted and then added to the script, or
a ContainerCommand which is basically a wrapper around the arguments to
_write_container_cmd_to bash.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Thu, 28 Sep 2023 21:18:18 +0000 (17:18 -0400)]
cephadm: add a higher-level function for managing systemd units
Add the function update_files to systemd_unit.py to encapsulate and
abstract the details regarding the generation of system unit files.
This will make it simpler in the future to add more advanced systemd
configurations include managing customized unit files and systemd
unit drop-in files.
Some additional work was needed to update the recently added
command_unit_install function. Because the new systemd_unit.update_files
function requires a full daemon identity. The command_unit_install
function now requires a daemon name. In addition, while testing this
change it was found that the function could not have worked as it was
because it required the fsid but neither used the infer_fsid decorator
nor provided a `--fsid` argument. Both were added.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Thu, 28 Sep 2023 21:17:03 +0000 (17:17 -0400)]
cephadm: update unit file test imports
Update systemd unit file tests file to use the canonical module for
systemd units functions rather than importing them indirectly from
cephadm.py.
This future proofs the test in case the imports in cephadm.py
change.
Signed-off-by: John Mulligan <jmulligan@redhat.com>