CMake 3.28 has turned stricter when executing string(REPLACE …) and
expects four or more commands. In case of distro package builds from
tarball, it happens that git is not present. CTags.cmake tries to
catch that by veriying the exit status of the command, but as there
is in fact git | awk, awk returns 0 even when git does not exist.
Ensure that the variable submodules has been defined before trying
to replace substrings in this variable.
Zac Dover [Tue, 19 Dec 2023 09:15:57 +0000 (19:15 +1000)]
doc/install: update "update submodules"
Remove misleading material that would give readers the wrong idea about
when stale submodules are present. This commit is made in response to
information given to me by Ilya Dryomov here: https://github.com/ceph/ceph/pull/54929#issuecomment-1859237986.
Matan Breizman [Wed, 30 Aug 2023 08:57:18 +0000 (08:57 +0000)]
osd/OSD: introduce trim_stale_maps
```
/**
* trim_stale_maps
*
* trim_maps had a possible (rare) leak which resulted in stale osdmaps.
* This method will cleanup any existing osdmaps from the store
* with an epoch earlier than the superblock's oldest_map epoch.
* See: https://tracker.ceph.com/issues/61962
*/
```
Kefu Chai [Fri, 15 Dec 2023 10:18:01 +0000 (18:18 +0800)]
cmake: differentiate WITH_CEPHFS from WITH_LIBCEPHFS
WITH_CEPHFS is used for enabling the server side of cephfs, i.e., mds,
while WITH_LIBCEPHFS is for the client side of it, namely libcephfs
and its python bindings. since we have these two different options,
in theory, user is allowed to build with WITH_CEPHFS=ON and
WITH_LIBCEPHFS=OFF, and in that case, `vstart` should not depend on
the non-existent target of cephfs and cython_cephfs. the same applies to
the build of cython_cephfs. this is an unusual combination, but
it is a valid one.
in this change, we
* build the python binding only if WITH_LIBCEPHFS is ON
* add cephfs and cython_cephfs as dependencies of vstart only if
WITH_LIBCEPHFS is ON
- Modify ClientRequest::snaps_need_to_recover() to return all
relevant snaps including the operation target.
- Update ClientRequest::snaps_need_to_recover() and
CommonClientRequest::recover_missings to use set::set
for snaps.
- Remove special handling for soid from
CommonClientRequest::recover_missings
- Simplify CommonClientRequest::recover_missings.
Patrick Donnelly [Fri, 15 Dec 2023 13:18:28 +0000 (08:18 -0500)]
Merge PR #52196 into main
* refs/pull/52196/head:
qa: configure balancer for multi-mds workloads
qa: create qa subvolumes in named subvolumegroup
qa: do not rely on default max_mds value
qa: add automate_balance to dashboard qa schema
doc/cephfs: add docs for balance_automate
doc/cephfs: use bash prompt for shell code
mds: add balance_automate fs setting
Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Xiubo Li <xiubli@redhat.com>
Ilya Dryomov [Sun, 10 Dec 2023 16:01:24 +0000 (17:01 +0100)]
test/pybind/rbd: don't ignore from_snapshot in check_diff()
Despite the test in test_diff_iterate() being correct, it started
failing:
> check_diff(self.image, 0, IMG_SIZE, 'snap1', [(0, 512, False)])
...
a = [], b = [(0, 512, False)]
...
> assert a == b
E AssertionError
This is because check_diff() drops 'snap1' argument on the floor and
passes None to image.diff_iterate() instead. This goes back to 2013,
see commit e88fe3cbbc8f ("rbd.py: add some missing functions").
- beginning of time -> HEAD, through intermediate snap
- snap -> snap, directly
- snap -> HEAD, directly
But coverage is too weak: none of the weird OBJECT_PENDING cases and
only a single diff-iterate vs deep-copy case is tested, for example.
Coverage is missing completely for:
- beginning of time -> HEAD, directly
- beginning of time -> snap, directly
- beginning of time -> snap, through intermediate snap
- snap -> snap, through intermediate snap
- snap -> HEAD, through intermediate snap
Casey Bodley [Wed, 13 Dec 2023 14:30:35 +0000 (09:30 -0500)]
rgw/multisite: forwarded requests always pass a bufferlist
d2dbe7550296da6db885b5344c71f77f9acbfd8f added a rgw_forward_request_to_master()
that took the input bufferlist by pointer instead of reference so it
could be optional; however, RGWRESTSimpleRequest::forward_request()
omits the Content-Length header when the data is nullptr. this was an
unintended change and broke the forwarding of some requests
Patrick Donnelly [Fri, 23 Jun 2023 21:01:00 +0000 (17:01 -0400)]
mds: add balance_automate fs setting
To turn off the automatic ("default") balancer in multiple MDS clusters. The
new default is "off" as the balancer is a constant source of problems and
surprise for administrators trying multiple actives. Instead, it should be a
deliberate decision to turn it on and usually with customization like the
"bal_rank_mask" setting or pinning.
Fixes: https://tracker.ceph.com/issues/61378 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Ilya Dryomov [Fri, 8 Dec 2023 14:19:02 +0000 (15:19 +0100)]
librbd: OBJECT_PENDING should always be treated as dirty
OBJECT_PENDING is a transition state which normally isn't encountered
in (snapshot) object maps. In case it's encountered, for example when
a snapshot is taken after losing power at the time a discard was being
handled, the object should be treated as dirty and produce a diff as
a result.
Assuming an object is marked OBJECT_PENDING, theoretically there are
four cases with respect to object's state in the next snapshot:
Prior to commit b81cd2460de7 ("librbd/object_map: diff state machine
should track object existence"), (3) was handled incorrectly (diff set
to DIFF_STATE_NONE instead of DIFF_STATE_UPDATED).
Post commit 399a45e11332 ("librbd/object_map: rbd diff between two
snapshots lists entire image content"), (4) is handled incorrectly
(diff set to DIFF_STATE_DATA instead of DIFF_STATE_DATA_UPDATED).
Similar to DiffIterateTest.DiffIterateDeterministic, systematically
cover the most common cases involving full-object discards. With this
in place, issue [1] can be reproduced by any of:
(preparatory) before snap3 is taken
(1) beginning of time -> HEAD
(2) snap1 -> HEAD
(5) beginning of time -> snap3
(6) snap1 -> snap3
Sub-object discards aren't covered here because of further issues
[2][3].
Ilya Dryomov [Fri, 10 Nov 2023 10:14:42 +0000 (11:14 +0100)]
librbd: resurrect "exists" assert in simple_diff_cb()
This effectively reverts commit 3ccc3bb4bd35 ("librbd: diff_iterate
needs to handle holes in parent images") which just dropped the assert
instead of addressing the root cause of reported crashes.
Ilya Dryomov [Thu, 9 Nov 2023 19:44:18 +0000 (20:44 +0100)]
librbd: diff-iterate shouldn't ever report "new hole" against a hole
If an object doesn't exist in both start and end versions but there is
an intermediate snapshot which contains it (i.e. the object is written
to and captured at some point but then discarded prior to or in the end
version), diff-iterate reports "new hole" -- callback is invoked with
exists=false. This occurs both on the slow list_snaps path and in
fast-diff mode.
Despite going all the way back to the introduction of diff-iterate in
commit 0296c7cdae91 ("librbd: implement diff_iterate"), this behavior
is wrong and contradicts diff-iterate API documentation added in commit a69532e86450 ("librbd: document diff_iterate in header") in the same
series:
If the source snapshot name is NULL, we interpret that as
the beginning of time and return all allocated regions of the
image.
It also triggered an assert added in commit c680531e070a ("librbd:
change diff_iterate interface to be more C-friendly") in the same
series. Unfortunately, commit f1f6407221a0 ("test_librbd: add
diff_iterate test including discard"), also part of the same series,
added a test which expected the wrong behavior. Very confusing!
A year later, a different manifestation of this bug was fixed in commit 9a1ab95176fe ("rbd: Fix rbd diff for non-existent objects"), but the
fix only covered the case where calc_snap_set_diff() goes past the end
snap ID while processing clones. The case where it runs out of clones
to process before reaching the end snap ID remained mishandled.
A year after that, commit 3ccc3bb4bd35 ("librbd: diff_iterate needs to
handle holes in parent images") dropped the assert mentioned above and
this bug got enshrined in the newly introduced fast-diff mode.
Finally, a few years later, deep-copy actually started relying on this
bug in commit e5a21e904142 ("librbd: deep-copy image copy state machine
skips clean objects"). This necessitates bifurcation in DiffRequest
because deep-copy wants the "has this object been touched" semantics,
which is different from diff-iterate (and also potentially much more
expensive to produce!).
This commit brings a minimal update to TestMockObjectMapDiffRequest
tests and DiffIterateTest.DiffIterateDiscard. Coverage is expanded in
the following commits.
Patrick Donnelly [Mon, 11 Dec 2023 13:37:39 +0000 (08:37 -0500)]
Merge PR #54726 into main
* refs/pull/54726/head:
PendingReleaseNotes: announce cephfs-shell avail. on rhel9
qa: test fs:shell on all distros
qa: add cephfs-shell to installed rpm packages
ceph.spec.in: enable support for cephfs-shell by default via EPEL9