This makes ceph-volume report partitions in inventory.
A partition is a valid device for `ceph-volume lvm prepare`
so we should report them in inventory (when using `--list-all`
parameter).
Nizamudeen A [Tue, 16 Jan 2024 05:21:56 +0000 (10:51 +0530)]
admin/doc-requirements: bump Sphinx to 5.0.2
```
Running Sphinx v4.5.0
Sphinx version error:
The sphinxcontrib.applehelp extension used by this project needs at least Sphinx v5.0; it therefore cannot be built with this version.
```
Ilya Dryomov [Sat, 6 Jan 2024 16:08:04 +0000 (17:08 +0100)]
librbd: try to preserve object map for diff-iterate in fast-diff mode
As an optimization, try to ensure that the object map for the end
version is preloaded through the acquisition of exclusive lock and
as a consequence remains around until exclusive lock is released.
If it's not around, DiffRequest would (re)load it on each call.
Ilya Dryomov [Sat, 6 Jan 2024 16:05:39 +0000 (17:05 +0100)]
librbd/object_map: potentially use in-memory object map in DiffRequest
If the object map for the end version is around (already loaded in
memory, either due to the end version being a snapshot or due to
exclusive lock being held), use it to run diff-iterate against the
beginning of time. Since it's the only object map needed in that
case, such calls would be satisfied locally.
Conflicts:
src/test/librbd/mock/MockObjectMap.h [ commit 87459c23aa05
("librbd,rbd_mirror: do not include RWLock.h unless it is
used") not in pacific ]
Ilya Dryomov [Fri, 5 Jan 2024 12:15:54 +0000 (13:15 +0100)]
librbd/object_map: decouple object map processing in DiffRequest
In preparation for potentially using in-memory object map, decouple
object map processing from loading object maps and place the logic in
prepare_for_object_map() and process_object_map().
Ilya Dryomov [Fri, 5 Jan 2024 11:23:24 +0000 (12:23 +0100)]
common/bit_vector: fix iterator vs reference constness confusion
T (ConstIterator or Iterator) is confused with const T here:
IteratorImpl dereference operator is wrongly overloaded on const
and returns Reference instead of ConstReference for ConstIterator.
This then fails inside bufferlist bowels because Reference is
incompatible with bufferlist::const_iterator.
Ilya Dryomov [Thu, 4 Jan 2024 10:39:20 +0000 (11:39 +0100)]
librbd/object_map: don't resize object map in handle_load_object_map()
Currently it's done in two cases:
- if the loaded object map is larger than expected based on byte size,
it's truncated to expected number of objects
- in case of deep-copy, if the loaded object map is smaller than diff
state, it's expanded to get "track the largest of all versions in the
set" semantics
Both of these cases can be easily dealt with without modifying the
object map. Being able to process a const object map is needed for
working on in-memory object map which is external to DiffRequest.
It's totally broken: instead of returning the current position and
moving to the next position, it returns the next position and doesn't
move anywhere. Luckily it hasn't been used until now.
Ilya Dryomov [Thu, 28 Dec 2023 09:14:18 +0000 (10:14 +0100)]
librbd: propagate diff-iterate range to parent in fast-diff mode
When getting parent diff, pass the overlap-reduced image extent instead
of the entire 0..overlap range to avoid a similar quadratic slowdown on
cloned images.
Ilya Dryomov [Wed, 27 Dec 2023 17:07:05 +0000 (18:07 +0100)]
librbd/object_map: add support for ranged diff-iterate
Currently diff-iterate in fast-diff mode is performed on the entire
image no matter what image extent is passed to the API. Then, unused
diff just gets discarded as DiffIterate ends up querying only objects
that the passed image extent maps to. This hasn't been an issue for
internal consumers ("rbd du", "rbd diff", etc) because they work on the
entire image, but turns out to lead to quadratic slowdown in some QEMU
use cases.
0..UINT64_MAX range is carved out for deep-copy which is unranged by
definition. To get effectively unranged diff-iterate, 0..UINT64_MAX-1
range can be used.
Ilya Dryomov [Sat, 23 Dec 2023 14:19:09 +0000 (15:19 +0100)]
test/librbd: expand TestMockObjectMapDiffRequest edge case coverage
For each covered edge case or error, run through the following
scenarios:
- where the edge case concerns snap_id_start
- where the edge case concerns snap_id_end
- where the edge case concerns intermediate snapshot and
snap_id_start == 0 (diff against the beginning of time)
- where the edge case concerns intermediate snapshot and
snap_id_start != 0 (diff from snapshot)
Ilya Dryomov [Sat, 23 Dec 2023 13:47:54 +0000 (14:47 +0100)]
librbd/object_map: allow intermediate snaps to be skipped on diff-iterate
In case of diff-iterate against the beginning of time, the result
depends only on the end version. Loading and processing object maps
or intermediate snapshots is redundant and can be skipped.
This optimization is made possible by commit be507aaed15f ("librbd:
diff-iterate shouldn't ever report "new hole" against a hole") and, to
a lesser extent, the previous commit.
Getting FastDiffInvalid, LoadObjectMapError and ObjectMapTooSmall to
pass required tweaking not just expectations, but also start/end snap
ids and thus also the meaning of these tests. This is addressed in the
next commit.
Ilya Dryomov [Fri, 22 Dec 2023 17:50:20 +0000 (18:50 +0100)]
librbd/object_map: resurrect diff-iterate behavior when image is shrunk
The new "track the largest of all versions in the set, diff state is
only ever grown" semantics introduced in commit 330f2a7bb94f ("librbd:
helper state machine for computing diffs between object-maps") don't
make sense for diff-iterate. It's a waste because DiffIterate won't
query beyond the end version size -- this is baked into the API.
Limit this behavior to deep-copy and resurrect the original behavior
from 2015 for diff-iterate.
Ilya Dryomov [Fri, 22 Dec 2023 15:10:12 +0000 (16:10 +0100)]
librbd/object_map: fix diff from snapshot when image is grown
Commit 399a45e11332 ("librbd/object_map: rbd diff between two
snapshots lists entire image content") fixed most of the damage caused
by commit b81cd2460de7 ("librbd/object_map: diff state machine should
track object existence"), but the case of a "resize diff" when diffing
from snapshot was missed. An area that was freshly allocated in image
resize is the same in principle as a freshly created image and objects
marked OBJECT_EXISTS_CLEAN are no exception. Diff for such objects in
such an area should be set to DIFF_STATE_DATA_UPDATED, however
currently when diffing from snapshot, it's set to DIFF_STATE_DATA.
Ilya Dryomov [Wed, 20 Dec 2023 11:22:17 +0000 (12:22 +0100)]
librbd/object_map: drop bogus if in handle_load_object_map()
It became redundant with commit b81cd2460de7 ("librbd/object_map: diff
state machine should track object existence") -- it != end_it condition
in the loop is sufficient.
In preparation for multiple similarly configured MockTestImageCtx
objects being used in a single test, centralize their creation and add
a couple of helpers for setting expectations from a callback.
rgw/sts: code to check IAM policy and return an
appropriate error incase Resource specified in the
IAM policy is incorrect and is discarded. The IAM
policy can be a resource policy or an identity policy.
This is for policies that have already been set.
rgw/sts: code for returning an error when an IAM policy
resource belongs to someone else's tenant.
While parsing the policy it discards the resource element,
but then when an operation is evaluated, since the resource element
is empty, it doesnt evaluate the resource at all and the policy
ends up erroneously allowing actions on resources in other tenants.
Eliminate m_max_recent and set the capacity of the m_recent ring buffer
when log_max_recent changes. In order to call set_capacity(),
ConcreteEntry needed its move constructor set noexcept.
I haven't followed the boost code all the way down but I suspect that
setting the ring buffer capacity to anything less than 1 entry will
probably cause problems, so restrict log_max_recent to >=1.
Also fix a wrong variable used for printing the max new entries during
"log dump".
Initialize standalone test for stretched clusters,
testing uneven weight warnings and != 2 buckets
warnings.
Added `wait_for_health_gone()` function in ceph-helpers.sh
this function allows us to wait for health condition to
disappear when doing standalone tests.
mon/Monitor: during shutdown don't accept new authentication and create new sessions
During shutdown, the monitor is designed not to accept new authentication requests
or create new sessions. However, a problem arises when the monitor marks its status
as "shutdown" but still accepts new authentication requests and creates new sessions.
This issue causes the monitor to fail when checking the session list.
To fix this problem, an update is implemented. With this fix,
a monitor in the "shutdown" state will correctly reject new authentication requests
and prevent the creation of new sessions.
This ensures that the monitor operates as intended during the shutdown process.
NitzanMordhai [Thu, 16 Nov 2023 07:09:29 +0000 (07:09 +0000)]
Tools/rados: Improve Error Messaging for Object Name Resolution
The current implementation of 'rados clearomap' exhibits a behavior where
an error message is generated without the associated object name or,
in the case of a non-existent object name, may result in a segmentation fault.
The proposed fix addresses this issue by enhancing the error message.
After applying the fix, error messages will consistently display the correct
object name, providing users with more accurate and actionable information.
Casey Bodley [Mon, 8 Jan 2024 16:24:18 +0000 (08:24 -0800)]
make-dist: don't use --continue option for wget
the boost jfrog mirror is broken and returns an HTML error page instead
of the archive. the file size of this page is 11534 bytes
when download_from() retries the download from download.ceph.com, the -c
option tells it to resume the download of the existing file. the
resulting boost_1_82_0.tar.bz2 ends up with the correct total file size
of 121325129 bytes, but the first 11534 bytes still correspond to the
HTML from jfrog. that causes the sha256sum mismatch
remove the -c option so that wget fetches the archive in its entirety
Ilya Dryomov [Wed, 22 Nov 2023 13:39:13 +0000 (14:39 +0100)]
librados: make querying pools for selfmanaged snaps reliable
If get_pool_is_selfmanaged_snaps_mode() is invoked on a fresh RADOS
client instance that still lacks an osdmap, it returns false, same as
for "this pool is not in selfmanaged snaps mode". The same happens if
the pool in question doesn't exist since the signature doesn't allow to
return an error.
The motivation for this API was to prevent users from running "rados
cppool" on a pool with unmanaged snapshots and deleting the original
thinking that they have a full copy. Unfortunately, it's exactly
"rados cppool" that fell into this trap, so no warning is printed and
--yes-i-really-mean-it flag isn't enforced.
Ilya Dryomov [Sun, 10 Dec 2023 16:01:24 +0000 (17:01 +0100)]
test/pybind/rbd: don't ignore from_snapshot in check_diff()
Despite the test in test_diff_iterate() being correct, it started
failing:
> check_diff(self.image, 0, IMG_SIZE, 'snap1', [(0, 512, False)])
...
a = [], b = [(0, 512, False)]
...
> assert a == b
E AssertionError
This is because check_diff() drops 'snap1' argument on the floor and
passes None to image.diff_iterate() instead. This goes back to 2013,
see commit e88fe3cbbc8f ("rbd.py: add some missing functions").
- beginning of time -> HEAD, through intermediate snap
- snap -> snap, directly
- snap -> HEAD, directly
But coverage is too weak: none of the weird OBJECT_PENDING cases and
only a single diff-iterate vs deep-copy case is tested, for example.
Coverage is missing completely for:
- beginning of time -> HEAD, directly
- beginning of time -> snap, directly
- beginning of time -> snap, through intermediate snap
- snap -> snap, through intermediate snap
- snap -> HEAD, through intermediate snap
Ilya Dryomov [Fri, 8 Dec 2023 14:19:02 +0000 (15:19 +0100)]
librbd: OBJECT_PENDING should always be treated as dirty
OBJECT_PENDING is a transition state which normally isn't encountered
in (snapshot) object maps. In case it's encountered, for example when
a snapshot is taken after losing power at the time a discard was being
handled, the object should be treated as dirty and produce a diff as
a result.
Assuming an object is marked OBJECT_PENDING, theoretically there are
four cases with respect to object's state in the next snapshot:
Prior to commit b81cd2460de7 ("librbd/object_map: diff state machine
should track object existence"), (3) was handled incorrectly (diff set
to DIFF_STATE_NONE instead of DIFF_STATE_UPDATED).
Post commit 399a45e11332 ("librbd/object_map: rbd diff between two
snapshots lists entire image content"), (4) is handled incorrectly
(diff set to DIFF_STATE_DATA instead of DIFF_STATE_DATA_UPDATED).
Similar to DiffIterateTest.DiffIterateDeterministic, systematically
cover the most common cases involving full-object discards. With this
in place, issue [1] can be reproduced by any of:
(preparatory) before snap3 is taken
(1) beginning of time -> HEAD
(2) snap1 -> HEAD
(5) beginning of time -> snap3
(6) snap1 -> snap3
Sub-object discards aren't covered here because of further issues
[2][3].
Ilya Dryomov [Fri, 10 Nov 2023 10:14:42 +0000 (11:14 +0100)]
librbd: resurrect "exists" assert in simple_diff_cb()
This effectively reverts commit 3ccc3bb4bd35 ("librbd: diff_iterate
needs to handle holes in parent images") which just dropped the assert
instead of addressing the root cause of reported crashes.
Ilya Dryomov [Thu, 9 Nov 2023 19:44:18 +0000 (20:44 +0100)]
librbd: diff-iterate shouldn't ever report "new hole" against a hole
If an object doesn't exist in both start and end versions but there is
an intermediate snapshot which contains it (i.e. the object is written
to and captured at some point but then discarded prior to or in the end
version), diff-iterate reports "new hole" -- callback is invoked with
exists=false. This occurs both on the slow list_snaps path and in
fast-diff mode.
Despite going all the way back to the introduction of diff-iterate in
commit 0296c7cdae91 ("librbd: implement diff_iterate"), this behavior
is wrong and contradicts diff-iterate API documentation added in commit a69532e86450 ("librbd: document diff_iterate in header") in the same
series:
If the source snapshot name is NULL, we interpret that as
the beginning of time and return all allocated regions of the
image.
It also triggered an assert added in commit c680531e070a ("librbd:
change diff_iterate interface to be more C-friendly") in the same
series. Unfortunately, commit f1f6407221a0 ("test_librbd: add
diff_iterate test including discard"), also part of the same series,
added a test which expected the wrong behavior. Very confusing!
A year later, a different manifestation of this bug was fixed in commit 9a1ab95176fe ("rbd: Fix rbd diff for non-existent objects"), but the
fix only covered the case where calc_snap_set_diff() goes past the end
snap ID while processing clones. The case where it runs out of clones
to process before reaching the end snap ID remained mishandled.
A year after that, commit 3ccc3bb4bd35 ("librbd: diff_iterate needs to
handle holes in parent images") dropped the assert mentioned above and
this bug got enshrined in the newly introduced fast-diff mode.
Finally, a few years later, deep-copy actually started relying on this
bug in commit e5a21e904142 ("librbd: deep-copy image copy state machine
skips clean objects"). This necessitates bifurcation in DiffRequest
because deep-copy wants the "has this object been touched" semantics,
which is different from diff-iterate (and also potentially much more
expensive to produce!).
This commit brings a minimal update to TestMockObjectMapDiffRequest
tests and DiffIterateTest.DiffIterateDiscard. Coverage is expanded in
the following commits.
Xiubo Li [Mon, 27 Nov 2023 07:55:42 +0000 (15:55 +0800)]
mds: set the loner to true for LOCK_EXCL_XSYN
For filelock when in LOCK_EXCL_XSYN state the non-loner clients
should be issued empty caps, but since the loner of this state
is set to false and it could make the Locker to issue the Fcb caps
to them, which is incorrect.