ceph-volume: migrate unit tests from 'mock' to 'unittest.mock'
unit tests in ceph-volume was still using the external 'mock' library
for unit tests, which is unnecessary since 'unittest.mock' is part
of the Python standard library (available since Python 3.3).
This commit updates all imports to use 'unittest.mock' instead,
ensuring better maintainability and removing the need for an extra
dependency.
This refactors `get_physical_osds()`.
The calculation of `data_slots` is now more concise. The handling of
`dev_size`, `rel_data_size`, and `abs_size` is standardized.
The initialization of `free_size` is moved outside the loop
for clarity. Redundant checks and assignments are removed to simplify
the code.
ceph-volume: support splitting db even on collocated scenario
This change enables ceph-volume to create OSDs where the DB is
explicitly placed on a separate LVM partition, even in collocated
scenarios (i.e., block and DB on the same device).
This helps mitigate BlueStore fragmentation issues.
Given that ceph-volume can't automatically predict a proper default size for the db device,
the idea is to use the `--block-db-size` parameter:
Passing `--block-db-size` and `--db-devices` makes ceph-volume create db devices
on dedicated devices (current implementation):
```
Total OSDs: 2
Type Path LV Size % of device
----------------------------------------------------------------------------------------------------
data /dev/vdb 200.00 GB 100.00%
block_db /dev/vdd 4.00 GB 2.00%
----------------------------------------------------------------------------------------------------
data /dev/vdc 200.00 GB 100.00%
block_db /dev/vdd 4.00 GB 2.00%
```
Passing `--block-db-size` without `--db-devices` makes ceph-volume create a separate
LV for db device on the same device (new behavior):
```
Total OSDs: 2
Type Path LV Size % of device
----------------------------------------------------------------------------------------------------
data /dev/vdb 196.00 GB 98.00%
block_db /dev/vdb 4.00 GB 2.00%
----------------------------------------------------------------------------------------------------
data /dev/vdc 196.00 GB 98.00%
block_db /dev/vdc 4.00 GB 2.00%
```
This new behavior is supported with the `--osds-per-device` parameter:
```
Total OSDs: 4
Type Path LV Size % of device
----------------------------------------------------------------------------------------------------
data /dev/vdb 96.00 GB 48.00%
block_db /dev/vdb 4.00 GB 2.00%
----------------------------------------------------------------------------------------------------
data /dev/vdb 96.00 GB 48.00%
block_db /dev/vdb 4.00 GB 2.00%
----------------------------------------------------------------------------------------------------
data /dev/vdc 96.00 GB 48.00%
block_db /dev/vdc 4.00 GB 2.00%
----------------------------------------------------------------------------------------------------
data /dev/vdc 96.00 GB 48.00%
block_db /dev/vdc 4.00 GB 2.00%
```
This adds Python type annotations to `ceph_volume.util.device`,
along with all necessary adjustments to ensure compatibility
and maintain code clarity.
ceph-volume: set default value for BlueStore.block_lv to None
This change updates the `BlueStore` class in
`ceph_volume.objectstore` by initializing the `block_lv` attribute
to `None` with the type `Optional[Volume]`. This ensures that the
attribute has a default value and avoids potential runtime errors
when the attribute is accessed before being explicitly assigned.
ceph-volume: improve wipefs retry logic in lvm.zap
- Simplify the initialization of `tries` and `interval` variables for clarity.
- Adjust the retry logic in the `wipefs` function to:
- Include the attempt count in the warning message for better debugging.
- Start the retry loop at 1 and increment up to `tries`.
- Remove unnecessary unpacking of `stdout` and `stderr` since they were unused.
- Update the loop to increment `tries` by 1 to reflect the intended number of attempts.
This change improves code readability and makes retry behavior more transparent.
Ilya Dryomov [Tue, 18 Feb 2025 16:51:47 +0000 (17:51 +0100)]
test/rbd_mirror: clear Namespace::s_instance at the end of a test
TestMockPoolReplayer.Namespaces and NamespacesError tests leave behind
a dangling pointer to a stack-allocated MockNamespace which leads to an
easily reproducible use-after-free and segfault when tests are shuffled.
Ilya Dryomov [Mon, 17 Feb 2025 11:41:51 +0000 (12:41 +0100)]
test/rbd_mirror: flush watch/notify callbacks in TestImageReplayer
TestImageReplayer establishes its own (i.e. outside of the SUT code)
watch on the header of the remote image to be able to synchronize the
execution of the test with certain notifications. This watch is
established before the remote image is opened and is teared down until
after the remote image is closed but while the image replayer is still
running. The flush that is part of image close sequence thus isn't
guaranteed to cover all callbacks, especially for snapshot-based
mirroring where UnlinkPeerRequest spawned from Replayer::unlink_peer()
generates a notification on the remote image for each completed unlink.
Since TestImageReplayer further immediately deletes C_WatchCtx, pretty
much any test can segfault when C_WatchCtx::handle_notify() is invoked
by TestWatchNotify infrastructure. Because it's a virtual method, the
segfault often involves a completely bogus instruction pointer:
Improve the grammar and correct the formatting of the "Upgrading root ca
certificates" procedure that was added to the documentation in https://github.com/ceph/ceph/pull/61867
Matan Breizman [Tue, 18 Feb 2025 10:40:24 +0000 (12:40 +0200)]
script/lib-build: Use clang 14
Updating to newer clang requires multiple fixes.
Don't use newer clang than 14. If needed, we could backport
the fixes from [1] and then use newer releases.
Zac Dover [Mon, 10 Feb 2025 08:12:34 +0000 (18:12 +1000)]
doc/cephadm: improve "Activate Existing OSDs".
Make three minor changes to doc/cephadm/services/osd.rst. These three
changes were suggested by Eugen Block, who reviewed this procedure after
developing it.
Zac Dover [Fri, 7 Feb 2025 01:32:20 +0000 (11:32 +1000)]
doc/cephadm: improve "Activate Existing OSDs"
Improve the section "Activate Existing OSDs".
Supplement the information in the "Activate Existing OSDs" section with
a procedure developed by Eugen Block, here:
https://heiterbiswolkig.blogs.nde.ag/2025/02/06/cephadm-activate-existing-osds/
This procedure explains how to activate OSDs on a host that, for
whatever reason, has had to have its operating system reinstalled.
John Mulligan [Tue, 20 Aug 2024 19:01:05 +0000 (15:01 -0400)]
src/script: add a script to help build ceph using containers
The build-with-container script tries to encapsulate nearly all major
build tasks using docker/podman containers. If there's no build image
locally it will create one for your. It provides targets for building
(make), testing (make check), building rpm packages or deb packages and
is designed to be fairly easily extended.
View the comment at the top of the source file for usage details.
John Mulligan [Tue, 20 Aug 2024 19:00:57 +0000 (15:00 -0400)]
build: add files needed to create a build container
A build container contains all the tools and dependencies needed to
build ceph. It provides a Container file and small script that
helps bootstrap the container setup. This script installs a few extra
things we need before farming most of the work out to install-deps.sh.
John Mulligan [Sat, 14 Sep 2024 10:31:23 +0000 (06:31 -0400)]
build: small script tweak to allow different build dirs
Move the mkdir line to allow for other builds dir naming schemes outside
of what appears in the .gitignore file. A tiny bit of added flexibility
at little cost.
John Mulligan [Mon, 14 Nov 2022 15:57:25 +0000 (10:57 -0500)]
src/script: add helper function has_build_dir
This function returns successfully if $BUILD_DIR exists and is valid.
This is a useful building block for automation around the build and
can be used to avoid re-running commands that fail is the build dir
exists already.
Ilya Dryomov [Wed, 29 Jan 2025 11:56:34 +0000 (12:56 +0100)]
librbd: stop filtering async request error codes
The roots of this go back to 2015 when snap create was changed to
filter EEXIST in commit 63f6c9bac9a4 ("librbd: fixed snap create race
conditions") and flatten respectively EINVAL in commit ef7e210c3f74
("librbd: better handling for duplicate flatten requests"). From there
this pattern made it to most other operations that can be proxied
including "rbd migration execute".
The motivation was to suppress generation of an "expected" error in
response to a duplicate async request notification for the operation.
However, doing this at the top of the handler (right before returning
to the caller) and for an error as generic as EINVAL is super fragile.
It's trivial for an error that is being filtered to sneak in with
a lower level change completely unnoticed. For example, live migration
recently added NBD stream which is implemented on top of libnbd and it
turns out that some libnbd APIs return EINVAL on various occasions when
the NBD endpoint disappears and an error like ENOTCONN would make more
sense. If this occurs during "rbd migration execute" operation, the
rest of librbd never learns that migration was disrupted and the image
is transitioned to MIGRATION_STATE_EXECUTED, thus handing a partially
imported (read: corrupted) image to the user.
Luckily, with commits 07fbc4b71df4 ("librbd: track complete async
operation requests") and 96bc20445afb ("librbd: track complete async
operation return code"), the scenario which originally prompted error
code filtering isn't an issue anymore. Despite a few shortcomings
(e.g. when an async request notification is acked with result 0, it's
impossible to tell whether a) a new operation was kicked off, b) there
is an operation that is still in progress or c) it's for an operation
that completed earlier but hasn't "expired" yet), even just commit 07fbc4b71df4 by itself prevents a duplicate notification from kicking
off a second operation that could generate an error for something that
actually succeeded. With that in mind, eradicate error code filtering
from Operations class.
John Mulligan [Tue, 21 Jan 2025 21:28:42 +0000 (16:28 -0500)]
container: add label ceph=True back
Add a label used by cephadm internally that was always set by
ceph-container [1] back to the new containerfile. This should
prevent issues with cephadm shell command thinking official ceph images
are not official ceph images.
Dan Mick [Thu, 23 Jan 2025 02:28:15 +0000 (18:28 -0800)]
container/build.sh: fix up org vs. repo naming
release builds were using the wrong container repo name because of
confused variable naming and inadequate separation. Keep the hostname,
org name, and repo name in separate variables, and assemble the full
path with a version when tagging is done.
Ilya Dryomov [Thu, 30 Jan 2025 19:30:18 +0000 (20:30 +0100)]
doc/rbd: use https links in live import examples
Even though it's explicitly said that "http" stream can be used to
import via both HTTP and HTTPS, it can still be confusing that "type":
"http" is expected to go with "url": "https://...". Switch example
URLs from HTTP to HTTPS to make it more obvious.
Ilya Dryomov [Mon, 27 Jan 2025 11:29:54 +0000 (12:29 +0100)]
osd/OSDCap: fix misleading grammar comments
The restrictions on pool name and namespace have been independent of
each other for ages. Specifying namespace[=]<namespace> doesn't require
specifying pool[=]<pool> like is currently suggested -- neither for
regular "allow" grants nor for "profile" grants.
Ilya Dryomov [Fri, 24 Jan 2025 19:47:11 +0000 (20:47 +0100)]
mon/OSDMonitor: relax cap enforcement for unmanaged snapshots
Since commit 4972e054b32c ("mon/OSDMonitor: enforce caps when
creating/deleting unmanaged snapshots"), a) write access to the MON
service, b) write access to the OSD service for a pool or c) permission
for "osd pool op unmanaged-snap" command for a pool is required. For
"profile rbd" we configure read-only access to the MON service and rely
on write access to the OSD service, however the corresponding check in
is_osd_writable() is too strict.
A OSD cap like "profile rbd namespace=myns" or "allow w namespace=myns"
allows write access to myns namespace of any pool, but is_osd_writable()
disallows operations with unmanaged snapshots with such a cap because
its match.pool_namespace.pool_name.empty() is true. This condition
appears to serve as the "doesn't include support for the application
tag" guard, but it should actually be match.pool_tag.is_match_all()
(or match.pool_tag.application.empty() if open-coded) -- no restriction
on the pool name doesn't automatically mean that there is a restriction
on the application tag.