Aashish Sharma [Thu, 28 Nov 2024 05:58:59 +0000 (11:28 +0530)]
mgr/dashboard: Add ceph_daemon filter to rgw overview grafana panel
queries
Currently rgw_servers filtering is not working in RGW Overview garfana graphs.
It is showing data of all the RGW services, even though filter set to single service.
This PR intends to solve this issue
ceph-volume: allow zapping partitions on multipath devices
ceph-volume refuses to zap a device if it is a partition on a multipath
device due to an overly strict condition. This change ensures that only
full mapper devices (excluding partitions) are blocked from being zapped,
allowing partitions on multipath devices to be processed correctly.
John Mulligan [Fri, 14 Feb 2025 19:51:03 +0000 (14:51 -0500)]
doc: document the new container build tool and link to it in README
Add a new markdown file in the root of the tree, ContainerBuild.md, that
can serve as a basic introduction to the new container build tools
recently merged to ceph.
Add a small 'breadcrumb' section to the project README.md to help find
this new document.
John Mulligan [Thu, 20 Feb 2025 00:17:30 +0000 (19:17 -0500)]
script/build-with-container: add support for overlay dir
The source dir (aka homedir, default /ceph) is mounted in the container
read-write. This is needed as the various ceph build scripts expect to
write things into the tree - often this is in the build directory - but
not always. This can lead to small messes and/or situations that are
confusing to debug, especially if one is jumping between distros often.
Add an option to use an overlay volume for the homedir - by default we
enable a persistent overlay with a supplied "upper dir" where files that
were written will appear. One can also enable a temporary overlay that
forgets the writes when the container exits - maybe useful when doing
experiments in 'interactive' mode.
To use this option run the command with the `--overlay=<dir>` option.
For example: `./src/script/build-with-container.py -b build.inner
--overlay-dir build.ovr`. This will create a directory
`build.ovr/content` automatically and all new files will appear there.
For example the build directory will appear at
`build.ovr/content/build.inner`.
To use the temporary overlay use a `-` as the directory name. For
example: `./src/script/build-with-container.py -b build.inner
--overlay-dir -`
John Mulligan [Thu, 20 Feb 2025 14:50:49 +0000 (09:50 -0500)]
script/build-with-container: skip dnf cache dir volume mounts on docker
When using docker the --volume option is not available during build
(docker [buildx] build), unlike podman. Since passing these volumes must
be conditional on them being set up I see no way to handle this short of
just disabling the option on docker. Log the fact that it's being
skipped - the only other issue is that we pointlessly set up some dirs
and the build may be a bit slower.
John Mulligan [Wed, 19 Feb 2025 18:20:36 +0000 (13:20 -0500)]
script/build-with-container: remove default --volume arg from ctr build
On the original github pr #59841 user fayak kindly informed us that the
--volume option was not supported by docker build. Since this section
was a leftover from a previous way of constructing the builder image and
was no longer needed we simply removed it.
John Mulligan [Wed, 19 Feb 2025 18:20:01 +0000 (13:20 -0500)]
script/build-with-container.py: build builder image with --pull=always
Construct the builder image using the --pull=always flag to initiate a
pull of the base image (centos, ubuntu, etc) in order to avoid using a
stale base image. Since the script automatically (by default) avoids
building if a matching tag is in local container storage it is handy to
use a fresh base when it *is* time to build something. Otherwise, you
end up in a situation like I sometimes do - using a months old base
unintentionally.
John Mulligan [Fri, 14 Feb 2025 19:50:42 +0000 (14:50 -0500)]
script/build-with-container: add a common packages target
Add a `packages` target to build-with-container.py that requests a build
of packages, whatever package type is native to the distro selected.
For example `./src/script/build-with-container.py -d ubuntu22.04 -e
packages` will automatically select a deb packages build where
`./src/script/build-with-container.py -d centos9 -e packages` will
trigger rpm packages to be built. The underlying package-type specific
targets remain unchanged.
John Mulligan [Fri, 14 Feb 2025 16:44:35 +0000 (11:44 -0500)]
script/build-with-container: support custom tag suffixes
Previously, one could use the `--tag` option to completely override the
container tag generated by the script. However, there are cases where
one may want to add information to the tag rather than override it.
Allow the tag value to start with a plus (+) character that indicates
that the remainder of the string is to be suffixed to the generated tag.
Add a command line option --base-branch that allows the user to supply a
custom base branch name. git doesn't make determining this easy so we
always assume a base branch of 'main' by default - but this option lets
one change that.
John Mulligan [Fri, 14 Feb 2025 16:24:29 +0000 (11:24 -0500)]
src/script: rename CEPH_BRANCH to CEPH_BASE_BRANCH for build container
Previously, we were passing build argument of CEPH_BRANCH, but that was
a bit misleading as we expect the current branch to vary a bit (as users
will be using branches to develop and test the code). What we actually
care about is the base branch ('main', 'squid', etc) as that is fed into
our bootstrap script and we want the option to simple variations based
on the name of said base branch.
Rename CEPH_BRANCH to CEPH_BASE_BRANCH for clarity.
Add a new --current-branch argument that lets the user supply a name for
the current branch. This allows the automatic tag generation to avoid
calling git - something useful if the tree is not using a git checkout
(like a tarball). It also allows you to pull a temporary branch in git
but ignore it and act like the temporary branch is the base branch.
John Mulligan [Tue, 11 Feb 2025 23:36:13 +0000 (18:36 -0500)]
script/build-with-container: add more distro aliases
Add a system to define distro name aliases and use that to define some
additional aliases, primarily to match ubuntu codenames rather than
version numbers. Requested by Zack.
Ilya Dryomov [Mon, 3 Mar 2025 16:59:35 +0000 (17:59 +0100)]
test/pybind/rbd: fix read offset in write zeroes tests
Random data is written and write zeroes is invoked on 0~256, but the
read is done on 256~256. This means that if write zeroes malfunctions
the test wouldn't catch it (especially in the thick provision case).
VinayBhaskar-V [Tue, 26 Nov 2024 11:18:51 +0000 (16:48 +0530)]
librbd: add rbd_diff_iterate3() API to take source snapshot by ID
Allow a diff to start from a non-user snapshot. This would be used by
"rbd du" command to account for non-user snapshots which are currently
just skipped potentially resulting in underreported space usage and in
other places.
Ilya Dryomov [Sun, 2 Mar 2025 08:24:52 +0000 (09:24 +0100)]
librbd: fix a deadlock on image_lock caused by Mirror::image_disable()
With Mirror::image_disable() taking image_lock for write and calling
list_children() under it, the following deadlock is possible:
1. Mirror::image_disable() takes image_lock for write and calls
list_children()
2. AbstractWriteLog::periodic_stats() timer fires (it runs every
5 seconds) and ImageCacheState::write_image_cache_state() is called
under a global timer_lock
3. ImageCacheState::write_image_cache_state() successfully takes
owner_lock and blocks attempting to take image_lock for read because
it's already held for write by Mirror::image_disable()
4. list_children() blocks inside of a call to ImageState::close() on
a descendant image
5. The descendant image close can't proceed because TokenBucketThrottle
requires a global timer_lock to complete QosImageDispatch shutdown
6. safe_timer thread which is holding timer_lock can't proceed because
ImageCacheState::write_image_cache_state() is effectively blocked on
the descendant image close through Mirror::image_disable()
Until commit 281a64acf920 ("librbd: remove snapshot mirror image-meta
when disabling"), Mirror::image_disable() was taking image_lock only for
read meaning that this deadlock wasn't possible. The only other change
that commit 281a64acf920 made to the code block protected by image_lock
was using child_mirror_image_internal for cls_client::mirror_image_get()
call on descendant images instead of mirror_image_internal to preserve
the value of mirror_image_internal for later. Both are local variables
that have nothing to do with image_lock, so I'm going back and making
Mirror::image_disable() take image_lock only for read again.
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/block/nvmeof-namespaces-form/nvmeof-namespaces-form.component.ts
src/pybind/mgr/dashboard/frontend/src/app/shared/api/nvmeof.service.spec.ts
src/pybind/mgr/dashboard/frontend/src/app/shared/api/nvmeof.service.ts
- restore it to the request type where groups is not present
workunit/dencoder: fix corpus test for backword and forward compability
- changed the check for non-deterministic, return code 1 is also legit
- unneeded check for is_dir, if it exist
- limit the number of threads to prevent error
Fixes: https://tracker.ceph.com/issues/67263 Signed-off-by: NitzanMordhai <nmordech@redhat.com>
(cherry picked from commit 30921272ddee5e7c8aaf4bdb8d69645ce92ba379)
Dan Mick [Thu, 27 Feb 2025 00:16:26 +0000 (16:16 -0800)]
container/build.sh: remove local container images
Optionally, for those that want to run build.sh locally and
use the images. The default is to remove, for Jenkins builders,
which will build, push, and rmi.
Ilya Dryomov [Thu, 20 Feb 2025 15:38:41 +0000 (16:38 +0100)]
qa/workunits/rbd: add a test for force promote with a user snapshot
Add a reproducer for the crash on a bad variant access which was fixed
in commit 7d75161051da ("librbd: fix a crash in get_rollback_snap_id").
The reproducer deliberately works around many other issues with force
promote in snapshot-based mirroring: stopping rbd-mirror daemon
shouldn't be necessary (let alone with SIGKILL), get_rollback_snap_id()
and its caller can_create_primary_snapshot() are flawed and can pick
the wrong snapshot to roll back to or skip rollback when it's actually
required, the user snapshot in this scenario should be removed as part
of force promoting because it's incomplete and won't be usable after
the image is promoted, etc.
Zac Dover [Mon, 3 Feb 2025 13:37:34 +0000 (23:37 +1000)]
doc/rados: improve pg_num/pgp_num info
Improve the guidance around setting pg_num, and clear up confusion
around whether pgp_num should be set manually or, indeed, if it even can
be set manually.
This PR was raised in response to Mark Schouten's email here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/CBDJTLTTIEZVG7GVZBX37UAWGYNSSMPD/
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit c43e7337212fe38e8db63d00345fa9858b3cb10a)
N Balachandran [Sat, 15 Feb 2025 13:26:31 +0000 (18:56 +0530)]
rbd-mirror: fix possible recursive lock of ImageReplayer::m_lock
If periodic status update (LambdaContext which is queued from
handle_update_mirror_image_replay_status()) races with shutdown and
ends up being the last in-flight operation that shutdown was pending
on, we attempt to recursively acquire m_lock in shut_down() because
m_in_flight_op_tracker.finish_op() is called with m_lock (and also
m_threads->timer_lock) held. These locks are needed only for the call
to schedule_update_mirror_image_replay_status() and should be unlocked
immediately.
Fixes: https://tracker.ceph.com/issues/69978 Co-authored-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: N Balachandran <nithya.balachandran@ibm.com>
(cherry picked from commit c60514087bc29540d3babd7855c5a4e28f2bf1b0)
Zac Dover [Tue, 25 Feb 2025 04:57:11 +0000 (14:57 +1000)]
doc/releases: correct squid release order
Put the releases of Squid in descending order. This change alters the
order of the Squid releases so that it is the same as the order of the
other Ceph releases.
ceph-volume: migrate unit tests from 'mock' to 'unittest.mock'
unit tests in ceph-volume was still using the external 'mock' library
for unit tests, which is unnecessary since 'unittest.mock' is part
of the Python standard library (available since Python 3.3).
This commit updates all imports to use 'unittest.mock' instead,
ensuring better maintainability and removing the need for an extra
dependency.
This refactors `get_physical_osds()`.
The calculation of `data_slots` is now more concise. The handling of
`dev_size`, `rel_data_size`, and `abs_size` is standardized.
The initialization of `free_size` is moved outside the loop
for clarity. Redundant checks and assignments are removed to simplify
the code.
ceph-volume: support splitting db even on collocated scenario
This change enables ceph-volume to create OSDs where the DB is
explicitly placed on a separate LVM partition, even in collocated
scenarios (i.e., block and DB on the same device).
This helps mitigate BlueStore fragmentation issues.
Given that ceph-volume can't automatically predict a proper default size for the db device,
the idea is to use the `--block-db-size` parameter:
Passing `--block-db-size` and `--db-devices` makes ceph-volume create db devices
on dedicated devices (current implementation):
```
Total OSDs: 2
Type Path LV Size % of device
----------------------------------------------------------------------------------------------------
data /dev/vdb 200.00 GB 100.00%
block_db /dev/vdd 4.00 GB 2.00%
----------------------------------------------------------------------------------------------------
data /dev/vdc 200.00 GB 100.00%
block_db /dev/vdd 4.00 GB 2.00%
```
Passing `--block-db-size` without `--db-devices` makes ceph-volume create a separate
LV for db device on the same device (new behavior):
```
Total OSDs: 2
Type Path LV Size % of device
----------------------------------------------------------------------------------------------------
data /dev/vdb 196.00 GB 98.00%
block_db /dev/vdb 4.00 GB 2.00%
----------------------------------------------------------------------------------------------------
data /dev/vdc 196.00 GB 98.00%
block_db /dev/vdc 4.00 GB 2.00%
```
This new behavior is supported with the `--osds-per-device` parameter:
```
Total OSDs: 4
Type Path LV Size % of device
----------------------------------------------------------------------------------------------------
data /dev/vdb 96.00 GB 48.00%
block_db /dev/vdb 4.00 GB 2.00%
----------------------------------------------------------------------------------------------------
data /dev/vdb 96.00 GB 48.00%
block_db /dev/vdb 4.00 GB 2.00%
----------------------------------------------------------------------------------------------------
data /dev/vdc 96.00 GB 48.00%
block_db /dev/vdc 4.00 GB 2.00%
----------------------------------------------------------------------------------------------------
data /dev/vdc 96.00 GB 48.00%
block_db /dev/vdc 4.00 GB 2.00%
```