John Mulligan [Fri, 14 Feb 2025 19:51:03 +0000 (14:51 -0500)]
doc: document the new container build tool and link to it in README
Add a new markdown file in the root of the tree, ContainerBuild.md, that
can serve as a basic introduction to the new container build tools
recently merged to ceph.
Add a small 'breadcrumb' section to the project README.md to help find
this new document.
John Mulligan [Thu, 20 Feb 2025 00:17:30 +0000 (19:17 -0500)]
script/build-with-container: add support for overlay dir
The source dir (aka homedir, default /ceph) is mounted in the container
read-write. This is needed as the various ceph build scripts expect to
write things into the tree - often this is in the build directory - but
not always. This can lead to small messes and/or situations that are
confusing to debug, especially if one is jumping between distros often.
Add an option to use an overlay volume for the homedir - by default we
enable a persistent overlay with a supplied "upper dir" where files that
were written will appear. One can also enable a temporary overlay that
forgets the writes when the container exits - maybe useful when doing
experiments in 'interactive' mode.
To use this option run the command with the `--overlay=<dir>` option.
For example: `./src/script/build-with-container.py -b build.inner
--overlay-dir build.ovr`. This will create a directory
`build.ovr/content` automatically and all new files will appear there.
For example the build directory will appear at
`build.ovr/content/build.inner`.
To use the temporary overlay use a `-` as the directory name. For
example: `./src/script/build-with-container.py -b build.inner
--overlay-dir -`
John Mulligan [Thu, 20 Feb 2025 14:50:49 +0000 (09:50 -0500)]
script/build-with-container: skip dnf cache dir volume mounts on docker
When using docker the --volume option is not available during build
(docker [buildx] build), unlike podman. Since passing these volumes must
be conditional on them being set up I see no way to handle this short of
just disabling the option on docker. Log the fact that it's being
skipped - the only other issue is that we pointlessly set up some dirs
and the build may be a bit slower.
John Mulligan [Wed, 19 Feb 2025 18:20:36 +0000 (13:20 -0500)]
script/build-with-container: remove default --volume arg from ctr build
On the original github pr #59841 user fayak kindly informed us that the
--volume option was not supported by docker build. Since this section
was a leftover from a previous way of constructing the builder image and
was no longer needed we simply removed it.
John Mulligan [Wed, 19 Feb 2025 18:20:01 +0000 (13:20 -0500)]
script/build-with-container.py: build builder image with --pull=always
Construct the builder image using the --pull=always flag to initiate a
pull of the base image (centos, ubuntu, etc) in order to avoid using a
stale base image. Since the script automatically (by default) avoids
building if a matching tag is in local container storage it is handy to
use a fresh base when it *is* time to build something. Otherwise, you
end up in a situation like I sometimes do - using a months old base
unintentionally.
John Mulligan [Fri, 14 Feb 2025 19:50:42 +0000 (14:50 -0500)]
script/build-with-container: add a common packages target
Add a `packages` target to build-with-container.py that requests a build
of packages, whatever package type is native to the distro selected.
For example `./src/script/build-with-container.py -d ubuntu22.04 -e
packages` will automatically select a deb packages build where
`./src/script/build-with-container.py -d centos9 -e packages` will
trigger rpm packages to be built. The underlying package-type specific
targets remain unchanged.
John Mulligan [Fri, 14 Feb 2025 16:44:35 +0000 (11:44 -0500)]
script/build-with-container: support custom tag suffixes
Previously, one could use the `--tag` option to completely override the
container tag generated by the script. However, there are cases where
one may want to add information to the tag rather than override it.
Allow the tag value to start with a plus (+) character that indicates
that the remainder of the string is to be suffixed to the generated tag.
Add a command line option --base-branch that allows the user to supply a
custom base branch name. git doesn't make determining this easy so we
always assume a base branch of 'main' by default - but this option lets
one change that.
John Mulligan [Fri, 14 Feb 2025 16:24:29 +0000 (11:24 -0500)]
src/script: rename CEPH_BRANCH to CEPH_BASE_BRANCH for build container
Previously, we were passing build argument of CEPH_BRANCH, but that was
a bit misleading as we expect the current branch to vary a bit (as users
will be using branches to develop and test the code). What we actually
care about is the base branch ('main', 'squid', etc) as that is fed into
our bootstrap script and we want the option to simple variations based
on the name of said base branch.
Rename CEPH_BRANCH to CEPH_BASE_BRANCH for clarity.
Add a new --current-branch argument that lets the user supply a name for
the current branch. This allows the automatic tag generation to avoid
calling git - something useful if the tree is not using a git checkout
(like a tarball). It also allows you to pull a temporary branch in git
but ignore it and act like the temporary branch is the base branch.
John Mulligan [Tue, 11 Feb 2025 23:36:13 +0000 (18:36 -0500)]
script/build-with-container: add more distro aliases
Add a system to define distro name aliases and use that to define some
additional aliases, primarily to match ubuntu codenames rather than
version numbers. Requested by Zack.
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/block/nvmeof-namespaces-form/nvmeof-namespaces-form.component.ts
src/pybind/mgr/dashboard/frontend/src/app/shared/api/nvmeof.service.spec.ts
src/pybind/mgr/dashboard/frontend/src/app/shared/api/nvmeof.service.ts
- restore it to the request type where groups is not present
workunit/dencoder: fix corpus test for backword and forward compability
- changed the check for non-deterministic, return code 1 is also legit
- unneeded check for is_dir, if it exist
- limit the number of threads to prevent error
Fixes: https://tracker.ceph.com/issues/67263 Signed-off-by: NitzanMordhai <nmordech@redhat.com>
(cherry picked from commit 30921272ddee5e7c8aaf4bdb8d69645ce92ba379)
Zac Dover [Mon, 3 Feb 2025 13:37:34 +0000 (23:37 +1000)]
doc/rados: improve pg_num/pgp_num info
Improve the guidance around setting pg_num, and clear up confusion
around whether pgp_num should be set manually or, indeed, if it even can
be set manually.
This PR was raised in response to Mark Schouten's email here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/CBDJTLTTIEZVG7GVZBX37UAWGYNSSMPD/
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit c43e7337212fe38e8db63d00345fa9858b3cb10a)
Zac Dover [Tue, 25 Feb 2025 04:57:11 +0000 (14:57 +1000)]
doc/releases: correct squid release order
Put the releases of Squid in descending order. This change alters the
order of the Squid releases so that it is the same as the order of the
other Ceph releases.
ceph-volume: migrate unit tests from 'mock' to 'unittest.mock'
unit tests in ceph-volume was still using the external 'mock' library
for unit tests, which is unnecessary since 'unittest.mock' is part
of the Python standard library (available since Python 3.3).
This commit updates all imports to use 'unittest.mock' instead,
ensuring better maintainability and removing the need for an extra
dependency.
This refactors `get_physical_osds()`.
The calculation of `data_slots` is now more concise. The handling of
`dev_size`, `rel_data_size`, and `abs_size` is standardized.
The initialization of `free_size` is moved outside the loop
for clarity. Redundant checks and assignments are removed to simplify
the code.
ceph-volume: support splitting db even on collocated scenario
This change enables ceph-volume to create OSDs where the DB is
explicitly placed on a separate LVM partition, even in collocated
scenarios (i.e., block and DB on the same device).
This helps mitigate BlueStore fragmentation issues.
Given that ceph-volume can't automatically predict a proper default size for the db device,
the idea is to use the `--block-db-size` parameter:
Passing `--block-db-size` and `--db-devices` makes ceph-volume create db devices
on dedicated devices (current implementation):
```
Total OSDs: 2
Type Path LV Size % of device
----------------------------------------------------------------------------------------------------
data /dev/vdb 200.00 GB 100.00%
block_db /dev/vdd 4.00 GB 2.00%
----------------------------------------------------------------------------------------------------
data /dev/vdc 200.00 GB 100.00%
block_db /dev/vdd 4.00 GB 2.00%
```
Passing `--block-db-size` without `--db-devices` makes ceph-volume create a separate
LV for db device on the same device (new behavior):
```
Total OSDs: 2
Type Path LV Size % of device
----------------------------------------------------------------------------------------------------
data /dev/vdb 196.00 GB 98.00%
block_db /dev/vdb 4.00 GB 2.00%
----------------------------------------------------------------------------------------------------
data /dev/vdc 196.00 GB 98.00%
block_db /dev/vdc 4.00 GB 2.00%
```
This new behavior is supported with the `--osds-per-device` parameter:
```
Total OSDs: 4
Type Path LV Size % of device
----------------------------------------------------------------------------------------------------
data /dev/vdb 96.00 GB 48.00%
block_db /dev/vdb 4.00 GB 2.00%
----------------------------------------------------------------------------------------------------
data /dev/vdb 96.00 GB 48.00%
block_db /dev/vdb 4.00 GB 2.00%
----------------------------------------------------------------------------------------------------
data /dev/vdc 96.00 GB 48.00%
block_db /dev/vdc 4.00 GB 2.00%
----------------------------------------------------------------------------------------------------
data /dev/vdc 96.00 GB 48.00%
block_db /dev/vdc 4.00 GB 2.00%
```
This adds Python type annotations to `ceph_volume.util.device`,
along with all necessary adjustments to ensure compatibility
and maintain code clarity.
ceph-volume: set default value for BlueStore.block_lv to None
This change updates the `BlueStore` class in
`ceph_volume.objectstore` by initializing the `block_lv` attribute
to `None` with the type `Optional[Volume]`. This ensures that the
attribute has a default value and avoids potential runtime errors
when the attribute is accessed before being explicitly assigned.
ceph-volume: improve wipefs retry logic in lvm.zap
- Simplify the initialization of `tries` and `interval` variables for clarity.
- Adjust the retry logic in the `wipefs` function to:
- Include the attempt count in the warning message for better debugging.
- Start the retry loop at 1 and increment up to `tries`.
- Remove unnecessary unpacking of `stdout` and `stderr` since they were unused.
- Update the loop to increment `tries` by 1 to reflect the intended number of attempts.
This change improves code readability and makes retry behavior more transparent.
Ilya Dryomov [Tue, 18 Feb 2025 16:51:47 +0000 (17:51 +0100)]
test/rbd_mirror: clear Namespace::s_instance at the end of a test
TestMockPoolReplayer.Namespaces and NamespacesError tests leave behind
a dangling pointer to a stack-allocated MockNamespace which leads to an
easily reproducible use-after-free and segfault when tests are shuffled.
Ilya Dryomov [Mon, 17 Feb 2025 11:41:51 +0000 (12:41 +0100)]
test/rbd_mirror: flush watch/notify callbacks in TestImageReplayer
TestImageReplayer establishes its own (i.e. outside of the SUT code)
watch on the header of the remote image to be able to synchronize the
execution of the test with certain notifications. This watch is
established before the remote image is opened and is teared down until
after the remote image is closed but while the image replayer is still
running. The flush that is part of image close sequence thus isn't
guaranteed to cover all callbacks, especially for snapshot-based
mirroring where UnlinkPeerRequest spawned from Replayer::unlink_peer()
generates a notification on the remote image for each completed unlink.
Since TestImageReplayer further immediately deletes C_WatchCtx, pretty
much any test can segfault when C_WatchCtx::handle_notify() is invoked
by TestWatchNotify infrastructure. Because it's a virtual method, the
segfault often involves a completely bogus instruction pointer:
Kefu Chai [Sun, 24 Mar 2024 10:41:40 +0000 (18:41 +0800)]
squid: crush: use std::vector instead of variable length arrays
despite that variable length arrays (VLA for short) has been around
for a long time, it is an extension supported by GCC and Clang, and is
not a part of C++ standard, its implementation allocates the dynamically
sized array on stack, hence is a source of potential stack overflow.
when compiling with Clang, it complains. so in this change, we switch
to std::vector<>, which is defined by the C++ standard, and it allocates
the storage on heap, so it is immune to the possible stack overflow problem.
```
/home/kefu/dev/ceph/src/crush/CrushWrapper.h:1613:16: warning: variable length arrays in C++ are a Clang extension [-Wvla-cxx-extension]
1613 | int rawout[maxout];
| ^~~~~~
/home/kefu/dev/ceph/src/crush/CrushWrapper.h:1613:16: note: function parameter 'maxout' with unknown value cannot be used in a constant expression
/home/kefu/dev/ceph/src/crush/CrushWrapper.h:1610:60: note: declared here
1610 | void do_rule(int rule, int x, std::vector<int>& out, int maxout,
| ^
/home/kefu/dev/ceph/src/crush/CrushWrapper.h:1614:15: warning: variable length arrays in C++ are a Clang extension [-Wvla-cxx-extension]
1614 | char work[crush_work_size(crush, maxout)];
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/kefu/dev/ceph/src/crush/CrushWrapper.h:1614:31: note: implicit use of 'this' pointer is only allowed within the evaluation of a call to a 'constexpr' member function
1614 | char work[crush_work_size(crush, maxout)];
| ^
2 warnings generated.
```
Naman Munet [Wed, 22 Jan 2025 10:59:20 +0000 (16:29 +0530)]
mgr/dashboard: Add confirmation textbox for resource name on delete action
Before:
=====
User was able to delete a single or multiple critical resources like ( images, snapshots, subvolumes, subvolume-groups, pools, hosts , OSDs, buckets, file system, services ) by just clicking on a checkbox.
After:
=====
User now has to type the resource name that they are deleting in the textbox on the delete modal, and then only they will be able to delete the critical resource.
Also from now onwards multiple selection for deletions of critical resources is not possible. Hence, user can delete only single resource at a time. On the other side, non-critical resources can be deleted in one go.
Adam Kupczyk [Fri, 20 Dec 2024 20:21:14 +0000 (20:21 +0000)]
os/bluestore: Add health warning for bluestore fragmentation
Changed "bluestore/fragmentation_micros" from quick imprecise to
slow but more representative score.
Introduced config "bluestore_warn_on_free_fragmentation" that controls
when free space fragmentation score becomes a health warning.
Currently calculation of fragmentation score might be non-instant for
severly fragmented disks. It might induce stalls to write IO.
Config value "bluestore_fragmentation_check_period" control score
calculation period.
In future, costly score calculation will be replaced with method that
continously updates score.
Improve the grammar and correct the formatting of the "Upgrading root ca
certificates" procedure that was added to the documentation in https://github.com/ceph/ceph/pull/61867
Matan Breizman [Tue, 18 Feb 2025 10:40:24 +0000 (12:40 +0200)]
script/lib-build: Use clang 14
Updating to newer clang requires multiple fixes.
Don't use newer clang than 14. If needed, we could backport
the fixes from [1] and then use newer releases.
Laura Flores [Mon, 17 Feb 2025 22:53:56 +0000 (22:53 +0000)]
qa/tasks: improve ignorelist for thrashing OSDs
This yaml file is used in rados/thrash-old-clients.
In this commit, I added some pattern matching for
warnings that show up in the cluster log detail that
are related to degraded PGs. In these tests, we are intentionally
marking down or killing OSDs, which leads to these states
showing up in the cluster log. So, the warnings are intended
and can be ignored in the context of OSD thrashing.
This directly aligns the squid ignorelist with main's, rather than
just a direct backport of
https://github.com/ceph/ceph/pull/61723/commits/ec50fc720c4fc0bdd3165d0014f19c36e9819012.
Fixes: https://tracker.ceph.com/issues/69962 Signed-off-by: Laura Flores <lflores@ibm.com>
Ronen Friedman [Thu, 30 Jan 2025 09:27:58 +0000 (03:27 -0600)]
osd/scrub: discard repair_oinfo_oid()
repair_oinfo_oid(), called every scrub, has a very specific
functionality: fix the object ID specified in the Object Info
attribute, if different from the ID of the owning object.
This fix was added in 2017, as a response to a unique failure
scenario that was observed in Sepia - probably following a
filesystem bug. See https://tracker.ceph.com/issues/18409 &
https://tracker.ceph.com/issues/20471.
The limited functionality of repair_oinfo_oid() -
only repairing this one specific issue, and only if the OI_ATTR
exists and is decodable - does not justify the overhead of
running it every scrub.