Jos Collin [Fri, 26 Sep 2025 15:03:02 +0000 (20:33 +0530)]
Merge PR #63069 into wip-jcollin-testing-20250926.150239-reef
* refs/pull/63069/head:
qa/cephfs: test that user created pool is not deleted by...
mgr/vol: don't delete user-created pool in "volume create" command
PendingReleaseNote: add note that "volume create" accepts pool names...
doc/cephfs: mention new options for "fs volume create" cmd
qa/cephfs: test passing pool names to "fs volume create" cmd
qa/cephfs: separate the tests for "ceph fs volume create" cmd
mgr/vol: allow passing pool names to "fs volume create" cmd
Jos Collin [Fri, 26 Sep 2025 15:02:58 +0000 (20:32 +0530)]
Merge PR #63224 into wip-jcollin-testing-20250926.150239-reef
* refs/pull/63224/head:
qa/cephfs: use "snapshot getpath" cmd instead of constructing...
qa/cephfs: add tests for "snapshot getpath" cmd against v1 and...
qa/cephfs: add a helper method to construct the snapshot path
qa/cephfs: move tests for "snapshot getpath" cmd to a separate class
Rishabh Dave [Fri, 7 Feb 2025 11:41:53 +0000 (17:11 +0530)]
mgr/vol: print proper message when subvolume metadata filename is too...
long.
When combination of subvolume group name and subvolume name is longer
than 248 characters, it leads to a failure because the metadata file is
a combination of both of these along with ":", "_" and ".meta".
Currently, when this comibnation longer than 248 characters, a bunch
stacktraces are printed along with multiple errors. This confuses the
user as well as looks bad.
Fixes: https://tracker.ceph.com/issues/69865 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 82fc1e7cac3d9157d9aaf95c45ded04c1b278183)
Rishabh Dave [Wed, 16 Apr 2025 08:24:27 +0000 (13:54 +0530)]
mgr/vol: don't delete user-created pool in "volume create" command
If one of the pool names passed to "ceph fs volume create" command
(through --data-pool and --meta-pool name) is absent, don't delete the
pool that is present and passed to this command during the cleanup code
of this command.
IOW, "volume create" command should continue deleting pool created by it
but not delete pool created by the user.
Fixes: https://tracker.ceph.com/issues/70945 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 4299f660ba9c83ef9305b2834c195da9008810a9)
Rishabh Dave [Mon, 3 Mar 2025 16:36:10 +0000 (22:06 +0530)]
doc/cephfs: mention new options for "fs volume create" cmd
Command "ceph fs volume create" accepts 2 new options to allow users to
pass data and metadata pool name. Update docs to include mention of both
the options.
Rishabh Dave [Fri, 11 Oct 2024 19:03:29 +0000 (00:33 +0530)]
qa/cephfs: extend wait for trash empty
Trash directory for a volume is not created by default. If
_wait_for_trash_empty() in test_volumes.py encounters absence of trash
directory, return true.
Rishabh Dave [Sat, 6 Jan 2024 14:42:31 +0000 (20:12 +0530)]
qa/cephfs: add tests for config option pause_purging
Setting MGR config option mgr/volumes/pause_purging to true halts
all ongoing purges and allows no new purging to begin until this option
is changed to false. Add tests for this.
Conflicts:
qa/tasks/cephfs/test_volumes.py
- First conflict occurred due to missing import of safe_while which
in Reef branch compared to main branch. Along with resolving this
conflict this has been imported as it used by the tests.
- Second conflict occured due to absence of some test methods right
before where TestPausePurging was to be added.
- Third conflict occured because entire contextutil was imported instead
of just safe_while and only CommandFailedError was imported from
teuthology.exceptions while this commit imports MaxWhileTries too.
Rishabh Dave [Fri, 12 Jan 2024 10:28:41 +0000 (15:58 +0530)]
qa/cephfs: don't strip any whitespace for get_shell_stdout
Whitespace is not removed from the end of the stdout returned by the
method get_ceph_cmd_stdout(). Follow the same policy here since it is
better to not do so (this whitespace can be useful, when copying Ceph
auth keyrings from stdout to a file) and also for sake of uniformity of
interfaces.
Conflicts:
qa/tasks/cephfs/mount.py
- Conflict occured for 2 reasons -
- One, method get_shell_stdout() is absent on Reef branch but not in
main so this patch which makes modification to it will obviously run
in to conflict
- Two, run_shell_payload() lies right next to get_shell_stdout() in
main branch and its definition is quite different, leading to
conflict again.
Rishabh Dave [Tue, 3 Sep 2024 10:01:07 +0000 (15:31 +0530)]
mgr/vol: add pause/resume mechanism for async jobs
Add mechansim that allows pausing/resuming of the entire async job
machinery that queues, launches and picks next async job; both async
jobs, clones as well as purges.
And then add mgr/vol config option pause_purging and pause_cloning so
that both of these async jobs can be paused and resumed individually.
Fixes: https://tracker.ceph.com/issues/61903 Fixes: https://tracker.ceph.com/issues/68630 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 01d37d5e1ba0e250e9d3a5f28ec7f3fa3597c63f)
Conflicts:
src/pybind/mgr/volumes/module.py
- Code where patch was to be applied was slighty different
Adam King [Mon, 22 Sep 2025 21:05:07 +0000 (17:05 -0400)]
pybind/mgr: pin cheroot version in requirements-required.txt
With python 3.10 (didn't seem to happen with python 3.12) the
pybind/mgr/cephadm/tests/test_node_proxy.py test times out.
This appears to be related to a new release of the cheroot
package and a github issues describing the same problem
we're seeing has been opened by another user
https://github.com/cherrypy/cheroot/issues/769
It is worth noting that the workaround described in that
issue does also work for us. If you add
John Mulligan [Fri, 12 Sep 2025 17:52:25 +0000 (13:52 -0400)]
build-with-container: add argument groups to organize options
Use the argparse add_argument_group feature to organize the mass of
arguments into more sensible categories. Hopefully, someone reading
over the `--help` output can now more easily see options that
are useful rather than being overwhelmed by a wall of text.
mgr/dashboard: fix zone update API forcing STANDARD storage class
The zone update REST API (`edit_zone`) always attempted to configure a
placement target for the `STANDARD` storage class, even when the request
was intended for a different storage class name.
This caused failures in deployments where `STANDARD` is not defined.
Changes:
Club add placement target and add storage class methods into one single
add_placement_targets_storage_class_zone method which takes the storage
class as a param as well alongside the rest of the placement params.
Rishabh Dave [Tue, 18 Feb 2025 12:30:03 +0000 (18:00 +0530)]
qa/cephfs: ignore warning that pg is stuck peering for upgrade jobs
Health warning "pg .* is stuck peering" is seen while Ceph cluster is
under the upgrade process during fs/upgrade QA job. Being an expected
warning, it should be added to the ignorelist.
And besides this one, we already ignore more severe warnings ("pg is
stuck inactive" and "pg is degrarded") for fs/upgrade jobs.
Fixes: https://tracker.ceph.com/issues/70023 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 9748de76e02254c6dc284dcc20ec5d5761760dcb)
cephfs-journal-tool:: Don't reset the journal trim position
If the fs had to go through journal recovery and reset,
the cephfs-journal-tool resets the journal trim position
because of which the old unused journal objects just stay
forever in the metadata pool. The patch fixes the issue.
Now, the old stale journal objects are trimmed during the
regular trimming cycle helping to recover space in the
metadata pool.
Validates that the cephfs-journal-tool reset
doesn't reset the trim position so that the
journal trim takes care of trimming the older
unused journal objects helping to recover the
space in metadata pool.
Laura Flores [Tue, 3 Dec 2024 22:15:19 +0000 (16:15 -0600)]
qa/workunits/mon: ensure election strategy is "connectivity" for stretch mode
The election strategy is randomly chosen for this type of test. Sometimes,
the test passes if the "connectivity" election strategy happens to be picked.
But if a different strategy, i.e. "classic", is picked, then the test will fail.
We can ensure that the election strategy is "connectivity" by setting it in the
workunit with the ceph CLI command. Although connectivity was specified in
stretch-mode-5-mons-8-osds.yaml, that config ultimately gets overridden by
the "qa/mon_config" yaml.
Problem:
Current dump for "removed_ranks" and "disallowed_leaders"
doesn't have the correct format so the python test
script can parse through these values.
Solution:
Modified the values such that it is in the correct format
Conflicts:
src/mon/MonmapMonitor.cc - replace `goto reply` with
`goto reply_no_propose`
src/mon/OSDMonitorcc - replace `rule_valid_for_pool_type`
with `get_rule_type` since
`rule_valid_for_pool_type` is not
backported.
Update the "Disconnected+Remounted FS" section in
doc/cephfs/troubleshooting.rst, as suggested by Venky Shankar in https://github.com/ceph/ceph/pull/65129/files#r2312903062
mgr/dashboard: show non default realm sync status in rgw overview page
Currently, we just show the sync status of the default realm in rgw
overview page. This PR is to show the sync status of non-default realms
as well. Multisite sync status can be viewed for any of the active daemon
which runs in default/non-default realm.
Dan Mick [Tue, 26 Aug 2025 00:45:21 +0000 (17:45 -0700)]
Remove git clean -fdx
either
1) a source tarball is supplied, in which case the local dir is
irrelevant, or
2) make-debs calls make-dist, which doesn't care about a dirty cwd
so it just punishes the unaware by removing things that they may
have wanted to keep.
Dan Mick [Sat, 23 Aug 2025 00:43:24 +0000 (17:43 -0700)]
make-debs.sh: invoke tar with --no-same-owner
When running as a normal user, tar does not attempt to preserve
owners set on the tar content files. When running as root, it does.
Containerized builds are running as root. Stop make-debs.sh from
trying to set other owners for files, and leaving files in the
host system with mapped UIDs other than the user running the container
(which causes jenkins to be unable to clear the workspace).
Dan Mick [Thu, 21 Aug 2025 20:00:43 +0000 (13:00 -0700)]
make-debs.sh: make "skip debug packages" conditional
Now that we're using make-debs.sh as a builder inside containers,
the default should be to build all the packages, including debug.
(Also, fix a typo.)
Niklas Hambüchen [Sat, 21 Jun 2025 17:46:13 +0000 (19:46 +0200)]
doc/rados/configuration: Mention show-with-defaults and ceph-conf
A small improvement based on
"Why is it still so difficult to just dump all config and where it comes from?"
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/EZSLRYBYEWDA6YIARQVMUKQUWHAE3PGR/
`show-with-defaults` is very useful, and `ceph-conf` is mentioned
so that it's clear that it's legacy, and the user doesn't have to
wonder if it's actually useful but was forgotten in the list.
Zac Dover [Fri, 22 Aug 2025 08:39:29 +0000 (18:39 +1000)]
doc/cephfs: edit troubleshooting.rst (Slow MDS)
Move the "Slow requests (MDS)" section immediately after the first
section in this document ("Slow/Stuck Operations"), because the first
procedure on the page directs the reader to undertake the operation in
"Slow requests (MDS)" before trying anything else.
Improve source rpm detection by adding a new detection method that
executes and rpm command in a container to get exactly the version of
the source rpm that the ceph.spec file would have generated. For
backwards compatibility and that I don't entirely trust myself to have
tested this the old methods are still available.
The old `--rpm-no-match-sha` is now an alias for `--srpm-match=any` to
cause it to build any (unique) ceph srpm it finds.
`--srpm-match=versionglob` retains the previous default behavior of
using a glob matching on the git id or ceph version value. The new
default of `--srpm-match=auto` implements the rpm command based behavior
described above.
All of this is wrapped in a new step `find-rpm` but that's mostly an
implementation detail and for testing.
Dan Mick [Wed, 13 Aug 2025 19:16:45 +0000 (12:16 -0700)]
pybind/mgr/dashboard/frontend: add NPM_CACHEDIR envvar, use in bwc
Add an optional NPM_CACHEDIR environment variable to serve as the
cache parameter for npm in the dashboard frontend build. The idea
is to allow it to persist across builds so that we decrease the load
on registry.npmjs.org, which has been throttling our requests when
using build-with-container.py, and also hopefully improve the time
of the frontend npm operations.
build-with-container.py also grows a --npm-cache-path option to allow
setting it for container builds and passing the envvar to the build.
John Mulligan [Wed, 21 May 2025 21:46:40 +0000 (17:46 -0400)]
dashboard: fix the workaround for unpacking node sources
My previous workaround in the dashboard for the unpacking of non-root
own tarball as the fake root of a container did not work because of the
strange quoting/escaping behavior of cmake (it tried to run `id -u` as a
single command, not a command and an argument).
Use single quoted string and old school backticks to work around this issue.
Fixes: 24dbfb5da4813c6588f9cd199b9f527bb67f1e88 Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 3a36180a373d91adcf9726660204f0cc1dcecba3)
John Mulligan [Fri, 2 May 2025 15:17:53 +0000 (11:17 -0400)]
dashboard: ensure nodeenv downloaded content is owned by current user
When testing ceph builds in a container we discovered that certain files
could not be deleted by jenkins after a build. This was due to the way
the container maps IDs - files owned by the root user in the container
become owned by the "real" user/jenkins user on the "host".
However, the node tarball that is fetched and unpacked by nodeenv has
a different owner name/uid that is preserved in the tree and this id
gets mapped to something that can be managed by the "fake root" of the
container but not by the "regular" user outside the container.
The simplest workaround I can think of is to chown the tree back
to the current user and avoid leaving files on disk with uncleanly
mapped uids.
John Mulligan [Fri, 20 Jun 2025 23:34:45 +0000 (19:34 -0400)]
Dockerfile.build: make WITH_CRIMSON a build arg
We've chosen to enable crimson by default to match the CI, but that
is not always something a developer may want, so make WITH_CRIMSON
a build argument that can be toggled off if necessary.
John Mulligan [Thu, 29 May 2025 17:41:45 +0000 (13:41 -0400)]
mgr/dashboard: add a cobertura xml file workaround variable
Add an environment variable REWRITE_COVERAGE_ROOTDIR that
changes the "hardcoded" path in the cobertura-coverage.xml file.
This can be used to map the paths used in a container build to
the paths known to a jenkins job (or whatever else you want to
do with the file).