Kefu Chai [Sat, 1 Jun 2024 00:39:03 +0000 (08:39 +0800)]
debian: package mgr/rgw in ceph-mgr-modules-core
in 110db72e, we added the rgw mgr module to ceph-mgr-modules-core
rpm package. but we didn't add this module to the corresponding
debian package.
rgw mgr module provides a simple interface to deploy RGW multisite
setup. so it would be nice to have it in ceph's debian packages as
well.
despite that rgw is not part of the core features, since this module
is already in ceph-mgr-modules-core rpm package, and it is relatively
small and does not pulling extra dependencies, let's added to the
debian packge with the same name. we can revisit this decision and
extract it out in a following up change if it is necessary in future.
Disable OSD bench from benchmarking the OSDs for teuthology tests. This is to
help prevent a cluster warning pertaining to the IOPS value not lying within
a typical threshold range from being raised.
The tests can rely on the built-in static values as defined by
osd_mclock_max_capacity_iops_[ssd|hdd] which should be good enough.
Commit fccb609 backported changes that do not apply to Reef.
PR #63074 and the commit referenced therein as cherry-pick do not
correspond to the diff. Revert the commit because:
- Wildcard SAN feature in 3c24753 only since Tentacle.
- Disable multisite sync traffic feature in d620ba6 only since Squid.
- Shutdown delay feature in b84bb72 only since Tentacle.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Kefu Chai [Wed, 24 Dec 2025 05:55:26 +0000 (13:55 +0800)]
debian/control: add iproute2 to build dependencies
Test scripts like qa/tasks/cephfs/mount.py expect the ip command to be
available in the container environment. Without it, tests fail with:
```
/bin/bash: line 1: ip: command not found
File "/ceph/qa/tasks/cephfs/mount.py", line 96, in cleanup_stale_netnses_and_bridge
p = remote.run(args=['ip', 'netns', 'list'],
...
teuthology.exceptions.CommandFailedError: Command failed with status 127: 'ip netns list'
```
Add iproute2 to the debian package build dependencies when the
<pkg.ceph.check> build profile is enabled. This ensures the package is
available during container-based builds, since buildcontainer-setup.sh
→ script/run-make.sh → install-deps.sh → debian/control → generated
dependency package chain respects build profiles configured via
`FOR_MAKE_CHECK` and `WITH_CRIMSON` environment variables set in
Dockerfile.build.
David Galloway [Tue, 16 Dec 2025 22:08:00 +0000 (17:08 -0500)]
install-deps: Replace apt-mirror
apt-mirror.front.sepia.ceph.com has happened to always work because we set up CNAMEs to gitbuilder.ceph.com.
That host is making its way to a new home upstate (literally and figuratively) so we'll get rid of the front subdomain since it's publicly accessible anyway and add TLS while we're at it.
chungfengz [Thu, 6 Nov 2025 09:46:51 +0000 (09:46 +0000)]
bluestore/BlueFS: fix bytes_written_slow counter with aio_write
The bytes_written_slow performance counter was incorrectly reporting
0 when using async I/O.
When aio_write() is called with a bufferlist, it uses claim_append()
to transfer ownership of the buffer to the aio structure, leaving the
source bufferlist empty. Using t.length() after aio_write() returns 0
instead of the actual bytes written.
Fix by using the pre-calculated x_len value which contains the actual
write size and is not affected by the buffer ownership transfer.
Fixes: https://tracker.ceph.com/issues/73735 Signed-off-by: chungfengz <chungfengz@synology.com>
(cherry picked from commit 5aa5b03932c4345343e5263431914c2484bd2b14)
Rishabh Dave [Fri, 11 Oct 2024 19:03:29 +0000 (00:33 +0530)]
qa/cephfs: extend wait for trash empty
Trash directory for a volume is not created by default. If
_wait_for_trash_empty() in test_volumes.py encounters absence of trash
directory, return true.
Rishabh Dave [Sat, 6 Jan 2024 14:42:31 +0000 (20:12 +0530)]
qa/cephfs: add tests for config option pause_purging
Setting MGR config option mgr/volumes/pause_purging to true halts
all ongoing purges and allows no new purging to begin until this option
is changed to false. Add tests for this.
Conflicts:
qa/tasks/cephfs/test_volumes.py
- First conflict occurred due to missing import of safe_while which
in Reef branch compared to main branch. Along with resolving this
conflict this has been imported as it used by the tests.
- Second conflict occured due to absence of some test methods right
before where TestPausePurging was to be added.
- Third conflict occured because entire contextutil was imported instead
of just safe_while and only CommandFailedError was imported from
teuthology.exceptions while this commit imports MaxWhileTries too.
Rishabh Dave [Fri, 12 Jan 2024 10:28:41 +0000 (15:58 +0530)]
qa/cephfs: don't strip any whitespace for get_shell_stdout
Whitespace is not removed from the end of the stdout returned by the
method get_ceph_cmd_stdout(). Follow the same policy here since it is
better to not do so (this whitespace can be useful, when copying Ceph
auth keyrings from stdout to a file) and also for sake of uniformity of
interfaces.
Conflicts:
qa/tasks/cephfs/mount.py
- Conflict occured for 2 reasons -
- One, method get_shell_stdout() is absent on Reef branch but not in
main so this patch which makes modification to it will obviously run
in to conflict
- Two, run_shell_payload() lies right next to get_shell_stdout() in
main branch and its definition is quite different, leading to
conflict again.
Rishabh Dave [Tue, 3 Sep 2024 10:01:07 +0000 (15:31 +0530)]
mgr/vol: add pause/resume mechanism for async jobs
Add mechansim that allows pausing/resuming of the entire async job
machinery that queues, launches and picks next async job; both async
jobs, clones as well as purges.
And then add mgr/vol config option pause_purging and pause_cloning so
that both of these async jobs can be paused and resumed individually.
Fixes: https://tracker.ceph.com/issues/61903 Fixes: https://tracker.ceph.com/issues/68630 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 01d37d5e1ba0e250e9d3a5f28ec7f3fa3597c63f)
Conflicts:
src/pybind/mgr/volumes/module.py
- Code where patch was to be applied was slighty different
Adam King [Mon, 22 Sep 2025 21:05:07 +0000 (17:05 -0400)]
pybind/mgr: pin cheroot version in requirements-required.txt
With python 3.10 (didn't seem to happen with python 3.12) the
pybind/mgr/cephadm/tests/test_node_proxy.py test times out.
This appears to be related to a new release of the cheroot
package and a github issues describing the same problem
we're seeing has been opened by another user
https://github.com/cherrypy/cheroot/issues/769
It is worth noting that the workaround described in that
issue does also work for us. If you add
John Mulligan [Fri, 12 Sep 2025 17:52:25 +0000 (13:52 -0400)]
build-with-container: add argument groups to organize options
Use the argparse add_argument_group feature to organize the mass of
arguments into more sensible categories. Hopefully, someone reading
over the `--help` output can now more easily see options that
are useful rather than being overwhelmed by a wall of text.
mgr/dashboard: fix zone update API forcing STANDARD storage class
The zone update REST API (`edit_zone`) always attempted to configure a
placement target for the `STANDARD` storage class, even when the request
was intended for a different storage class name.
This caused failures in deployments where `STANDARD` is not defined.
Changes:
Club add placement target and add storage class methods into one single
add_placement_targets_storage_class_zone method which takes the storage
class as a param as well alongside the rest of the placement params.
Laura Flores [Tue, 3 Dec 2024 22:15:19 +0000 (16:15 -0600)]
qa/workunits/mon: ensure election strategy is "connectivity" for stretch mode
The election strategy is randomly chosen for this type of test. Sometimes,
the test passes if the "connectivity" election strategy happens to be picked.
But if a different strategy, i.e. "classic", is picked, then the test will fail.
We can ensure that the election strategy is "connectivity" by setting it in the
workunit with the ceph CLI command. Although connectivity was specified in
stretch-mode-5-mons-8-osds.yaml, that config ultimately gets overridden by
the "qa/mon_config" yaml.
Problem:
Current dump for "removed_ranks" and "disallowed_leaders"
doesn't have the correct format so the python test
script can parse through these values.
Solution:
Modified the values such that it is in the correct format
Conflicts:
src/mon/MonmapMonitor.cc - replace `goto reply` with
`goto reply_no_propose`
src/mon/OSDMonitorcc - replace `rule_valid_for_pool_type`
with `get_rule_type` since
`rule_valid_for_pool_type` is not
backported.
Laura Flores [Fri, 5 Sep 2025 21:46:20 +0000 (16:46 -0500)]
doc/rados/operations: add kernel client procedure to read balancer documentation
As of now, the kernel client does not support `pg-upmap-primary`. I have
added some troubleshooting steps to help users who are unable to
mount images and filesystems with the kernel client while using `pg-upmap-primary`.
Once the feature is supported by the kernel client, users will be able
to perform mounts along with `pg-upmap-primary`.
Update the "Disconnected+Remounted FS" section in
doc/cephfs/troubleshooting.rst, as suggested by Venky Shankar in https://github.com/ceph/ceph/pull/65129/files#r2312903062
mgr/dashboard: show non default realm sync status in rgw overview page
Currently, we just show the sync status of the default realm in rgw
overview page. This PR is to show the sync status of non-default realms
as well. Multisite sync status can be viewed for any of the active daemon
which runs in default/non-default realm.
Dan Mick [Tue, 26 Aug 2025 00:45:21 +0000 (17:45 -0700)]
Remove git clean -fdx
either
1) a source tarball is supplied, in which case the local dir is
irrelevant, or
2) make-debs calls make-dist, which doesn't care about a dirty cwd
so it just punishes the unaware by removing things that they may
have wanted to keep.
Dan Mick [Sat, 23 Aug 2025 00:43:24 +0000 (17:43 -0700)]
make-debs.sh: invoke tar with --no-same-owner
When running as a normal user, tar does not attempt to preserve
owners set on the tar content files. When running as root, it does.
Containerized builds are running as root. Stop make-debs.sh from
trying to set other owners for files, and leaving files in the
host system with mapped UIDs other than the user running the container
(which causes jenkins to be unable to clear the workspace).
Dan Mick [Thu, 21 Aug 2025 20:00:43 +0000 (13:00 -0700)]
make-debs.sh: make "skip debug packages" conditional
Now that we're using make-debs.sh as a builder inside containers,
the default should be to build all the packages, including debug.
(Also, fix a typo.)
Niklas Hambüchen [Sat, 21 Jun 2025 17:46:13 +0000 (19:46 +0200)]
doc/rados/configuration: Mention show-with-defaults and ceph-conf
A small improvement based on
"Why is it still so difficult to just dump all config and where it comes from?"
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/EZSLRYBYEWDA6YIARQVMUKQUWHAE3PGR/
`show-with-defaults` is very useful, and `ceph-conf` is mentioned
so that it's clear that it's legacy, and the user doesn't have to
wonder if it's actually useful but was forgotten in the list.
Zac Dover [Fri, 22 Aug 2025 08:39:29 +0000 (18:39 +1000)]
doc/cephfs: edit troubleshooting.rst (Slow MDS)
Move the "Slow requests (MDS)" section immediately after the first
section in this document ("Slow/Stuck Operations"), because the first
procedure on the page directs the reader to undertake the operation in
"Slow requests (MDS)" before trying anything else.
Improve source rpm detection by adding a new detection method that
executes and rpm command in a container to get exactly the version of
the source rpm that the ceph.spec file would have generated. For
backwards compatibility and that I don't entirely trust myself to have
tested this the old methods are still available.
The old `--rpm-no-match-sha` is now an alias for `--srpm-match=any` to
cause it to build any (unique) ceph srpm it finds.
`--srpm-match=versionglob` retains the previous default behavior of
using a glob matching on the git id or ceph version value. The new
default of `--srpm-match=auto` implements the rpm command based behavior
described above.
All of this is wrapped in a new step `find-rpm` but that's mostly an
implementation detail and for testing.
Dan Mick [Wed, 13 Aug 2025 19:16:45 +0000 (12:16 -0700)]
pybind/mgr/dashboard/frontend: add NPM_CACHEDIR envvar, use in bwc
Add an optional NPM_CACHEDIR environment variable to serve as the
cache parameter for npm in the dashboard frontend build. The idea
is to allow it to persist across builds so that we decrease the load
on registry.npmjs.org, which has been throttling our requests when
using build-with-container.py, and also hopefully improve the time
of the frontend npm operations.
build-with-container.py also grows a --npm-cache-path option to allow
setting it for container builds and passing the envvar to the build.