Kefu Chai [Wed, 24 Dec 2025 05:55:26 +0000 (13:55 +0800)]
debian/control: add iproute2 to build dependencies
Test scripts like qa/tasks/cephfs/mount.py expect the ip command to be
available in the container environment. Without it, tests fail with:
```
/bin/bash: line 1: ip: command not found
File "/ceph/qa/tasks/cephfs/mount.py", line 96, in cleanup_stale_netnses_and_bridge
p = remote.run(args=['ip', 'netns', 'list'],
...
teuthology.exceptions.CommandFailedError: Command failed with status 127: 'ip netns list'
```
Add iproute2 to the debian package build dependencies when the
<pkg.ceph.check> build profile is enabled. This ensures the package is
available during container-based builds, since buildcontainer-setup.sh
→ script/run-make.sh → install-deps.sh → debian/control → generated
dependency package chain respects build profiles configured via
`FOR_MAKE_CHECK` and `WITH_CRIMSON` environment variables set in
Dockerfile.build.
David Galloway [Tue, 16 Dec 2025 22:08:00 +0000 (17:08 -0500)]
install-deps: Replace apt-mirror
apt-mirror.front.sepia.ceph.com has happened to always work because we set up CNAMEs to gitbuilder.ceph.com.
That host is making its way to a new home upstate (literally and figuratively) so we'll get rid of the front subdomain since it's publicly accessible anyway and add TLS while we're at it.
Rishabh Dave [Tue, 18 Feb 2025 12:30:03 +0000 (18:00 +0530)]
qa/cephfs: ignore warning that pg is stuck peering for upgrade jobs
Health warning "pg .* is stuck peering" is seen while Ceph cluster is
under the upgrade process during fs/upgrade QA job. Being an expected
warning, it should be added to the ignorelist.
And besides this one, we already ignore more severe warnings ("pg is
stuck inactive" and "pg is degrarded") for fs/upgrade jobs.
Fixes: https://tracker.ceph.com/issues/70023 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 9748de76e02254c6dc284dcc20ec5d5761760dcb)
Conflicts:
qa/cephfs/overrides/pg_health.yaml
- Line before the point where the patch was to be applied is different
comapred to main branch.
In statfs, when the quota root for a dir is discovered,
it uses that dir to base values for max_files and max_bytes.
This can be an issue when a dir is found with only one of two potential quota
fields. Take for instance, a dir with only max_files set and parent dir
has only max_bytes set. During a statfs call, it will then use the max_files
value for provided dir, but does not have a value for max_bytes. In this case,
this behavior will cause the size of the filesystem to be displayed.
Instead, find the quota root for max_files and max_bytes separately. This will
allow for mixed quotas to inherit missing values from its parent. In the above
example, max_files from current dir and max_bytes from parent dir will be
displayed.
Fixes: https://tracker.ceph.com/issues/73487 Signed-off-by: Christopher Hoffman <choffman@redhat.com>
(cherry picked from commit dd02ea9b18502b87ce815eba4286ae3516e334b3)
In cases where there is a single element in a batch_op_map,new_batch_head
is a nullptr, when this is retried at Finisher we'd hit one of the asserts when
dereferencing
In SingletonClient::init(), objecter->start() called before
monc->authenticate(), it makes conns of monc authencated before
monc->authenticate() called if mons reply faster, in this case,
monc will not subsribe monmap/config.
mds: client is evicted when an export subtree task is interrupted
The importer will force open some sessions provided by the exporter but the client does not know about
the new sessions until the exporter notifies it, and the notifications cannot be sent if the exporter
is interrupted. The client does not renew the sessions regularly that it does not know about, so the client
will be evicted by the importer after `session_autoclose` seconds (300 seconds by default).
The sessions that are forced opened in the importer need to be closed when the import process is reversed.
Zhansong Gao [Fri, 26 May 2023 04:20:17 +0000 (12:20 +0800)]
mds: session in the importing state cannot be cleared if an export subtree task is interrupted while the state of importer is acking
The related sessions in the importer are in the importing state(`Session::is_importing` return true) when the state of importer is `acking`,
`Migrator::import_reverse` called by `MDCache::handle_resolve` should reverse the process to clear the importing state if the exporter restarts
at this time, but it doesn't do that actually because of its bug. And it will cause these sessions to not be cleared when the client is
unmounted(evicted or timeout) until the mds is restarted.
The bug in `import_reverse` is that it contains the code to handle state `IMPORT_ACKING` but it will never be executed because
the state is modified to `IMPORT_ABORTING` at the beginning. Move `stat.state = IMPORT_ABORTING` to the end of import_reverse
so that it can handle the state `IMPORT_ACKING`.
Casey Bodley [Fri, 3 Oct 2025 16:24:18 +0000 (12:24 -0400)]
rgw: fix 'bucket rm --bypass-gc' for copied objects
the `--bypass-gc` argument to `radosgw-admin bucket rm` causes us to
call `RadosBucket::remove_bypass_gc()`, which loops over the tail
objects and removes each with `RGWRados::delete_raw_obj_aio()`
however, this was removing the objects with `cls_rgw_remove_obj()`,
which is for head objects, not tails. tail objects must be removed with
`cls_refcount_put()`, which preserves them until the last copy is
removed
rename `delete_raw_obj_aio()` to `delete_tail_obj_aio()` to clarify its
purpose
Nitzan Mordechai [Wed, 22 Oct 2025 05:41:56 +0000 (05:41 +0000)]
tasks/cbt_performance: Tolerate exceptions during performance data updates
If an exception occurs during the POST request to update CBT performance,
log the error instead of failing the entire job. This ensures that
intermittent update failures do not block the main workflow.
The unlink subcommand did not handle unsharded bucket indices
appropriately. These are when the number of shards listed in the
bucket instance object is 0. In that case there will actually be 1
shard.
When number of shards as 0 is passed into the function that maps
object names to shards, it returns -1. And that was not handled
properly. That is now fixed.
Henry Richter [Wed, 8 Oct 2025 23:00:34 +0000 (01:00 +0200)]
rgw: asio/beast add ssl hot-reload
Adds the `ssl_reload` config option to the beast frontend.
This sets an interval in seconds to periodically reload the ssl context to pick up changes without restarting. It can be disabled (default) be setting it to `0`.
Nitzan Mordechai [Tue, 10 Dec 2024 09:04:34 +0000 (09:04 +0000)]
msg/async: race condition between reset_recv_state and shutdown_connections
when shutting down monitors and valgrind is involved, we can,
sometimes, to hit race condition and locks that causing the shutdown
process to hang for a long time.
reset_recv_state - issuing a message without proper locks that
causing the shutdown to hang during shutdown connection (drain network)
1. Fixes the promql expr used to calculate "In" OSDs in
ceph-cluster-advanced.json.
2. Fixes the color coding for the single state panels used in the OSDs
grafana panel like "In", "Out" etc
according to `dpkg-buildflags`, ubuntu 24 raised this value to
`-D_FORTIFY_SOURCE=3` which causes `error: "_FORTIFY_SOURCE" redefined`
compilation failures because Ceph itself adds `-D_FORTIFY_SOURCE=2`
`_FORTIFY_SOURCE` is a hardening option. both our rpm and debian builds
already specify that via environment variables, so Ceph's cmake should
leave it alone
Anoop C S [Mon, 23 Sep 2024 07:06:55 +0000 (12:36 +0530)]
client: Gracefully handle empty pathname for statxat()
man statx(2)[1] says the following:
. . .
AT_EMPTY_PATH
If pathname is an empty string, operate on the file referred to by
dirfd (which may have been obtained using the open(2) O_PATH flag).
In this case, dirfd can refer to any type of file, not just a
directory.
If dirfd is AT_FDCWD, the call operates on the current working
directory.
. . .
Look out for an empty pathname and use the relative fd's inode in the
presence of AT_EMPTY_PATH flag before calling internal _getattr().
Fixes: https://tracker.ceph.com/issues/68189
Review with: git show -w
Anoop C S [Thu, 17 Oct 2024 16:15:17 +0000 (21:45 +0530)]
libcephfs.h: Fix API documentation for ceph_statxat
flags parameter for ceph_statxat() API is supposed to accept only
AT_STATX_DONT_SYNC and AT_SYMLINK_NOFOLLOW. Modify the corresponding
documentation to reflect the acceptance of above two flags.
Anoop C S [Fri, 20 Sep 2024 08:49:01 +0000 (14:19 +0530)]
client: Gracefully handle empty pathname for chownat()
man fchownat(2)[1] says the following:
. . .
AT_EMPTY_PATH (since Linux 2.6.39)
If pathname is an empty string, operate on the file referred to by
dirfd (which may have been obtained using the open(2) O_PATH flag).
In this case, dirfd can refer to any type of file, not just a
directory. If dirfd is AT_FDCWD, the call operates on the current
working directory.
. . .
Look out for an empty pathname and use the relative fd's inode in the
presence of AT_EMPTY_PATH flag before calling internal _setattr().
Fixes: https://tracker.ceph.com/issues/68189
Review with: git show -w
test/rbd-mirror: eliminate a race in ResyncRequestedRemoteNotPrimary
Adjust the wait_for_notification call in TestMockImageReplayerSnapshotReplayer.ResyncRequestedRemoteNotPrimary
to expect 2 notifications instead of 1. This allows the test to correctly wait for both expected events
i.e for finish_sync() and handle_replay_complete(locker, -EREMOTEIO, "remote image demoted"), ensuring the
replayer transitions to STATE_COMPLETE and is_replaying() returns false as intended.
Fixes: https://tracker.ceph.com/issues/72325 Signed-off-by: VinayBhaskar-V <vvarada@redhat.com>
(cherry picked from commit b5a013f6170bb4445da8f5469243e4869b760a81)
VinayBhaskar-V [Tue, 13 May 2025 20:25:44 +0000 (01:55 +0530)]
rbd-mirror: prevent image deletion if remote image is not primary
A resync on a mirrored image may incorrectly results in the local
image being deleted even when the remote image is no longer primary.
This issue can occur under the following conditions:
* if resync is requested on the secondary before the remote image has
been fully demoted
* if the demotion of the primary image is not mirrored
due to the rbd-mirror daemon being offline.
This can be fixed by ensuring that image deletion during a resync is
only allowed when the remote image is confirmed to be primary.
This commit fixes the issue only for snapshot based mirroring mode
Fixes: https://tracker.ceph.com/issues/70948 Signed-off-by: VinayBhaskar-V <vvarada@redhat.com>
(cherry picked from commit e14afbc95a5fb8f5a33e7ea23a035992b966d671)
Adam King [Mon, 22 Sep 2025 21:05:07 +0000 (17:05 -0400)]
pybind/mgr: pin cheroot version in requirements-required.txt
With python 3.10 (didn't seem to happen with python 3.12) the
pybind/mgr/cephadm/tests/test_node_proxy.py test times out.
This appears to be related to a new release of the cheroot
package and a github issues describing the same problem
we're seeing has been opened by another user
https://github.com/cherrypy/cheroot/issues/769
It is worth noting that the workaround described in that
issue does also work for us. If you add
John Mulligan [Fri, 12 Sep 2025 17:52:25 +0000 (13:52 -0400)]
build-with-container: add argument groups to organize options
Use the argparse add_argument_group feature to organize the mass of
arguments into more sensible categories. Hopefully, someone reading
over the `--help` output can now more easily see options that
are useful rather than being overwhelmed by a wall of text.
mgr/dashboard: fix zone update API forcing STANDARD storage class
The zone update REST API (`edit_zone`) always attempted to configure a
placement target for the `STANDARD` storage class, even when the request
was intended for a different storage class name.
This caused failures in deployments where `STANDARD` is not defined.
Changes:
Club add placement target and add storage class methods into one single
add_placement_targets_storage_class_zone method which takes the storage
class as a param as well alongside the rest of the placement params.
Conflicts:
src/pybind/mgr/volumes/fs/async_cloner.py
src/pybind/mgr/volumes/fs/operations/versions/subvolume_v1.py
- commit 8c536f78907f was missing which led to conflict.
Adam C. Emerson [Mon, 8 Sep 2025 18:19:20 +0000 (14:19 -0400)]
rgw: Record the `service_unique_id`, if present, in the SrviceMap
For consistency and ease associating the two.
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 3a94a7b2ed02d20b2bc839b283e60cf4778f69e4) Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Adam C. Emerson [Fri, 5 Sep 2025 15:31:40 +0000 (11:31 -0400)]
common: Allow PerfCounters to return a provided service ID
Dashboard has asked for a unique identifier that can be associated
with services. This commit provides a component of that
functionality. Enforcing uniqueness is beyond the scope of this PR and
is the responsibility of cluster setup and orchestration. The scope of
uniqueness is a matter of policy and up to the design of cluster setup
and orchestration software.
We provide the `--service_unique_id` argument that can be passed on
the command line when executing a Ceph service that uses
`global_init`. If non-empty, a `service_unique_id` section is added to
the PerfCounters dump for that service. This section has a single
entry whose name is set to the argument of `service_unique_id` and
whose value is arbitrary. If unspecified or empty, no
`service_unique_id` section is added.
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 6dc322421f7a3758251fe29e3f35934231358011)
Conflicts:
src/common/options/global.yaml.in
- Preceding options not in Squid
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>