Rishabh Dave [Tue, 18 Feb 2025 12:30:03 +0000 (18:00 +0530)]
qa/cephfs: ignore warning that pg is stuck peering for upgrade jobs
Health warning "pg .* is stuck peering" is seen while Ceph cluster is
under the upgrade process during fs/upgrade QA job. Being an expected
warning, it should be added to the ignorelist.
And besides this one, we already ignore more severe warnings ("pg is
stuck inactive" and "pg is degrarded") for fs/upgrade jobs.
Fixes: https://tracker.ceph.com/issues/70023 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 9748de76e02254c6dc284dcc20ec5d5761760dcb)
Conflicts:
qa/cephfs/overrides/pg_health.yaml
- Line before the point where the patch was to be applied is different
comapred to main branch.
In statfs, when the quota root for a dir is discovered,
it uses that dir to base values for max_files and max_bytes.
This can be an issue when a dir is found with only one of two potential quota
fields. Take for instance, a dir with only max_files set and parent dir
has only max_bytes set. During a statfs call, it will then use the max_files
value for provided dir, but does not have a value for max_bytes. In this case,
this behavior will cause the size of the filesystem to be displayed.
Instead, find the quota root for max_files and max_bytes separately. This will
allow for mixed quotas to inherit missing values from its parent. In the above
example, max_files from current dir and max_bytes from parent dir will be
displayed.
Fixes: https://tracker.ceph.com/issues/73487 Signed-off-by: Christopher Hoffman <choffman@redhat.com>
(cherry picked from commit dd02ea9b18502b87ce815eba4286ae3516e334b3)
In cases where there is a single element in a batch_op_map,new_batch_head
is a nullptr, when this is retried at Finisher we'd hit one of the asserts when
dereferencing
In SingletonClient::init(), objecter->start() called before
monc->authenticate(), it makes conns of monc authencated before
monc->authenticate() called if mons reply faster, in this case,
monc will not subsribe monmap/config.
mds: client is evicted when an export subtree task is interrupted
The importer will force open some sessions provided by the exporter but the client does not know about
the new sessions until the exporter notifies it, and the notifications cannot be sent if the exporter
is interrupted. The client does not renew the sessions regularly that it does not know about, so the client
will be evicted by the importer after `session_autoclose` seconds (300 seconds by default).
The sessions that are forced opened in the importer need to be closed when the import process is reversed.
Zhansong Gao [Fri, 26 May 2023 04:20:17 +0000 (12:20 +0800)]
mds: session in the importing state cannot be cleared if an export subtree task is interrupted while the state of importer is acking
The related sessions in the importer are in the importing state(`Session::is_importing` return true) when the state of importer is `acking`,
`Migrator::import_reverse` called by `MDCache::handle_resolve` should reverse the process to clear the importing state if the exporter restarts
at this time, but it doesn't do that actually because of its bug. And it will cause these sessions to not be cleared when the client is
unmounted(evicted or timeout) until the mds is restarted.
The bug in `import_reverse` is that it contains the code to handle state `IMPORT_ACKING` but it will never be executed because
the state is modified to `IMPORT_ABORTING` at the beginning. Move `stat.state = IMPORT_ABORTING` to the end of import_reverse
so that it can handle the state `IMPORT_ACKING`.
Casey Bodley [Fri, 3 Oct 2025 16:24:18 +0000 (12:24 -0400)]
rgw: fix 'bucket rm --bypass-gc' for copied objects
the `--bypass-gc` argument to `radosgw-admin bucket rm` causes us to
call `RadosBucket::remove_bypass_gc()`, which loops over the tail
objects and removes each with `RGWRados::delete_raw_obj_aio()`
however, this was removing the objects with `cls_rgw_remove_obj()`,
which is for head objects, not tails. tail objects must be removed with
`cls_refcount_put()`, which preserves them until the last copy is
removed
rename `delete_raw_obj_aio()` to `delete_tail_obj_aio()` to clarify its
purpose
Nitzan Mordechai [Wed, 22 Oct 2025 05:41:56 +0000 (05:41 +0000)]
tasks/cbt_performance: Tolerate exceptions during performance data updates
If an exception occurs during the POST request to update CBT performance,
log the error instead of failing the entire job. This ensures that
intermittent update failures do not block the main workflow.
The unlink subcommand did not handle unsharded bucket indices
appropriately. These are when the number of shards listed in the
bucket instance object is 0. In that case there will actually be 1
shard.
When number of shards as 0 is passed into the function that maps
object names to shards, it returns -1. And that was not handled
properly. That is now fixed.
Henry Richter [Wed, 8 Oct 2025 23:00:34 +0000 (01:00 +0200)]
rgw: asio/beast add ssl hot-reload
Adds the `ssl_reload` config option to the beast frontend.
This sets an interval in seconds to periodically reload the ssl context to pick up changes without restarting. It can be disabled (default) be setting it to `0`.
qa/tasks/ceph_manager: population must be a sequence
This patch addresses TypeError message for rados_bench if there is
python3.11 for example.
2025-04-17T17:05:45.719 INFO:tasks.thrashosds.thrasher:Traceback (most recent call last):
File "/home/debian/src/github.com_kshtsk_ceph_b8b19a59890781db2f405500155c975cbdeb38a1/qa/tasks/ceph_manager.py", line 192, in wrapper
return func(self)
^^^^^^^^^^
File "/home/debian/src/github.com_kshtsk_ceph_b8b19a59890781db2f405500155c975cbdeb38a1/qa/tasks/ceph_manager.py", line 1439, in _do_thrash
self.choose_action()()
File "/home/debian/src/github.com_kshtsk_ceph_b8b19a59890781db2f405500155c975cbdeb38a1/qa/tasks/ceph_manager.py", line 855, in grow_pool
pool = self.ceph_manager.get_pool()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/debian/src/github.com_kshtsk_ceph_b8b19a59890781db2f405500155c975cbdeb38a1/qa/tasks/ceph_manager.py", line 2221, in get_pool
return random.sample(self.pools.keys(), 1)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/random.py", line 439, in sample
raise TypeError("Population must be a sequence. "
TypeError: Population must be a sequence. For dicts or sets, use sorted(d).
This happens because dict.keys() returns dict_keys() instead of list,
however the random.sample() accepts a list only as first argument
because sampling from a set deprecated since Python 3.9 and eventually
removed since 3.11 version.
Zac Dover [Thu, 1 May 2025 07:31:33 +0000 (17:31 +1000)]
doc/src/common/options: mgr.yaml.in edit
Improve the "desc" field under the "mgr_data" entry in
src/common/options/mgr.yaml.in.
This is a test to determine whether the Jenkins tests can be passed.
This test is made after the mystifying failure of
https://github.com/ceph/ceph/pull/62983.
Conflicts:
src/pybind/mgr/dashboard/controllers/saml2.py
- kept the config changes as is on squid
src/pybind/mgr/dashboard/tox.ini
- kept the file as it is
Paulo E. Castro [Sat, 5 Apr 2025 20:47:55 +0000 (21:47 +0100)]
pybind/mgr: Hack around the 'ImportError: PyO3 modules may only be initialized once per interpreter process' issue.
Fixes: https://tracker.ceph.com/issues/64213 Signed-off-by: Paulo E. Castro <pecastro@wormholenet.com>
(cherry picked from commit 5b2aa8f8c61d7c2a56e1480c479801079a1ff822)
Edit the section "Data Pool Damage" in doc/cephfs/disaster-recovery.rst.
This commit is part of the project of improving the data-recovery parts
of the CephFS documentation, as requested in the Ceph Power Users
Feedback Summary in mid-2025.
Patrick Donnelly [Wed, 16 Apr 2025 20:24:23 +0000 (16:24 -0400)]
test/libcephfs: copy DT_NEEDED entries from input libraries
On Ubuntu 22.04, the linker is not stumbling thinking the libceph-common
library is missing on the command-line. This appears to be a bug and the only
workaround I've found is to copy the DT_NEEDED entries for the input shared
objects (which is traditional linker behavior). I don't have an explanation for
why this occurs only for a few test executables.
Zac Dover [Wed, 11 Jun 2025 12:44:32 +0000 (22:44 +1000)]
doc/rados/ops: edit cache-tiering.rst
Add material to doc/rados/operations/cache-tiering.rst, as suggested by
Anthony D'Atri in
https://github.com/ceph/ceph/pull/63745#discussion_r2127887785.
Ville Ojamo [Wed, 30 Apr 2025 18:17:14 +0000 (01:17 +0700)]
doc/radosgw: Improve rgw-cache.rst
Try to improve the language by completely rewriting some sentences.
Attempt to format the document more like the rest of the docs.
Fix several errors in punctuation, capitalization, spaces etc.
Use blocks with bash prompts for CLI commands instead of hardcoded
prompts.
Fix section hierarchy and section title underline lengths.
Use admonition.