When libcephfs aio tests (src/test/client) are run
with objectcacher disabled (ceph_test_client --client_oc=false),
the TestClient.LlreadvLlwritev fails and core dumps. The client
hits the assert 'caps_ref[c]<0'.
This patch fixes the same. There is no need to give out cap_ref
and take it again between multiple read because of short reads.
In some cases, the get_caps used to fail in C_Read_Sync_NonBlocking::finish
causing cap_ref to go negative when put_cap_ref is done at last in
C_Read_Finish::finish_io
Cause:
In aio path, the client_lock was not being held
in the internal callback after the io is done where
it's expected to be taken leading to corruption.
Venky Shankar [Fri, 30 May 2025 18:11:19 +0000 (18:11 +0000)]
client: asynchronous fsync can decrement request ref twice
After the asynchronous execution context is woken up when waiting
for Fb caps reference to be released causing the clien to crash
as per:
```
0x00007f3115b2452c in __pthread_kill_implementation () from /lib64/libc.so.6
0x00007f3115ad7686 in raise () from /lib64/libc.so.6
0x00007f3115ac1833 in abort () from /lib64/libc.so.6
0x00007f3113375d0a in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/common/assert.cc:74
0x00007f3113375e6f in ceph::__ceph_assert_fail (ctx=...) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/common/assert.cc:79
0x00007f311237db1d in xlist<MetaRequest*>::item::~item (this=<optimized out>, this=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/include/xlist.h:31
MetaRequest::~MetaRequest (this=<optimized out>, this=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/MetaRequest.cc:65
Client::put_request (this=0x564b491726c0, request=0x7f301c0165c0) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.cc:2140
0x00007f31123c88ad in Client::C_nonblocking_fsync_state::advance (this=0x7f307002e9f0) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.cc:11905
0x00007f3112331ccd in Context::complete (this=0x7f3070009250, r=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/include/Context.h:99
0x00007f311246a964 in Client::signal_context_list(std::__cxx11::list<Context*, std::allocator<Context*> >&) [clone .constprop.0] (ls=std::__cxx11::list = {...}, this=<optimized out>)
at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.cc:4257
0x00007f3112395f45 in Client::put_cap_ref (this=0x564b491726c0, in=0x7f306807be90, cap=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.cc:3611
0x00007f31123331f3 in Client::C_Write_Finisher::finish_io (r=0, this=0x7f30240442d0) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.cc:11381
Client::CWF_iofinish::finish (this=<optimized out>, r=0) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.h:1481
0x00007f3112331ccd in Context::complete (this=0x7f302401afd0, r=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/include/Context.h:99
0x00007f31123c5242 in Client::C_Lock_Client_Finisher::finish (this=0x7f302403c9d0, r=0) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.cc:11372
0x00007f3112331ccd in Context::complete (this=0x7f302403c9d0, r=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/include/Context.h:99
0x00007f31134374ad in Finisher::finisher_thread_entry (this=0x564b491730b0) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/common/Finisher.cc:72
0x00007f3115b227e2 in start_thread () from /lib64/libc.so.6
0x00007f3115ba7800 in clone3 () from /lib64/libc.so.6
0x0000000000000000 in ?? ()
```
Shachar Sharon [Tue, 22 Oct 2024 12:06:54 +0000 (15:06 +0300)]
client: fix memory leak in Client::CRF_iofinish::complete
Commit 1210ddf7a ("Client: Add non-blocking helper classes") introduced
Client::C_Read_Finisher Context object for async READ operations, but
it has a read-after-free bug which may cause memory leak when calling
libcephf's non-blocking ceph_ll_nonblocking_readv_writev API with async
READ:
ceph_ll_nonblocking_readv_writev (READ)
Client::ll_preadv_pwritev
...
Client::_read_async
Context::complete
Client::CRF_iofinish::complete
Client::CRF_iofinish::finish
CRF->finish_io()
Client::C_Read_Finisher::finish_io
...
delete this; // frees CRF_iofinish->CRF
if (CRF->iofinished) // use-after-free of CRF
delete this; // may not get here
A possible memory leak depends on timing and race with other thread
allocation which alters the memory address of CRF->iofinished to
false, thus skipping the last delete operation.
The check of `if (CRF->iofinished)` is unnecessary: it is always set to
true upon calling CRF->finish_io(). Thus, there is no need to have the
override function Client::CRF_iofinish::complete() as it now has the
same logic as Context::complete(). Removed.
Ronen Friedman [Thu, 19 Jun 2025 15:27:38 +0000 (10:27 -0500)]
osd/scrub: clarify that osd_scrub_auto_repair_num_errors counts objects
'osd_scrub_auto_repair_num_errors' limits the number of damaged objects
that we will try to auto-repair during a scrub. Its documentation
referred to "number of errors", which did not fit the implementation.
Fixes: https://tracker.ceph.com/issues/71754 Fixes: Red Hat BZ2316244 Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
(cherry picked from commit 680b58ffd0bf5b213ec525f8d783297fb0b14343) Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Add an instruction that includes the --enable-auth flag in a "git orch
apply mgmt-gateway" command, in accordance with a request made by
afreen23 here: https://github.com/ceph/ceph/pull/60440#discussion_r1953530599
Improve the English in the "desc" field of the
"osd_deep_scrub_interval_cv" variable, as suggested by Anthony D'Atri in
https://github.com/ceph/ceph/pull/63490#discussion_r2124893516.
Change the wording of a sentence in doc/radosgw/metrics.rst so that its
articles read as though they were written by a native speaker of the
English language.
This commit is being raised as part of a diagnostic process aimed at
discovering why the ReadtheDocs check is failing on PR
https://github.com/ceph/ceph/pull/62877.
John Mulligan [Fri, 1 Nov 2024 15:25:35 +0000 (11:25 -0400)]
squid: python-common: fix mypy errors in earmarking.py
Fix various errors found by running mypy with python 3.12 on the
python-common subtree. Uses a Protocol as a stand-in for actual file
system integration objects.
Part of an effort to get ceph tox environments passing on Python 3.12.
Signed-off-by: John Mulligan <jmulligan@redhat.com> Signed-off-by: Avan Thakkar <athakkar@redhat.com>
(cherry picked from commit b77829c45e213d3e984789b903d50d9267be5c74)
John Mulligan [Fri, 1 Nov 2024 15:25:35 +0000 (11:25 -0400)]
python-common: fix mypy errors in earmarking.py
Fix various errors found by running mypy with python 3.12 on the
python-common subtree. Uses a Protocol as a stand-in for actual file
system integration objects.
Part of an effort to get ceph tox environments passing on Python 3.12.
During upgrade I get
```
Failed to set Dashboard config for nvmeof: dashboard nvmeof-gateway-add failed: JSON array/object not allowed {"prefix": "dashboard nvmeof-gateway-add", "name": "nvmeof.rbd", "group": null, "daemon_name": "nvmeof.rbd.ceph-node-01.irpssg"} retval: -22
```
which is fixed by handling the group_name when its not there in spec.
And the other error was
```
Failed to set Dashboard config for nvmeof: dashboard nvmeof-gateway-add failed: Traceback (most recent call last): File "/usr/share/ceph/mgr/mgr_module.py", line 1864, in _handle_command return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf) File "/usr/share/ceph/mgr/mgr_module.py", line 499, in call return self.func(mgr, **kwargs) File "/usr/share/ceph/mgr/mgr_module.py", line 535, in check return func(*args, **kwargs) File "/usr/share/ceph/mgr/dashboard/services/nvmeof_cli.py", line 28, in add_nvmeof_gateway NvmeofGatewaysConfig.add_gateway(name, service_url, group, daemon_name) File "/usr/share/ceph/mgr/dashboard/services/nvmeof_conf.py", line 61, in add_gateway gateway['daemon_name'] = daemon_name TypeError: 'str' object does not support item assignment retval: -22
```
which is fixed by properly updating the config to the newer format that
is available in newer version
Zac Dover [Tue, 10 Jun 2025 02:54:18 +0000 (12:54 +1000)]
doc/mgr: edit telemetry.rst (lines 300-400)
Edit doc/mgr/telemetry.rst (lines 300-400).
Follow up on the suggestions made by Anthony D'Atri in
https://github.com/ceph/ceph/pull/63741 (except for the one about
including Lovecraftian lore in the dummy user data in this file).
Zac Dover [Tue, 10 Jun 2025 10:58:22 +0000 (20:58 +1000)]
doc/rados: enhance "pools.rst"
Add a link to the instructions for modifying a user's caps for a given
pool. Add this link where it makes sense to add it. Add this link where
the reader would naturally want to have the link.
Zac Dover [Tue, 10 Jun 2025 10:38:54 +0000 (20:38 +1000)]
doc/rbd: add mirroring troubleshooting info
Add a note to doc/rbd/rbd-mirroring.rst that directs the reader to set
both "site-a" and "site-b" to have the same pool names in the event that
rbd throws the error message "failed to import peer bootstrap token".
This information was reported to the Ceph upstream by Petr Tlapa in June
of 2025, and credit for its development goes to Petr.
Zac Dover [Tue, 10 Jun 2025 03:04:13 +0000 (13:04 +1000)]
doc/rados: edit ops/user-management.rst
Edit an sentence in the imperative mood so that it matches the general
form of imperative sentences immediately preceding commands that contain
replaceable portions.
This commit targets only the Squid release branch.
Follows up on https://github.com/ceph/ceph/pull/58235/.