Venky Shankar [Fri, 30 May 2025 18:11:19 +0000 (18:11 +0000)]
client: asynchronous fsync can decrement request ref twice
After the asynchronous execution context is woken up when waiting
for Fb caps reference to be released causing the clien to crash
as per:
```
0x00007f3115b2452c in __pthread_kill_implementation () from /lib64/libc.so.6
0x00007f3115ad7686 in raise () from /lib64/libc.so.6
0x00007f3115ac1833 in abort () from /lib64/libc.so.6
0x00007f3113375d0a in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/common/assert.cc:74
0x00007f3113375e6f in ceph::__ceph_assert_fail (ctx=...) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/common/assert.cc:79
0x00007f311237db1d in xlist<MetaRequest*>::item::~item (this=<optimized out>, this=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/include/xlist.h:31
MetaRequest::~MetaRequest (this=<optimized out>, this=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/MetaRequest.cc:65
Client::put_request (this=0x564b491726c0, request=0x7f301c0165c0) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.cc:2140
0x00007f31123c88ad in Client::C_nonblocking_fsync_state::advance (this=0x7f307002e9f0) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.cc:11905
0x00007f3112331ccd in Context::complete (this=0x7f3070009250, r=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/include/Context.h:99
0x00007f311246a964 in Client::signal_context_list(std::__cxx11::list<Context*, std::allocator<Context*> >&) [clone .constprop.0] (ls=std::__cxx11::list = {...}, this=<optimized out>)
at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.cc:4257
0x00007f3112395f45 in Client::put_cap_ref (this=0x564b491726c0, in=0x7f306807be90, cap=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.cc:3611
0x00007f31123331f3 in Client::C_Write_Finisher::finish_io (r=0, this=0x7f30240442d0) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.cc:11381
Client::CWF_iofinish::finish (this=<optimized out>, r=0) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.h:1481
0x00007f3112331ccd in Context::complete (this=0x7f302401afd0, r=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/include/Context.h:99
0x00007f31123c5242 in Client::C_Lock_Client_Finisher::finish (this=0x7f302403c9d0, r=0) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.cc:11372
0x00007f3112331ccd in Context::complete (this=0x7f302403c9d0, r=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/include/Context.h:99
0x00007f31134374ad in Finisher::finisher_thread_entry (this=0x564b491730b0) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/common/Finisher.cc:72
0x00007f3115b227e2 in start_thread () from /lib64/libc.so.6
0x00007f3115ba7800 in clone3 () from /lib64/libc.so.6
0x0000000000000000 in ?? ()
```
Shachar Sharon [Tue, 22 Oct 2024 12:06:54 +0000 (15:06 +0300)]
client: fix memory leak in Client::CRF_iofinish::complete
Commit 1210ddf7a ("Client: Add non-blocking helper classes") introduced
Client::C_Read_Finisher Context object for async READ operations, but
it has a read-after-free bug which may cause memory leak when calling
libcephf's non-blocking ceph_ll_nonblocking_readv_writev API with async
READ:
ceph_ll_nonblocking_readv_writev (READ)
Client::ll_preadv_pwritev
...
Client::_read_async
Context::complete
Client::CRF_iofinish::complete
Client::CRF_iofinish::finish
CRF->finish_io()
Client::C_Read_Finisher::finish_io
...
delete this; // frees CRF_iofinish->CRF
if (CRF->iofinished) // use-after-free of CRF
delete this; // may not get here
A possible memory leak depends on timing and race with other thread
allocation which alters the memory address of CRF->iofinished to
false, thus skipping the last delete operation.
The check of `if (CRF->iofinished)` is unnecessary: it is always set to
true upon calling CRF->finish_io(). Thus, there is no need to have the
override function Client::CRF_iofinish::complete() as it now has the
same logic as Context::complete(). Removed.
Ronen Friedman [Thu, 19 Jun 2025 15:27:38 +0000 (10:27 -0500)]
osd/scrub: clarify that osd_scrub_auto_repair_num_errors counts objects
'osd_scrub_auto_repair_num_errors' limits the number of damaged objects
that we will try to auto-repair during a scrub. Its documentation
referred to "number of errors", which did not fit the implementation.
Fixes: https://tracker.ceph.com/issues/71754 Fixes: Red Hat BZ2316244 Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
(cherry picked from commit 680b58ffd0bf5b213ec525f8d783297fb0b14343) Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Add an instruction that includes the --enable-auth flag in a "git orch
apply mgmt-gateway" command, in accordance with a request made by
afreen23 here: https://github.com/ceph/ceph/pull/60440#discussion_r1953530599
Improve the English in the "desc" field of the
"osd_deep_scrub_interval_cv" variable, as suggested by Anthony D'Atri in
https://github.com/ceph/ceph/pull/63490#discussion_r2124893516.
Change the wording of a sentence in doc/radosgw/metrics.rst so that its
articles read as though they were written by a native speaker of the
English language.
This commit is being raised as part of a diagnostic process aimed at
discovering why the ReadtheDocs check is failing on PR
https://github.com/ceph/ceph/pull/62877.
John Mulligan [Fri, 1 Nov 2024 15:25:35 +0000 (11:25 -0400)]
squid: python-common: fix mypy errors in earmarking.py
Fix various errors found by running mypy with python 3.12 on the
python-common subtree. Uses a Protocol as a stand-in for actual file
system integration objects.
Part of an effort to get ceph tox environments passing on Python 3.12.
Signed-off-by: John Mulligan <jmulligan@redhat.com> Signed-off-by: Avan Thakkar <athakkar@redhat.com>
(cherry picked from commit b77829c45e213d3e984789b903d50d9267be5c74)
John Mulligan [Fri, 1 Nov 2024 15:25:35 +0000 (11:25 -0400)]
python-common: fix mypy errors in earmarking.py
Fix various errors found by running mypy with python 3.12 on the
python-common subtree. Uses a Protocol as a stand-in for actual file
system integration objects.
Part of an effort to get ceph tox environments passing on Python 3.12.
During upgrade I get
```
Failed to set Dashboard config for nvmeof: dashboard nvmeof-gateway-add failed: JSON array/object not allowed {"prefix": "dashboard nvmeof-gateway-add", "name": "nvmeof.rbd", "group": null, "daemon_name": "nvmeof.rbd.ceph-node-01.irpssg"} retval: -22
```
which is fixed by handling the group_name when its not there in spec.
And the other error was
```
Failed to set Dashboard config for nvmeof: dashboard nvmeof-gateway-add failed: Traceback (most recent call last): File "/usr/share/ceph/mgr/mgr_module.py", line 1864, in _handle_command return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf) File "/usr/share/ceph/mgr/mgr_module.py", line 499, in call return self.func(mgr, **kwargs) File "/usr/share/ceph/mgr/mgr_module.py", line 535, in check return func(*args, **kwargs) File "/usr/share/ceph/mgr/dashboard/services/nvmeof_cli.py", line 28, in add_nvmeof_gateway NvmeofGatewaysConfig.add_gateway(name, service_url, group, daemon_name) File "/usr/share/ceph/mgr/dashboard/services/nvmeof_conf.py", line 61, in add_gateway gateway['daemon_name'] = daemon_name TypeError: 'str' object does not support item assignment retval: -22
```
which is fixed by properly updating the config to the newer format that
is available in newer version
Zac Dover [Tue, 10 Jun 2025 02:54:18 +0000 (12:54 +1000)]
doc/mgr: edit telemetry.rst (lines 300-400)
Edit doc/mgr/telemetry.rst (lines 300-400).
Follow up on the suggestions made by Anthony D'Atri in
https://github.com/ceph/ceph/pull/63741 (except for the one about
including Lovecraftian lore in the dummy user data in this file).
Zac Dover [Tue, 10 Jun 2025 10:58:22 +0000 (20:58 +1000)]
doc/rados: enhance "pools.rst"
Add a link to the instructions for modifying a user's caps for a given
pool. Add this link where it makes sense to add it. Add this link where
the reader would naturally want to have the link.
Zac Dover [Tue, 10 Jun 2025 10:38:54 +0000 (20:38 +1000)]
doc/rbd: add mirroring troubleshooting info
Add a note to doc/rbd/rbd-mirroring.rst that directs the reader to set
both "site-a" and "site-b" to have the same pool names in the event that
rbd throws the error message "failed to import peer bootstrap token".
This information was reported to the Ceph upstream by Petr Tlapa in June
of 2025, and credit for its development goes to Petr.
Zac Dover [Tue, 10 Jun 2025 03:04:13 +0000 (13:04 +1000)]
doc/rados: edit ops/user-management.rst
Edit an sentence in the imperative mood so that it matches the general
form of imperative sentences immediately preceding commands that contain
replaceable portions.
This commit targets only the Squid release branch.
Follows up on https://github.com/ceph/ceph/pull/58235/.
Ville Ojamo [Fri, 18 Apr 2025 07:43:27 +0000 (14:43 +0700)]
doc/radosgw: Improve and more consistent formatting
Use inline code formatting consistently for command
line switches, data, hostnames, etc.
Correctly indent text and child lists in list items.
Remove a mid-sentence double spaces.
Capitalize "RGW" and "API" in text.
Remove unordered lists that are just regular text
everywhere else.
Use correct prompt # instead of $ for privileged
commands.
Use line continuation for multi-line example commands
instead of render them incorrectly as separate
single-line commands.
Use Title Case in few section header text that
missed it.
multisite.rst: Don't repeat "(RGW)" after "RADOS
Gateway" beyond the first instance in the same
paragraph.
multisite.rst: Change one "multisite" to "multi-site"
because all other instances use this spelling (EXCEPT
the title of the document??).
multisite.rst: Fix indentation of continuation lines in
prompted example commands.
Use pre-formatted block, as seen elsewhere in docs,
instead of strange unordered list plus inline code for
syntax example.
Add space before backslash for multi-line command
continuation.
Casey Bodley [Mon, 19 May 2025 21:05:43 +0000 (17:05 -0400)]
doc/rgw: use 'confval' directive to render sts config options
the 'confval' directive reads the config options from
common/options/rgw.yaml and renders them nicely. this keeps
everything consistent between the options and their docs
improve the config option descriptions:
* add existing note about rgw_sts_key length/format
* add example openssl command to generate a conforming sts key
* add notes about sharing sts key between gateways/zones
format the last remaining 'Note' with the 'note' directive
J. Eric Ivancich [Tue, 25 Mar 2025 22:10:27 +0000 (18:10 -0400)]
rgw: fix bug with rgw-gap-list
rgw-gap-list would fail if it it reached the end of the second file
before the first, thereby causing an infinite loop.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com> Signed-off-by: Michael J. Kidd <linuxkidd@gmail.com>
(cherry picked from commit 0cfbc57d2c43ea88845561f14e295d0d48e44b32)