Nizamudeen A [Tue, 12 Aug 2025 04:15:54 +0000 (09:45 +0530)]
mgr/dashboard: close token status subscription properly
Since its not returning any subscription back to the `this.subs`
property, those subscriptions are not properly closed in the
workbench-layout when its destroyed. So ensuring proper return
Nizamudeen A [Fri, 8 Aug 2025 06:42:20 +0000 (12:12 +0530)]
mgr/dashboard: fix memory leak in prometheus service
Prometheus API calls in the Cluster Utilization call is subscribed in
the for loop multiple times but this is not properly unsubscribed. As we
stay in the dashboard page for longer time, it produces a significant
memory leak which eventually lags the UI. Attempting to fix it by
properly handling the subscription
Nizamudeen A [Mon, 28 Jul 2025 08:22:36 +0000 (13:52 +0530)]
mgr/dashboard: fix table dom re-rendering
each table refresh creates a new data or update the existing data. this
causes the existing data to be completely replaced with a newer one and
thereby loosing the trackBy functionality. So I am modifying the data
in-place so that the memory reference doesn't get changed
The arm64-only module uadk needs numa.h to build; nothing else
ensures it's available. Make it an unconditional ceph build
dependency on behalf of the arm64 build.
libcephfs_proxy: implement client side support for embedded perms
Implement the code to handle embedded perms or not depending on a
feature flag negotiated during connection.
If embedded permissions are enabled, ceph_userperm_new() will allocate a
local structure with the provided credentials instead of sending it to
the server. ceph_userperm_destroy() will just destroy the allocated
structure. If it's disabled, these function will work as any other
function, sending the request to the server.
libcephfs_proxy: extend the protocol to support embedded permissions
This patch adds the changes to the protocol definition to support
sending the user credentials along with the request that requires it.
Using protocol version 1, instead of sending a pointer to a previously
allocated UserPerm structure, the caller will embed the uid, gid and the
list of additional groups in the request itself.
This change doesn't modify the binary format of the protocol structures
so that they remain backward compatible, but it modifies how they are
declared to make it easier to extend them with future protocol
improvements.
this change is created in the same spirit of bb1fa818.
when building the tree with clang-21, following warning was raised:
```
/home/kefu/dev/ceph/src/libcephfs_proxy/proxy_async.c:43:9: warning: arithmetic on a pointer to void is a GNU extension [-Wgnu-pointer-arith]
43 | data += iov->iov_len;
| ~~~~ ^
1 warning generated.
```
this change should address this warning by casting a `void*` pointer to
`char*` pointer before performing arithmetic operation on it.
Zac Dover [Sat, 9 Aug 2025 05:53:59 +0000 (15:53 +1000)]
doc/cephfs: edit troubleshooting.rst
Edit the section "RADOS Health" in the file
doc/cephfs/troubleshooting.rst. Add a Sphinx directive to the
doc/rados/troubleshooting/index.rst file that directs to the index of
the RADOS troubleshooting documentation.
qa/suites/krbd: use a standard fixed-1 cluster in unmap subsuite
A custom "fixed-1, but with the client on a separate node" cluster was
needed only for pre-single-major.yaml kernel which is no longer around.
This can be a single-node job now -- see commits 311a450163cf
("krbd/unmap: put client.0 on a separate remote") and 39a579144cd8
("qa/suites/krbd: drop pre-single-major test").
Bill Scales [Fri, 1 Aug 2025 15:17:58 +0000 (16:17 +0100)]
doc: erasure coding enhancements for tentacle
* Document new pool flag allow_ec_optimizations
* Reference new conf setting osd_pool_default_flag_ec_optimizations
* Add section describing Erasure Code Optimizations
Zac Dover [Thu, 7 Aug 2025 05:03:22 +0000 (15:03 +1000)]
doc/cephfs: edit troubleshooting.rst
Follow up on comments made by Anthony D'Atri in
https://github.com/ceph/ceph/pull/64832 and make other small changes to
increase the ease of reading this text.
Ronen Friedman [Wed, 6 Aug 2025 05:38:07 +0000 (00:38 -0500)]
osd/scrub: do not limit operator-initiated repairs
'auto-repair' scrubs are limited to a maximum of
'scrub_auto_repair_num_errors' damaged objects.
However, operator-initiated repairs should not be limited
by that number. Alas, a bug in a previous commit
(97de817ad1c253ee1c7c9c9302981ad2435301b9) modified the
code in such a way that it applied the
'scrub_auto_repair_num_errors' limit to all repairs,
including operator-initiated ones. This commit fixes that.
Zac Dover [Tue, 5 Aug 2025 11:24:41 +0000 (21:24 +1000)]
doc/cephfs: edit troubleshooting.rst
Edit "Stuck in up:replay" under the "Stuck During Recovery" section of
doc/cephfs/troubleshooting.rst. I had planned to edit the entire "Stuck
During Recovery" section in a single commit, but I think that the
material is too involved for that.
Naman Munet [Tue, 22 Jul 2025 17:08:42 +0000 (22:38 +0530)]
mgr/dashboard: user accounts enhancements
fixes: https://tracker.ceph.com/issues/72072
PR covers:
1) Displaying account name instead of account id in bucket list page & bucket edit form for account owned buckets
2) non-root account user can now be assigned with managed policies with which they can perform operations
3) The root user indication shifted next to username in users list rather than on Account Name with a new icon.
Nitzan Mordechai [Thu, 19 Jun 2025 08:54:43 +0000 (08:54 +0000)]
monitor: Enhance historic ops command output and error handling
Dumping monitor historic operations currently yields no results
and incorrectly issues an error message indicating that
"mon_enable_op_tracker" is not enabled, even when it should be.
This commit addresses these issues by:
- Adding previously missing commands for historic operations.
- Correcting the dump operations check to only issue an error when
"mon_enable_op_tracker" is genuinely not enabled.
- Tracking "mon_enable_op_tracker" changes
- Refactoring and organizing the historic operations dump command code.
- Improving the appearance and clarity of error messages.
test/rbd-mirror: eliminate a race in ResyncRequestedRemoteNotPrimary
Adjust the wait_for_notification call in TestMockImageReplayerSnapshotReplayer.ResyncRequestedRemoteNotPrimary
to expect 2 notifications instead of 1. This allows the test to correctly wait for both expected events
i.e for finish_sync() and handle_replay_complete(locker, -EREMOTEIO, "remote image demoted"), ensuring the
replayer transitions to STATE_COMPLETE and is_replaying() returns false as intended.
Fixes: https://tracker.ceph.com/issues/72325 Signed-off-by: VinayBhaskar-V <vvarada@redhat.com>
(cherry picked from commit b5a013f6170bb4445da8f5469243e4869b760a81)
Alex Ainscow [Mon, 12 May 2025 17:30:02 +0000 (18:30 +0100)]
interval_map: non_const iterator
The interval_map code cannot cope with iterators which change the size
of an interval. Due to this, they use const iterators. However, many
other modifications to intervals ARE ok and more efficient, nicer
looking code can be written with them.
This PR adds non-const iterators, but also adds some policing that the
size of the bufferlist has not changed over the interval.
Everything is hidden behind a template, as this changes the behaviour of interval map in a way that we don't want to use without careful testing of each instance.
Alex Ainscow [Wed, 11 Jun 2025 15:24:12 +0000 (16:24 +0100)]
osd: Remove all references to hinfo from optimized EC
Legacy EC used hinfo to store two things:
1. Shard size
2. CRCs of the shards
However:
* Optimized EC stores different object sizes on each shard
* Optimized EC scrub calculates the correct sizes of shards and checks them, so shard size checks are not needed in hinfo.
* Bluestore checks the CRC.
* Seastore checks the CRC.
As such, the hinfo object is redundant. As such we remove it in
optimized EC:
1. Remove all references/upgrades to hinfo.
2. Delete hinfo attribute if found on recovery/backfill.
3. Redirect all scrub references for hinfo to legacy EC.
Incorporate into doc/cephfs/ceph-dokan.rst the suggestions made by
Anthony D'Atri in https://github.com/ceph/ceph/pull/64737, and make a
few other small improvements to the English language in that file.