Bill Scales [Fri, 5 Dec 2025 09:39:41 +0000 (09:39 +0000)]
DO NOT MERGE: osd: Tweaks to PeeringState
Some tweeaks to PeeringState to refine this fix to avoid updating last_update on
stray shards outside of the acting set because of pwlc. This was causing
problems when a shard had last_complete != last_update because it was just given
an info update so could not update its missing list. If the shard then later
becomes part of the acting set again it causes problems.
MERGE WITH PREVIOUS COMMIT AFTER TESTING
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
Bill Scales [Wed, 1 Oct 2025 14:52:23 +0000 (15:52 +0100)]
osd: Optimized EC missing list not updated on recovering shard
Shards that are recovering (last_complete != last_update) that
skip transactions and log entries because of partial writes are
using pwlc to advance last_update. However simply incrementing
last_update is not sufficient - there are scenarios where the
needed version of a missing object has to be updated.
If the shard is already missing object X at version V1 and there was
a partial write at V2 that did not update the shard, it does not need
to retain the log entry, but it does need to update the missing list
to say it needs V2 rather than V1. This ensures all shards report
a need for an object at the same version and avoids an assert in
MissingLoc::add_active_missing when the primary is trying to
combine the missing lists from all the shards to work out what has
to be recovered. Avoiding applying pwlc during the early phase
of the peering process ensures the missing list gets updated.
However if a shard is not missing object X and there was a partial
write at V2 that did not update the shard then at the end of peering
it is still necessary to advance last_upadte by applying pwlc. This
ensures that in later peering cycles the code does not change its
mind and think the shard is now missing object X.
The fix is to be more sophisticated about when pwlc can be used
to advance last_update for a recovering shard. The code now
passes in a parameter indicating whether we are in the early
(pre activate) or later phase of peering. This also means that
additional calls to apply_pwlc are needed when peering gets to
activating and is searching for missing to make updates that were
not made earlier.
Fixes: https://tracker.ceph.com/issues/73249 Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
errors in do_transaction_no_callbacks fail with vague abort message
showing only the transaction dump.
Instead, add meaningful abort message with the actual error and method.
This could be replaced by assert_all_func, though we would likely
rewrite this method soon.
Afreen Misbah [Tue, 21 Oct 2025 16:37:46 +0000 (22:07 +0530)]
mgr/dashboard: Carbonize the Change Password Form
Fixes https://tracker.ceph.com/issues/73193
- using carbon based stylings, typography and components
- used grid layout for form arrangement
- breadcrumb is slightly off, which needs to be fixed by applying grid layout to the app shell
Kefu Chai [Sat, 22 Nov 2025 00:24:36 +0000 (08:24 +0800)]
qa/suites/rados/encoder: exclude ceph-osd-* when installing LTS releases
In a37b5b5, the ceph-osd-classic and ceph-osd-crimson packages were
added to qa/packages/packages.yaml. The "install" task uses this file as
the default package list for all branches, including LTS releases like
Reef.
However, a37b5b5 only exists in the main branch and won't be backported
to LTS branches. This causes installation failures in the rados/encoder
test suite, which verifies forward compatibility by installing LTS
releases and testing whether they can decode the latest corpus.
Exclude ceph-osd-classic and ceph-osd-crimson from LTS installations to
ensure the test suite can successfully install ceph-dencoder, which is
required for the interoperability tests.
Afreen Misbah [Wed, 19 Nov 2025 20:03:26 +0000 (01:33 +0530)]
monitoring: Fixes for development
- fixes tox.ini using and undefined env - `grafonnet-check`( instead of `jsonnet-check`)
- adds steps for local development of mixins and building jsonnet
- added help command in Makefile
- added comments and descriptions for Makefile and tox.ini
Shraddha Agrawal [Mon, 17 Nov 2025 19:50:44 +0000 (01:20 +0530)]
qa/clusters/crimson: increase reactors in fixed-1 cluster
Issue: Various different tests were failing randomly due to slow
ops. There was no common ground between them, it was happening
across differnet object stores (seastore and bluestore) and
across different tests.
Cause: Since this is happening quite randomly, this is likely
happening due to low reactor count.
Solution: We are opting the solution to increase reactors used
for testing. I've increased them to 3 from the initial 2 value.
Matan Breizman [Sun, 16 Nov 2025 12:52:05 +0000 (12:52 +0000)]
qa/suites: exclude ceph-osd-classic
a37b5b5bde8c2e8d6890f16b31046119ed55f25d added ceph-osd-classic
package.
old-clients and upgrade tests should not try to install the new package
as it is not available in older releases.
Anoop C S [Tue, 21 Oct 2025 08:53:50 +0000 (14:23 +0530)]
mgr/smb: Disable posix locking in share definition
The prerequisites for supporting durable handles[1] in Samba include
disabling the mapping of POSIX locks, as well as setting the `kernel
oplocks` and `kernel sharemodes` parameters to disabled. Currently
this configuration is hard‑coded, but in the future it could be made
conditional and combined with other settings to enable persistent
handles on continuously available shares.