git.apps.os.sepia.ceph.com Git

]> git.apps.os.sepia.ceph.com Git - ceph.git/log

projects / ceph.git / log

commit | commitdiff | tree

Yuri Weinstein [Tue, 23 Sep 2025 16:31:37 +0000 (09:31 -0700)]

qa/tests: added messages to the whitelist

Signed-off-by: Yuri Weinstein <yweinste@redhat.com>

commit | commitdiff | tree

Adam King [Tue, 23 Sep 2025 12:56:14 +0000 (08:56 -0400)]

Merge pull request #65635 from adk3798/tentacle-cephadm-pin-cheroot

tentacle: pybind/mgr: pin cheroot version in requirements-required.txt

Reviewed-by: John Mulligan <jmulligan@redhat.com>

commit | commitdiff | tree

David Galloway [Tue, 23 Sep 2025 02:27:32 +0000 (19:27 -0700)]

Merge pull request #65628 from phlogistonjohn/jjm-t-65514

tentacle: build-with-container: add argument groups to organize options

commit | commitdiff | tree

Adam King [Mon, 22 Sep 2025 21:05:07 +0000 (17:05 -0400)]

pybind/mgr: pin cheroot version in requirements-required.txt

With python 3.10 (didn't seem to happen with python 3.12) the
pybind/mgr/cephadm/tests/test_node_proxy.py test times out.
This appears to be related to a new release of the cheroot
package and a github issues describing the same problem
we're seeing has been opened by another user
https://github.com/cherrypy/cheroot/issues/769

It is worth noting that the workaround described in that
issue does also work for us. If you add

```
import cheroot
cheroot.server.HTTPServer._serve_unservicable = lambda: None
```

after the existing imports in test_node_proxy.py the
test hanging issue also disappears. Also worth noting the
particular pin of

cheroot~=10.0

was chosen as it matches the existing pin being used
in pybind/mgr/dashboard/constraints.txt

Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 6231955b5d00ae6b3630ee94e85b2449092ef0fe)

commit | commitdiff | tree

Yuri Weinstein [Mon, 22 Sep 2025 18:16:14 +0000 (11:16 -0700)]

Merge pull request #65485 from tobias-urdin/tentacle-rgw-admin-bucket-pagination

tentacle: rgw/admin: Add max-entries and marker to bucket list

Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>

commit | commitdiff | tree

Yuri Weinstein [Mon, 22 Sep 2025 18:15:25 +0000 (11:15 -0700)]

Merge pull request #65488 from BBoozmen/wip-72970-tentacle

tentacle: RGW: multi object delete op; skip olh update for all deletes but the last one

Reviewed-by: Casey Bodley <cbodley@redhat.com>

commit | commitdiff | tree

Adam King [Mon, 22 Sep 2025 15:08:56 +0000 (11:08 -0400)]

Merge pull request #65594 from adk3798/tentacle-cephadm-nvmeof-stray

tentacle: mgr/cephadm: don't mark nvmeof daemons without pool and group in name as stray

Reviewed-by: Guillaume Abrioux <gabrioux@ibm.com>

commit | commitdiff | tree

John Mulligan [Fri, 12 Sep 2025 17:52:25 +0000 (13:52 -0400)]

build-with-container: add argument groups to organize options

Use the argparse add_argument_group feature to organize the mass of
arguments into more sensible categories. Hopefully, someone reading
over the `--help` output can now more easily see options that
are useful rather than being overwhelmed by a wall of text.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 71a1be4dd0aea004da56c2f518ee70a281a3f7d3)

commit | commitdiff | tree

Anthony D'Atri [Mon, 22 Sep 2025 13:27:24 +0000 (09:27 -0400)]

Merge pull request #65617 from spuiuk/tentacle-doc-provider

tentacle: doc/mgr/smb: document the 'provider' option for smb share

commit | commitdiff | tree

Jos Collin [Mon, 22 Sep 2025 13:03:50 +0000 (18:33 +0530)]

Merge pull request #65259 from joscollin/wip-72284-tentacle

tentacle: mds: wrong snap check for directory with parent snaps

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Dhairya Parmar [Mon, 22 Sep 2025 11:19:07 +0000 (16:49 +0530)]

Merge pull request #64885 from vshankar/wip-72391-tentacle

tentacle: mds/MDSDaemon: unlock `mds_lock` while shutting down Beacon and others

commit | commitdiff | tree

Dhairya Parmar [Mon, 22 Sep 2025 10:59:30 +0000 (16:29 +0530)]

Merge pull request #64888 from vshankar/wip-72285-tentacle

tentacle: qa/suites/upgrade: add "Replacing daemon mds" to ignorelist

commit | commitdiff | tree

Dhairya Parmar [Mon, 22 Sep 2025 10:54:35 +0000 (16:24 +0530)]

Merge pull request #64953 from batrick/wip-72514-tentacle

tentacle: mds: skip charmap handler check for MDS requests

commit | commitdiff | tree

Dhairya Parmar [Mon, 22 Sep 2025 10:46:21 +0000 (16:16 +0530)]

Merge pull request #65132 from chrisphoffman/wip-72644-tentacle

tentacle: client: use path supplied in statfs

commit | commitdiff | tree

Dhairya Parmar [Mon, 22 Sep 2025 10:19:41 +0000 (15:49 +0530)]

Merge pull request #65163 from joscollin/wip-72153-tentacle

tentacle: mds: dump export_ephemeral_random_pin as double

commit | commitdiff | tree

Dhairya Parmar [Mon, 22 Sep 2025 10:15:00 +0000 (15:45 +0530)]

Merge pull request #64650 from rishabh-d-dave/wip-72201-tentacle

tentacle: mgr/vol: keep and show clone source info

commit | commitdiff | tree

Jos Collin [Mon, 22 Sep 2025 09:09:43 +0000 (14:39 +0530)]

Merge pull request #65346 from joscollin/wip-72803-tentacle

tentacle: mds: Fix readdir when osd is full.

Reviewed-by: Kotresh HR <khiremat@redhat.com>

commit | commitdiff | tree

Sachin Prabhu [Thu, 1 May 2025 10:59:54 +0000 (11:59 +0100)]

doc/mgr/smb: document the 'provider' option for smb share

Signed-off-by: Sachin Prabhu <sp@spui.uk>
(cherry picked from commit 742659b18a21cd8ccc36a0f0a53bea265a13a541)
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>

commit | commitdiff | tree

Jos Collin [Mon, 22 Sep 2025 08:20:20 +0000 (13:50 +0530)]

Merge pull request #65564 from xhernandez/wip-73075-tentacle

tentacle: Add normalization and casesensitive options to the subvolume group creation command

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>

commit | commitdiff | tree

Jos Collin [Mon, 22 Sep 2025 06:17:16 +0000 (11:47 +0530)]

Merge pull request #65262 from joscollin/wip-71831-tentacle

tentacle: mgr/volumes: Keep mon caps if auth key has remaining mds/osd caps

Reviewed-by: Kotresh HR <khiremat@redhat.com>

commit | commitdiff | tree

SrinivasaBharathKanta [Sat, 20 Sep 2025 12:27:53 +0000 (17:57 +0530)]

Merge pull request #65540 from NitzanMordhai/wip-72996-tentacle

tentacle: qa/workunits/rados: remove cache tier test

commit | commitdiff | tree

SrinivasaBharathKanta [Sat, 20 Sep 2025 12:27:34 +0000 (17:57 +0530)]

Merge pull request #65369 from Naveenaidu/wip-72819-tentacle

tentacle: qa/suites/rados/thrash-old-clients: Add OSD warnings to ignore list

commit | commitdiff | tree

Yuri Weinstein [Fri, 19 Sep 2025 22:43:31 +0000 (15:43 -0700)]

Merge pull request #65213 from ifed01/wip-ifed-discard-threads-better-lifecycle-tent

tentacle: blk/kernel: improve DiscardThread life cycle.

Reviewed-by: YiteGu <yitegu0@gmail.com>

commit | commitdiff | tree

Nizamudeen A [Fri, 19 Sep 2025 03:15:38 +0000 (08:45 +0530)]

Merge pull request #65560 from rhcs-dashboard/wip-73063-tentacle

tentacle: mgr/dashboard: fix missing schedule interval in rbd API

commit | commitdiff | tree

Adam King [Wed, 7 May 2025 20:02:56 +0000 (16:02 -0400)]

mgr/cephadm: don't mark nvmeof daemons without pool and group in name as stray

Cephadm's naming of these daemons always includes the pool and
group name associated with the nvmeof service. Nvmeof recently
has started to register with the cluster using names that
don't include that, resulting in warnings likes

```
[WRN] CEPHADM_STRAY_DAEMON: 1 stray daemon(s) not managed by cephadm
    stray daemon nvmeof.vm-01.hwwhfc on host vm-01 not managed by cephadm
```

where cephadm knew that nvmeof daemon as

```
[ceph: root@vm-00 /]# ceph orch ps --daemon-type nvmeof
NAME                            HOST   PORTS                   STATUS   REFRESHED  AGE  MEM USE  MEM LIM  VERSION    IMAGE ID
nvmeof.foo.group1.vm-01.hwwhfc  vm-01  *:5500,4420,8009,10008  stopped     5m ago  25m        -        -  <unknown>  <unknown>
```

Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 695680876eb8af0891e3776888b6361dc8728c86)

commit | commitdiff | tree

Yuri Weinstein [Thu, 18 Sep 2025 19:24:12 +0000 (12:24 -0700)]

Merge pull request #65570 from shraddhaag/wip-shraddhaag-availability-default-tentacle

tentacle: options/mon: disable availability tracking by default

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Thu, 18 Sep 2025 19:21:25 +0000 (12:21 -0700)]

Merge pull request #65562 from rzarzynski/ec_fixpack3_pr-tentacle

tentacle: EC fixpack 3 (with depedencies)

Reviewed-by: Bill Scales <bill_scales@uk.ibm.com>
Reviewed-by: Alex Ainscow <aainscow@uk.ibm.com>

commit | commitdiff | tree

Shraddha Agrawal [Tue, 16 Sep 2025 13:52:27 +0000 (19:22 +0530)]

options/mon: disable availability tracking by default

Signed-off-by: Shraddha Agrawal <shraddhaag@ibm.com>
(cherry picked from commit ef7effaa33bd6b936d7433e668d36f80ed7bee65)

commit | commitdiff | tree

Shraddha Agrawal [Wed, 17 Sep 2025 20:15:51 +0000 (01:45 +0530)]

Merge pull request #65520 from shraddhaag/wip-73013-tentacle

tentacle: mon: add config option to change availability score update interval

commit | commitdiff | tree

Yuri Weinstein [Wed, 17 Sep 2025 20:11:39 +0000 (13:11 -0700)]

Merge pull request #65218 from cbodley/wip-72714-tentacle

tentacle: rgw/s3: remove 'aws-chunked' from Content-Encoding response

Reviewed-by: Adam Emerson <aemerson@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Wed, 17 Sep 2025 20:09:17 +0000 (13:09 -0700)]

Merge pull request #65543 from leonidc/wip-73048-tentacle

tentacle: nvmeofgw:

Reviewed-by: Aviv Caro <Aviv.Caro@ibm.com>

commit | commitdiff | tree

Yuri Weinstein [Wed, 17 Sep 2025 20:08:39 +0000 (13:08 -0700)]

Merge pull request #65542 from leonidc/wip-73045-tentacle

tentacle: nvmeofgw: cleanup pending map upon monitor restart

Reviewed-by: Aviv Caro <Aviv.Caro@ibm.com>

commit | commitdiff | tree

Adam King [Wed, 17 Sep 2025 13:18:28 +0000 (09:18 -0400)]

Merge pull request #64610 from avanthakkar/wip-72209-tentacle

tentacle: mgr/prometheus: add smb_metadata metric

Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Adam King [Wed, 17 Sep 2025 13:05:24 +0000 (09:05 -0400)]

Merge pull request #64895 from adk3798/tentacle=cephadm-limit-list-server-calls

tentacle: mgr/cephadm: limit calls to list_servers

Reviewed-by: Guillaume Abrioux <gabrioux@ibm.com>
Reviewed-by: Kushal Deb <Kushal.Deb@ibm.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

commit | commitdiff | tree

Jon Bailey [Wed, 20 Aug 2025 10:11:09 +0000 (11:11 +0100)]

osd: Reduce the amount of status invalidations when rolling shards forwards during peering

Currently stats invalidations happen during peering when rolling forward shards.
We can reduce this so we only invalidate the stats when we don't have any other shards at the version we want to roll the stats forwards to.
In the cases where we have a shard with the stats at the correct version, we use those stats instead of invalidating.
If we do not have any shards with the correct version of stats, we do the invalidate as before.

Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>
(cherry picked from commit b5cad2694569b7f0eef173f87a7eecb2ddd6b27e)

commit | commitdiff | tree

Bill Scales [Wed, 27 Aug 2025 13:44:08 +0000 (14:44 +0100)]

osd: Optimized EC incorrectly rolled backwards write

A bug in choose_acting in this scenario:

* Current primary shard has been absent so has missed the latest few writes
* All the recent writes are partial writes that have not updated shard X
* All the recent writes have completed

The authorative shard is chosen from the set of primary-capable shards
that have the highest last epoch started, these have all got log entries
for the recent writes.

The get log shard is chosen from the set of shards that have the highest
last epoch started, this chooses shard X because its furthest behind

The primary shard last update is not less than get log shard last
update so this if statement decides that it has a good enough log:

if ((repeat_getlog != nullptr) &&
    get_log_shard != all_info.end() &&
    (info.last_update < get_log_shard->second.last_update) &&
    pool.info.is_nonprimary_shard(get_log_shard->first.shard)) {

We then proceed through peering using the primary log and the
log from shard X. Neither have details about the recent writes
which are then incorrectly rolled back.

The if statement should be looking at last_update for the
authorative shard rather than the get_log_shard, the code
would then realize that it needs to get the log from the
authorative shard first and then have a second pass
where it gets the log from the get log shard.

Peering would then have information about the partial writes
(obtained from the authorative shards log) and could correctly
roll these writes forward by deducing that the get_log_shard
didn't have these log entries because they were partial writes.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit ac4e0926bbac4ee4d8e33110b8a434495d730770)

commit | commitdiff | tree

Alex Ainscow [Tue, 12 Aug 2025 16:12:45 +0000 (17:12 +0100)]

osd: Clear zero_for_decode for shards where read failed on recovery

Not clearing this can lead to a failed decode, which panics, rather than
a recovery or IO failure.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit 6365803275b1b6a142200cc2db9735d48c86ae03)

commit | commitdiff | tree

Alex Ainscow [Fri, 8 Aug 2025 15:20:32 +0000 (16:20 +0100)]

osd: Reduce buffer-printing debug strings to debug level 30

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
# Conflicts:
# src/osd/ECBackend.cc
(cherry picked from commit b4ab3b1dcef59a19c67bb3b9e3f90dfa09c4f30b)

commit | commitdiff | tree

Alex Ainscow [Fri, 8 Aug 2025 09:25:53 +0000 (10:25 +0100)]

osd: Fix segfault in EC debug string

The old debug_string implementation was potentially reading up to 3
bytes off the end of an array. It was also doing lots of unnecessary
bufferlist reconstructs. This refactor of this function fixes both
issues.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit da3ccdf4d03e40b747f8876449199102e53e00ce)

commit | commitdiff | tree

Bill Scales [Fri, 8 Aug 2025 08:58:14 +0000 (09:58 +0100)]

osd: Optimized EC backfill interval has wrong versions

Bug in the optimized EC code creating the backfill
interval on the primary. It is creating a map with
the object version for each backfilling shard. When
there are multiple backfill targets the code was
overwriting oi.version with the version
for a shard that has had partial writes which
can result in the object not being backfilled.

Can manifest as a data integirty issue, scrub
error or snapshot corruption.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit acca514f9a3d0995b7329f4577f6881ba093a429)

commit | commitdiff | tree

Bill Scales [Mon, 4 Aug 2025 15:24:41 +0000 (16:24 +0100)]

osd: Optimized EC choose_acting needs to use best primary shard

There have been a couple of corner case bugs with choose_acting
with optimized EC pools in the scenario where a new primary
with no existing log is choosen and find_best_info selects
a non-primary shard as the authorative shard.

Non-primary shards don't have a full log so in this scenario
we need to get the log from a shard that does have a complete
log first (so our log is ahead or eqivalent to authorative shard)
and then repeat the get log for the authorative shard.

Problems arise if we make different decisions about the acting
set and backfill/recovery based on these two different shards.
In one bug we osicillated between two different primaries
because one primary used one shard to making peering decisions
and the other primary used the other shard, resulting in
looping flip/flop changes to the acting_set.

In another bug we used one shard to decide that we could do
async recovery but then tried to get the log from another
shard and asserted because we didn't have enough history in
the log to do recovery and should have choosen to do a backfill.

This change makes optimized EC pools always choose the
best !non_primary shard when making decisions about peering
(irrespective of whether the primary has a full log or not).
The best overall shard is now only used for get log when
deciding how far to rollback the log.

It also sets repeat_getlog to false if peering fails because
the PG is incomplete to avoid looping forever trying to get
the log.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit f3f45c2ef3e3dd7c7f556b286be21bd5a7620ef7)

commit | commitdiff | tree

Alex Ainscow [Fri, 1 Aug 2025 14:09:58 +0000 (15:09 +0100)]

osd: Do not sent PDWs if read count > k

The main point of PDW (as currently implemented) is to reduce the amount
of reading performed by the primary when preparing for a read-modify-write (RMW).

It was making the assumption that if any recovery was required by a
conventional RMW, then a PDW is always better. This was an incorrect assumption
as a conventional RMW performs at most K reads for any plugin which
supports PDW. As such, we tweak this logic to perform a conventional RMW
if the PDW is going to read k or more shards.

This should improve performance in some minor areas.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit cffd10f3cc82e0aef29209e6e823b92bdb0291ce)

commit | commitdiff | tree

Alex Ainscow [Wed, 18 Jun 2025 19:46:49 +0000 (20:46 +0100)]

osd: Fix decode for some extent cache reads.

The extent cache in EC can cause the backend to perform some surprising reads. Some
of the patterns were discovered in test that caused the decode to attempt to
decode more data than was anticipated during the read planning, leading to an
assert. This simple fix reduces the scope of the decode to the minimum.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit 2ab45a22397112916bbcdb82adb85f99599e03c0)

commit | commitdiff | tree

Bill Scales [Fri, 1 Aug 2025 10:48:18 +0000 (11:48 +0100)]

osd: Optimized EC calculate_maxles_and_minlua needs to use ...
exclude_nonprimary_shards

When an optimized EC pool is searching for the best shard that
isn't a non-primary shard then the calculation for maxles and
minlua needs to exclude nonprimary-shards

This bug was seen in a test run where activating a PG was
interrupted by a new epoch and only a couple of non-primary
shards became active and updated les. In the next epoch
a new primary (without log) failed to find a shard that
wasn't non-primary with the latest les. The les of
non-primary shards should be ignored when looking for
an appropriate shard to get the full log from.

This is safe because an epoch cannot start I/O without
at least K shards that have updated les, and there
are always K-1 non-primary shards. If I/O has started
then we will find the latest les even if we skip
non-primary shards. If I/O has not started then the
latest les ignoring non-primary shards is the
last epoch in which I/O was started and has a good
enough log+missing list.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 72d55eec85afa4c00fac8dc18a1fb49751e61985)

commit | commitdiff | tree

Bill Scales [Fri, 1 Aug 2025 09:39:16 +0000 (10:39 +0100)]

osd: Optimized EC choose_async_recovery_ec must use auth_shard

Optimized EC pools modify how GetLog and choose_acting work,
if the auth_shard is a non-primary shard and the (new) primary
is behind the auth_shard then we cannot just get the log from
the non-primary shard because it will be missing entries for
partial writes. Instead we need to get the log from a shard
that has the full log first and then repeat GetLog to get
the log from the auth_shard.

choose_acting was modifying auth_shard in the case where
we need to get the log from another shard first. This is
wrong - the remainder of the logic in choose_acting and
in particular choose_async_recovery_ec needs to use the
auth_shard to calculate what the acting set will be.
Using a different shard occasional can cause a
different acting set to be selected (because of
thresholds about the number of log entries behind
a shard needs to be to perform async recovery) and
this can lead to two shards flip/flopping with
different opinions about what the acting set should be.

Fix is to separate out which shard will be returned
to GetLog from the auth_shard which will be used
for acting set calculations.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 3c2161ee7350a05e0d81a23ce24cd0712dfef5fb)

commit | commitdiff | tree

Bill Scales [Fri, 1 Aug 2025 09:22:47 +0000 (10:22 +0100)]

osd: Optimized EC don't try to trim past crt

If there is an exceptionally long sequence of partial writes
that did not update a shard that is followed by a full write
then it is possible that the log trim point is ahead of the
previous write to the shard (and hence crt). We cannot trim
beyond crt. In this scenario its fine to limit the trim to crt
because the shard doesn't have any of the log entries for the
partial writes so there is nothing more to trim.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 645cdf9f61e79764eca019f58a4d9c6b51768c81)

commit | commitdiff | tree

Bill Scales [Fri, 1 Aug 2025 08:56:23 +0000 (09:56 +0100)]

osd: Optimized EC missing call to apply_pwlc after updating pwlc

update_peer_info was updating pwlc with a newer version received
from another shard, but failed to update the peer_info's to
reflect the new pwlc by calling apply_pwlc.

Scenario was primary receiving an update from shard X which had
newer information about shard Y. The code was calling apply_pwlc
for shard X but not for shard Y.

The fix simplifies the logic in update_peer_info - if we are
the primary update all peer_info's that have pwlc. If we
are a non-primary and there is pwlc then update info.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit d19f3a3bcbb848e530e4d31cbfe195973fa9a144)

commit | commitdiff | tree

Bill Scales [Wed, 30 Jul 2025 11:44:10 +0000 (12:44 +0100)]

osd: Optimized EC don't apply pwlc for divergent writes

Split pwlc epoch into a separate variable so that we
can use epoch and version number when comparing if
last_update is within a pwlc range. This ensures that
pwlc is not applied to a shard that has a divergent
write, but still tracks the most recent update of pwlc.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit d634f824f229677aa6df7dded57352f7a59f3597)

commit | commitdiff | tree

Bill Scales [Wed, 30 Jul 2025 11:41:34 +0000 (12:41 +0100)]

osd: Optimized EC present_shards no longer needed

present_shards is no longer needed in the PG log entry, this has been
replaced with code in proc_master_log that calculates which shards were
in the last epoch started and are still present.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 880a17e39626d99a0b6cc8259523daa83c72802c)

commit | commitdiff | tree

Bill Scales [Mon, 28 Jul 2025 08:26:36 +0000 (09:26 +0100)]

osd: Optimized EC proc_master_log fix roll-forward logic when shard is absent

Fix bug in optimized EC code where proc_master_log incorrectly did not
roll forward a write if one of the written shards is missing in the current
epoch and there is a stray version of that shard that did not receive the
write.

As long as the currently present shards that participated in les and were
updated by a write have the update then the write should be rolled-forward.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit e0e8117769a8b30b2856f940ab9fc00ad1e04f63)

commit | commitdiff | tree

Bill Scales [Mon, 28 Jul 2025 08:21:54 +0000 (09:21 +0100)]

osd: Refactor find_best_info and choose_acting

Refactor find_best_info to have separate function to calculate
maxles and minlua. The refactor makes history_les_bound
optional, tidy up the choose_acting interface removing this
where it is not used.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit f1826fdbf136dc7c96756f0fb8a047c9d9dda82a)

commit | commitdiff | tree

Bill Scales [Thu, 17 Jul 2025 18:17:27 +0000 (19:17 +0100)]

osd: EC Optimizations proc_master_log boundary case bug fixes

Fix a couple of bugs in proc_master_log for optimized EC
pools dealing with boundary conditions such as an empty
log and merging two logs that diverge from the very first
entry.

Refactor the code to handle the boundary conditions and
neaten up the code.

Predicate the code block with if (pool.info.allows_ecoptimizations())
to make it clear this code path is only for optimized EC pools.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 1b44fd9991f5f46b969911440363563ddfad94ad)

commit | commitdiff | tree

Jon Bailey [Fri, 25 Jul 2025 13:16:35 +0000 (14:16 +0100)]

osd: Invalidate stats during peering if we are rolling a shard forwards.

This change will mean we always recalculate stats upon rolling stats forwards. This prevent the situation where we end up with incorrect statistics due to where we always take the stats of the oldest shard during peering; causing outdated pg stats being applied for cases where the oldest shards are shards that don't see partial writes where num_bytes has changed on other places after that point on that shard.

Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>
(cherry picked from commit b178ce476f4a5b2bb0743e36d78f3a6e23ad5506)

commit | commitdiff | tree

Radoslaw Zarzynski [Wed, 21 May 2025 16:33:15 +0000 (16:33 +0000)]

osd: ECTransaction.h includes OSDMap.h

Needed for crimson.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 6dd393e37f6afb9063c4bed3e573557bd0efb6bd)

commit | commitdiff | tree

Radoslaw Zarzynski [Mon, 21 Apr 2025 08:49:55 +0000 (08:49 +0000)]

osd: bypass messenger for local EC reads

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit b07d1f67625c8b621b2ebf5a7f744c588cae99d3)

commit | commitdiff | tree

Radoslaw Zarzynski [Fri, 18 Jul 2025 10:35:09 +0000 (10:35 +0000)]

osd: fix buildability after get_write_plan() shuffling

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 7f4cb19251345849736e83bd0c7cc15ccdcdf48b)

commit | commitdiff | tree

Radoslaw Zarzynski [Sun, 11 May 2025 10:40:55 +0000 (10:40 +0000)]

osd: just shuffle get_write_plan() from ECBackend to ECCommon

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 9d5bf623537b8ee29e000504d752ace1c05964d7)

commit | commitdiff | tree

Radoslaw Zarzynski [Sun, 11 May 2025 09:20:29 +0000 (09:20 +0000)]

osd: prepare get_write_plan() for moving from ECBackend to ECCommon

For the sake of sharing with crimson.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit dc5b0910a500363b62cfda8be44b4bed634f9cd6)

commit | commitdiff | tree

Radoslaw Zarzynski [Sun, 11 May 2025 06:51:23 +0000 (06:51 +0000)]

osd: separate producing EC's WritePlan out into a dedicated method

For the sake of sharing with crimson in next commits.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit e06c0c6dd08fd6d2418a189532171553d63a9deb)

commit | commitdiff | tree

Radoslaw Zarzynski [Wed, 23 Apr 2025 11:42:00 +0000 (11:42 +0000)]

osd: fix unused variable warning in ClientReadCompleter

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit eb3a3bb3a70e6674f6e23a88dd1b2b86551efda2)

commit | commitdiff | tree

Radoslaw Zarzynski [Thu, 9 May 2024 21:00:05 +0000 (21:00 +0000)]

osd: shuffle ECCommon::RecoveryBackend from ECBackend.cc to ECCommon.cc

It's just code movement; there is no changes apart that.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit ef644c9d29b8adaef228a20fc96830724d1fc3f5)

commit | commitdiff | tree

Radoslaw Zarzynski [Thu, 9 May 2024 20:32:32 +0000 (20:32 +0000)]

osd: drop junky `#if 1` in recovery backend

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit d43bded4a02532c4612d53fc4418db8e4e829c3f)

commit | commitdiff | tree

Radoslaw Zarzynski [Thu, 9 May 2024 19:11:14 +0000 (19:11 +0000)]

osd: move ECCommon::RecoveryBackend from ECBackend.cc to ECCommon.cc

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit debd035a650768ead64e0707028bb862f4767bef)

commit | commitdiff | tree

Radoslaw Zarzynski [Thu, 9 May 2024 19:09:50 +0000 (19:09 +0000)]

osd: replace get_obc() with maybe_load_obc() in EC recovery

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 266773625f997ff6a1fda82b201e023948a5c081)

commit | commitdiff | tree

Radoslaw Zarzynski [Thu, 9 May 2024 19:07:32 +0000 (19:07 +0000)]

osd: abstract sending MOSDPGPush during EC recovery

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 1d54eaff41ec8d880bcf9149e4c71114e0ffdc09)

commit | commitdiff | tree

Radoslaw Zarzynski [Tue, 26 Mar 2024 14:28:16 +0000 (14:28 +0000)]

osd: prepare ECCommon::RecoveryBackend for shuffling to ECCommon.cc

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit e3ade5167d3671524eb372a028157f2a46e7a219)

commit | commitdiff | tree

Radoslaw Zarzynski [Tue, 26 Mar 2024 14:20:56 +0000 (14:20 +0000)]

osd: squeeze RecoveryHandle out of ECCommon::RecoveryBackend

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 1e0feb73a4b91bd8b7b3ecc164d28fe005b97ed1)

commit | commitdiff | tree

Radosław Zarzyński [Wed, 27 Sep 2023 12:17:06 +0000 (14:17 +0200)]

osd: just shuffle RecoveryMessages to ECCommon.h

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit bc28c16a9a83b0f12d3d6463eaeacbab40b0890b)

commit | commitdiff | tree

Radoslaw Zarzynski [Tue, 26 Mar 2024 11:59:42 +0000 (11:59 +0000)]

osd: prepare RecoveryMessages for shuffling to ECCommon.h

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 0581926035113b1a9cb38f76233242d6b32a7dc6)

commit | commitdiff | tree

Radoslaw Zarzynski [Mon, 25 Mar 2024 13:02:07 +0000 (13:02 +0000)]

osd: ECCommon::RecoveryBackend doesn't depend on ECBackend anymore

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 6ead960b23a95211847250d90e3d2945c6254345)

commit | commitdiff | tree

Radoslaw Zarzynski [Fri, 18 Apr 2025 08:42:18 +0000 (08:42 +0000)]

osd: fix buildability after the RecoveryBackend shuffling

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit c9d18cf3024e5ba681bed5dc315f70527e99b3f1)

commit | commitdiff | tree

Radoslaw Zarzynski [Mon, 25 Mar 2024 11:08:23 +0000 (11:08 +0000)]

osd: just shuffle RecoveryBackend from ECBackend.h to ECCommon.h

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 98480d2f75a7b99aa72562a6a6daa5f39db3d425)

commit | commitdiff | tree

Venky Shankar [Thu, 11 Sep 2025 03:35:19 +0000 (03:35 +0000)]

qa/cephfs: fix test_subvolume_group_charmap_inheritance test

Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit fe3d6417bfaef571a9bb4093b5a8dfdb3cc3e59d)

commit | commitdiff | tree

Xavi Hernandez [Thu, 3 Jul 2025 08:34:37 +0000 (10:34 +0200)]

doc: add name mangling documentation for subvolume group creation

Signed-off-by: Xavi Hernandez <xhernandez@gmail.com>
(cherry picked from commit b47bbf8afdfcb81ee8aed7ef9c27b45dd8d5a589)

commit | commitdiff | tree

Xavi Hernandez [Thu, 3 Jul 2025 08:33:49 +0000 (10:33 +0200)]

qa: add tests for name mangling in subvolume group creation

Signed-off-by: Xavi Hernandez <xhernandez@gmail.com>
(cherry picked from commit ea2d8e9fc04f249d576e4799c3bdc44302cf1226)

commit | commitdiff | tree

Xavi Hernandez [Thu, 3 Jul 2025 08:27:10 +0000 (10:27 +0200)]

pybind/mgr: add name mangling options to subvolume group creation

Signed-off-by: Xavi Hernandez <xhernandez@gmail.com>
(cherry picked from commit f98990ac1bbdf4ca0f05ea0336289cb32001159f)

commit | commitdiff | tree

Nizamudeen A [Thu, 11 Sep 2025 04:13:13 +0000 (09:43 +0530)]

mgr/dashboard: fix missing schedule interval in rbd API

Fetching the rbd image schedule interval through the rbd_support module
schedule list command

GET /api/rbd will have the following field per image
```
"schedule_info": {
                    "image": "rbd/rbd_1",
                    "schedule_time": "2025-09-11 03:00:00",
                    "schedule_interval": [
                        {
                            "interval": "5d",
                            "start_time": null
                        },
                        {
                            "interval": "3h",
                            "start_time": null
                        }
                    ]
                },
```

Also fixes the UI where schedule interval was missing in the form and
also disable editing the schedule_interval.

Extended the same thing to the `GET /api/pool` endpoint.

Fixes: https://tracker.ceph.com/issues/72977
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 72cebf0126bd07f7d42b0ae7b68646c527044942)

commit | commitdiff | tree

Laura Flores [Tue, 16 Sep 2025 22:09:30 +0000 (17:09 -0500)]

Merge pull request #65411 from aainscow/wip-72561-tentacle

tentacle: Optimized Erasure Coding - Fixpack 2

commit | commitdiff | tree

Adam King [Wed, 30 Jul 2025 19:51:11 +0000 (15:51 -0400)]

mgr/cephadm: don't use list_servers to get active mgr host for prometheus SD config

Having a lot of calls into list_servers causes issues with
the core ceph mgr on large clusters. Additionally, we were
using it purely to get the active mgr's host here, which
cephadm should be able to do without needing a mgr api call

Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit 726bb5a95de7857c220953a1ed26ed3263213c6f)

commit | commitdiff | tree

Adam King [Wed, 30 Jul 2025 19:49:20 +0000 (15:49 -0400)]

mgr/cephadm: add interval control for stray daemon checks

Primarily to avoid running list_servers (which we kind of
need to do stray daemon checks since the whole point is
to check against a source that isn't cephadm). It was
found on larger clusters calling into list_servers
often can cause issues with the core ceph mgr

Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit ee0364761e1ee29e6ad527dddd0eafc01c1f1aaa)

commit | commitdiff | tree

Vallari Agrawal [Tue, 16 Sep 2025 19:37:29 +0000 (01:07 +0530)]

Merge pull request #64030 from VallariAg/wip-71724-tentacle

tentacle: qa: reduce nvmeof thrasher fio to 32 devices from 200

commit | commitdiff | tree

Yuri Weinstein [Tue, 16 Sep 2025 19:36:53 +0000 (12:36 -0700)]

Merge pull request #65429 from nbalacha/wip-72905-tentacle

tentacle: rgw/logging: fixes data loss during rollover

Reviewed-by: Adam Emerson <aemerson@redhat.com>
Reviewed-by: Yuval Lifshitz <ylifshit@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Tue, 16 Sep 2025 19:36:03 +0000 (12:36 -0700)]

Merge pull request #65271 from smanjara/wip-72570-tentacle

tentacle: rgw/multisite: url-encode list_bucket query param 'key-marker'

Reviewed-by: Adam Emerson <aemerson@redhat.com>

commit | commitdiff | tree

Yuri Weinstein [Tue, 16 Sep 2025 19:19:39 +0000 (12:19 -0700)]

Merge pull request #64862 from adamemerson/wip-71066-tentacle

tentacle: rgw/multisite: Fix lifetime issues

Reviewed-by: Casey Bodley <cbodley@redhat.com>

commit | commitdiff | tree

Anthony D'Atri [Tue, 16 Sep 2025 18:36:58 +0000 (13:36 -0500)]

Merge pull request #65546 from stackhpc/doc-balancer-tentacle

tentacle: doc: Fixes a typo in balancer operations

commit | commitdiff | tree

Tyler Brekke [Tue, 24 Jun 2025 19:12:33 +0000 (12:12 -0700)]

doc: Fixes a typo in balancer operations

Signed-off-by: Tyler Brekke <tbrekke@digitalocean.com>
(cherry picked from commit b038b8093d01a5e676ffa419607489a79261ef29)

commit | commitdiff | tree

Leonid Chernin [Tue, 24 Jun 2025 13:00:49 +0000 (16:00 +0300)]

nvmeofgw: fixing GW delete issues
1.fixing the issue when gw is deleted based on invalid subsystem info
2. in function track_deleting_gws: break from loop only if
  delete was really done
        3. fix published rebalance index - publish ana-group instead of
  index
        4. do not dump gw-id string after gw was removed

Fixes: https://tracker.ceph.com/issues/71896
Signed-off-by: Leonid Chernin <leonidc@il.ibm.com>
(cherry picked from commit 77a11a7206748fa4be383da9f00a5df50e437e4a)

commit | commitdiff | tree

Leonid Chernin [Tue, 5 Aug 2025 10:19:59 +0000 (13:19 +0300)]

nvmeofgw: cleanup pending map upon monitor restart
fixes https://tracker.ceph.com/issues/72434

Signed-off-by: Leonid Chernin <leonidc@il.ibm.com>
(cherry picked from commit 924acd1f2c11784790abb2b9c5ff5dacd32934e1)

commit | commitdiff | tree

Nitzan Mordechai [Tue, 15 Jul 2025 10:58:40 +0000 (10:58 +0000)]

workunits/rados: remove cache tier test

Fixes: https://tracker.ceph.com/issues/71930
Signed-off-by: Nitzan Mordechai <nmordec@ibm.com>
(cherry picked from commit 2b57f435a1de1a99dd7bcb47478938965587713b)

commit | commitdiff | tree

Adam King [Mon, 15 Sep 2025 15:30:56 +0000 (11:30 -0400)]

Merge pull request #65160 from adk3798/wip-72668-tentacle

tentacle: cephadm/cephadmlib: Eliminate false warnings about old sysctl conf files

Reviewed-by: Gil Bregman <gbregman@il.ibm.com>

commit | commitdiff | tree

Adam King [Mon, 15 Sep 2025 15:30:26 +0000 (11:30 -0400)]

Merge pull request #65068 from adk3798/tentacle-smb-remotectl

tentacle: smb: add remote control server

Reviewed-by: John Mulligan <jmulligan@redhat.com>

commit | commitdiff | tree

Adam King [Mon, 15 Sep 2025 15:27:59 +0000 (11:27 -0400)]

Merge pull request #64724 from adk3798/wip-72268-tentacle

tentacle: mgr/cephadm: updating maintenance health status in the serve…

Reviewed-by: Guillaume Abrioux <gabrioux@ibm.com>
Reviewed-by: Kushal Deb <Kushal.Deb@ibm.com>

commit | commitdiff | tree

Adam King [Mon, 15 Sep 2025 15:26:56 +0000 (11:26 -0400)]

Merge pull request #64723 from adk3798/wip-72264-tentacle

tentacle: mgr/cephadm: Provide appropriate exit codes for orch operations

Reviewed-by: Kushal Deb <Kushal.Deb@ibm.com>
Reviewed-by: Guillaume Abrioux <gabrioux@ibm.com>

commit | commitdiff | tree

Adam King [Mon, 15 Sep 2025 15:26:13 +0000 (11:26 -0400)]

Merge pull request #64722 from adk3798/tentacle-cephadm-nvmeof-add-force-tls-flag

tentacle: mgr/cephadm/nvmeof: Add "force TLS" flag to NVMeOF spec file.

Reviewed-by: Gil Bregman <gbregman@il.ibm.com>

commit | commitdiff | tree

Adam King [Mon, 15 Sep 2025 15:25:18 +0000 (11:25 -0400)]

Merge pull request #64721 from adk3798/tentacle-cephadm-nvmeof-increase-default-max-namespaces

tentacle: mgr/cephadm/nvmeof: Increase the default limit of max_namespaces

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Gil Bregman <gbregman@il.ibm.com>

commit | commitdiff | tree

Adam King [Mon, 15 Sep 2025 15:22:31 +0000 (11:22 -0400)]

Merge pull request #64691 from adk3798/wip-72135-tentacle

tentacle: mgr/cephadm: disallow changing OSD service type to non-OSD types

Reviewed-by: Guillaume Abrioux <gabrioux@ibm.com>

commit | commitdiff | tree

Adam King [Mon, 15 Sep 2025 15:21:33 +0000 (11:21 -0400)]

Merge pull request #64674 from adk3798/tentacle-cephadm-undefined-variable-haproxy-config

tentacle: mgr/cephadm: handle possibly undefined template variable in haproxy.cfg.j2

Reviewed-by: Guillaume Abrioux <gabrioux@ibm.com>

commit | commitdiff | tree

Adam King [Mon, 15 Sep 2025 15:19:44 +0000 (11:19 -0400)]

Merge pull request #64673 from adk3798/wip-72138-tentacle

tentacle: mgr/rgw: don't fail realm bootstrap if system user exists already

Reviewed-by: Guillaume Abrioux <gabrioux@ibm.com>

commit | commitdiff | tree

Adam King [Mon, 15 Sep 2025 15:19:10 +0000 (11:19 -0400)]

Merge pull request #64672 from adk3798/tentacle-teuth-add-cephadm-file-path

tentacle: qa/tasks/cephadm: allow to select from 'cephadm' and 'cephadm.py'

Reviewed-by: Guillaume Abrioux <gabrioux@ibm.com>

commit | commitdiff | tree

Shraddha Agrawal [Thu, 21 Aug 2025 12:38:18 +0000 (18:08 +0530)]

doc: add config option and usage docs

This commit adds docs for the new config option introduced as
well as updates the feature docs on how to use this config
option.

Fixes: https://tracker.ceph.com/issues/72619
Signed-off-by: Shraddha Agrawal <shraddhaag@ibm.com>
(cherry picked from commit 1cbe41bde12eb1d0437b746164edb689393cc5ad)

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom