qa/suites/krbd: use a standard fixed-1 cluster in unmap subsuite
A custom "fixed-1, but with the client on a separate node" cluster was
needed only for pre-single-major.yaml kernel which is no longer around.
This can be a single-node job now -- see commits 311a450163cf
("krbd/unmap: put client.0 on a separate remote") and 39a579144cd8
("qa/suites/krbd: drop pre-single-major test").
Bill Scales [Fri, 1 Aug 2025 15:17:58 +0000 (16:17 +0100)]
doc: erasure coding enhancements for tentacle
* Document new pool flag allow_ec_optimizations
* Reference new conf setting osd_pool_default_flag_ec_optimizations
* Add section describing Erasure Code Optimizations
Zac Dover [Thu, 7 Aug 2025 05:03:22 +0000 (15:03 +1000)]
doc/cephfs: edit troubleshooting.rst
Follow up on comments made by Anthony D'Atri in
https://github.com/ceph/ceph/pull/64832 and make other small changes to
increase the ease of reading this text.
Ronen Friedman [Wed, 6 Aug 2025 05:38:07 +0000 (00:38 -0500)]
osd/scrub: do not limit operator-initiated repairs
'auto-repair' scrubs are limited to a maximum of
'scrub_auto_repair_num_errors' damaged objects.
However, operator-initiated repairs should not be limited
by that number. Alas, a bug in a previous commit
(97de817ad1c253ee1c7c9c9302981ad2435301b9) modified the
code in such a way that it applied the
'scrub_auto_repair_num_errors' limit to all repairs,
including operator-initiated ones. This commit fixes that.
Zac Dover [Tue, 5 Aug 2025 11:24:41 +0000 (21:24 +1000)]
doc/cephfs: edit troubleshooting.rst
Edit "Stuck in up:replay" under the "Stuck During Recovery" section of
doc/cephfs/troubleshooting.rst. I had planned to edit the entire "Stuck
During Recovery" section in a single commit, but I think that the
material is too involved for that.
Naman Munet [Tue, 22 Jul 2025 17:08:42 +0000 (22:38 +0530)]
mgr/dashboard: user accounts enhancements
fixes: https://tracker.ceph.com/issues/72072
PR covers:
1) Displaying account name instead of account id in bucket list page & bucket edit form for account owned buckets
2) non-root account user can now be assigned with managed policies with which they can perform operations
3) The root user indication shifted next to username in users list rather than on Account Name with a new icon.
Nitzan Mordechai [Thu, 19 Jun 2025 08:54:43 +0000 (08:54 +0000)]
monitor: Enhance historic ops command output and error handling
Dumping monitor historic operations currently yields no results
and incorrectly issues an error message indicating that
"mon_enable_op_tracker" is not enabled, even when it should be.
This commit addresses these issues by:
- Adding previously missing commands for historic operations.
- Correcting the dump operations check to only issue an error when
"mon_enable_op_tracker" is genuinely not enabled.
- Tracking "mon_enable_op_tracker" changes
- Refactoring and organizing the historic operations dump command code.
- Improving the appearance and clarity of error messages.
Alex Ainscow [Mon, 12 May 2025 17:30:02 +0000 (18:30 +0100)]
interval_map: non_const iterator
The interval_map code cannot cope with iterators which change the size
of an interval. Due to this, they use const iterators. However, many
other modifications to intervals ARE ok and more efficient, nicer
looking code can be written with them.
This PR adds non-const iterators, but also adds some policing that the
size of the bufferlist has not changed over the interval.
Everything is hidden behind a template, as this changes the behaviour of interval map in a way that we don't want to use without careful testing of each instance.
Incorporate into doc/cephfs/ceph-dokan.rst the suggestions made by
Anthony D'Atri in https://github.com/ceph/ceph/pull/64737, and make a
few other small improvements to the English language in that file.
John Mulligan [Fri, 20 Jun 2025 23:03:22 +0000 (19:03 -0400)]
script/build-with-container: add rocky10 to built-in distros
Add "rocky10" (also aliased to "rockylinux10") to the known distro bases
so that the team can begin to experiment with the Rocky Linux 10 distro
for containerized builds.
John Mulligan [Fri, 27 Jun 2025 15:04:44 +0000 (11:04 -0400)]
install-deps.sh: add a temporary repo for missing el10 deps
Add a new dnf/yum repository hosted in the ceph lab infra for providing
the last few dependencies missing from other el10 repos.
Hopefully we can remove this soon but it serves as a stopgap as we work
on getting el10 builds working in the ceph CI infra and tested.
Adam C. Emerson [Thu, 5 Jun 2025 17:09:36 +0000 (13:09 -0400)]
rgw/multisite: Don't rerun recovery periodically
Recovery is so conservative it creates many, many datalog entries,
slowing sync.
Fixes: https://tracker.ceph.com/issues/71465 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit eb686df10f1b9dc474a26ebc9b4fc3891b9d330b)
Fixes: https://tracker.ceph.com/issues/72174 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
qa/suites/krbd: request msgr1 explicitly in unmap subsuite
Since commit 5011cc926cd4 ("qa/suites/krbd: run unmap subsuite with
msgr1 only"), unmap.t is run only against msgr1. pre-single-major.yaml
kernel has actually been gone for some time now, but there is still
value in maintaining a msgr1-only test. With the default switched to
msgr2 in commit a577f6fa405c ("krbd: "rbd device map" command should
use msgr2 by default"), msgr1 needs to be requested explicitly.
John Mulligan [Tue, 17 Jun 2025 19:09:20 +0000 (15:09 -0400)]
cephadm: add support for specific network binds to smb service
Add a bunch of code to support specific IP address (and/or interface -
see below) binds for the smb service. When the smb service is not
clustered it is using container networking - in this case we use
publish options for the container manager to only listen on the supplied
addresses.
When the smb service is clustered we need to jump through a bunch of
hoops to configure each service individually. Many are easy with just
a short set of CLI options. CTDB only listens on the (first) node
address that it can bind to and only that. smbd has complex interactions
based on the `interfaces` and `bind interfaces only` config parameters.
Because these parameters may be unique to a node (addresses certainly
will be - and interfaces names could be) we can not store this in
the registry based conf. Instead, we take the slightly hacky approach
of generating a stub conf file with just the interfaces related params
in them and telling sambacc to generate a config that includes this
stub config.
IMPORTANT: When using ctdb with public addresses smbd doesn't know what
additional IPs it may need to listen to, so instead of binding to
a fixed IP we configure it to use an interface. This does have a
downside of possibly listening to another address on the same interface
we don't want it to. Additionally, I have observed that as addresses
are added or removed from the interface by ctdb, smbd doesn't
consistently start listening to those addresses.
John Mulligan [Wed, 18 Jun 2025 21:18:30 +0000 (17:18 -0400)]
mgr/cephadm: teach ctdb nodes logic about bind_addrs
Within the cephadm smb service class we have logic to help manage CTDB's
nodes. Ensure that this node handling logic also conforms to the recent
addition of the smb service's bind_addrs field.
John Mulligan [Mon, 16 Jun 2025 20:05:22 +0000 (16:05 -0400)]
mgr/cephadm: add filter_host_candidates method to smb service class
Add a filter_host_candidates method to the smb service class allowing
that class to act as a HostSelector. The HostSelector was added in an
earlier commit to allow classes like this one to make specific host
selections based on unique to that class (or it's spec) criteria.
This method uses the newly added `bind_addrs` field of the smb service
spec to ensure only hosts that meet the desired set of
networks/addresses get used in placement.
John Mulligan [Mon, 16 Jun 2025 20:04:35 +0000 (16:04 -0400)]
python-common/deployment: add bind_addrs and related type for smb
Add a `bind_addrs` field and `SMBClusterBindIPSpec` to the smb service
spec. If specified the `bind_addrs` field can contain one or more
SMBClusterBindIPSpec value. In JSON these values can contain either an
address `{"address": "192.168.76.10"}` or network `{"network":
"192.168.76.0/24"}`.
These specs will be used by cephadm to place the smb service only on
hosts that have IPs matching the supplied IP Address/Network values. It
will also instruct the smb services to only bind to these addresses.
A suggested future enhancement may be include an IP address range
representation for the SMBClusterBindIPSpec.
John Mulligan [Mon, 16 Jun 2025 20:05:14 +0000 (16:05 -0400)]
mgr/cephadm: teach serve.py about host selector support
A previous commit added a HostSelector protocol type to the schedule
code. This change makes it so the function calling upon the
HostAssignment class detects if a CephService provides a
filter_host_candidates method - meaning the service class can act as a
HostSelector. If the class can be a HostSelector pass it to the
HostAssignment so that the custom selection operation can be run.
John Mulligan [Mon, 16 Jun 2025 20:05:01 +0000 (16:05 -0400)]
mgr/cephadm: prepare schedule.py for per-service-type host filtering
Prepare schedule.py for per-service-type host filtering based on allowed
host addresses/networks. Add a new HostSelector protocol type to the
module defining what the filtering interface looks like.
This interface is intended allows CephService classes to "take over" the
network based filtering of nodes prior to placement and customize the
behavior of this step in cephamd's placement algorithm.
Note that the type must be passed in to the HostAssignment class as an
optional argument. If nothing is passed the class behaves as it did
before.