Zac Dover [Sun, 27 Oct 2024 12:04:16 +0000 (22:04 +1000)]
doc/rados: add blaum_roth coding guidance
Direct Ceph administrators using blaum_roth coding for erasure-coded
pools to change the default value of w=7 to a different value in order
to ensure that w+1 is prime.
This information was provided to the Ceph upstream by Benjamin Mare in
September of 2024.
Zac Dover [Wed, 23 Oct 2024 08:34:25 +0000 (18:34 +1000)]
doc/rados: standardize markup of "clean"
Standardize the markup around the status "clean" in the documentation so
that readers don't mistakenly get the idea that inconsistent
presentation of the word "clean" implies a never-stated difference
between one instance and the other.
This introduces a new `ceph orch device replace` command in order to
improve the user experience when it comes to replacing the underlying
device of an OSD.
```
ceph_volume/util/disk.py:1374: error: Incompatible types in assignment (expression has type "Optional[str]", variable has type "str") [assignment]
```
Adam Kupczyk [Tue, 15 Oct 2024 12:41:22 +0000 (12:41 +0000)]
os/bluestore: Fix repair of multilabel when collides with BlueFS
The problem was that BDEV_FIRST_LABEL_POSITION was removed from
bdev_label_valid_locations set.
Now, if label at BDEV_FIRST_LABEL_POSITION is valid, it is in the set.
Fixes: https://tracker.ceph.com/issues/68558 Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
(cherry picked from commit 7343be720870d4a5f82b55beee4685457a003067)
Nizamudeen A [Wed, 9 Oct 2024 14:45:55 +0000 (20:15 +0530)]
mgr/dashboard: fix group name bugs in the nvmeof API
there are 2 issues
1. in cephadm, i was always using the first daemon to populate the group
in all the services for the dashboard config.
2. in the API, if there are more than 1 gateways listed in the config,
rather than chosing a random gateway from the group, raise an
exception and warn user to specify the gw_group parameter in the api
request
BlueStore::read_allocation_from_drive_for_bluestore_tool was
not informed that multiple bdev labels can exist and reserve space.
Comparison of real alloc vs recovered alloc was failing.
Fixes: https://tracker.ceph.com/issues/68560 Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
(cherry picked from commit 358f33a148c9a65478e33648f16e8c8af73c98f2)
Aashish Sharma [Wed, 9 Oct 2024 14:02:49 +0000 (19:32 +0530)]
mgr/cephadm: RGW service deployment defaults to 'default' realm/zonegroup/zone despite non-default spec in service
When we create an RGW service using the ceph orch apply command, the service is always deployed in the default realm, zonegroup, and zone, even if we specify a different realm, zonegroup, or zone in the service spec. This happens because certain configuration values, like rgw_realm, rgw_zonegroup, and rgw_zone, need to be set for the RGW instances before the daemons are deployed. Currently, these configurations are being applied after the RGW daemons are deployed, which requires a service restart to reflect the correct realm, zonegroup, and zone. Ideally, these configurations should be applied before the RGW daemons are deployed, so they are correctly placed in the desired realm, zonegroup, and zone from the start.
* pthread name is saved in a thread_local storage
* the thread_local name is copied into Entry object's ctor
* Log::dump_recent() reads the thread name from the Entry
object's data member when dumping logs
Afreen Misbah [Fri, 11 Oct 2024 08:57:24 +0000 (14:27 +0530)]
mgr/dashboard: Fix listener deletion
Listener deletion is broken due to passing wrong gateway address.
Including `traddr` in DELETE API of listener to choose correct gateway address for deletion.
Nizamudeen A [Thu, 19 Sep 2024 03:39:20 +0000 (09:09 +0530)]
mgr/dashboard: ignore exceptions raised when no cert/key found
for nvmeof client, when there are no cert found, it raises an exception
which gets logged more often because the dashboard polls the client
frequently.
```
Sep 18 13:40:54 ceph-node-00 ceph-mgr[2716]: log_channel(cephadm) log [ERR] : No secret found for entity nvmeof_root_ca_cert with service name nvmeof.rbd.default
Traceback (most recent call last):
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 140, in wrapper
return OrchResult(f(*args, **kwargs))
File "/usr/share/ceph/mgr/cephadm/module.py", line 3271, in cert_store_get_cert
raise OrchSecretNotFound(entity=entity, service_name=service_name, hostname=hostname)
cephadm.inventory.OrchSecretNotFound: No secret found for entity nvmeof_root_ca_cert with service name nvmeof.rbd.default
Sep 18 13:40:54 ceph-node-00 ceph-mgr[2716]: [dashboard INFO orchestrator] is orchestrator available: True,
Sep 18 13:40:54 ceph-node-00 ceph-mgr[2716]: [cephadm ERROR orchestrator._interface] No secret found for entity nvmeof_server_cert with service name nvmeof.rbd.default
Traceback (most recent call last):
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 140, in wrapper
return OrchResult(f(*args, **kwargs))
File "/usr/share/ceph/mgr/cephadm/module.py", line 3271, in cert_store_get_cert
raise OrchSecretNotFound(entity=entity, service_name=service_name, hostname=hostname)
cephadm.inventory.OrchSecretNotFound: No secret found for entity nvmeof_server_cert with service name nvmeof.rbd.default
Sep 18 13:40:54 ceph-node-00 ceph-0377c7c2-75c1-11ef-bb0e-5254000e47d2-mgr-ceph-node-00-cvrrld[2712]: 2024-09-18T13:40:54.529+0000 7fbbd9272640 -1 log_channel(cephadm) log [ERR] : No secret found for entity nvmeof_server_cert with service name nvmeof.rbd.default
Sep 18 13:40:54 ceph-node-00 ceph-0377c7c2-75c1-11ef-bb0e-5254000e47d2-mgr-ceph-node-00-cvrrld[2712]: Traceback (most recent call last):
Sep 18 13:40:54 ceph-node-00 ceph-0377c7c2-75c1-11ef-bb0e-5254000e47d2-mgr-ceph-node-00-cvrrld[2712]: File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 140, in wrapper
Sep 18 13:40:54 ceph-node-00 ceph-0377c7c2-75c1-11ef-bb0e-5254000e47d2-mgr-ceph-node-00-cvrrld[2712]: return OrchResult(f(*args, **kwargs))
Sep 18 13:40:54 ceph-node-00 ceph-0377c7c2-75c1-11ef-bb0e-5254000e47d2-mgr-ceph-node-00-cvrrld[2712]: File "/usr/share/ceph/mgr/cephadm/module.py", line 3271, in cert_store_get_cert
Sep 18 13:40:54 ceph-node-00 ceph-0377c7c2-75c1-11ef-bb0e-5254000e47d2-mgr-ceph-node-00-cvrrld[2712]: raise OrchSecretNotFound(entity=entity, service_name=service_name, hostname=hostname)
Sep 18 13:40:54 ceph-node-00 ceph-0377c7c2-75c1-11ef-bb0e-5254000e47d2-mgr-ceph-node-00-cvrrld[2712]: cephadm.inventory.OrchSecretNotFound: No secret found for entity nvmeof_server_cert with service name nvmeof.rbd.default
Sep 18 13:40:54 ceph-node-00 ceph-mgr[2716]: log_channel(cephadm) log [ERR] : No secret found for entity nvmeof_server_cert with service name nvmeof.rbd.default
Traceback (most recent call last):
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 140, in wrapper
return OrchResult(f(*args, **kwargs))
File "/usr/share/ceph/mgr/cephadm/module.py", line 3271, in cert_store_get_cert
raise OrchSecretNotFound(entity=entity, service_name=service_name, hostname=hostname)
cephadm.inventory.OrchSecretNotFound: No secret found for entity nvmeof_server_cert with service name nvmeof.rbd.default
Sep 18 13:40:54 ceph-node-00 ceph-mgr[2716]: [dashboard INFO nvmeof_client] Insecurely connecting to: 192.168.100.101:5500
```
Naman Munet [Mon, 7 Oct 2024 05:11:29 +0000 (10:41 +0530)]
mgr/dashboard: unable to edit pipe config for bucket level policy of a bucket
Fixes: https://tracker.ceph.com/issues/68387
Fixes Includes:
1) Passing additional parameter for 'user' and 'mode' as the user can be either system/dashboard or other values while creating pipe.
2) Previously while removing the src/dest bucket field, we were getting same old values on editing pipe, but now it will become '*' if empty value passed from frontend.
Ronen Friedman [Tue, 8 Oct 2024 13:25:56 +0000 (08:25 -0500)]
qa/standalone/scrub: remove TEST_recovery_scrub_2
That test does no longer match the actual requirements and
implementation of scrubbing.
It was already deactivated in
https://github.com/ceph/ceph/pull/59590. Here - it is
fully removed, mainly for the sake of backporting.
instead of calling ioctx.get_last_version() after a rados operation,
callers now pass version_t* as an output parameter. in the null_yield
case, that version is assigned to ioctx.get_last_version() as normal. in
the async case, we get the version out of librados::async_operate()'s
return value
librados/asio: add version_t to completion signatures
IoCtx::aio_operate() doesn't update IoCtx::get_last_version(). to make
the resulting version_t available to the caller, we have to read it out
of the AioCompletionImpl and return it to the caller
liangmingyuan [Mon, 5 Aug 2024 07:30:33 +0000 (15:30 +0800)]
rgw/beast: optimize for accept when meeting error in listenning
It is not suitable to stop accept socket when meeting any error in
previous socket listen and accept. This will results in radosgw
stop work after a occasional case. For example, Too many open files
warning may occur at high iops(or just after reshard, sockets opened
may increase for doing operations blocked).