Adam Kupczyk [Fri, 10 Jan 2025 08:26:54 +0000 (08:26 +0000)]
os/bluestore: Fix BlueFS::truncate()
In `struct bluefs_fnode_t` there is a vector `extents` and
the vector `extents_index` that is a log2 seek cache.
Until modifications to truncate() we never removed extents from files.
Modified truncate() did not update extents_index.
For example 10 extents long files when truncated to 0 will have:
0 extents, 10 extents_index.
After writing some data to file:
1 extents, 11 extents_index.
Now, `bluefs_fnode_t::seek` will binary search extents_index,
lets say it located seek at item #3.
It will then jump up from #0 extent (that exists) to #3 extent which
does not exist at.
The worst part is that code is now broken, as #3 != extent.end().
There are 3 parts of the fix:
1) assert in `bluefs_fnode_t::seek` to protect against
jumping outside extents
2) code in BlueFS::truncate to sync up `extents_index` with `extents`
3) dampening down assert in _replay to give a way out of cases
where incorrect "offset 12345" (12345 is file size) instead of
"offset 20000" (allocations occupied) was written to log.
Fixes: https://tracker.ceph.com/issues/69481 Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
(cherry picked from commit 7f3601089d41bfc23f530c7bf3fb7efad2d055ec)
Adam Kupczyk [Fri, 10 Jan 2025 10:07:18 +0000 (10:07 +0000)]
os/bluestore: bluefs unittest for truncate bug
Unittest showing 2 different flavours of problems:
1) bluefs log corruption
2) bluefs sigsegv
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
(cherry picked from commit f2b5e2fa0a9274c1667fccafa597fff9be7a74b1)
+fixup for bad usage of std::string's fill constructor
Zac Dover [Sun, 27 Oct 2024 12:04:16 +0000 (22:04 +1000)]
doc/rados: add blaum_roth coding guidance
Direct Ceph administrators using blaum_roth coding for erasure-coded
pools to change the default value of w=7 to a different value in order
to ensure that w+1 is prime.
This information was provided to the Ceph upstream by Benjamin Mare in
September of 2024.
Zac Dover [Wed, 23 Oct 2024 08:34:25 +0000 (18:34 +1000)]
doc/rados: standardize markup of "clean"
Standardize the markup around the status "clean" in the documentation so
that readers don't mistakenly get the idea that inconsistent
presentation of the word "clean" implies a never-stated difference
between one instance and the other.
This introduces a new `ceph orch device replace` command in order to
improve the user experience when it comes to replacing the underlying
device of an OSD.
```
ceph_volume/util/disk.py:1374: error: Incompatible types in assignment (expression has type "Optional[str]", variable has type "str") [assignment]
```
Adam Kupczyk [Tue, 15 Oct 2024 12:41:22 +0000 (12:41 +0000)]
os/bluestore: Fix repair of multilabel when collides with BlueFS
The problem was that BDEV_FIRST_LABEL_POSITION was removed from
bdev_label_valid_locations set.
Now, if label at BDEV_FIRST_LABEL_POSITION is valid, it is in the set.
Fixes: https://tracker.ceph.com/issues/68558 Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
(cherry picked from commit 7343be720870d4a5f82b55beee4685457a003067)
Nizamudeen A [Wed, 9 Oct 2024 14:45:55 +0000 (20:15 +0530)]
mgr/dashboard: fix group name bugs in the nvmeof API
there are 2 issues
1. in cephadm, i was always using the first daemon to populate the group
in all the services for the dashboard config.
2. in the API, if there are more than 1 gateways listed in the config,
rather than chosing a random gateway from the group, raise an
exception and warn user to specify the gw_group parameter in the api
request
BlueStore::read_allocation_from_drive_for_bluestore_tool was
not informed that multiple bdev labels can exist and reserve space.
Comparison of real alloc vs recovered alloc was failing.
Fixes: https://tracker.ceph.com/issues/68560 Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
(cherry picked from commit 358f33a148c9a65478e33648f16e8c8af73c98f2)
Aashish Sharma [Wed, 9 Oct 2024 14:02:49 +0000 (19:32 +0530)]
mgr/cephadm: RGW service deployment defaults to 'default' realm/zonegroup/zone despite non-default spec in service
When we create an RGW service using the ceph orch apply command, the service is always deployed in the default realm, zonegroup, and zone, even if we specify a different realm, zonegroup, or zone in the service spec. This happens because certain configuration values, like rgw_realm, rgw_zonegroup, and rgw_zone, need to be set for the RGW instances before the daemons are deployed. Currently, these configurations are being applied after the RGW daemons are deployed, which requires a service restart to reflect the correct realm, zonegroup, and zone. Ideally, these configurations should be applied before the RGW daemons are deployed, so they are correctly placed in the desired realm, zonegroup, and zone from the start.
* pthread name is saved in a thread_local storage
* the thread_local name is copied into Entry object's ctor
* Log::dump_recent() reads the thread name from the Entry
object's data member when dumping logs
Afreen Misbah [Fri, 11 Oct 2024 08:57:24 +0000 (14:27 +0530)]
mgr/dashboard: Fix listener deletion
Listener deletion is broken due to passing wrong gateway address.
Including `traddr` in DELETE API of listener to choose correct gateway address for deletion.
Nizamudeen A [Thu, 19 Sep 2024 03:39:20 +0000 (09:09 +0530)]
mgr/dashboard: ignore exceptions raised when no cert/key found
for nvmeof client, when there are no cert found, it raises an exception
which gets logged more often because the dashboard polls the client
frequently.
```
Sep 18 13:40:54 ceph-node-00 ceph-mgr[2716]: log_channel(cephadm) log [ERR] : No secret found for entity nvmeof_root_ca_cert with service name nvmeof.rbd.default
Traceback (most recent call last):
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 140, in wrapper
return OrchResult(f(*args, **kwargs))
File "/usr/share/ceph/mgr/cephadm/module.py", line 3271, in cert_store_get_cert
raise OrchSecretNotFound(entity=entity, service_name=service_name, hostname=hostname)
cephadm.inventory.OrchSecretNotFound: No secret found for entity nvmeof_root_ca_cert with service name nvmeof.rbd.default
Sep 18 13:40:54 ceph-node-00 ceph-mgr[2716]: [dashboard INFO orchestrator] is orchestrator available: True,
Sep 18 13:40:54 ceph-node-00 ceph-mgr[2716]: [cephadm ERROR orchestrator._interface] No secret found for entity nvmeof_server_cert with service name nvmeof.rbd.default
Traceback (most recent call last):
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 140, in wrapper
return OrchResult(f(*args, **kwargs))
File "/usr/share/ceph/mgr/cephadm/module.py", line 3271, in cert_store_get_cert
raise OrchSecretNotFound(entity=entity, service_name=service_name, hostname=hostname)
cephadm.inventory.OrchSecretNotFound: No secret found for entity nvmeof_server_cert with service name nvmeof.rbd.default
Sep 18 13:40:54 ceph-node-00 ceph-0377c7c2-75c1-11ef-bb0e-5254000e47d2-mgr-ceph-node-00-cvrrld[2712]: 2024-09-18T13:40:54.529+0000 7fbbd9272640 -1 log_channel(cephadm) log [ERR] : No secret found for entity nvmeof_server_cert with service name nvmeof.rbd.default
Sep 18 13:40:54 ceph-node-00 ceph-0377c7c2-75c1-11ef-bb0e-5254000e47d2-mgr-ceph-node-00-cvrrld[2712]: Traceback (most recent call last):
Sep 18 13:40:54 ceph-node-00 ceph-0377c7c2-75c1-11ef-bb0e-5254000e47d2-mgr-ceph-node-00-cvrrld[2712]: File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 140, in wrapper
Sep 18 13:40:54 ceph-node-00 ceph-0377c7c2-75c1-11ef-bb0e-5254000e47d2-mgr-ceph-node-00-cvrrld[2712]: return OrchResult(f(*args, **kwargs))
Sep 18 13:40:54 ceph-node-00 ceph-0377c7c2-75c1-11ef-bb0e-5254000e47d2-mgr-ceph-node-00-cvrrld[2712]: File "/usr/share/ceph/mgr/cephadm/module.py", line 3271, in cert_store_get_cert
Sep 18 13:40:54 ceph-node-00 ceph-0377c7c2-75c1-11ef-bb0e-5254000e47d2-mgr-ceph-node-00-cvrrld[2712]: raise OrchSecretNotFound(entity=entity, service_name=service_name, hostname=hostname)
Sep 18 13:40:54 ceph-node-00 ceph-0377c7c2-75c1-11ef-bb0e-5254000e47d2-mgr-ceph-node-00-cvrrld[2712]: cephadm.inventory.OrchSecretNotFound: No secret found for entity nvmeof_server_cert with service name nvmeof.rbd.default
Sep 18 13:40:54 ceph-node-00 ceph-mgr[2716]: log_channel(cephadm) log [ERR] : No secret found for entity nvmeof_server_cert with service name nvmeof.rbd.default
Traceback (most recent call last):
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 140, in wrapper
return OrchResult(f(*args, **kwargs))
File "/usr/share/ceph/mgr/cephadm/module.py", line 3271, in cert_store_get_cert
raise OrchSecretNotFound(entity=entity, service_name=service_name, hostname=hostname)
cephadm.inventory.OrchSecretNotFound: No secret found for entity nvmeof_server_cert with service name nvmeof.rbd.default
Sep 18 13:40:54 ceph-node-00 ceph-mgr[2716]: [dashboard INFO nvmeof_client] Insecurely connecting to: 192.168.100.101:5500
```
Naman Munet [Mon, 7 Oct 2024 05:11:29 +0000 (10:41 +0530)]
mgr/dashboard: unable to edit pipe config for bucket level policy of a bucket
Fixes: https://tracker.ceph.com/issues/68387
Fixes Includes:
1) Passing additional parameter for 'user' and 'mode' as the user can be either system/dashboard or other values while creating pipe.
2) Previously while removing the src/dest bucket field, we were getting same old values on editing pipe, but now it will become '*' if empty value passed from frontend.