Nizamudeen A [Fri, 7 Jun 2024 13:49:42 +0000 (19:19 +0530)]
mgr/dashboard: fix edit bucket failing in other selected gateways
even if I select gateway 8002, the bucket policy req seems to go through 8000 and doesn't find the bucket
```
2024-06-07T13:40:33.161+0000 7f563be00700 0 [dashboard DEBUG rest_client] RGW REST API GET req: /hello?policy data: None
2024-06-07T13:40:33.164+0000 7f563be00700 0 [dashboard DEBUG urllib3.connectionpool] http://172.20.0.5:8000 "GET /hello?policy HTTP/1.1" 404 174
2024-06-07T13:40:33.164+0000 7f563be00700 0 [dashboard ERROR rest_client] RGW REST API failed GET req status: 404
2024-06-07T13:40:33.164+0000 7f563be00700 0 [dashboard ERROR exception] Internal Server Error
Traceback (most recent call last):
File "/ceph/src/pybind/mgr/dashboard/services/exception.py", line 47, in dashboard_exception_handler
return handler(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/cherrypy/_cpdispatch.py", line 54, in __call__
return self.callable(*self.args, **self.kwargs)
File "/ceph/src/pybind/mgr/dashboard/controllers/_base_controller.py", line 263, in inner
ret = func(*args, **kwargs)
File "/ceph/src/pybind/mgr/dashboard/controllers/_rest_controller.py", line 193, in wrapper
return func(*vpath, **params)
File "/ceph/src/pybind/mgr/dashboard/controllers/rgw.py", line 463, in get
result['bucket_policy'] = self._get_policy(bucket_name)
File "/ceph/src/pybind/mgr/dashboard/controllers/rgw.py", line 381, in _get_policy
return rgw_client.get_bucket_policy(bucket)
File "/ceph/src/pybind/mgr/dashboard/rest_client.py", line 543, in func_wrapper
**kwargs)
File "/ceph/src/pybind/mgr/dashboard/services/rgw_client.py", line 957, in get_bucket_policy
raise e
File "/ceph/src/pybind/mgr/dashboard/services/rgw_client.py", line 949, in get_bucket_policy
request = request()
File "/ceph/src/pybind/mgr/dashboard/rest_client.py", line 325, in __call__
data, raw_content, headers)
File "/ceph/src/pybind/mgr/dashboard/rest_client.py", line 428, in do_request
resp.content)
dashboard.rest_client.RequestException: RGW REST API failed request with status code 404
(b'{"Code":"NoSuchBucket","Message":"","BucketName":"hello","RequestId":"tx0000'
b'0d73bbbad485175ea-0066630dd1-18785-zone1-zg1-realm1","HostId":"18785-zone1-z'
b'g1-realm1-zg1-realm1"}')
```
But for the same bucket the encryption and other req goes through the correct gateway
```
2024-06-07T13:40:32.704+0000 7f563be00700 0 [dashboard DEBUG urllib3.connectionpool] http://172.20.0.5:8002 "GET /hello?versioning HTTP/1.1" 200 2
2024-06-07T13:40:32.745+0000 7f563be00700 0 [dashboard DEBUG rest_client] RGW REST API GET res status: 200 content: {}
2024-06-07T13:40:32.745+0000 7f563be00700 0 [dashboard INFO rgw_client] Found RGW daemon with configuration: host=172.20.0.5, port=8000, ssl=False
2024-06-07T13:40:32.746+0000 7f563be00700 0 [dashboard INFO rgw_client] Found RGW daemon with configuration: host=172.20.0.5, port=8002, ssl=False
2024-06-07T13:40:32.746+0000 7f563be00700 0 [dashboard DEBUG rest_client] RGW REST API GET req: /hello?encryption data: None
2024-06-07T13:40:32.747+0000 7f563be00700 0 [dashboard DEBUG urllib3.connectionpool] http://172.20.0.5:8002 "GET /hello?encr
```
Zack Cerza [Fri, 14 Jun 2024 19:37:16 +0000 (13:37 -0600)]
qa/tasks/qemu: Fix OS version comparison
See: https://sentry.ceph.com/share/issue/21ed88d705854238bdafbf6711e795ee/
They're strings, not floats.
This surfaced as a result of https://github.com/ceph/teuthology/pull/1953
Ilya Dryomov [Wed, 5 Jun 2024 06:36:12 +0000 (08:36 +0200)]
pybind/rbd: parse access and modify timestamps in UTC
It appears that commits 08cee16d0a4b ("pybind/rbd: always parse
timestamps in UTC") and 809c5430c292 ("librbd: add image access/last
modified timestamps") raced with each other and we ended up with two
more timezone-dependent timestamps.
Ilya Dryomov [Tue, 4 Jun 2024 19:37:49 +0000 (21:37 +0200)]
test/pybind/rbd: make timestamp tests meaningful
The existing asserts don't really test anything, with some of them
being for inequality against a literal of a mismatching type. As
a result, a bug in access_timestamp() and modify_timestamp() went
unnoticed for years.
Ilya Dryomov [Tue, 4 Jun 2024 19:19:40 +0000 (21:19 +0200)]
test/pybind/rbd: fix tests that compare strings with b''
assert_not_equal(b'', self.image.id()) is bogus because Image::id()
returns a string (str), not bytes. If the types don't match, values
are guaranteed to not match.
Adam King [Tue, 30 Apr 2024 18:17:58 +0000 (14:17 -0400)]
python-common/service_spec: fix some mypy complaints
The python/mypy combination on the jenkins nodes the CI
is running on don't seem to care, but locally I get
mypy: commands[0]> mypy --config-file=../mypy.ini -p ceph
ceph/deployment/service_spec.py: note: In member "validate" of class "NvmeofServiceSpec":
ceph/deployment/service_spec.py:1497: error: Unsupported operand types for > ("float" and "None") [operator]
ceph/deployment/service_spec.py:1497: note: Left operand is of type "Optional[float]"
ceph/deployment/service_spec.py:1500: error: Unsupported operand types for > ("int" and "None") [operator]
ceph/deployment/service_spec.py:1500: note: Left operand is of type "Optional[int]"
ceph/deployment/service_spec.py:1503: error: Unsupported operand types for > ("int" and "None") [operator]
ceph/deployment/service_spec.py:1503: note: Left operand is of type "Optional[int]"
ceph/deployment/service_spec.py:1506: error: Unsupported operand types for > ("int" and "None") [operator]
ceph/deployment/service_spec.py:1506: note: Left operand is of type "Optional[int]"
ceph/deployment/service_spec.py:1509: error: Unsupported operand types for > ("int" and "None") [operator]
ceph/deployment/service_spec.py:1509: note: Left operand is of type "Optional[int]"
ceph/deployment/service_spec.py:1512: error: Unsupported operand types for > ("int" and "None") [operator]
ceph/deployment/service_spec.py:1512: note: Left operand is of type "Optional[int]"
ceph/deployment/service_spec.py:1515: error: Unsupported operand types for > ("float" and "None") [operator]
ceph/deployment/service_spec.py:1515: note: Left operand is of type "Optional[float]"
ceph/deployment/service_spec.py:1518: error: Unsupported operand types for > ("int" and "None") [operator]
ceph/deployment/service_spec.py:1518: note: Left operand is of type "Optional[int]"
ceph/deployment/service_spec.py:1521: error: Unsupported operand types for > ("int" and "None") [operator]
ceph/deployment/service_spec.py:1521: note: Left operand is of type "Optional[int]"
ceph/deployment/service_spec.py:1524: error: Unsupported operand types for > ("int" and "None") [operator]
ceph/deployment/service_spec.py:1524: note: Left operand is of type "Optional[int]"
ceph/deployment/service_spec.py:1527: error: Unsupported operand types for > ("int" and "None") [operator]
ceph/deployment/service_spec.py:1527: note: Left operand is of type "Optional[int]"
ceph/deployment/service_spec.py:1530: error: Unsupported operand types for > ("float" and "None") [operator]
ceph/deployment/service_spec.py:1530: note: Left operand is of type "Optional[float]"
Found 12 errors in 1 file (checked 27 source files)
The errors make sense to me, so I think we should fix them
osd_op_params_t::user_at_version was populated from
osd_op_params_t::at_version before the call to prepare_transaction,
which incremented osd_op_params_t::at_version.version. As a result,
the value stored in object_info_t::user_version ended up one version
behind object_info_t::version. The log entry, on the other hand,
ended up with the correct version as OpsExecutor::prepare_transaction
populates it directly from at_version. As a result, the primary could
return different versions to the client depending on whether the IO was
already in the log.
This commit eliminates osd_op_params_t::user_at_version and updates
PGBackend::mutate_object to behave like prepare_transaction. Because
the prior commit removes the prepare_transaction increment, this isn't
strictly necessary, but it is simpler.
Samuel Just [Thu, 9 May 2024 03:39:18 +0000 (20:39 -0700)]
crimson/osd/ops_executor: only increment osd_op_params_t::at_version for clone
Previously, we incremented prior to usage in both prepare_transaction
and execute_clone. Because at_version is initialized from
PG::get_next_version(), this results in log entries skipping
values. Instead, only increment after populating clone
object_info.
Samuel Just [Thu, 9 May 2024 02:31:35 +0000 (19:31 -0700)]
crimson/osd/pg: remove slightly confusing assert from PG::submit_transaction
This assert should always hold, so it's not wrong. However,
osd_op_p.at_version is chosen during transaction construction time
using the current epoch via PG::get_next_version. Asserting it here
is more confusing than helpful.
It was (probably) assumed that the pattern `addr:` would be present only
once. With the introduction of node-proxy, this isn't true anymore.
Now that the cephadm binary can embed some external libraries we can leverage pyyaml.
The idea is to use proper yaml format instead so it is easier to process the data.
Adam King [Thu, 2 May 2024 17:35:41 +0000 (13:35 -0400)]
mgr/cephadm: make SMB and NVMEoF upgrade last in staggered upgrade
This needs to happen as some work on the NVMEoF side (still unmerged
as of writing this) will make the NVMEoF daemon dependent on the mon.
Prior to this patch, in a staggered upgrade, all daemons not using the
ceph image were upgraded after the mgr since we typically only care
about the default image changing or potential changes to how we handle
our systemd units which only needs the mgr to be upgraded to be applied.
This NVMEoF dependency on the mon changes this and we can no longer
upgrade it directly after the mgr. This patch changes it so the NVMEoF
daemon is instead upgraded after all ceph image daemons have been
upgraded in a staggered upgrade scenario. Non-staggered upgrades
are unaffected as the NVMEoF daemon was already upgraded near the
end in that scenario. The SMB dameon has no reason it needs to be
upgraded later, but it's in the (small) pool of daemons that don't
use the ceph image and aren't for monitoring, so it's been affected
by this as well.
NOTE: This is a bit of an ugly patch imo and shows that a refactoring
of the upgrade code is likely required. Hopefully this patch is more
of a stopgap until that larger effort can be made