Samuel Just [Wed, 19 Jun 2024 04:10:34 +0000 (21:10 -0700)]
crimson/.../object_context: drop recovery_read_marker
This doesn't seem to serve a purpose with current crimson. classic
uses ObjectState::recovery_read_marker to indicate that backfill
should be requeued upon wakeup, but that hasn't been necessary so
far in crimson. We can reintroduce this if it becomes useful.
Samuel Just [Thu, 13 Jun 2024 00:47:08 +0000 (00:47 +0000)]
crimson/.../tri_mutex: lock() methods return normal future
f63d76a2 modified the lock() variants on tri_mutex so that the obc
loading pathway wouldn't invoke .then() on returned future known
statically to be ready. Now that the loading pathway uses demotion
mechanisms that cannot block and do not return futures, we no longer
have any users like that and can drop the extra std::nullopt
possibility.
In a larger sense, if lock() *can* return a non-ready future in a
particular pathway, there's no semantic difference between returning
std::optional<future<>> and future<> as the caller would still have to
deal with a possible non-ready future return even if std::nullopt is
also possible. If the pathway can be demonstrated statically to be
non-blocking, as with the obc loading mechanism, we really want to use a
mechanism that obviously cannot block rather relying on a mechanism with
a return signature of std::optional<future<>> to return std::nullopt.
Patrick Donnelly [Sun, 23 Jun 2024 18:29:53 +0000 (14:29 -0400)]
Merge PR #57754 into main
* refs/pull/57754/head:
mds: set the proper extra bl for the create request
mds: encode the correct extra info depending on the feature bits
mds: add set_reply_extra_bl() helper support
mds: cleanup the code to make it to be more readable
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Greg Farnum <gfarnum@redhat.com>
Because we just constructed the obc, we know that we can get an
exclusive lock without blocking. Introduce
ObjectContext::load_then_with_lock to take an exclusive lock
unconditionally, load, downgrade (which we also know must be safe), and
then run the passed function.
Laura Flores [Wed, 19 Jun 2024 21:57:45 +0000 (16:57 -0500)]
qa/suites/upgrade/telemetry-upgrade/reef-x: update how cephadm is pulled and change image reference
Update how cephadm is pulled:
`cephadm_git_url` and `cephadm_branch` are used in releases older than reef
to install cephadm. Both of these keys are needed to install it from the github
repo.
However, in reef and on, the compiled zipapp cephadm needs to be pulled differently
than the old single python script `cephadm` from earlier releases.
Laura Flores [Wed, 19 Jun 2024 21:07:31 +0000 (16:07 -0500)]
qa/suites/upgrade/telemetry-upgrade: add more ignorelist items and require_osd_release=squid
The warnings added to the ignorelist show up in the cluster log, but they are
expected during upgrades and should thus be ignored.
We also need to set require_osd_release=squid to avoid this warning:
```
cluster [WRN] Health check failed: all OSDs are running squid or later but require_osd_release < squid (OSD_UPGRADE_FINISHED)
```
Patrick Donnelly [Tue, 18 Jun 2024 17:31:14 +0000 (13:31 -0400)]
mon/AuthMonitor: add `ceph auth rotate` command
Add command to rotate the permanent key of an entity. This avoids the need to
delete / recreate the key when it is compromised, lost, or just scheduled for
rotation.
Fixes: https://tracker.ceph.com/issues/66509 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Note from Dan van der Ster:
This is a test that should succeed,
it definitely used to succeed back in the L/O days of Ceph. At some
point peering code changed and this behaviour regressed. In short,
an OSD goes down then comes up, and no objects were modified in the mean time.
There should be no degraded PGs in this case.
As this commit is currently breaking make check on all PRs, I think it should
be re-evaluated and merged so whatever fix is needed along with this test to
make it work are merged together.
Fixes: https://tracker.ceph.com/issues/66556 Signed-off-by: Laura Flores <lflores@ibm.com>
Patrick Donnelly [Thu, 20 Jun 2024 15:43:17 +0000 (11:43 -0400)]
script/backport-create-issue: update tag custom field
The redmineup_tags was added and, apparently, the old "Tags" custom field with
id == 3 was deleted and then recreated with id == 31. This broke the script.
Update and refactor the id for that "Tags" custom field.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
The NFS tab in object and File nav uses same route due to which both
gets activated when one of them is clicked.
Hence, this PR separates the routing for Object and File nav.
Object-> NFS: /rgw/nfs
File-> NFS: /cephfs/nfs
Both routes use same NFS List and Form component but under different
routes as mentioned above.
Changes summary
- updated route for File from "/fs" to "/cephfs/<any_other_sub_route>"
to support both fs and nfs tabs. Since using `/fs` and `/fs/nfs` will
activate both paths and it will be an undesirable user experience.
- `getFsalRouteFromPath` helper function to set the storage backend from
route.
- removed `stoarge-backend` field from nfs form as now route decides teh
storage backend
- breadcrumbs redirect to respective navs
- updated e2e tests
- updated unit tests
- changes list page of object-> nfs page to say Bucket instead of Path
Xiubo Li [Wed, 19 Jun 2024 03:27:33 +0000 (11:27 +0800)]
common/TrackedOp: do not count the ops marked as nowarn
If an op is marked as nowarn then it won't be counted as the slow
requests, but currently it will count the initiated time when
iterating the inflight ops.
For example:
[WRN] : 1 slow requests, 1 included below; oldest blocked for > 38.764892 secs
[WRN] : slow request 33.875059 seconds old, received at 2024-06-17T14:14:34.228261+0000: client_request(client.78109915:11369251 mkdir #0x1008ecedea2/chk-89588 2024-06-17T14:14:34.097825+0000 caller_uid=1002960000, caller_gid=0{0,1002960000,}) currently failed to wrlock, waiting
The oldest blocked request is 38.764892 old, but the oldest slow
request reported is 33.875059 old.
Fixes: commit e4160d7e783 ("mds: don't report slow request for blocked filelock request") Fixes: https://tracker.ceph.com/issues/66557 Signed-off-by: Xiubo Li <xiubli@redhat.com>
Patrick Donnelly [Wed, 19 Jun 2024 19:19:56 +0000 (15:19 -0400)]
Merge PR #55792 into main
* refs/pull/55792/head:
tools/cephfs: recover alternate_name of dentries from journal
qa: add test to verify recovery of alternate_name from journal
tools/cephfs/JournalTool: add some more debugging
tools/cephfs/JournalTool: remove extraneous 0x in debug output
mds: dump alternate_name to formatter
mds: add warning about encoding new fields
Reviewed-by: Christopher Hoffman <choffman@redhat.com> Reviewed-by: Xiubo Li <xiubli@redhat.com>
Adam King [Wed, 19 Jun 2024 19:04:21 +0000 (15:04 -0400)]
mgr/rgw: fix error handling in rgw zone create
This was returning either a list of strings or
a HandleCommandResult and in the latter case
it would error out trying to build the final
return message which covered up the original
error
Fixes: https://tracker.ceph.com/issues/66568 Signed-off-by: Adam King <adking@redhat.com>
John Mulligan [Mon, 6 May 2024 20:35:31 +0000 (16:35 -0400)]
mgr/smb: include login_control content when generating share config
The login_control list (modified by restrict_access) defines the
smb.conf params 'read list', 'write list', 'admin users', 'invalid
users', and 'valid users'.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Mon, 6 May 2024 20:35:41 +0000 (16:35 -0400)]
mgr/smb: add login_control and restrict_access attributes to share
The login_control attribute contains a list of LoginAccessEntry objects.
These will be mapped to samba share access parameters.
The restrict_access attribute is a boolean that makes access to the
share contingent on being listed in the login_control list, as
either a listed user or member of a listed group, for any access
level other than 'none' (as that always denies access).
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Afreen Misbah [Fri, 14 Jun 2024 05:21:47 +0000 (10:51 +0530)]
mgr/dashboard: fix service page e2e tests
- service page now uses defaults value for the placement count due to which mds test failing
- in test we pass "1" while "2" which is the default count for mds is already populated, making it 21 and causing unable to create mds service
Zack Cerza [Fri, 14 Jun 2024 19:37:16 +0000 (13:37 -0600)]
qa/tasks/qemu: Fix OS version comparison
See: https://sentry.ceph.com/share/issue/21ed88d705854238bdafbf6711e795ee/
They're strings, not floats.
This surfaced as a result of https://github.com/ceph/teuthology/pull/1953
Nizamudeen A [Sun, 16 Jun 2024 11:15:43 +0000 (16:45 +0530)]
mgr/dashboard: disable telemetry notification in e2e
After the new UI shell, the telemetry notifications are displayed on the
bottom side, which kind of exists on top of some buttons and that causes
some cypress failures while clicking Submit button in the form.
I am disabling the notification itself in e2e because its not checked in
the e2e at all so we can afford to disable it. Incase we decide to add
an e2e for the notification, we can just toggle it on later on and we
can check it
Fixes: https://tracker.ceph.com/issues/66506 Signed-off-by: Nizamudeen A <nia@redhat.com>
Xuehan Xu [Fri, 24 May 2024 09:30:41 +0000 (17:30 +0800)]
crimson/osd/osd_operations/client_request_common: `PeeringState::needs_recovery()`
may fail if the object is under backfill
Meanwhile, set the correct version for backfill:
From Classic:
```
if (is_degraded_or_backfilling_object(head)) {
if (can_backoff && g_conf()->osd_backoff_on_degraded) {
add_backoff(session, head, head);
maybe_kick_recovery(head);
}
```