Patrick Donnelly [Tue, 25 Jun 2024 16:27:28 +0000 (12:27 -0400)]
Merge PR #53503 into main
* refs/pull/53503/head:
qa: add tests for `mds last-seen` command
doc/cephfs: add documentation for `mds last-seen`
PendingReleaseNotes: add note on last-seen command
mon/MDSMonitor: add command to lookup when mds was last seen
mon/MDSMonitor: set birth time on FSMap during encode
pybind/mgr/dashboard: show context diff for openapi check
Venky Shankar [Tue, 25 Jun 2024 07:09:33 +0000 (12:39 +0530)]
Merge PR #56429 into main
* refs/pull/56429/head:
mds: fix rank root doesn't insert root ino into its subtree map when starting
mds: flush mds log before finishing STATE_STARTING
mds/FSMap: go back to STARTING state when rank doesn't make it pass STARTING
Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Zac Dover [Mon, 24 Jun 2024 10:32:30 +0000 (20:32 +1000)]
doc/rados: edit troubleshooting-osd.rst
Make minor changes to the "Debugging Slow Requests" section of
doc/rados/troubleshooting/troubleshooting-osd.rst in preparation
for an expansion of this section in response to a reqeust from Joel
Davidow.
Samuel Just [Wed, 19 Jun 2024 04:10:34 +0000 (21:10 -0700)]
crimson/.../object_context: drop recovery_read_marker
This doesn't seem to serve a purpose with current crimson. classic
uses ObjectState::recovery_read_marker to indicate that backfill
should be requeued upon wakeup, but that hasn't been necessary so
far in crimson. We can reintroduce this if it becomes useful.
Samuel Just [Thu, 13 Jun 2024 00:47:08 +0000 (00:47 +0000)]
crimson/.../tri_mutex: lock() methods return normal future
f63d76a2 modified the lock() variants on tri_mutex so that the obc
loading pathway wouldn't invoke .then() on returned future known
statically to be ready. Now that the loading pathway uses demotion
mechanisms that cannot block and do not return futures, we no longer
have any users like that and can drop the extra std::nullopt
possibility.
In a larger sense, if lock() *can* return a non-ready future in a
particular pathway, there's no semantic difference between returning
std::optional<future<>> and future<> as the caller would still have to
deal with a possible non-ready future return even if std::nullopt is
also possible. If the pathway can be demonstrated statically to be
non-blocking, as with the obc loading mechanism, we really want to use a
mechanism that obviously cannot block rather relying on a mechanism with
a return signature of std::optional<future<>> to return std::nullopt.
Patrick Donnelly [Sun, 23 Jun 2024 18:29:53 +0000 (14:29 -0400)]
Merge PR #57754 into main
* refs/pull/57754/head:
mds: set the proper extra bl for the create request
mds: encode the correct extra info depending on the feature bits
mds: add set_reply_extra_bl() helper support
mds: cleanup the code to make it to be more readable
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Greg Farnum <gfarnum@redhat.com>
Because we just constructed the obc, we know that we can get an
exclusive lock without blocking. Introduce
ObjectContext::load_then_with_lock to take an exclusive lock
unconditionally, load, downgrade (which we also know must be safe), and
then run the passed function.
Laura Flores [Wed, 19 Jun 2024 21:57:45 +0000 (16:57 -0500)]
qa/suites/upgrade/telemetry-upgrade/reef-x: update how cephadm is pulled and change image reference
Update how cephadm is pulled:
`cephadm_git_url` and `cephadm_branch` are used in releases older than reef
to install cephadm. Both of these keys are needed to install it from the github
repo.
However, in reef and on, the compiled zipapp cephadm needs to be pulled differently
than the old single python script `cephadm` from earlier releases.
Laura Flores [Wed, 19 Jun 2024 21:07:31 +0000 (16:07 -0500)]
qa/suites/upgrade/telemetry-upgrade: add more ignorelist items and require_osd_release=squid
The warnings added to the ignorelist show up in the cluster log, but they are
expected during upgrades and should thus be ignored.
We also need to set require_osd_release=squid to avoid this warning:
```
cluster [WRN] Health check failed: all OSDs are running squid or later but require_osd_release < squid (OSD_UPGRADE_FINISHED)
```
Nizamudeen A [Fri, 7 Jun 2024 07:45:06 +0000 (13:15 +0530)]
mgr/dashboard: select default daemon based on the default zonegroup
if multisite is configured, the default daemon needs to be selected
based on the default zonegroup. Otherwise dashboard gives you incorrect
details when doing the period commit
The issue occurs when you do a period update --commit and you reload one
of the block page, the api assigns the zonegroup of the second gateway
because for a moment, the first gateway reflects the period changes...
This is not true because the default zonegroup is of the previous active
gateway but even though the back-end correctly says the active
zonegroup, the dashboard api says it wrongly.
Fixes: https://tracker.ceph.com/issues/66394 Signed-off-by: Nizamudeen A <nia@redhat.com>
Patrick Donnelly [Tue, 18 Jun 2024 17:31:14 +0000 (13:31 -0400)]
mon/AuthMonitor: add `ceph auth rotate` command
Add command to rotate the permanent key of an entity. This avoids the need to
delete / recreate the key when it is compromised, lost, or just scheduled for
rotation.
Fixes: https://tracker.ceph.com/issues/66509 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Note from Dan van der Ster:
This is a test that should succeed,
it definitely used to succeed back in the L/O days of Ceph. At some
point peering code changed and this behaviour regressed. In short,
an OSD goes down then comes up, and no objects were modified in the mean time.
There should be no degraded PGs in this case.
As this commit is currently breaking make check on all PRs, I think it should
be re-evaluated and merged so whatever fix is needed along with this test to
make it work are merged together.
Fixes: https://tracker.ceph.com/issues/66556 Signed-off-by: Laura Flores <lflores@ibm.com>
Patrick Donnelly [Thu, 20 Jun 2024 15:43:17 +0000 (11:43 -0400)]
script/backport-create-issue: update tag custom field
The redmineup_tags was added and, apparently, the old "Tags" custom field with
id == 3 was deleted and then recreated with id == 31. This broke the script.
Update and refactor the id for that "Tags" custom field.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
The NFS tab in object and File nav uses same route due to which both
gets activated when one of them is clicked.
Hence, this PR separates the routing for Object and File nav.
Object-> NFS: /rgw/nfs
File-> NFS: /cephfs/nfs
Both routes use same NFS List and Form component but under different
routes as mentioned above.
Changes summary
- updated route for File from "/fs" to "/cephfs/<any_other_sub_route>"
to support both fs and nfs tabs. Since using `/fs` and `/fs/nfs` will
activate both paths and it will be an undesirable user experience.
- `getFsalRouteFromPath` helper function to set the storage backend from
route.
- removed `stoarge-backend` field from nfs form as now route decides teh
storage backend
- breadcrumbs redirect to respective navs
- updated e2e tests
- updated unit tests
- changes list page of object-> nfs page to say Bucket instead of Path
Xiubo Li [Wed, 19 Jun 2024 03:27:33 +0000 (11:27 +0800)]
common/TrackedOp: do not count the ops marked as nowarn
If an op is marked as nowarn then it won't be counted as the slow
requests, but currently it will count the initiated time when
iterating the inflight ops.
For example:
[WRN] : 1 slow requests, 1 included below; oldest blocked for > 38.764892 secs
[WRN] : slow request 33.875059 seconds old, received at 2024-06-17T14:14:34.228261+0000: client_request(client.78109915:11369251 mkdir #0x1008ecedea2/chk-89588 2024-06-17T14:14:34.097825+0000 caller_uid=1002960000, caller_gid=0{0,1002960000,}) currently failed to wrlock, waiting
The oldest blocked request is 38.764892 old, but the oldest slow
request reported is 33.875059 old.
Fixes: commit e4160d7e783 ("mds: don't report slow request for blocked filelock request") Fixes: https://tracker.ceph.com/issues/66557 Signed-off-by: Xiubo Li <xiubli@redhat.com>