Yuri Weinstein [Wed, 12 May 2021 19:27:01 +0000 (12:27 -0700)]
Merge pull request #41227 from ceph/wip-yuriw-nautilus-p2p
nautilus: qa/tests: advanced nautilus initial version to 14.2.20
Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Yuri Weinstein [Wed, 12 May 2021 17:00:42 +0000 (10:00 -0700)]
Merge pull request #41156 from smithfarm/wip-50366-nautilus
nautilus: rgw: during reshard lock contention, adjust logging
Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
Yuri Weinstein [Tue, 11 May 2021 19:48:16 +0000 (12:48 -0700)]
qa/tests: resolved comment - changed to 14.2.20
Signed-off-by: Yuri Weinstein <yweinste@redhat.com>
Neha Ojha [Tue, 11 May 2021 16:55:42 +0000 (09:55 -0700)]
Merge pull request #41213 from tchaikov/nautilus-49919
nautilus: mon/OSDMonitor: drop stale failure_info after a grace period
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
Ernesto Puerta [Tue, 11 May 2021 07:46:46 +0000 (09:46 +0200)]
Merge pull request #41253 from rhcs-dashboard/wip-50724-nautilus
nautilus: mgr/dashboard: fix base-href: revert it to previous approach
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Yuri Weinstein [Mon, 10 May 2021 14:46:15 +0000 (07:46 -0700)]
Merge pull request #40667 from smithfarm/wip-49092-nautilus
nautilus: rgw/http: add timeout to http client
Reviewed-by: Yuval Lifshitz <yuvalif@yahoo.com>
Yuri Weinstein [Mon, 10 May 2021 14:45:03 +0000 (07:45 -0700)]
Merge pull request #40713 from smithfarm/wip-49471-nautilus
nautilus: qa: bump osd heartbeat grace for ffsb workload
Reviewed-by: Ramana Raja <rraja@redhat.com>
Yuri Weinstein [Mon, 10 May 2021 14:43:46 +0000 (07:43 -0700)]
Merge pull request #41173 from ifed01/wip-ifed-better-onode-trim-nau
nautilus:os/bluestore: do not count pinned entries as trimmed ones.
Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
Avan Thakkar [Fri, 7 May 2021 09:38:11 +0000 (15:08 +0530)]
mgr/dashboard: fix base-href: revert it to previous approach
Fixes: https://tracker.ceph.com/issues/50684
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
(cherry picked from commit
b6f92922f5c80223fd288d98ce85405a650c0135)
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/app.module.ts
- Adopt the changes coming from master for this file.
Yuri Weinstein [Sat, 8 May 2021 19:52:59 +0000 (12:52 -0700)]
Merge pull request #36183 from smithfarm/wip-46480-nautilus
nautilus: mds: send scrub status to ceph-mgr only when scrub is running
Reviewed-by: Ramana Raja <rraja@redhat.com>
Yuri Weinstein [Fri, 7 May 2021 19:55:25 +0000 (12:55 -0700)]
Merge pull request #40920 from neha-ojha/wip-50403-nautilus
nautilus: common/options/global.yaml.in: increase default value of bluestore_cache_trim_max_skip_pinned
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Yuri Weinstein [Fri, 7 May 2021 16:07:46 +0000 (09:07 -0700)]
qa/tests: advanced nautilus initial version to 14.2.20
Signed-off-by: Yuri Weinstein <yweinste@redhat.com>
Kefu Chai [Sun, 14 Mar 2021 03:56:59 +0000 (11:56 +0800)]
osd: drop entry in failure_pending when resetting stale peer
no need to keep it in the pending list anymore.
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit
ff077fc3ea7d1595679c7053cda3b16d68aefd01)
Kefu Chai [Thu, 11 Mar 2021 13:13:13 +0000 (21:13 +0800)]
mon/OSDMonitor: drop stale failure_info
failure_info keeps strong references of the MOSDFailure messages
sent by osd or peon monitors, whenever monitor starts to handle
an MOSDFailure message, it registers it in its OpTracker. and
the failure report messageis unregistered when monitor acks them
by either canceling them or replying the reporters with a new
osdmap marking the target osd down. but if this does not happen,
the failure reports just pile up in OpTracker. and monitor considers
them as slow ops. and they are reported as SLOW_OPS health warning.
in theory, it does not take long to mark an unresponsive osd down if
we have enough reporters. but there is chance, that a reporter fails
to cancel its report before it reboots, and the monitor also fails
to collect enough reports and mark the target osd down. so the
target osd never gets an osdmap marking it down, so it won't send
an alive message to monitor to fix this.
in this change, we check for the stale failure info in tick(), and
simply drop the stale reports. so the messages can released and
marked "done".
Fixes: https://tracker.ceph.com/issues/47380
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit
a124ee85b03e15f4ea371358008ecac65f9f4e50)
Conflicts:
src/mon/OSDMonitor.h: trivial resolution
Kefu Chai [Thu, 11 Mar 2021 10:28:18 +0000 (18:28 +0800)]
mon/OSDMonitor: restructure OSDMonitor::check_failures() loop
will add a trim failures call in the loop, which mutates failure_info,
while we are still iterating this map. so have to restructure the loop
a little bit.
Fixes: https://tracker.ceph.com/issues/47380
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit
6e512b2f1e228eb808d6bff1e5c159c4d16667ef)
Yuri Weinstein [Wed, 5 May 2021 21:04:57 +0000 (14:04 -0700)]
Merge pull request #40753 from smithfarm/wip-50073-nautilus
nautilus: mgr/PyModule: put mgr_module_path before Py_GetPath()
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Yuri Weinstein [Wed, 5 May 2021 17:56:39 +0000 (10:56 -0700)]
Merge pull request #41167 from tchaikov/nautilus-doc-build
nautilus: build python extensions using distutils
Reviewed-by: Yuri Weinstein <yweinste@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
Igor Fedotov [Wed, 5 May 2021 13:02:24 +0000 (16:02 +0300)]
os/bluestore: do not count pinned entries as trimmed ones.
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
Kefu Chai [Mon, 8 Jul 2019 14:08:57 +0000 (22:08 +0800)]
pybind: use distutils.sysconfig for compiling flags
this allows maintainer to override the compiling flags when
cross-compiling Ceph.
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit
6050f28870e15ca80e59e0804783205d222f8493)
Kefu Chai [Sat, 23 Feb 2019 14:26:48 +0000 (22:26 +0800)]
pybind: encode flattened dict
in python3, the keys and values in dict are unicode strings, so we need
to encode them before passing them to underlying librados' C API.
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit
6cb23f9ce6b3848a7fd93d135acc143f2ae7cba1)
Kefu Chai [Sat, 23 Feb 2019 14:19:11 +0000 (22:19 +0800)]
pybind: extract flatten_dict() out
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit
31ea5a7e930978151265678aedf4e37ae9e97776)
Kefu Chai [Thu, 21 Feb 2019 14:54:30 +0000 (22:54 +0800)]
pybind: set language_level for cythonize explicitly
Compiling rbd.pyx because it changed.
[1/1] Cythonizing rbd.pyx
/usr/lib/python2.7/dist-packages/Cython/Compiler/Main.py:367:
FutureWarning: Cython directive 'language_level' not set, using 2 for
now (Py2). This will change in a later re
lease! File: /var/ssd/ceph/src/pybind/rbd/rbd.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
this warning is raised by cython 0.29.2
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit
fb760dd7115d46547794d855b413ab0c3139a37e)
J. Eric Ivancich [Wed, 14 Apr 2021 17:55:22 +0000 (13:55 -0400)]
rgw: during reshard lock contention, adjust logging
When RGW fails to get a lock on a reshard log, we log it in such a way
that it looks like an error. Instead we'll make sure that the log
message is informational.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
(cherry picked from commit
6d3dee37791ad427a3435c493a1d7874ba075674)
Yuri Weinstein [Tue, 4 May 2021 16:38:59 +0000 (09:38 -0700)]
Merge pull request #41099 from amathuria/wip-50125-nautilus
nautilus: mon: Modifying trim logic to change paxos_service_trim_max dynamically
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
Yuri Weinstein [Tue, 4 May 2021 16:38:13 +0000 (09:38 -0700)]
Merge pull request #41098 from dvanders/nautilus_neg_progress
nautilus: mon: ensure progress is [0,1] before printing
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Yuri Weinstein [Tue, 4 May 2021 16:37:22 +0000 (09:37 -0700)]
Merge pull request #41016 from idryomov/wip-reset-authenticate-err-nautilus
nautilus: mon/MonClient: reset authenticate_err in _reopen_session()
Reviewed-by: Kefu Chai <kchai@redhat.com>
Yuri Weinstein [Tue, 4 May 2021 15:31:40 +0000 (08:31 -0700)]
Merge pull request #40987 from trociny/wip-50481-nautilus
nautilus: os/FileStore: don't propagate split/merge error to "create"/"remove"
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Aishwarya Mathuria [Fri, 12 Mar 2021 11:27:40 +0000 (16:57 +0530)]
mon: Modifying trim logic to change paxos_service_trim_max dynamically
Currently, the Paxos Service trim logic is bounded by a max value (paxos_service_trim_max). This change dynamically modifies the max value when the number of logs to be trimmed is higher than paxos_service_trim_max.
The paxos_service_trim_max_multiplier has been added in case we want to increase paxos_service_trim_max by a certain factor. If this option is enabled we get a new upper bound when trim sizes are high.
Fixes: https://tracker.ceph.com/issues/50004
Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
(cherry picked from commit
2e1141e43980a0a44b18159860ebf9cc38316435)
Aishwarya Mathuria [Mon, 29 Mar 2021 14:05:12 +0000 (19:35 +0530)]
mon: Adding variables for Paxos trim
1. Define variables for paxos_service_trim_min and paxos_service_trim_max.
2. Use them in place of g_conf()→paxos_service_trim_min and g_conf()→paxos_service_trim_max
Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
(cherry picked from commit
45c59f2f0d0d90beb9163804e86139c551cf505b)
Dan van der Ster [Fri, 30 Apr 2021 06:31:07 +0000 (08:31 +0200)]
nautilus: mon: ensure progress is [0,1] before printing
Ensure that progress is in the expected range [0,1] before
rendering a progress bar.
Nautilus only because this is avoided in future releases thanks
to
5f95ec4457059889bc4dbc2ad25cdc0537255f69.
Related-to: https://tracker.ceph.com/issues/50587
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
Nathan Cutler [Thu, 29 Apr 2021 19:52:35 +0000 (21:52 +0200)]
Merge pull request #39912 from gerald-yang/nautilus-49640
nautilus: common: Fix assertion when disabling and re-enabling clog_to_monitors
Reviewed-by: Neha Ojha <nojha@redhat.com>
Yuri Weinstein [Thu, 29 Apr 2021 16:50:46 +0000 (09:50 -0700)]
Merge pull request #40744 from smithfarm/wip-50255-nautilus
nautilus: mds: trim cache regularly for standby-replay
Reviewed-by: Ramana Raja <rraja@redhat.com>
Yuri Weinstein [Wed, 28 Apr 2021 21:25:32 +0000 (14:25 -0700)]
Merge pull request #40730 from smithfarm/wip-50179-nautilus
nautilus: cephfs: client: only check pool permissions for regular files
Reviewed-by: Ramana Raja <rraja@redhat.com>
Yuri Weinstein [Wed, 28 Apr 2021 19:09:08 +0000 (12:09 -0700)]
Merge pull request #40722 from smithfarm/wip-50026-nautilus
nautilus: client: fire the finish_cap_snap() after buffer being flushed
Reviewed-by: Ramana Raja <rraja@redhat.com>
Yuri Weinstein [Wed, 28 Apr 2021 19:08:45 +0000 (12:08 -0700)]
Merge pull request #40720 from smithfarm/wip-49853-nautilus
nautilus: mds: fix race of fetching large dirfrag
Reviewed-by: Ramana Raja <rraja@redhat.com>
Dan van der Ster [Wed, 28 Apr 2021 11:34:18 +0000 (13:34 +0200)]
Merge pull request #41060 from dvanders/50549
nautilus: os/bluestore: be more verbose in _open_super_meta by default.
Igor Fedotov [Fri, 11 Oct 2019 14:34:58 +0000 (17:34 +0300)]
os/bluestore: be more verbose in _open_super_meta by default.
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit
4087f82aea674df4c7b485bf804f3a9c98ae3741)
Kefu Chai [Tue, 30 Mar 2021 18:32:38 +0000 (02:32 +0800)]
mgr/PyModule: put mgr_module_path before Py_GetPath()
pip comes with _vendor/progress. so there is chance to import the vendored
version of "progress" module instead of the "progress" mgr module, and
fail to import the latter.
in this change, the order of paths are rearranged so the configured
`mgr_module_path` is put before the return value of `Py_GetPath()`.
Fixes: https://tracker.ceph.com/issues/50058
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit
8638f526a9d04c3dfd758073980d709165070336)
Conflicts:
src/mgr/PyModule.cc
- nautilus has a preprocessor directive "#if PY_MAJOR_VERSION >= 3"
which is not there in master
- since we still need to support python2, apply the same change to
the #else branch at line 351
Yuri Weinstein [Tue, 27 Apr 2021 21:06:04 +0000 (14:06 -0700)]
Merge pull request #40697 from smithfarm/wip-49567-nautilus
nautilus: tests: ceph_test_rados_api_watch_notify: Allow for reconnect
Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
Yuri Weinstein [Tue, 27 Apr 2021 19:27:55 +0000 (12:27 -0700)]
Merge pull request #40751 from smithfarm/wip-50144-nautilus
nautilus: qa/tasks/vstart_runner.py: start max required mgrs
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
Yuri Weinstein [Tue, 27 Apr 2021 19:27:11 +0000 (12:27 -0700)]
Merge pull request #40752 from smithfarm/wip-50211-nautilus
nautilus: os/bluestore/BlueFS: do not _flush_range deleted files
Reviewed-by: Igor Fedotov <ifedotov@suse.com>
Yuri Weinstein [Tue, 27 Apr 2021 18:58:59 +0000 (11:58 -0700)]
Merge pull request #40747 from smithfarm/wip-49731-nautilus
nautilus: osd: do not dump an osd multiple times
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
Ernesto Puerta [Tue, 27 Apr 2021 17:21:48 +0000 (19:21 +0200)]
Merge pull request #40818 from rhcs-dashboard/wip-50172-nautilus
nautilus: mgr/dashboard: debug nodeenv hangs
Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Ernesto Puerta [Tue, 27 Apr 2021 17:18:59 +0000 (19:18 +0200)]
Merge pull request #39984 from aaSharma14/wip-49656-nautilus
nautilus: mgr/dashboard: test prometheus rules through promtool
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Ernesto Puerta [Tue, 27 Apr 2021 17:13:40 +0000 (19:13 +0200)]
Merge pull request #41021 from rhcs-dashboard/wip-50417-nautilus
nautilus: mgr/dashboard: filesystem pool size should use stored stat
Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Ernesto Puerta [Tue, 27 Apr 2021 17:12:59 +0000 (19:12 +0200)]
Merge pull request #40650 from rhcs-dashboard/wip-50202-nautilus
nautilus: mgr/dashboard: Revoke read-only user's access to Manager modules
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Ernesto Puerta [Tue, 27 Apr 2021 17:12:17 +0000 (19:12 +0200)]
Merge pull request #40490 from aaSharma14/wip-50050-nautilus
nautilus: mgr/dashboard: Remove username, password fields from Manager Modules/dashboard,influx
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Yuri Weinstein [Tue, 27 Apr 2021 16:50:51 +0000 (09:50 -0700)]
Merge pull request #40750 from smithfarm/wip-50122-nautilus
nautilus: crush/CrushLocation: do not print logging message in constructor
Reviewed-by: Kefu Chai <kchai@redhat.com>
Yuri Weinstein [Tue, 27 Apr 2021 16:48:26 +0000 (09:48 -0700)]
Merge pull request #40700 from smithfarm/wip-50130-nautilus
nautilus: monmaptool: Don't call set_port on an invalid address
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
Yuri Weinstein [Tue, 27 Apr 2021 16:47:30 +0000 (09:47 -0700)]
Merge pull request #40676 from smithfarm/wip-49531-nautilus
nautilus: mgr: add --max <n> to 'osd ok-to-stop' command
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Yuri Weinstein [Tue, 27 Apr 2021 14:39:17 +0000 (07:39 -0700)]
Merge pull request #39903 from singuliere/wip-49376-nautilus
nautilus: cmake: build static libs if they are internal ones
Reviewed-by: Kefu Chai <kchai@redhat.com>
Nathan Cutler [Thu, 8 Apr 2021 20:21:01 +0000 (22:21 +0200)]
qa: bump osd heartbeat grace for ffsb workload
This is a manual backport of
84528b1693f2abdeff5b816253a1e01ce4a19d36
A cherry-pick was not undertaken because the structure of the Nautilus yaml
files under qa/ is very different than in master.
Signed-off-by: Nathan Cutler <ncutler@suse.com>
Yuri Weinstein [Mon, 26 Apr 2021 23:55:05 +0000 (16:55 -0700)]
Merge pull request #40709 from smithfarm/wip-49562-nautilus
nautilus: qa: delete all fs during tearDown
Reviewed-by: Ramana Raja <rraja@redhat.com>
Yuri Weinstein [Mon, 26 Apr 2021 23:54:36 +0000 (16:54 -0700)]
Merge pull request #40704 from smithfarm/wip-49516-nautilus
nautilus: pybind/cephfs: DT_REG and DT_LNK values are wrong
Reviewed-by: Ramana Raja <rraja@redhat.com>
Yuri Weinstein [Mon, 26 Apr 2021 23:54:01 +0000 (16:54 -0700)]
Merge pull request #40701 from smithfarm/wip-49473-nautilus
nautilus: test: use std::atomic<bool> instead of volatile for cb_done var
Reviewed-by: Ramana Raja <rraja@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Yuri Weinstein [Mon, 26 Apr 2021 22:31:52 +0000 (15:31 -0700)]
Merge pull request #40714 from smithfarm/wip-49613-nautilus
nautilus: qa: add sleep for blocklisting to take effect
Reviewed-by: Aishwarya Mathuria <amathuri@redhat.com>
Nizamudeen A [Tue, 6 Apr 2021 15:54:51 +0000 (21:24 +0530)]
mgr/dashboard: Revoke read-only user's access to Manager modules
This will disable read only user to read/open Manager Modules page in
Ceph Dashboard where some of the security related informations are
shown.
Fixes: https://tracker.ceph.com/issues/50174
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit
fb607f1561371340d2c9d4e16c4eaceb365fd926)
Conflicts:
src/pybind/mgr/dashboard/services/access_control.py
- Some of the changes are not backported because those features are
not implemented on nautilus. So I left them as it is
Avan Thakkar [Thu, 15 Apr 2021 13:28:52 +0000 (18:58 +0530)]
mgr/dashboard: filesystem pool size should use stored stat
Fixes: https://tracker.ceph.com/issues/50195
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Replaces 'bytes_used' with 'stored' stat to see the correct results
of CephFS pool stats.
(cherry picked from commit
7110fd4e0c257d20aa56591f05d74a2851a2fe00)
Ilya Dryomov [Thu, 22 Apr 2021 10:29:59 +0000 (12:29 +0200)]
mon/MonClient: reset authenticate_err in _reopen_session()
Otherwise, if "mon host" list has at least one unqualified IP address
without a port and both msgr1 and msgr2 are turned on, there is a race
affecting MonClient::authenticate().
For backwards compatibility reasons such an address is expanded into
two entries, each being treated as a separate monitor. For example,
"mon host = 1.2.3.4" generates the following initial monmap:
0: v1:1.2.3.4:6789/0
1: v2:1.2.3.4:3300/0
See MonMap::_add_ambiguous_addr() for details.
Then, the following can happen:
1. we connect to both endpoints and attempt to authenticate
2. authenticate() sets authenticate_err to 1 and sleeps on auth_cond
3. msgr1 authenticates first (i.e. it gets the final MAuth message
before msgr2 gets the monmap)
4. active_con is set to msgr1 connection, msgr2 connection is closed
as redundant
5. _finish_auth() sets authenticate_err to 0 and signals auth_cond,
but before either the monmap is received or authenticate() wakes
up, msgr1 connection is closed due to a network hiccup
6. ms_handle_reset() calls _reopen_session() which clears active_con
and again connects to both endpoints and attempts to authenticate
7. authenticate() wakes up, sees that there is no active_con and goes
back to sleep, but this time with authenticate_err == 0
8. msgr2 authenticates first but doesn't call _finish_auth() because
it is called only if authenticate_err == 1
9. active_con is set to msgr2 connection, msgr1 connection is closed
as redundant
10. authenticate() hangs on auth_cond until timeout defaulting to 5
minutes
The discrepancy between msgr1 and msgr2 plays a key role. For msgr1,
authentication is considered to be complete as soon as the final MAuth
message is received -- the monmap is not waited for. For msgr2,
authentication is considered to be complete only after the monmap is
received.
Avoid the race by setting authenticate_err to 1 in _reopen_session(),
so that _finish_auth() is called on/after every authentication attempt
instead of just the first one.
Fixes: https://tracker.ceph.com/issues/50477
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
8c9de31c9806629d22c30b35769e664446090046)
Ilya Dryomov [Thu, 22 Apr 2021 10:29:59 +0000 (12:29 +0200)]
mon/MonClient: remove reopen_session() callback mechanism
It's been unused for over 5 years, since commit
17d24292b812 ("osd:
remove old stats backoff mechanism").
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
853c04b5a66721755830c5b46b695f6c86cb406b)
Conflicts:
src/mon/MonClient.cc [ commit
85157d5aae3d ("mon:
s/Mutex/ceph::mutex/") not in nautilus ]
src/mon/MonClient.h [ commit
a144cacdd88b ("mon/MonClient:
add send_mon_message(MessageRef)") not in nautilus ]
Aashish Sharma [Thu, 25 Mar 2021 05:55:37 +0000 (11:25 +0530)]
mgr/dashboard:Simplify some complex calculations in test_alerts.yml
run-promtool-unittests is failing with difference in floating point values in some complex calculations. This PR intends to simplify those calculations and fix this issue.
Fixes: https://tracker.ceph.com/issues/49952
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit
8d2f39e6c568afb6880689160212bcc93057e194)
Kefu Chai [Mon, 22 Mar 2021 06:07:54 +0000 (14:07 +0800)]
ceph.spec,install-deps: use golang-github-prometheus for promtools
instead of installing docker for using promtools, install
golang-github-prometheus.
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit
e33e3a931db97d01318643ec686fe63fdd614082)
Conflicts:
ceph.spec.in (#287-295 new changes overrid these lines, merged correctly now)
install-deps.sh (changed dnf to yumdnf)
Kefu Chai [Fri, 19 Mar 2021 02:32:16 +0000 (10:32 +0800)]
test: run promtool test without docker on ubuntu/focal
before this change, we use docker for running promtools offered by
a docker image, but this is not efficient, and quite a few developers
do not want to use docker for running "make check". this change was
introduced by #39246, the reason was that, in Ceph's CI process, we
are using Ubuntu/Bionic for running "make check" jobs, but prometheus
packaged by Bionic does not offer the "test rules" command. so, to
address problem, we are using "dnanexus/promtool:2.9.2" docker image
for verifying monitoring/prometheus/alerts/test_alerts.yml.
after this change, we use prometheus packaged by debian derivatives
instead of pulling a docker image.
* debian/control: add prometheus as a "make check" dependency
* install-deps.sh: partially revert
53a5816deda0874a3a37e131e9bc22d88bb2a588, as we don't need to
pull docker or start docker service for using promtool anymore.
* cmake: check if promtool is capable of running "test rules"
command, bail out if it is not.
see also: https://tracker.ceph.com/issues/49653
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit
f381aa8bf0e175940153975fa1534ef0559ecadd)
Conflicts:
debian/control (python-cherrypy3 conficts with python-cherrypy3 | python3-cherrypy3 in nautilus)
install-deps.sh (preload_wheels_for_tox method not in nautilus so removed that)
src/test/CMakeLists.txt (#561-594 new changes overrid these lines, merged correctly now)
Aashish Sharma [Wed, 3 Feb 2021 07:23:56 +0000 (12:53 +0530)]
mgr/dashboard:test prometheus rules through promtool
This PR intends to add unit testing for prometheus rules using promtool. To run the tests run 'run-promtool-unittests.sh' file.
Fixes: https://tracker.ceph.com/issues/45415
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit
53a5816deda0874a3a37e131e9bc22d88bb2a588)
Conflicts:
install-deps.sh (changed dnf to yumdnf , preload_wheels method not in nautilus so removed that)
src/test/CMakeLists.txt (#561-594 new changes overrid these lines, merged correctly now)
Aashish Sharma [Mon, 8 Mar 2021 09:44:00 +0000 (15:14 +0530)]
mgr/dashboard: Remove username, password fileds from -Cluster/Manager Modules/dashboard
Username, password fields are empty in Cluster/Manager Modules/dashboard.Since this functionality is when dashboard supported single user-password, now we need to remove these fields from here.
Fixes: https://tracker.ceph.com/issues/49645
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit
d8fba40d982bb1ad824961aa210475bd7aa51524)
Conflicts:
src/pybind/mgr/dashboard/services/access_control.py(no check_migrate_v0_to_current and check_migrate_v1_to_current methods in nautilus)
src/pybind/mgr/dashboard/tests/test_access_control.py(no test_load_v2 method in nautilus, 'time' import no longer needed with new changes)
Mykola Golub [Mon, 19 Apr 2021 07:32:01 +0000 (08:32 +0100)]
os/FileStore: don't propagate split/merge error to "create"/"remove"
Either ignore or terminate, otherwise it may confuse the
"create"/"remove" caller.
Fixes: https://tracker.ceph.com/issues/50395
Signed-off-by: Mykola Golub <mgolub@suse.com>
(cherry picked from commit
936898b8caf7b13a120ea6108df0b0dac29882c4)
Ernesto Puerta [Wed, 21 Apr 2021 13:56:46 +0000 (15:56 +0200)]
Merge pull request #40959 from rhcs-dashboard/wip-50459-nautilus
nautilus: vstart.sh: disable "auth_allow_insecure_global_id_reclaim"
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Thu, 15 Apr 2021 13:07:53 +0000 (21:07 +0800)]
vstart.sh: disable "auth_allow_insecure_global_id_reclaim"
to silence the health warning of "mons are allowing insecure global_id
reclaim", which prevents the cluster from being active+clean. couple
tests are expecting a warning free cluster before they starts.
as this option is enabled by default for appeasing the old clients, but when it
comes to most of upstream testing, we can just disable it.
Fixes: https://tracker.ceph.com/issues/50374
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit
77a8376d0731c24e7bbf24523d3d7450e9f978af)
Yuri Weinstein [Tue, 20 Apr 2021 18:11:55 +0000 (11:11 -0700)]
Merge pull request #40270 from kotreshhr/wip-49903-nautilus
nautilus: mgr/volumes: Retain suid guid bits in clone
Reviewed-by: Ramana Raja <rraja@redhat.com>
Ilya Dryomov [Tue, 20 Apr 2021 08:56:25 +0000 (10:56 +0200)]
Merge branch 'nautilus-saved' into nautilus
Neha Ojha [Thu, 15 Apr 2021 16:44:27 +0000 (16:44 +0000)]
common/options/global.yaml.in: increase default value of bluestore_cache_trim_max_skip_pinned
This option controls the rate of trimming of onodes and the earlier default of
64 has been seen to be too low for large clusters, leading to buildup of
onodes resulting in memory growth.
Increase the default value to 1000, since there are no known downsides to it.
Fixes: https://tracker.ceph.com/issues/50217
Signed-off-by: Neha Ojha <nojha@redhat.com>
(cherry picked from commit
26d7336d6b65f987298ede5d2c5c435191f1405c)
Conflicts:
src/common/options/global.yaml.in - file does not exist in nautilus
Jenkins Build Slave User [Mon, 19 Apr 2021 14:11:15 +0000 (14:11 +0000)]
14.2.20
Patrick Donnelly [Tue, 30 Mar 2021 03:09:30 +0000 (20:09 -0700)]
mds: trim cache regularly for standby-replay
This change is slightly awkward because standby-replay MDS do not do all
the kinds of upkeep a normal active MDS does. In particular, it is not
going to recall client state from clients.
This diff also merges the extra recall_client_state in
MDCache::check_memory_usage into its only caller (the upkeep thread)
where it was also doing a recall. That's just a matter of merging the
recall flags. This has the added benefit of making
MDCache::check_memory_usage callable for all MDS daemons regardless of
state.
Fixes: https://tracker.ceph.com/issues/50048
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit
19293d9b9d19c32af4de655cd59e206056b2417d)
Conflicts:
src/mds/MDCache.cc
- conflict was caused by an additional line "upkeep_last_trim = clock::now();"
which should have been dropped by
da45b4e7ca3a1c88c44424256c104b07710e9679
(nautilus cherry-pick of master commit
af4cac5ec7bab4e5bf8936cd685f6ee8dcb38127)
src/mds/MDCache.h
Patrick Donnelly [Tue, 30 Mar 2021 03:07:25 +0000 (20:07 -0700)]
mds: remove extra heap release
We now regularly do this unconditionally in the MDS, see the upkeep
thread.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit
5a9d6c080d77c7e3644b02cab4f8c91900f4fe8f)
Conflicts:
src/mds/MDCache.h
Xiubo Li [Mon, 29 Mar 2021 04:02:09 +0000 (12:02 +0800)]
client: only check pool permissions for regular files
There is no need to do a check_pool_perm() on anything that isn't
a regular file, as the MDS is what handles talking to the OSD in
those cases. Just return 0 if it's not a regular file.
Signed-off-by: Xiubo Li <xiubli@redhat.com>
(cherry picked from commit
59a3006b88f479cb5333e16fe30201ea14ab1717)
Ilya Dryomov [Thu, 15 Apr 2021 13:18:58 +0000 (15:18 +0200)]
auth/cephx: make KeyServer::build_session_auth_info() less confusing
The second KeyServer::build_session_auth_info() overload is used only
by the monitor, for mon <-> mon authentication. The monitor passes in
service_secret (mon secret) and secret_id (-1). The TTL is irrelevant
because there is no rotation.
However the signature doesn't make it obvious. Clarify that
service_secret and secret_id are input parameters and info is the only
output parameter.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
6f12cd3688b753633c8ff29fb3bd64758f960b2b)
Ilya Dryomov [Thu, 15 Apr 2021 07:48:13 +0000 (09:48 +0200)]
auth/cephx: cap ticket validity by expiration of "next" key
If auth_mon_ticket_ttl is increased by several times as done in
commit
522a52e6c258 ("auth/cephx: rotate auth tickets less often"),
active clients eventually get stuck because the monitor sends out an
auth ticket with a bogus validity. The ticket is secured with the
"current" secret that is scheduled to expire according to the old TTL,
but the validity of the ticket is set to the new TTL. As a result,
the client simply doesn't attempt to renew, letting the secrets rotate
potentially more than once. When that happens, the client first hits
auth authorizer errors as it tries to renew service tickets and when
it finally gets to renewing the auth ticket, it hits the insecure
global_id reclaim wall.
Cap TTL by expiration of "next" key -- the "current" key may be
milliseconds away from expiration and still be used, legitimately.
Do it in KeyServerData alongside key rotation code and propagate the
capped TTL to the upper layer.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
370c9b13970d47a55b1b20ef983c6f01236c9565)
Conflicts:
src/auth/cephx/CephxKeyServer.cc [ commit
ef3c42cd6481 ("auth:
EACCES, not EPERM") not in nautilus ]
Ilya Dryomov [Thu, 15 Apr 2021 07:47:50 +0000 (09:47 +0200)]
auth/cephx: drop redundant KeyServerData::get_service_secret() overload
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
3078af716505ae754723864786a41a6d6af0534c)
Ernesto Puerta [Tue, 6 Apr 2021 11:45:15 +0000 (13:45 +0200)]
mgr/dashboard: debug nodeenv hangs
Increase verbosity in nodeenv command for debugging purposes.
Fixes: https://tracker.ceph.com/issues/50044
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
(cherry picked from commit
2c2a397f84455147e1cc5c7b5fc1289e47bbe5ee)
Conflicts:
make-dist
src/pybind/mgr/dashboard/CMakeLists.txt
- Adopted the master branch changes.
Kotresh HR [Thu, 18 Mar 2021 12:54:44 +0000 (18:24 +0530)]
mgr/volumes: Retain suid/guid bits in subvolume clone
Fixes: https://tracker.ceph.com/issues/49882
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit
92dc982318fa7d49c3185615b84a7a7764c6ed42)
Conflicts:
qa/tasks/cephfs/test_volumes.py: Few of the testcases are not preset
in octopus, hence the conflicts.
Kotresh HR [Thu, 18 Mar 2021 12:51:05 +0000 (18:21 +0530)]
pybind/cephfs: Add lchmod python binding
Fixes: https://tracker.ceph.com/issues/49882
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit
b2375adce085e98bef521422441b80a945e38c80)
Conflicts:
src/pybind/cephfs/mock_cephfs.pxi : Not present in octopus
src/pybind/cephfs/c_cephfs.pxd : Not present in octopus
src/pybind/cephfs/cephfs.pyx : Few of the fops is not part of octopus
which got pulled as part of this backport
src/test/pybind/test_cephfs.py : Few of the fops is not part of
octopus, which got pulled as part of this backport. Added missing
stat import.
Kotresh HR [Thu, 18 Mar 2021 12:51:05 +0000 (18:21 +0530)]
client/libcephfs: Add lchmod
Fixes: https://tracker.ceph.com/issues/49882
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit
bb1fd87e3bc45b20f377438cddde6c6307299a29)
Sage Weil [Sun, 28 Mar 2021 22:07:57 +0000 (18:07 -0400)]
qa/standalone: default to disable insecure global id reclaim
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit
72c4fc75ad301980baebc7789ed6391444057e5b)
Sage Weil [Fri, 26 Mar 2021 22:08:46 +0000 (18:08 -0400)]
qa/tasks/ceph[adm].conf[.template]: disable insecure global_id reclaim health alerts
Turn these off everywhere for our tests so they don't interfere with our health checks.
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit
9f6fd4fe563c9cd4cf65316921d511b677c972e4)
Conflicts:
qa/tasks/cephadm.conf [ no cephadm in nautilus ]
Sage Weil [Thu, 25 Mar 2021 22:07:53 +0000 (18:07 -0400)]
mon/HealthMonitor: raise AUTH_INSECURE_GLOBAL_ID_RENEWAL[_ALLOWED]
Two new alerts:
- AUTH_INSECURE_GLOBAL_ID_RENEWAL_ALLOWED if we are allowing clients to reclaim
global_ids in an insecure manner (for backwards compatibility until
clients are upgraded)
- AUTH_INSECURE_GLBOAL_ID_RENEWAL if there are currently clients connected that
do not know how to securely renew their global_id, as exposed by
auth_expose_insecure_global_id_reclaim=true. The client auth names and IPs
are listed the alert details (up to a limit, at least).
The docs recommend operators mute these alerts instead of silencing, but
we still include option that allow the alerts to be disabled entirely.
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit
18b343b06e5dd904af425dc99e2c848e12f3b552)
Conflicts:
doc/rados/operations/health-checks.rst [ MON_DISK_* alerts
present but not documented in nautilus; "ceph health mute"
not in nautilus -- silencing temporarily is not possible ]
src/mon/HealthMonitor.cc [ commits
e4bf716bfa07 ("mon: store
a reference as member variable") and
d0eb22f3ba55
("mon/health_checks: associate a count with health_alert_t")
not in nautilus ]
Ilya Dryomov [Tue, 2 Mar 2021 14:09:26 +0000 (15:09 +0100)]
auth/cephx: ignore CEPH_ENTITY_TYPE_AUTH in requested keys
When handling CEPHX_GET_AUTH_SESSION_KEY requests from nautilus+
clients, ignore CEPH_ENTITY_TYPE_AUTH in CephXAuthenticate::other_keys.
Similarly, when handling CEPHX_GET_PRINCIPAL_SESSION_KEY requests,
ignore CEPH_ENTITY_TYPE_AUTH in CephXServiceTicketRequest::keys.
These fields are intended for requesting service tickets, the auth
ticket (which is really a ticket granting ticket) must not be shared
this way.
Otherwise we end up sharing an auth ticket that a) isn't encrypted
with the old session key even if needed (should_enc_ticket == true)
and b) has the wrong validity, namely auth_service_ticket_ttl instead
of auth_mon_ticket_ttl. In the CEPHX_GET_AUTH_SESSION_KEY case, this
undue ticket immediately supersedes the actual auth ticket already
encoded in the same reply (the reply frame ends up containing two auth
tickets).
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
05772ab6127bdd9ed2f63fceef840f197ecd9ea8)
Ilya Dryomov [Mon, 22 Mar 2021 18:16:32 +0000 (19:16 +0100)]
auth/cephx: rotate auth tickets less often
If unauthorized global_id (re)use is disallowed, a client that has
been disconnected from the network long enough for keys to rotate
and its auth ticket to expire (i.e. become invalid/unverifiable)
would not be able to reconnect.
The default TTL is 12 hours, resulting in a 12-24 hour reconnect
window (the previous key is kept around, so the actual window can be
up to double the TTL). The setting has stayed the same since 2009,
but it also hasn't been enforced. Bump it to get a 72 hour reconnect
window to cover for something breaking on Friday and not getting fixed
until Monday.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
522a52e6c258932274f0753feb623ce008519216)
Ilya Dryomov [Thu, 25 Mar 2021 19:59:13 +0000 (20:59 +0100)]
mon: fail fast when unauthorized global_id (re)use is disallowed
When unauthorized global_id (re)use is disallowed, we don't want to
let unpatched clients in because they wouldn't be able to reestablish
their monitor session later, resulting in subtle hangs and disrupted
user workloads.
Denying the initial connect for all legacy (CephXAuthenticate < v3)
clients is not feasible because a large subset of them never stopped
presenting their ticket on reconnects and are therefore compatible with
enforcing mode: most notably all kernel clients but also pre-luminous
userspace clients. They don't need to be patched and excluding them
would significantly hamper the adoption of enforcing mode.
Instead, force clients that we are not sure about to reconnect shortly
after they go through authentication and obtain global_id. This is
done in Monitor::dispatch_op() to capture both msgr1 and msgr2, most
likely instead of dispatching mon_subscribe.
We need to let mon_getmap through for "ceph ping" and "ceph tell" to
work. This does mean that we share the monmap, which lets the client
return from MonClient::authenticate() considering authentication to be
finished and causing the potential reconnect error to not propagate to
the user -- the client would hang waiting for remaining cluster maps.
For msgr1, this is unavoidable because the monmap is sent immediately
after the final MAuthReply. But for msgr2 this is rare: most of the
time we get to their mon_subscribe and cut the connection before they
process the monmap!
Regardless, the user doesn't get a chance to start a workload since
there is no proper higher-level session at that point.
To help with identifying clients that need patching, add global_id and
global_id_status to "sessions" output.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
08766a17edebb7450cd9b17cc2dc01efc068bb94)
Conflicts:
src/mon/Monitor.cc [ commit
e1163b445bbf ("mon: print
entity_name along with caps to debug log") not in nautilus ]
Ilya Dryomov [Sat, 13 Mar 2021 13:53:52 +0000 (14:53 +0100)]
auth/cephx: option to disallow unauthorized global_id (re)use
global_id is a cluster-wide unique id that must remain stable for the
lifetime of the client instance. The cephx protocol has a facility to
allow clients to preserve their global_id across reconnects:
(1) the client should provide its global_id in the initial handshake
message/frame and later include its auth ticket proving previous
possession of that global_id in CEPHX_GET_AUTH_SESSION_KEY request
(2) the monitor should verify that the included auth ticket is valid
and has the same global_id and, if so, allow the reclaim
(3) if the reclaim is allowed, the new auth ticket should be
encrypted with the session key of the included auth ticket to
ensure authenticity of the client performing reclaim. (The
included auth ticket could have been snooped when the monitor
originally shared it with the client or any time the client
provided it back to the monitor as part of requesting service
tickets, but only the genuine client would have its session key
and be able to decrypt.)
Unfortunately, all (1), (2) and (3) have been broken for a while:
- (1) was broken in 2016 by commit
a2eb6ae3fb57 ("mon/monclient:
hunt for multiple monitor in parallel") and is addressed in patch
"mon/MonClient: preserve auth state on reconnects"
- it turns out that (2) has never been enforced. When cephx was
being designed and implemented in 2009, two changes to the protocol
raced with each other pulling it in different directions: commits
0669ca21f4f7 ("auth: reuse global_id when requesting tickets")
and
fec31964a12b ("auth: when renewing session, encrypt ticket")
added the reclaim mechanism based strictly on auth tickets, while
commit
5eeb711b6b2b ("auth: change server side negotiation a bit")
allowed the client to provide global_id in the initial handshake.
These changes didn't get reconciled and as a result a malicious
client can assign itself any global_id of its choosing by simply
passing something other than 0 in MAuth message or AUTH_REQUEST
frame and not even bother supplying any ticket. This includes
getting a global_id that is being used by another client.
- (3) was broken in 2019 with addition of support for msgr2, where
the new auth ticket ends up being shared unencrypted. However the
root cause is deeper and a malicious client can coerce msgr1 into
the same. This also goes back to 2009 and is addressed in patch
"auth/cephx: ignore CEPH_ENTITY_TYPE_AUTH in requested keys".
Because (2) has never been enforced, no one noticed when (1) got
broken and we began to rely on this flaw for normal operation in
the face of reconnects due to network hiccups or otherwise. As of
today, only pre-luminous userspace clients and kernel clients are
not exercising it on a daily basis.
Bump CephXAuthenticate version and use a dummy v3 to distinguish
between legacy clients that don't (may not) include their auth ticket
and new clients. For new clients, unconditionally disallow claiming
global_id without a corresponding auth ticket. For legacy clients,
introduce a choice between permissive (current behavior, default for
the foreseeable future) and enforcing mode.
If the reclaim is disallowed, return EACCES. While MonClient does
have some provision for global_id changes and we could conceivably
implement enforcement by handing out a fresh global_id instead of
the provided one, those code paths have never been tested and there
are too many ways a sudden global_id change could go wrong.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
abebd643cc60fa8a7cb82dc29a9d5041fb3c3d36)
Conflicts:
src/auth/AuthServiceHandler.h [ bufferlist vs
ceph::buffer::list ]
src/auth/cephx/CephxProtocol.h [ ditto ]
src/auth/cephx/CephxServiceHandler.h [ ditto ]
src/auth/none/AuthNoneServiceHandler.h [ ditto ]
Ilya Dryomov [Tue, 30 Mar 2021 09:10:17 +0000 (11:10 +0200)]
auth/cephx: make cephx_decode_ticket() take a const ticket_blob
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
6b860684c6e59b11c727206819805f89f0518575)
Ilya Dryomov [Tue, 9 Mar 2021 15:33:55 +0000 (16:33 +0100)]
auth/AuthServiceHandler: keep track of global_id and whether it is new
AuthServiceHandler already has global_id field, but it is unused.
Revive it and let the handler know whether global_id is newly assigned
by the monitor or provided by the client.
Lift the setting of entity_name into AuthServiceHandler.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
b50b6abd60e730176a7ef602bdd25d789a3c467d)
Conflicts:
src/auth/AuthServiceHandler.h [ bufferlist vs
ceph::buffer::list ]
src/auth/cephx/CephxServiceHandler.cc [ ditto ]
src/auth/cephx/CephxServiceHandler.h [ ditto ]
src/auth/none/AuthNoneServiceHandler.h [ ditto ]
Ilya Dryomov [Tue, 9 Mar 2021 13:36:39 +0000 (14:36 +0100)]
auth/AuthServiceHandler: build_cephx_response_header() is cephx-specific
Make the one in CephxServiceHandler private and drop the stub in
AuthNoneServiceHandler.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
49cba02a750d4c1ab68399401f0c04f9c9be5b9e)
Conflicts:
src/auth/cephx/CephxServiceHandler.h [ bufferlist vs
ceph::buffer::list ]
src/auth/none/AuthNoneServiceHandler.h [ ditto ]
Ilya Dryomov [Tue, 9 Mar 2021 13:25:39 +0000 (14:25 +0100)]
auth/AuthServiceHandler: drop unused start_session() args
session_key, connection_secret and connection_secret_required_length
aren't material for start_session() across all three implementations.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
c151c9659bdb71f30b520bbd62f91cc009ec51cd)
Conflicts:
src/auth/AuthServiceHandler.h [ bufferlist vs
ceph::buffer::list ]
src/auth/cephx/CephxServiceHandler.h [ ditto ]
src/auth/none/AuthNoneServiceHandler.h [ ditto ]
Ilya Dryomov [Tue, 30 Mar 2021 13:19:41 +0000 (15:19 +0200)]
mon/MonClient: drop global_id arg from _add_conn() and _add_conns()
Passing anything but MonClient instance's global_id doesn't make
sense.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
a71f6e90d43cca5a79db92ca6a640598796ae7ee)
Conflicts:
src/mon/MonClient.cc [ commit
1e9b18008c5e ("mon: set
MonClient::_add_conn return type to void") not in nautilus ]
src/mon/MonClient.h [ ditto ]
Ilya Dryomov [Thu, 1 Apr 2021 08:55:36 +0000 (10:55 +0200)]
mon/MonClient: reset auth state in shutdown()
Destroying AuthClientHandler and not resetting global_id is another
way to get MonClient to send CEPHX_GET_AUTH_SESSION_KEY requests with
CephXAuthenticate::old_ticket not populated. This is particularly
pertinent to get_monmap_and_config() which shuts down the bootstrap
MonClient between retry attempts.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
c9b022e07392979e7f9ea6c11484a7dd872cc235)
Ilya Dryomov [Mon, 8 Mar 2021 14:37:02 +0000 (15:37 +0100)]
mon/MonClient: preserve auth state on reconnects
Commit
a2eb6ae3fb57 ("mon/monclient: hunt for multiple monitor in
parallel") introduced a regression where auth state (global_id and
AuthClientHandler) was no longer preserved on reconnects. The ensuing
breakage was quickly noticed and prompted a follow-on fix
8bb6193c8f53
("mon/MonClient: persist global_id across re-connecting").
However, as evident from the subject, the follow-on fix only took
care of the global_id part. AuthClientHandler is still destroyed
and all cephx tickets are discarded. A new from-scratch instance
is created for each MonConnection and CEPHX_GET_AUTH_SESSION_KEY
requests end up with CephXAuthenticate::old_ticket not populated.
The bug is in MonClient, so both msgr1 and msgr2 are affected.
This should have resulted in a similar sort of breakage but didn't
because of a much larger bug. The monitor should have denied the
attempt to reclaim global_id with no valid ticket proving previous
possession of that global_id presented. Alas, it appears that this
aspect of the cephx protocol has never been enforced. This is dealt
with in the next patch.
To fix the issue at hand, clone AuthClientHandler into each
MonConnection so that each respective CEPHX_GET_AUTH_SESSION_KEY
request gets a copy of the current auth ticket.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
236b536b28482ec9d8b872de03da7d702ce4787b)
Conflicts:
src/mon/MonClient.cc [ commit
1e9b18008c5e ("mon: set
MonClient::_add_conn return type to void") not in nautilus ]
Ilya Dryomov [Sat, 6 Mar 2021 10:15:40 +0000 (11:15 +0100)]
mon/MonClient: claim active_con's auth explicitly
Eliminate confusion by moving auth from active_con into MonClient
instead of swapping them.
The existing MonClient::auth can be destroyed right away -- I don't
see why active_con would need it or a reason to delay its destruction
(which is what stashing in active_con effectively does).
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit
eec24e4d119c57c7eb5119dc0083616a61b33b89)
Sage Weil [Wed, 29 Jan 2020 21:37:03 +0000 (15:37 -0600)]
mon: dump json from 'sessions' asok/tell command
Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit
71a0d8a568bd0034cc1e6329cd20269f11635697)
Conflicts:
src/mon/Monitor.cc [ commit
adf1486e46cb ("common/admin_socket:
pass Formatter from generic infrastructure") not in nautilus ]
Sage Weil [Mon, 5 Apr 2021 18:08:30 +0000 (13:08 -0500)]
qa/tasks/ceph.conf: shorten cephx TTL for testing
Rotate tickets frequently to exercise those code paths during testing.
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit
94df76244798cdc0bafd74c9e5197adb5aa990c0)
Conflicts:
qa/tasks/cephadm.conf [ no cephadm in nautilus ]
Yuri Weinstein [Mon, 12 Apr 2021 15:23:40 +0000 (08:23 -0700)]
Merge pull request #40359 from tchaikov/nautilus-pr-39937
nautilus: mgr: add mon metada using type of "mon"
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>