git-server-git.apps.pok.os.sepia.ceph.com Git

mon/OSDMonitor: drop stale failure_info even if can_mark_down()

in a124ee85b03e15f4ea371358008ecac65f9f4e50, we add a check to drop
stale failure_info reports. but if osdmap does not prohibit us from
marking the osd in question down, the branch checking the stale info
is not executed. in general, it is allowed to mark an osd down, so
the fix of a124ee85b03e15f4ea371358008ecac65f9f4e50 just fails to
work.

in this change, we check for stale failure report of osd in question
as long as the osd is not marked down in the same function. this should
address the slow ops of failure report issue.

Fixes: https://tracker.ceph.com/issues/50964
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 2d21ab905889c36bf9a9ecc6f0b66f4142c826e3)

Merge pull request #41310 from k0ste/wip-50777-nautilus

nautilus: mgr/progress: ensure progress stays between [0,1]

Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #41164 from linuxbox2/wip-nautilus-41031

nautilus: rgw: check object locks in multi-object delete

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #41238 from trociny/wip-50701-nautilus

nautilus: os/FileStore: fix to handle readdir error correctly

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #41111 from k0ste/wip-50603-nautilus

nautilus: osd: compute OSD's space usage ratio via raw space utilization

Reviewed-by: Igor Fedotov <ifedotov@suse.com>

Merge pull request #40106 from xijiacun/nautilus

nautilus: rgw: Use correct bucket info when put or get large object with swift.

Reviewed-by: Casey Bodley <cbodley@redhat.com>

rgw: Use correct bucket info when put or get large object with swift.

Fixes: https://tracker.ceph.com/issues/49791
Signed-off-by: zhiming zhang <zhangzhm1@chinatelecom.cn>
Signed-off-by: yupeng chen <chenyupeng@chinatelecom.cn>
(cherry picked from commit bdd0635fbb0632c881e8f38c563f88d0957688bf)

Conflicts:
src/rgw/rgw_op.cc
src/rgw/rgw_rest_swift.cc

-In octopus:
- RGWRados::Object op_target(store->getRados(), ...)
-In nautilus:
- RGWRados::Object op_target(store, ...)

Merge pull request #41137 from tchaikov/nautilus-50456

nautilus: bind on loopback address if no other addresses are available

Merge pull request #41318 from neha-ojha/wip-50692-nautilus

nautilus: pybind/rados: should pass "name" to cstr()

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Yuri Weinstein <yweinste@redhat.com>
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

Merge pull request #39818 from ceph/wip-yuriw-client-upgrade-nautilus-pacific-nautilus

nautilus: qa/tests: added client-upgrade-nautilus-pacific tests

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>

Merge remote-tracking branch 'origin/nautilus-saved' into nautilus

14.2.21

mgr/dashboard: fix cookie injection issue

Fixes: CVE-2021-3509
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
(cherry picked from commit b39922818bc57cde1b016e9ad41908b18063b93b)

Conflicts:
src/pybind/mgr/dashboard/controllers/docs.py
- Remove allow_empty_body and _with_token method

Update qa/suites/upgrade-clients/client-upgrade-nautilus-pacific/nautilus-client-x/rbd/1-install/nautilus-client-x.yaml

Co-authored-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Yuri Weinstein <yweinste@redhat.com>

mgr/dashboard: fix base-href: revert it to previous approach

Fixes: https://tracker.ceph.com/issues/50684
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
(cherry picked from commit b6f92922f5c80223fd288d98ce85405a650c0135)

Conflicts:

src/pybind/mgr/dashboard/frontend/src/app/app.module.ts
- Adopt the changes coming from master for this file.

(cherry picked from commit 3802683035532bc15d95e16232e69e0fa96c474f)

common/pick_addr: use grading machinery to refactor pick_address()

as picking iface on the same NUMA node is not a hard requirement, the
grading machinery is a nice fit for this purpose.

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 329d51c68ec6bf1864aa9430a62d65a93362a1b9)

common/pick_address: prefer non-loopback addresses

instead of filtering out loopback ifaces, check for loopback addresses,
and prefer non-loopback addresses over loopback addresses.

before this change, iface named "lo" is filtered out by default,
and "lo" is allowed if `ms_bind_exclude_lo_iface` is false.

after this change, iface with address out of 127/8 is prefered.
the iface marked down is not considered.

the option of "ms_bind_exclude_lo_iface" is removed. the tests are
updated accordingly.

Fixes: https://tracker.ceph.com/issues/50456
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit a9b9bcd53215a07608a28ac2c8e4a8c8b8e80e66)

Conflicts:
src/common/options/global.yaml.in
src/common/pick_address.cc: trivial resolution

common/pick_address: Allow binding on loopback iface

in 6147c0917157efd2d35610e759685656a4989abb, "lo" is also skipped when
daemon is trying to find an address to bind. but that change reverts the
fix of 201b59204374ebdab91bb554b986577a97b19c36, to address the problem.

an option named "ms_bind_exclude_lo_iface" is added, it defaults to
"true". but it can be changed to false to allow daemon to bind on "lo".

Fixes: https://tracker.ceph.com/issues/50012
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 7f01d36a2ca0576f1ff103ae3fa7c3662e93b722)

common/pick_address: document find_ip_in_subnet_list()

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit b106ec0bbf7fa726062989114f461f2d0a1f93a9)

common/pick_address: pass string by reference

to silence warnings from clang-tidy.

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 6d0ed81f796209f27b96811f9140b7fff16a7940)

common/pick_addr: refactor pick_address.cc and ipaddr.cc

* do not replicate the same logic in IPv4 and IPv6 paths
* use helpers returning bool for filtering the candidate addresses
for better readability

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 52785d5a3607b2f2ee6d41069d18a154b3eb5d45)

Conflicts:
src/common/ipaddr.cc
src/common/pick_address.cc: trivial resolution

common/pick_address: use scope_guard for freeifaddrs()

for better readability

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit c3c110b5763ac420c4b88f8a545c1c87a71ce59a)

common/pick_address: fail if cannot bind with specified network family

this change partially reverts 9f75dfbf364f5140b3f291e0a2c6769bc3d8cbac

we should not proceed against user's will if dual stack is specified but
only one network for a network family can be found. the right fix is
have better error message and documentation, not to tolerate the
failure.

Fixes: https://tracker.ceph.com/issues/46845
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit d752acafa0d99c3d7cacfaaaf3ae51770e251aff)

pick_address: Warn and continue when you find at least 1 IPv4 or IPv6 address

Currently if specify a single public or cluster network, yet have both
`ms bind ipv4` and `ms bind ipv6` set daemons crash when they can't find
both IPs from the same network:

unable to find any IPv4 address in networks '2001:db8:11d::/120' interfaces ''

And rightly so, of course it can't find an IPv4 network in an IPv6
network.
This patch, adds a new helper method, networks_address_family_coverage,
that takes the list of networks and returns a bitmap of address families
supported.
We then check to see if we have enough networks defined and if you don't
it'll warn and then continue.

Also update the network-config-ref to mention having to define both
address family addresses for cluster and or public networks.

As well as a warning about `ms bind ipv4` being enabled by default which
is easy to miss, there by enabling dual stack when you may only be
expect single stack IPv6.

Thee is also a drive by to fix a `note` that wan't being displayed due
to missing RST syntax.

Signed-off-by: Matthew Oliver <moliver@suse.com>
Fixes: https://tracker.ceph.com/issues/46845
Fixes: https://tracker.ceph.com/issues/39711
(cherry picked from commit 9f75dfbf364f5140b3f291e0a2c6769bc3d8cbac)

pybind/rados: should pass "name" to cstr()

it's a regression introduced by 6cb23f9c

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit dba8326bdf5c9cda08d5bc70640371220bb18073)

Merge pull request #41227 from ceph/wip-yuriw-nautilus-p2p

nautilus: qa/tests: advanced nautilus initial version to 14.2.20

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>

Merge pull request #41156 from smithfarm/wip-50366-nautilus

nautilus: rgw: during reshard lock contention, adjust logging

Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>

mgr/progress: ensure progress stays between [0,1]

If _original_pg_count is 0 then progress can be negative.

Fixes: https://tracker.ceph.com/issues/50591
Related-to: https://tracker.ceph.com/issues/50587
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
(cherry picked from commit 20990a94598d0249745e2ec25c9197d842119d92)

qa/tests: resolved comment - changed to 14.2.20

Signed-off-by: Yuri Weinstein <yweinste@redhat.com>

Merge pull request #41213 from tchaikov/nautilus-49919

nautilus: mon/OSDMonitor: drop stale failure_info after a grace period

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>

rgw: sanitize \r in s3 CORSConfiguration's ExposeHeader

follows up on 1524d3c0c5cb11775313ea1e2bb36a93257947f2 to escape \r as
well

Fixes: CVE-2021-3524
Reported-by: Sergey Bobrov <Sergey.Bobrov@kaspersky.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 87806f48e7a1b8891eb90711f1cedd26f1119aac)

rgw: RGWSwiftWebsiteHandler::is_web_dir checks empty subdir_name

checking for empty name avoids later assertion in RGWObjectCtx::set_atomic

Fixes: CVE-2021-3531
Reviewed-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 7196a469b4470f3c8628489df9a41ec8b00a5610)

Merge pull request #41253 from rhcs-dashboard/wip-50724-nautilus

nautilus: mgr/dashboard: fix base-href: revert it to previous approach

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>

Merge pull request #40667 from smithfarm/wip-49092-nautilus

nautilus: rgw/http: add timeout to http client

Reviewed-by: Yuval Lifshitz <yuvalif@yahoo.com>

Merge pull request #40713 from smithfarm/wip-49471-nautilus

nautilus: qa: bump osd heartbeat grace for ffsb workload

Reviewed-by: Ramana Raja <rraja@redhat.com>

Merge pull request #41173 from ifed01/wip-ifed-better-onode-trim-nau

nautilus:os/bluestore: do not count pinned entries as trimmed ones.

Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>

mgr/dashboard: fix base-href: revert it to previous approach

Fixes: https://tracker.ceph.com/issues/50684
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
(cherry picked from commit b6f92922f5c80223fd288d98ce85405a650c0135)

Conflicts:

src/pybind/mgr/dashboard/frontend/src/app/app.module.ts
- Adopt the changes coming from master for this file.

Merge pull request #36183 from smithfarm/wip-46480-nautilus

nautilus: mds: send scrub status to ceph-mgr only when scrub is running

Reviewed-by: Ramana Raja <rraja@redhat.com>

os/FileStore: fix to handle readdir error correctly

Currently filestore code does not handle readdir error.
As man readdir(3) says, we need to check errno after readdir
returns NULL to determine if error happens or not.

This patch fixes the all readdir() calls to check errono and
handle it appropriately:
- FileStore.cc ... abort if EIO error happens
- BtrfsFileStoreBAckend.cc/LFNindex.cc
... return error to upper layer

Without this fixes, primary PG could fail to correctly perform
backfill operation and could lead data loss propagation described
in #50558.

Fixes: https://tracker.ceph.com/issues/50558
Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
(cherry picked from commit 5a6c6267a182f859471ee629b490777ee1e970dd)

Merge pull request #40920 from neha-ojha/wip-50403-nautilus

nautilus: common/options/global.yaml.in: increase default value of bluestore_cache_trim_max_skip_pinned

Reviewed-by: Josh Durgin <jdurgin@redhat.com>

qa/tests: advanced nautilus initial version to 14.2.20

Signed-off-by: Yuri Weinstein <yweinste@redhat.com>

osd: drop entry in failure_pending when resetting stale peer

no need to keep it in the pending list anymore.

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit ff077fc3ea7d1595679c7053cda3b16d68aefd01)

mon/OSDMonitor: drop stale failure_info

failure_info keeps strong references of the MOSDFailure messages
sent by osd or peon monitors, whenever monitor starts to handle
an MOSDFailure message, it registers it in its OpTracker. and
the failure report messageis unregistered when monitor acks them
by either canceling them or replying the reporters with a new
osdmap marking the target osd down. but if this does not happen,
the failure reports just pile up in OpTracker. and monitor considers
them as slow ops. and they are reported as SLOW_OPS health warning.

in theory, it does not take long to mark an unresponsive osd down if
we have enough reporters. but there is chance, that a reporter fails
to cancel its report before it reboots, and the monitor also fails
to collect enough reports and mark the target osd down. so the
target osd never gets an osdmap marking it down, so it won't send
an alive message to monitor to fix this.

in this change, we check for the stale failure info in tick(), and
simply drop the stale reports. so the messages can released and
marked "done".

Fixes: https://tracker.ceph.com/issues/47380
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit a124ee85b03e15f4ea371358008ecac65f9f4e50)

Conflicts:
src/mon/OSDMonitor.h: trivial resolution

mon/OSDMonitor: restructure OSDMonitor::check_failures() loop

will add a trim failures call in the loop, which mutates failure_info,
while we are still iterating this map. so have to restructure the loop
a little bit.

Fixes: https://tracker.ceph.com/issues/47380
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 6e512b2f1e228eb808d6bff1e5c159c4d16667ef)

rgw: rename variable for clarity

Signed-off-by: Mark Houghton <mhoughton@microfocus.com>
(cherry picked from commit 5d22b7d29a25db4f648daf0c51be74702d4149a2)
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Conflicts:
src/rgw/rgw_op.cc

rgw: fix RGWDeleteMultiObj::verify_permission

Signed-off-by: Mark Houghton <mhoughton@microfocus.com>
(cherry picked from commit ba23750bea89a0e9818887abe62db0efef02fe3a)
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Conflicts:
src/rgw/rgw_op.cc

rgw: Check user permissions for governance retention bypass in multi-object delete.

fixes: https://tracker.ceph.com/issues/47586
Signed-off-by: Mark Houghton <mhoughton@microfocus.com>
(cherry picked from commit 4f1524199132cbf382877a35b040d691b12717d1)

Conflicts:
rgw_op.cc

rgw: Honour governance retention override in multi-object delete.

Allow governance retention to be overridden by a suitably privileged user.

Fixes: http://tracker.ceph.com/issues/47586
Signed-off-by: Mark Houghton <mhoughton@microfocus.com>
(cherry picked from commit 6989da1bcbe59e4d561c9d16f0ff891f6c6ef567)
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Conflicts:
src/rgw/rgw_op.cc

rgw: Check S3 object lock date in multi-object delete

Multi-object delete (via the S3 API) will now check each object's retention date in the same way as single object delet does.

Fixes: http://tracker.ceph.com/issues/47586
Signed-off-by: Mark Houghton <mhoughton@microfocus.com>
(cherry picked from commit 1a3f08550813e719b34a8133b83eefa97dd43d3a)
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Conflicts:
src/rgw/rgw_common.h
src/rgw/rgw_common.cc
src/rgw/rgw_op.cc

Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>

Merge pull request #40753 from smithfarm/wip-50073-nautilus

nautilus: mgr/PyModule: put mgr_module_path before Py_GetPath()

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>

Merge pull request #41167 from tchaikov/nautilus-doc-build

nautilus: build python extensions using distutils

Reviewed-by: Yuri Weinstein <yweinste@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>

os/bluestore: do not count pinned entries as trimmed ones.

Signed-off-by: Igor Fedotov <ifedotov@suse.com>

pybind: use distutils.sysconfig for compiling flags

this allows maintainer to override the compiling flags when
cross-compiling Ceph.

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 6050f28870e15ca80e59e0804783205d222f8493)

pybind: encode flattened dict

in python3, the keys and values in dict are unicode strings, so we need
to encode them before passing them to underlying librados' C API.

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 6cb23f9ce6b3848a7fd93d135acc143f2ae7cba1)

pybind: extract flatten_dict() out

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 31ea5a7e930978151265678aedf4e37ae9e97776)

pybind: set language_level for cythonize explicitly

Compiling rbd.pyx because it changed.
[1/1] Cythonizing rbd.pyx
/usr/lib/python2.7/dist-packages/Cython/Compiler/Main.py:367:
FutureWarning: Cython directive 'language_level' not set, using 2 for
now (Py2). This will change in a later re
lease! File: /var/ssd/ceph/src/pybind/rbd/rbd.pyx
tree = Parsing.p_module(s, pxd, full_module_name)

this warning is raised by cython 0.29.2

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit fb760dd7115d46547794d855b413ab0c3139a37e)

rgw: during reshard lock contention, adjust logging

When RGW fails to get a lock on a reshard log, we log it in such a way
that it looks like an error. Instead we'll make sure that the log
message is informational.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
(cherry picked from commit 6d3dee37791ad427a3435c493a1d7874ba075674)

Merge pull request #41099 from amathuria/wip-50125-nautilus

nautilus: mon: Modifying trim logic to change paxos_service_trim_max dynamically

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #41098 from dvanders/nautilus_neg_progress

nautilus: mon: ensure progress is [0,1] before printing

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>

Merge pull request #41016 from idryomov/wip-reset-authenticate-err-nautilus

nautilus: mon/MonClient: reset authenticate_err in _reopen_session()

Reviewed-by: Kefu Chai <kchai@redhat.com>

Merge pull request #40987 from trociny/wip-50481-nautilus

nautilus: os/FileStore: don't propagate split/merge error to "create"/"remove"

Reviewed-by: Josh Durgin <jdurgin@redhat.com>

osd: compute OSD's space usage ratio via raw space utilization

Fixes: https://tracker.ceph.com/issues/50533
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 81c4d82be02ee14aff2849b3025a5dea6cb0327e)

mon: Modifying trim logic to change paxos_service_trim_max dynamically

Currently, the Paxos Service trim logic is bounded by a max value (paxos_service_trim_max). This change dynamically modifies the max value when the number of logs to be trimmed is higher than paxos_service_trim_max.

The paxos_service_trim_max_multiplier has been added in case we want to increase paxos_service_trim_max by a certain factor. If this option is enabled we get a new upper bound when trim sizes are high.

Fixes: https://tracker.ceph.com/issues/50004
Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
(cherry picked from commit 2e1141e43980a0a44b18159860ebf9cc38316435)

mon: Adding variables for Paxos trim
1. Define variables for paxos_service_trim_min and paxos_service_trim_max.
2. Use them in place of g_conf()→paxos_service_trim_min and g_conf()→paxos_service_trim_max

Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
(cherry picked from commit 45c59f2f0d0d90beb9163804e86139c551cf505b)

nautilus: mon: ensure progress is [0,1] before printing

Ensure that progress is in the expected range [0,1] before
rendering a progress bar.

Nautilus only because this is avoided in future releases thanks
to 5f95ec4457059889bc4dbc2ad25cdc0537255f69.

Related-to: https://tracker.ceph.com/issues/50587
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>

Merge pull request #39912 from gerald-yang/nautilus-49640

nautilus: common: Fix assertion when disabling and re-enabling clog_to_monitors

Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #40744 from smithfarm/wip-50255-nautilus

nautilus: mds: trim cache regularly for standby-replay

Reviewed-by: Ramana Raja <rraja@redhat.com>

Merge pull request #40730 from smithfarm/wip-50179-nautilus

nautilus: cephfs: client: only check pool permissions for regular files

Reviewed-by: Ramana Raja <rraja@redhat.com>

Merge pull request #40722 from smithfarm/wip-50026-nautilus

nautilus: client: fire the finish_cap_snap() after buffer being flushed

Reviewed-by: Ramana Raja <rraja@redhat.com>

Merge pull request #40720 from smithfarm/wip-49853-nautilus

nautilus: mds: fix race of fetching large dirfrag

Reviewed-by: Ramana Raja <rraja@redhat.com>

Merge pull request #41060 from dvanders/50549

nautilus: os/bluestore: be more verbose in _open_super_meta by default.

os/bluestore: be more verbose in _open_super_meta by default.

Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 4087f82aea674df4c7b485bf804f3a9c98ae3741)

mgr/PyModule: put mgr_module_path before Py_GetPath()

pip comes with _vendor/progress. so there is chance to import the vendored
version of "progress" module instead of the "progress" mgr module, and
fail to import the latter.

in this change, the order of paths are rearranged so the configured
`mgr_module_path` is put before the return value of `Py_GetPath()`.

Fixes: https://tracker.ceph.com/issues/50058
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 8638f526a9d04c3dfd758073980d709165070336)

Conflicts:
src/mgr/PyModule.cc
- nautilus has a preprocessor directive "#if PY_MAJOR_VERSION >= 3"
which is not there in master
- since we still need to support python2, apply the same change to
the #else branch at line 351

Merge pull request #40697 from smithfarm/wip-49567-nautilus

nautilus: tests: ceph_test_rados_api_watch_notify: Allow for reconnect

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Brad Hubbard <bhubbard@redhat.com>

Merge pull request #40751 from smithfarm/wip-50144-nautilus

nautilus: qa/tasks/vstart_runner.py: start max required mgrs

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

Merge pull request #40752 from smithfarm/wip-50211-nautilus

nautilus: os/bluestore/BlueFS: do not _flush_range deleted files

Reviewed-by: Igor Fedotov <ifedotov@suse.com>

Merge pull request #40747 from smithfarm/wip-49731-nautilus

nautilus: osd: do not dump an osd multiple times

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>

Merge pull request #40818 from rhcs-dashboard/wip-50172-nautilus

nautilus: mgr/dashboard: debug nodeenv hangs

Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>

Merge pull request #39984 from aaSharma14/wip-49656-nautilus

nautilus: mgr/dashboard: test prometheus rules through promtool

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>

Merge pull request #41021 from rhcs-dashboard/wip-50417-nautilus

nautilus: mgr/dashboard: filesystem pool size should use stored stat

Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>

Merge pull request #40650 from rhcs-dashboard/wip-50202-nautilus

nautilus: mgr/dashboard: Revoke read-only user's access to Manager modules

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>

Merge pull request #40490 from aaSharma14/wip-50050-nautilus

nautilus: mgr/dashboard: Remove username, password fields from Manager Modules/dashboard,influx

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>

Merge pull request #40750 from smithfarm/wip-50122-nautilus

nautilus: crush/CrushLocation: do not print logging message in constructor

Reviewed-by: Kefu Chai <kchai@redhat.com>

Merge pull request #40700 from smithfarm/wip-50130-nautilus

nautilus: monmaptool: Don't call set_port on an invalid address

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Brad Hubbard <bhubbard@redhat.com>

Merge pull request #40676 from smithfarm/wip-49531-nautilus

nautilus: mgr: add --max <n> to 'osd ok-to-stop' command

Reviewed-by: Josh Durgin <jdurgin@redhat.com>

Merge pull request #39903 from singuliere/wip-49376-nautilus

nautilus: cmake: build static libs if they are internal ones

Reviewed-by: Kefu Chai <kchai@redhat.com>

qa: bump osd heartbeat grace for ffsb workload

This is a manual backport of 84528b1693f2abdeff5b816253a1e01ce4a19d36

A cherry-pick was not undertaken because the structure of the Nautilus yaml
files under qa/ is very different than in master.

Signed-off-by: Nathan Cutler <ncutler@suse.com>

Merge pull request #40709 from smithfarm/wip-49562-nautilus

nautilus: qa: delete all fs during tearDown

Reviewed-by: Ramana Raja <rraja@redhat.com>

Merge pull request #40704 from smithfarm/wip-49516-nautilus

nautilus: pybind/cephfs: DT_REG and DT_LNK values are wrong

Reviewed-by: Ramana Raja <rraja@redhat.com>

Merge pull request #40701 from smithfarm/wip-49473-nautilus

nautilus: test: use std::atomic<bool> instead of volatile for cb_done var

Reviewed-by: Ramana Raja <rraja@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>

Merge pull request #40714 from smithfarm/wip-49613-nautilus

nautilus: qa: add sleep for blocklisting to take effect

Reviewed-by: Aishwarya Mathuria <amathuri@redhat.com>

mgr/dashboard: Revoke read-only user's access to Manager modules

This will disable read only user to read/open Manager Modules page in
Ceph Dashboard where some of the security related informations are
shown.

Fixes: https://tracker.ceph.com/issues/50174
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit fb607f1561371340d2c9d4e16c4eaceb365fd926)

Conflicts:
src/pybind/mgr/dashboard/services/access_control.py
- Some of the changes are not backported because those features are
not implemented on nautilus. So I left them as it is

mgr/dashboard: filesystem pool size should use stored stat

Fixes: https://tracker.ceph.com/issues/50195
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Replaces 'bytes_used' with 'stored' stat to see the correct results
of CephFS pool stats.

(cherry picked from commit 7110fd4e0c257d20aa56591f05d74a2851a2fe00)

mon/MonClient: reset authenticate_err in _reopen_session()

Otherwise, if "mon host" list has at least one unqualified IP address
without a port and both msgr1 and msgr2 are turned on, there is a race
affecting MonClient::authenticate().

For backwards compatibility reasons such an address is expanded into
two entries, each being treated as a separate monitor.  For example,
"mon host = 1.2.3.4" generates the following initial monmap:

  0: v1:1.2.3.4:6789/0
  1: v2:1.2.3.4:3300/0

See MonMap::_add_ambiguous_addr() for details.

Then, the following can happen:

1. we connect to both endpoints and attempt to authenticate
2. authenticate() sets authenticate_err to 1 and sleeps on auth_cond
3. msgr1 authenticates first (i.e. it gets the final MAuth message
   before msgr2 gets the monmap)
4. active_con is set to msgr1 connection, msgr2 connection is closed
   as redundant
5. _finish_auth() sets authenticate_err to 0 and signals auth_cond,
   but before either the monmap is received or authenticate() wakes
   up, msgr1 connection is closed due to a network hiccup
6. ms_handle_reset() calls _reopen_session() which clears active_con
   and again connects to both endpoints and attempts to authenticate
7. authenticate() wakes up, sees that there is no active_con and goes
   back to sleep, but this time with authenticate_err == 0
8. msgr2 authenticates first but doesn't call _finish_auth() because
   it is called only if authenticate_err == 1
9. active_con is set to msgr2 connection, msgr1 connection is closed
   as redundant
10. authenticate() hangs on auth_cond until timeout defaulting to 5
    minutes

The discrepancy between msgr1 and msgr2 plays a key role.  For msgr1,
authentication is considered to be complete as soon as the final MAuth
message is received -- the monmap is not waited for.  For msgr2,
authentication is considered to be complete only after the monmap is
received.

Avoid the race by setting authenticate_err to 1 in _reopen_session(),
so that _finish_auth() is called on/after every authentication attempt
instead of just the first one.

Fixes: https://tracker.ceph.com/issues/50477
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 8c9de31c9806629d22c30b35769e664446090046)

mon/MonClient: remove reopen_session() callback mechanism

It's been unused for over 5 years, since commit 17d24292b812 ("osd:
remove old stats backoff mechanism").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 853c04b5a66721755830c5b46b695f6c86cb406b)

Conflicts:
src/mon/MonClient.cc [ commit 85157d5aae3d ("mon:
s/Mutex/ceph::mutex/") not in nautilus ]
src/mon/MonClient.h [ commit a144cacdd88b ("mon/MonClient:
add send_mon_message(MessageRef)") not in nautilus ]

mgr/dashboard:Simplify some complex calculations in test_alerts.yml

run-promtool-unittests is failing with difference in floating point values in some complex calculations. This PR intends to simplify those calculations and fix this issue.

Fixes: https://tracker.ceph.com/issues/49952
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 8d2f39e6c568afb6880689160212bcc93057e194)

ceph.spec,install-deps: use golang-github-prometheus for promtools

instead of installing docker for using promtools, install
golang-github-prometheus.

Signed-off-by: Aashish Sharma <aasharma@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit e33e3a931db97d01318643ec686fe63fdd614082)

Conflicts:
ceph.spec.in (#287-295 new changes overrid these lines, merged correctly now)
install-deps.sh (changed dnf to yumdnf)

test: run promtool test without docker on ubuntu/focal

before this change, we use docker for running promtools offered by
a docker image, but this is not efficient, and quite a few developers
do not want to use docker for running "make check". this change was
introduced by #39246, the reason was that, in Ceph's CI process, we
are using Ubuntu/Bionic for running "make check" jobs, but prometheus
packaged by Bionic does not offer the "test rules" command. so, to
address problem, we are using "dnanexus/promtool:2.9.2" docker image
for verifying monitoring/prometheus/alerts/test_alerts.yml.

after this change, we use prometheus packaged by debian derivatives
instead of pulling a docker image.

* debian/control: add prometheus as a "make check" dependency
* install-deps.sh: partially revert
  53a5816deda0874a3a37e131e9bc22d88bb2a588, as we don't need to
  pull docker or start docker service for using promtool anymore.
* cmake: check if promtool is capable of running "test rules"
  command, bail out if it is not.

see also: https://tracker.ceph.com/issues/49653

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit f381aa8bf0e175940153975fa1534ef0559ecadd)

Conflicts:
debian/control (python-cherrypy3 conficts with python-cherrypy3 | python3-cherrypy3 in nautilus)
install-deps.sh (preload_wheels_for_tox method not in nautilus so removed that)
src/test/CMakeLists.txt (#561-594 new changes overrid these lines, merged correctly now)

mgr/dashboard:test prometheus rules through promtool

This PR intends to add unit testing for prometheus rules using promtool. To run the tests run 'run-promtool-unittests.sh' file.

Fixes: https://tracker.ceph.com/issues/45415
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 53a5816deda0874a3a37e131e9bc22d88bb2a588)

Conflicts:
install-deps.sh (changed dnf to yumdnf , preload_wheels method not in nautilus so removed that)
src/test/CMakeLists.txt (#561-594 new changes overrid these lines, merged correctly now)

mgr/dashboard: Remove username, password fileds from -Cluster/Manager Modules/dashboard

Username, password fields are empty in Cluster/Manager Modules/dashboard.Since this functionality is when dashboard supported single user-password, now we need to remove these fields from here.

Fixes: https://tracker.ceph.com/issues/49645
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit d8fba40d982bb1ad824961aa210475bd7aa51524)

Conflicts:
src/pybind/mgr/dashboard/services/access_control.py(no check_migrate_v0_to_current and check_migrate_v1_to_current methods in nautilus)
src/pybind/mgr/dashboard/tests/test_access_control.py(no test_load_v2 method in nautilus, 'time' import no longer needed with new changes)