git-server-git.apps.pok.os.sepia.ceph.com Git

mon/MonClient: tolerate a rotating key that is slightly out of date

Commit 918c12c2ab5d ("monclient: avoid key renew storm on clock skew")
made wait_auth_rotating() wait for a key set with a valid "current" key
(instead of any key set, including with all keys expired if the clocks
are skewed).  While a good idea in general, this is a bit too stringent
because the monitors will hand out key sets with "current" key that is
_just_ about to expire.  There is nothing wrong with that as "next" key
is also there, valid for the entire auth_service_ticket_ttl.  So even
if the daemon is talking to the leader, it is possible to get a key set
with an expired "current" key.  If the daemon is talking to a peon, it
is pretty easy to run into in practice.  This, coupled with the fact
that _check_auth_rotating() explicitly allows the keys to go slightly
out of date, can lead to wait_auth_rotating() stalling the boot for up
to 30 seconds:

  15:41:11.824+0000  1 ... ==== auth_reply(proto 2 0 (0) Success)
  15:41:41.824+0000  0 monclient: wait_auth_rotating timed out after 30
  15:41:41.824+0000 -1 mds.b unable to obtain rotating service keys; retrying

Apply the same 30 second or less tolerance in wait_auth_rotating().

Fixes: https://tracker.ceph.com/issues/50390
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 6160ed75fcc2a648da4b696fd0ec20b95c4a0a61)

Merge pull request #40945 from badone/wip-octopus-tracker-50414

octopus: qa/ceph-ansible: Update ansible version and ceph_stable_release

Reviewed-by: Yuri Weinstein <yweinste@redhat.com>

Merge pull request #41157 from smithfarm/wip-50365-octopus

octopus: rgw: during reshard lock contention, adjust logging

Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>

Merge pull request #40767 from smithfarm/wip-49472-octopus

octopus: qa: bump osd heartbeat grace for ffsb workload

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>

Merge pull request #40743 from smithfarm/wip-50256-octopus

octopus: mds: trim cache regularly for standby-replay

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>

Merge pull request #40708 from smithfarm/wip-49475-octopus

octopus: test: use std::atomic<bool> instead of volatile for cb_done var

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>

Merge pull request #41247 from idryomov/wip-posix-memalign-fix-octopus

octopus: common/buffer: adjust align before calling posix_memalign()

Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #41237 from trociny/wip-50703-octopus

octopus: os/FileStore: fix to handle readdir error correctly

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #41112 from k0ste/wip-50601-octopus

octopus: osd: compute OSD's space usage ratio via raw space utilization

Reviewed-by: Igor Fedotov <ifedotov@suse.com>

Merge pull request #40919 from neha-ojha/wip-50405-octopus

octopus: common/options/global.yaml.in: increase default value of bluestore_cache_trim_max_skip_pinned

Reviewed-by: Josh Durgin <jdurgin@redhat.com>

Merge pull request #40296 from xijiacun/octopus

octopus: rgw: Use correct bucket info when put or get large object with swift

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #39978 from singuliere/wip-49053-octopus

octopus: common/mempool: Improve mempool shard selection

Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge remote-tracking branch 'origin/octopus-saved' into octopus

15.2.12

mgr/dashboard: fix cookie injection issue

Fixes: CVE-2021-3509
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
(cherry picked from commit b39922818bc57cde1b016e9ad41908b18063b93b)

Conflicts:
src/pybind/mgr/dashboard/controllers/docs.py
- Remove allow_empty_body and _with_token method

mgr/dashboard: fix base-href: revert it to previous approach

Fixes: https://tracker.ceph.com/issues/50684
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
(cherry picked from commit b6f92922f5c80223fd288d98ce85405a650c0135)

Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/app.module.ts
- Adopt the changes coming from master.

(cherry picked from commit fab19ddf55c1e3f1e61745a676785ff0309f11f2)

Merge pull request #40737 from Daniel-Pivonka/octopus-backport-40477

octopus: cephadm: fix failure when using --apply-spec and --shh-user

Merge pull request #40657 from badone/wip-octopus-fix-pytest-double-requirement

octopus: mgr/dashboard: Remove redundant pytest requirement

Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>

pybind/mgr/dashboard: use setUpClass for initializeing class

instead of relying on __init__(), use setUpClass() to initialize class
for testing. it turns out in pytest > 4, __init__() is called for the
test class but the attributes of the instantiated class is in turn overriden.

so we have to use setUpClass to do this job.

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 71979e9b46e21dc3d3cfc6f06f4a84c9b4c7ce78)

Conflicts:
src/pybind/mgr/dashboard/tests/test_api_auditing.py:
Differences in import lines
src/pybind/mgr/dashboard/tests/test_tools.py:
Differences in import lines

tools/setup-virtualenv.sh: pass --use-feature=2020-resolver to pip

as long as pip supports this option, pass it to `pip install`

to silence warnings and errors like:

ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.

We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.

autopep8 1.5.4 requires pycodestyle>=2.6.0, but you'll have pycodestyle 2.5.0 which is incompatible.
pytest-cov 2.10.1 requires pytest>=4.6, but you'll have pytest 3.10.1 which is incompatible.

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit fa9e2bfd4b3648f08ed3a88ce737d432ab97cce1)

pybind/mgr/dashboard: move pytest into requirements.txt

before this change, pytest is included by both requirements-lint.txt
and requirements-test.txt. this fails the install-deps.sh script when
collecting the python package wheels:

ERROR: Double requirement given: pytest<4 (from -r requirements-test.txt (line 2)) (already in pytest (from -r requirements-lint.txt (line 12)), name='pytest')

also, since pytest is unconditionally imported in the source, for instance,
in pybind/mgr/dashboard/tests/test_ceph_service.py

it would be more straightforward just to include it in requirements.txt.

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit eab195566d54122f826debd8efb7f36db78fa4e1)

Conflicts:
src/pybind/mgr/dashboard/requirements-lint.txt: Additional
package lines
src/pybind/mgr/dashboard/requirements-test.txt: No mock line

pybind/mgr/dashboard: s/pytest<4/pytest/

to address following failure:

The user requested pytest<4
pytest-cov 2.10.1 depends on pytest>=4.6

when building the target of "mgr-dashboard-virtualenv"

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 128778f25eb64cd334e062d627abdb23e0ef0e49)

Conflicts:
src/pybind/mgr/dashboard/requirements-test.txt: No mock line

Merge pull request #41228 from ceph/wip-yuriw-octopus-p2p

octopus: qa/tests: advanced octopus initial version to 15.2.10

Reviewed-by: Neha Ojha <nojha@redhat.com>

qa/tests: resolved comments

Signed-off-by: Yuri Weinstein <yweinste@redhat.com>

Merge pull request #40783 from smithfarm/wip-50286-octopus

octopus: mon: check mdsmap is resizeable before promoting standby-replay

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>

Merge pull request #40779 from smithfarm/wip-50181-octopus

octopus: cephfs: client: only check pool permissions for regular files

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>

Merge pull request #40778 from smithfarm/wip-50027-octopus

octopus: client: fire the finish_cap_snap() after buffer being flushed

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>

Merge pull request #40777 from smithfarm/wip-49950-octopus

octopus: doc/cephfs/nfs: Add note about cephadm NFS-Ganesha daemon port

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>

Merge pull request #40776 from smithfarm/wip-49934-octopus

octopus: test: reduce number of threads to 32 in LibCephFS.ShutdownRace

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>

Merge pull request #40775 from smithfarm/wip-49752-octopus

octopus: doc: snap-schedule documentation

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>

Merge pull request #40774 from smithfarm/wip-49851-octopus

octopus: mds: fix race of fetching large dirfrag

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>

Merge pull request #40773 from smithfarm/wip-49611-octopus

octopus: qa: add sleep for blocklisting to take effect

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>

Merge pull request #40772 from smithfarm/wip-49560-octopus

octopus: qa: delete all fs during tearDown

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>

Merge pull request #40771 from smithfarm/wip-49518-octopus

octopus: cephfs: client: wake up the front pos waiter

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>

Merge pull request #40770 from smithfarm/wip-49515-octopus

octopus: pybind/cephfs: DT_REG and DT_LNK values are wrong

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>

Merge pull request #40765 from smithfarm/wip-49347-octopus

octopus: qa: for the latest kclient it will also return EIO

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>

Merge pull request #40764 from smithfarm/wip-48878-octopus

octopus: mds: update defaults for recall configs

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>

Merge pull request #40763 from smithfarm/wip-48836-octopus

octopus: mount.ceph: collect v2 addresses for non-legacy ms_mode options

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>

Merge pull request #40762 from smithfarm/wip-45853-octopus

octopus: tools/cephfs: don't bind to public_addr

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>

Merge pull request #40268 from kotreshhr/wip-49904-octopus

octopus: mgr/volumes: Retain suid guid bits in clone

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>

Merge pull request #41057 from rhcs-dashboard/wip-50475-octopus

octopus: mgr/dashboard: Remove username and password from request body

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>

Merge PR #40766 into octopus

* refs/pull/40766/head:
doc/cephfs/nfs: Add rook pod restart note, export and log block example

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Varsha Rao <varao@redhat.com>

rgw: sanitize \r in s3 CORSConfiguration's ExposeHeader

follows up on 1524d3c0c5cb11775313ea1e2bb36a93257947f2 to escape \r as
well

Fixes: CVE-2021-3524
Reported-by: Sergey Bobrov <Sergey.Bobrov@kaspersky.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 87806f48e7a1b8891eb90711f1cedd26f1119aac)

rgw: RGWSwiftWebsiteHandler::is_web_dir checks empty subdir_name

checking for empty name avoids later assertion in RGWObjectCtx::set_atomic

Fixes: CVE-2021-3531
Reviewed-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 7196a469b4470f3c8628489df9a41ec8b00a5610)

Merge pull request #41252 from rhcs-dashboard/wip-50722-octopus

octopus: mgr/dashboard: fix base-href: revert it to previous approach

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>

mgr/dashboard: fix base-href: revert it to previous approach

Fixes: https://tracker.ceph.com/issues/50684
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
(cherry picked from commit b6f92922f5c80223fd288d98ce85405a650c0135)

Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/app.module.ts
- Adopt the changes coming from master.

msg/async/ProtocolV2: catch correct bad_alloc exception

We want buffer::bad_alloc, not std::bad_alloc. Otheriwise, we end
up with a confusing error

failed decoding of frame header: Bad allocation

from ProtocolV2::run_continuation(), printed after frame header is
successfully decoded.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 67bb6cf524975ea54d539c5b10ba83fa496a1ced)

common/buffer: adjust align before calling posix_memalign()

posix_memalign() requires alignment argument to be a multiple of
sizeof(void *).  Since it is an implementation detail of buffer,
it needs to be adjusted there -- buffer consumers have no way of
knowing that passing e.g. align == 4 is incorrect.

One place already does the adjustment, but only for align == 0.
The other just asserts.  Fix both and remove the "power of two"
assertion.  Let posix_memalign() return EINVAL and handle that
by throwing buffer::bad_alloc, as expected by the consumers.

Fixes: https://tracker.ceph.com/issues/50646
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit aa31ddf0e70b3b8ef8012e09cb3158f3db4dea1b)

os/FileStore: fix to handle readdir error correctly

Currently filestore code does not handle readdir error.
As man readdir(3) says, we need to check errno after readdir
returns NULL to determine if error happens or not.

This patch fixes the all readdir() calls to check errono and
handle it appropriately:
- FileStore.cc ... abort if EIO error happens
- BtrfsFileStoreBAckend.cc/LFNindex.cc
... return error to upper layer

Without this fixes, primary PG could fail to correctly perform
backfill operation and could lead data loss propagation described
in #50558.

Fixes: https://tracker.ceph.com/issues/50558
Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
(cherry picked from commit 5a6c6267a182f859471ee629b490777ee1e970dd)

qa/tests: advanced octopus initial version to 15.2.10

Signed-off-by: Yuri Weinstein <yweinste@redhat.com>

Merge pull request #41124 from aaSharma14/wip-50582-octopus

octopus: mgr/dashboard: OSDs placement text is unreadable

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>

Merge pull request #41017 from idryomov/wip-reset-authenticate-err-octopus

octopus: mon/MonClient: reset authenticate_err in _reopen_session()

Reviewed-by: Kefu Chai <kchai@redhat.com>

Merge pull request #40988 from trociny/wip-50479-octopus

octopus: os/FileStore: don't propagate split/merge error to "create"/"remove"

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>

Merge pull request #40838 from mgfritch/octopus-backport-39415

octopus: cephadm: Allow to use paths in all <_devices> drivegroup sections

Reviewed-by: Sebastian Wagner <sebastian.wagner@suse.com>

Merge pull request #40823 from mgfritch/octopus-backport-39259

octopus: mgr/cephadm: on ssh connection error, advice chmod 0600

Reviewed-by: Adam King adking@redhat.com

rgw: during reshard lock contention, adjust logging

When RGW fails to get a lock on a reshard log, we log it in such a way
that it looks like an error. Instead we'll make sure that the log
message is informational.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
(cherry picked from commit 6d3dee37791ad427a3435c493a1d7874ba075674)

mds: do not trim the inodes from the lru list in standby_replay

In standby_replay, if some dentries just added/linked but not get a
chance to replay the EOpen journals followed, if the upkeep_main() is
excuted, which will may trim them out immediately. Then when playing
the EOpen journals later the replay will fail.

In standby_replay, let's skip trimming them if dentry's linkage inode
is not nullptr.

Fixes: https://tracker.ceph.com/issues/50246
Signed-off-by: Xiubo Li <xiubli@redhat.com>
(cherry picked from commit 79bb44c1b9f1715378a9550a81984e949e454ff4)

mds: trim cache regularly for standby-replay

This change is slightly awkward because standby-replay MDS do not do all
the kinds of upkeep a normal active MDS does. In particular, it is not
going to recall client state from clients.

This diff also merges the extra recall_client_state in
MDCache::check_memory_usage into its only caller (the upkeep thread)
where it was also doing a recall. That's just a matter of merging the
recall flags. This has the added benefit of making
MDCache::check_memory_usage callable for all MDS daemons regardless of
state.

Fixes: https://tracker.ceph.com/issues/50048
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 19293d9b9d19c32af4de655cd59e206056b2417d)

mds: avoid spurious sleeps

Like trim_interval, don't sleep for small amounts of time. This avoids
spurious sleeps like:

    2020-12-25T00:14:22.242+0000 7f6a95884700 20 mds.0.cache upkeep thread waiting interval 0.000000108s
    2020-12-25T00:14:22.242+0000 7f6a95884700 20 mds.0.cache upkeep thread waiting interval 0.000000108s
    2020-12-25T00:14:22.242+0000 7f6a95884700 20 mds.0.cache upkeep thread waiting interval 0.000000108s
    2020-12-25T00:14:22.242+0000 7f6a95884700 20 mds.0.cache upkeep thread waiting interval 0.000000108s
    2020-12-25T00:14:22.242+0000 7f6a95884700 20 mds.0.cache upkeep thread waiting interval 0.000000108s
    2020-12-25T00:14:22.242+0000 7f6a95884700 20 mds.0.cache upkeep thread waiting interval 0.000000108s
    2020-12-25T00:14:22.242+0000 7f6a95884700 20 mds.0.cache upkeep thread waiting interval 0.000000108s

Also, fix the same issue in the Client.

Fixes: https://tracker.ceph.com/issues/48753
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit eb47e990c33843b9baa366e2b2a187439210e680)

Conflicts:
src/client/Client.cc
- the code being changed does not exist in octopus

mds: remove extra heap release

We now regularly do this unconditionally in the MDS, see the upkeep
thread.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 5a9d6c080d77c7e3644b02cab4f8c91900f4fe8f)

Conflicts:
src/mds/MDCache.cc
- octopus has a line, in->maybe_ephemeral_dist(false);, which is not there in
master

mgr/dashboard:OSDs placement text is unreadable

While displaying the host pattern in the OSDs placement tab, it gets splited with semi-colons. Also adjusted the column size of Container Image ID and Placement columns.

Fixes: https://tracker.ceph.com/issues/50580
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 543b02436f18876a56757226c686a5c2c33c7c33)

Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/cluster/services/services.component.ts (PlacementPipe takes i18n as input param)

osd: compute OSD's space usage ratio via raw space utilization

Fixes: https://tracker.ceph.com/issues/50533
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 81c4d82be02ee14aff2849b3025a5dea6cb0327e)

Merge pull request #41061 from dvanders/50550

octopus: os/bluestore: be more verbose in _open_super_meta by default.

os/bluestore: be more verbose in _open_super_meta by default.

Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 4087f82aea674df4c7b485bf804f3a9c98ae3741)

mgr/dashboard: Remove username and password from request body

Fixes: https://tracker.ceph.com/issues/50451
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 273a776cad8065f568f17a05804aabd14625a1f0)

Merge pull request #40894 from rhcs-dashboard/wip-50349-octopus

octopus: mgr/dashboard: improve telemetry opt-in reminder notification message

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>

Merge pull request #39987 from aaSharma14/wip-49657-octopus

octopus: mgr/dashboard: test prometheus rules through promtool

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>

Merge pull request #40816 from rhcs-dashboard/wip-50170-octopus

octopus: mgr/dashboard: debug nodeenv hangs

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>

Merge pull request #41020 from rhcs-dashboard/wip-50416-octopus

octopus: mgr/dashboard: filesystem pool size should use stored stat

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>

Merge pull request #40433 from rhcs-dashboard/labels-badge-octopus

octopus: mgr/dashboard: Add badge to the Label column in Host List

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>

Merge pull request #39802 from p-se/wip-pse-cephadm-SUSE-alertmanager-octopus

octopus: `cephadm ls` broken for SUSE downstream alertmanager container

Reviewed-by: Sebastian Wagner <swagner@suse.com>

Merge pull request #40364 from ideepika/wip-bug-48142-octopus

octopus: qa/suites/rados/cephadm/upgrade: change starting version by distro

Reviewed-by: Sage Weil <sage@redhat.com>

Merge pull request #40589 from rhcs-dashboard/wip-50070-octopus

octopus: mgr/dashboard: Fix for alert notification message being undefined

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Waad Alkhoury <walkhour@redhat.com>

Merge pull request #40758 from smithfarm/wip-50129-octopus

octopus: monmaptool: Don't call set_port on an invalid address

Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>

Merge pull request #40649 from rhcs-dashboard/wip-50204-octopus

octopus: mgr/dashboard: Revoke read-only user's access to Manager modules

Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>

mgr/dashboard: filesystem pool size should use stored stat

Fixes: https://tracker.ceph.com/issues/50195
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Replaces 'bytes_used' with 'stored' stat to see the correct results
of CephFS pool stats.

(cherry picked from commit 7110fd4e0c257d20aa56591f05d74a2851a2fe00)

mon/MonClient: reset authenticate_err in _reopen_session()

Otherwise, if "mon host" list has at least one unqualified IP address
without a port and both msgr1 and msgr2 are turned on, there is a race
affecting MonClient::authenticate().

For backwards compatibility reasons such an address is expanded into
two entries, each being treated as a separate monitor.  For example,
"mon host = 1.2.3.4" generates the following initial monmap:

  0: v1:1.2.3.4:6789/0
  1: v2:1.2.3.4:3300/0

See MonMap::_add_ambiguous_addr() for details.

Then, the following can happen:

1. we connect to both endpoints and attempt to authenticate
2. authenticate() sets authenticate_err to 1 and sleeps on auth_cond
3. msgr1 authenticates first (i.e. it gets the final MAuth message
   before msgr2 gets the monmap)
4. active_con is set to msgr1 connection, msgr2 connection is closed
   as redundant
5. _finish_auth() sets authenticate_err to 0 and signals auth_cond,
   but before either the monmap is received or authenticate() wakes
   up, msgr1 connection is closed due to a network hiccup
6. ms_handle_reset() calls _reopen_session() which clears active_con
   and again connects to both endpoints and attempts to authenticate
7. authenticate() wakes up, sees that there is no active_con and goes
   back to sleep, but this time with authenticate_err == 0
8. msgr2 authenticates first but doesn't call _finish_auth() because
   it is called only if authenticate_err == 1
9. active_con is set to msgr2 connection, msgr1 connection is closed
   as redundant
10. authenticate() hangs on auth_cond until timeout defaulting to 5
    minutes

The discrepancy between msgr1 and msgr2 plays a key role.  For msgr1,
authentication is considered to be complete as soon as the final MAuth
message is received -- the monmap is not waited for.  For msgr2,
authentication is considered to be complete only after the monmap is
received.

Avoid the race by setting authenticate_err to 1 in _reopen_session(),
so that _finish_auth() is called on/after every authentication attempt
instead of just the first one.

Fixes: https://tracker.ceph.com/issues/50477
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 8c9de31c9806629d22c30b35769e664446090046)

mon/MonClient: remove reopen_session() callback mechanism

It's been unused for over 5 years, since commit 17d24292b812 ("osd:
remove old stats backoff mechanism").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 853c04b5a66721755830c5b46b695f6c86cb406b)

Merge pull request #40491 from aaSharma14/wip-50049-octopus

octopus: mgr/dashboard: Remove username, password fields from Manager Modules/dashboard,influx

Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Waad Alkhoury <walkhour@redhat.com>

Merge pull request #40495 from aaSharma14/wip-50052-octopus

octopus: mgr/dashboard: Device health status is not getting listed under hosts section

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>

Merge pull request #40550 from idryomov/wip-remove-log-early-octopus

octopus: common: remove log_early configuration option

Reviewed-by: Sage Weil <sage@redhat.com>

Merge pull request #40558 from singuliere/wip-49917-octopus

octopus: mon/OSDMonitor: drop stale failure_info after a grace period

Reviewed-by: Kefu Chai <kchai@redhat.com>

Merge pull request #40699 from smithfarm/wip-50123-octopus

octopus: mon: Modifying trim logic to change paxos_service_trim_max dynamically

Reviewed-by: Aishwarya Mathuria <amathuri@redhat.com>

Merge pull request #40756 from smithfarm/wip-49566-octopus

octopus: tests: ceph_test_rados_api_watch_notify: Allow for reconnect

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Laura Paduano <lpaduano@suse.com>

Merge pull request #40757 from smithfarm/wip-49816-octopus

octopus: mon/MgrMonitor: populate available_modules from promote_standby()

Reviewed-by: Sage Weil <sage@redhat.com>

Merge pull request #40788 from smithfarm/wip-49732-octopus

octopus: osd: do not dump an osd multiple times

Reviewed-by: Kefu Chai <kchai@redhat.com>

Merge pull request #40791 from smithfarm/wip-50120-octopus

octopus: crush/CrushLocation: do not print logging message in constructor

Reviewed-by: Kefu Chai <kchai@redhat.com>

Merge pull request #40792 from smithfarm/wip-50143-octopus

octopus: qa/tasks/vstart_runner.py: start max required mgrs

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Rishabh Dave <ridave@redhat.com>

Merge pull request #40793 from smithfarm/wip-50210-octopus

octopus: os/bluestore/BlueFS: do not _flush_range deleted files

Reviewed-by: Kefu Chai <kchai@redhat.com>

Merge pull request #40789 from smithfarm/wip-49378-octopus

octopus: cmake: build static libs if they are internal ones

Reviewed-by: Kefu Chai <kchai@redhat.com>

Merge pull request #40812 from yuvalif/wip-yuval-fix-48462

octopus: rgw/notification: support GetTopicAttributes API

Reviewed-by: Casey Bodley <cbodley@redhat.com>

Merge pull request #40755 from smithfarm/wip-50213-octopus

octopus: rgw: objectlock: improve client error messages

Reviewed-by: Casey Bodley <cbodley@redhat.com>

mgr/dashboard:Simplify some complex calculations in test_alerts.yml

run-promtool-unittests is failing with difference in floating point values in some complex calculations. This PR intends to simplify those calculations and fix this issue.

Fixes: https://tracker.ceph.com/issues/49952
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 8d2f39e6c568afb6880689160212bcc93057e194)

ceph.spec,install-deps: use golang-github-prometheus for promtools

instead of installing docker for using promtools, install
golang-github-prometheus.

Signed-off-by: Aashish Sharma <aasharma@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit e33e3a931db97d01318643ec686fe63fdd614082)

Conflicts:
install-deps.sh (changed dnf to yumdnf)

test: run promtool test without docker on ubuntu/focal

before this change, we use docker for running promtools offered by
a docker image, but this is not efficient, and quite a few developers
do not want to use docker for running "make check". this change was
introduced by #39246, the reason was that, in Ceph's CI process, we
are using Ubuntu/Bionic for running "make check" jobs, but prometheus
packaged by Bionic does not offer the "test rules" command. so, to
address problem, we are using "dnanexus/promtool:2.9.2" docker image
for verifying monitoring/prometheus/alerts/test_alerts.yml.

after this change, we use prometheus packaged by debian derivatives
instead of pulling a docker image.

* debian/control: add prometheus as a "make check" dependency
* install-deps.sh: partially revert
  53a5816deda0874a3a37e131e9bc22d88bb2a588, as we don't need to
  pull docker or start docker service for using promtool anymore.
* cmake: check if promtool is capable of running "test rules"
  command, bail out if it is not.

see also: https://tracker.ceph.com/issues/49653

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit f381aa8bf0e175940153975fa1534ef0559ecadd)

mgr/dashboard:test prometheus rules through promtool

This PR intends to add unit testing for prometheus rules using promtool. To run the tests run 'run-promtool-unittests.sh' file.

Fixes: https://tracker.ceph.com/issues/45415
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 53a5816deda0874a3a37e131e9bc22d88bb2a588)

Conflicts:
install-deps.sh (changed dnf to yumdnf)

os/FileStore: don't propagate split/merge error to "create"/"remove"

Either ignore or terminate, otherwise it may confuse the
"create"/"remove" caller.

Fixes: https://tracker.ceph.com/issues/50395
Signed-off-by: Mykola Golub <mgolub@suse.com>
(cherry picked from commit 936898b8caf7b13a120ea6108df0b0dac29882c4)

mgr/dashboard: Device health status is not getting listed under hosts section

Device health is shown as failed to retrieve data under Hosts > Device Health section. This PR intends to fix this issue.

Fixes: https://tracker.ceph.com/issues/49354
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 8f4574696c5272de4be6cbcbd3a8fc713d6b604e)

mgr/dashboard: Remove username, password fileds from -Cluster/Manager Modules/dashboard

Username, password fields are empty in Cluster/Manager Modules/dashboard.Since this functionality is when dashboard supported single user-password, now we need to remove these fields from here.

Fixes: https://tracker.ceph.com/issues/49645
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit d8fba40d982bb1ad824961aa210475bd7aa51524)

Merge pull request #40790 from smithfarm/wip-50081-octopus

octopus: rbd-mirror: fix UB while registering perf counters

Reviewed-by: Mykola Golub <mgolub@mirantis.com>