]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
4 years agoos/bluestore: be more verbose in _open_super_meta by default. 41061/head
Igor Fedotov [Fri, 11 Oct 2019 14:34:58 +0000 (17:34 +0300)]
os/bluestore: be more verbose in _open_super_meta by default.

Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 4087f82aea674df4c7b485bf804f3a9c98ae3741)

4 years agoMerge pull request #40894 from rhcs-dashboard/wip-50349-octopus
Ernesto Puerta [Tue, 27 Apr 2021 17:21:19 +0000 (19:21 +0200)]
Merge pull request #40894 from rhcs-dashboard/wip-50349-octopus

octopus: mgr/dashboard: improve telemetry opt-in reminder notification message

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
4 years agoMerge pull request #39987 from aaSharma14/wip-49657-octopus
Ernesto Puerta [Tue, 27 Apr 2021 17:18:29 +0000 (19:18 +0200)]
Merge pull request #39987 from aaSharma14/wip-49657-octopus

octopus: mgr/dashboard: test prometheus rules through promtool

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
4 years agoMerge pull request #40816 from rhcs-dashboard/wip-50170-octopus
Ernesto Puerta [Tue, 27 Apr 2021 17:14:42 +0000 (19:14 +0200)]
Merge pull request #40816 from rhcs-dashboard/wip-50170-octopus

octopus: mgr/dashboard: debug nodeenv hangs

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
4 years agoMerge pull request #41020 from rhcs-dashboard/wip-50416-octopus
Ernesto Puerta [Tue, 27 Apr 2021 17:11:23 +0000 (19:11 +0200)]
Merge pull request #41020 from rhcs-dashboard/wip-50416-octopus

octopus: mgr/dashboard: filesystem pool size should use stored stat

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
4 years agoMerge pull request #40433 from rhcs-dashboard/labels-badge-octopus
Ernesto Puerta [Tue, 27 Apr 2021 17:10:01 +0000 (19:10 +0200)]
Merge pull request #40433 from rhcs-dashboard/labels-badge-octopus

octopus: mgr/dashboard: Add badge to the Label column in Host List

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
4 years agoMerge pull request #39802 from p-se/wip-pse-cephadm-SUSE-alertmanager-octopus
Kefu Chai [Tue, 27 Apr 2021 06:12:55 +0000 (14:12 +0800)]
Merge pull request #39802 from p-se/wip-pse-cephadm-SUSE-alertmanager-octopus

octopus: `cephadm ls` broken for SUSE downstream alertmanager container

Reviewed-by: Sebastian Wagner <swagner@suse.com>
4 years agoMerge pull request #40364 from ideepika/wip-bug-48142-octopus
Kefu Chai [Tue, 27 Apr 2021 06:12:14 +0000 (14:12 +0800)]
Merge pull request #40364 from ideepika/wip-bug-48142-octopus

octopus: qa/suites/rados/cephadm/upgrade: change starting version by distro

Reviewed-by: Sage Weil <sage@redhat.com>
4 years agoMerge pull request #40589 from rhcs-dashboard/wip-50070-octopus
Kefu Chai [Tue, 27 Apr 2021 06:09:38 +0000 (14:09 +0800)]
Merge pull request #40589 from rhcs-dashboard/wip-50070-octopus

octopus: mgr/dashboard: Fix for alert notification message being undefined

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
4 years agoMerge pull request #40758 from smithfarm/wip-50129-octopus
Kefu Chai [Tue, 27 Apr 2021 06:07:55 +0000 (14:07 +0800)]
Merge pull request #40758 from smithfarm/wip-50129-octopus

octopus: monmaptool: Don't call set_port on an invalid address

Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #40649 from rhcs-dashboard/wip-50204-octopus
Ernesto Puerta [Mon, 26 Apr 2021 08:06:35 +0000 (10:06 +0200)]
Merge pull request #40649 from rhcs-dashboard/wip-50204-octopus

octopus: mgr/dashboard: Revoke read-only user's access to Manager modules

Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
4 years agomgr/dashboard: filesystem pool size should use stored stat 41020/head
Avan Thakkar [Thu, 15 Apr 2021 13:28:52 +0000 (18:58 +0530)]
mgr/dashboard: filesystem pool size should use stored stat

Fixes: https://tracker.ceph.com/issues/50195
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Replaces 'bytes_used' with 'stored' stat to see the correct results
of CephFS pool stats.

(cherry picked from commit 7110fd4e0c257d20aa56591f05d74a2851a2fe00)

4 years agoMerge pull request #40491 from aaSharma14/wip-50049-octopus
Kefu Chai [Sun, 25 Apr 2021 02:55:22 +0000 (10:55 +0800)]
Merge pull request #40491 from aaSharma14/wip-50049-octopus

octopus: mgr/dashboard: Remove username, password fields from Manager Modules/dashboard,influx

Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
4 years agoMerge pull request #40495 from aaSharma14/wip-50052-octopus
Kefu Chai [Sun, 25 Apr 2021 02:54:12 +0000 (10:54 +0800)]
Merge pull request #40495 from aaSharma14/wip-50052-octopus

octopus: mgr/dashboard: Device health status is not getting listed under hosts section

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
4 years agoMerge pull request #40550 from idryomov/wip-remove-log-early-octopus
Kefu Chai [Sun, 25 Apr 2021 02:53:01 +0000 (10:53 +0800)]
Merge pull request #40550 from idryomov/wip-remove-log-early-octopus

octopus: common: remove log_early configuration option

Reviewed-by: Sage Weil <sage@redhat.com>
4 years agoMerge pull request #40558 from singuliere/wip-49917-octopus
Kefu Chai [Sun, 25 Apr 2021 02:52:22 +0000 (10:52 +0800)]
Merge pull request #40558 from singuliere/wip-49917-octopus

octopus: mon/OSDMonitor: drop stale failure_info after a grace period

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #40699 from smithfarm/wip-50123-octopus
Kefu Chai [Sun, 25 Apr 2021 02:51:37 +0000 (10:51 +0800)]
Merge pull request #40699 from smithfarm/wip-50123-octopus

octopus: mon: Modifying trim logic to change paxos_service_trim_max dynamically

Reviewed-by: Aishwarya Mathuria <amathuri@redhat.com>
4 years agoMerge pull request #40756 from smithfarm/wip-49566-octopus
Kefu Chai [Sun, 25 Apr 2021 02:49:49 +0000 (10:49 +0800)]
Merge pull request #40756 from smithfarm/wip-49566-octopus

octopus: tests: ceph_test_rados_api_watch_notify: Allow for reconnect

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Laura Paduano <lpaduano@suse.com>
4 years agoMerge pull request #40757 from smithfarm/wip-49816-octopus
Kefu Chai [Sun, 25 Apr 2021 02:49:01 +0000 (10:49 +0800)]
Merge pull request #40757 from smithfarm/wip-49816-octopus

octopus: mon/MgrMonitor: populate available_modules from promote_standby()

Reviewed-by: Sage Weil <sage@redhat.com>
4 years agoMerge pull request #40788 from smithfarm/wip-49732-octopus
Kefu Chai [Sun, 25 Apr 2021 02:48:01 +0000 (10:48 +0800)]
Merge pull request #40788 from smithfarm/wip-49732-octopus

octopus: osd: do not dump an osd multiple times

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #40791 from smithfarm/wip-50120-octopus
Kefu Chai [Sun, 25 Apr 2021 02:47:27 +0000 (10:47 +0800)]
Merge pull request #40791 from smithfarm/wip-50120-octopus

octopus:  crush/CrushLocation: do not print logging message in constructor

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #40792 from smithfarm/wip-50143-octopus
Kefu Chai [Sun, 25 Apr 2021 02:47:08 +0000 (10:47 +0800)]
Merge pull request #40792 from smithfarm/wip-50143-octopus

octopus: qa/tasks/vstart_runner.py: start max required mgrs

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Rishabh Dave <ridave@redhat.com>
4 years agoMerge pull request #40793 from smithfarm/wip-50210-octopus
Kefu Chai [Sun, 25 Apr 2021 02:46:31 +0000 (10:46 +0800)]
Merge pull request #40793 from smithfarm/wip-50210-octopus

octopus: os/bluestore/BlueFS: do not _flush_range deleted files

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #40789 from smithfarm/wip-49378-octopus
Kefu Chai [Sat, 24 Apr 2021 10:00:49 +0000 (18:00 +0800)]
Merge pull request #40789 from smithfarm/wip-49378-octopus

octopus: cmake: build static libs if they are internal ones

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #40812 from yuvalif/wip-yuval-fix-48462
Yuri Weinstein [Fri, 23 Apr 2021 20:01:42 +0000 (13:01 -0700)]
Merge pull request #40812 from yuvalif/wip-yuval-fix-48462

octopus: rgw/notification: support GetTopicAttributes API

Reviewed-by: Casey Bodley <cbodley@redhat.com>
4 years agoMerge pull request #40755 from smithfarm/wip-50213-octopus
Yuri Weinstein [Fri, 23 Apr 2021 20:01:10 +0000 (13:01 -0700)]
Merge pull request #40755 from smithfarm/wip-50213-octopus

octopus: rgw: objectlock: improve client error messages

Reviewed-by: Casey Bodley <cbodley@redhat.com>
4 years agomgr/dashboard:Simplify some complex calculations in test_alerts.yml 39987/head
Aashish Sharma [Thu, 25 Mar 2021 05:55:37 +0000 (11:25 +0530)]
mgr/dashboard:Simplify some complex calculations in test_alerts.yml

run-promtool-unittests is failing with difference in floating point values in some complex calculations. This PR intends to simplify those calculations and fix this issue.

Fixes: https://tracker.ceph.com/issues/49952
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 8d2f39e6c568afb6880689160212bcc93057e194)

4 years agoceph.spec,install-deps: use golang-github-prometheus for promtools
Kefu Chai [Mon, 22 Mar 2021 06:07:54 +0000 (14:07 +0800)]
ceph.spec,install-deps: use golang-github-prometheus for promtools

instead of installing docker for using promtools, install
golang-github-prometheus.

Signed-off-by: Aashish Sharma <aasharma@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit e33e3a931db97d01318643ec686fe63fdd614082)

Conflicts:
install-deps.sh (changed dnf to yumdnf)

4 years agotest: run promtool test without docker on ubuntu/focal
Kefu Chai [Fri, 19 Mar 2021 02:32:16 +0000 (10:32 +0800)]
test: run promtool test without docker on ubuntu/focal

before this change, we use docker for running promtools offered by
a docker image, but this is not efficient, and quite a few developers
do not want to use docker for running "make check". this change was
introduced by #39246, the reason was that, in Ceph's CI process, we
are using Ubuntu/Bionic for running "make check" jobs, but prometheus
packaged by Bionic does not offer the "test rules" command. so, to
address problem, we are using "dnanexus/promtool:2.9.2" docker image
for verifying monitoring/prometheus/alerts/test_alerts.yml.

after this change, we use prometheus packaged by debian derivatives
instead of pulling a docker image.

* debian/control: add prometheus as a "make check" dependency
* install-deps.sh: partially revert
  53a5816deda0874a3a37e131e9bc22d88bb2a588, as we don't need to
  pull docker or start docker service for using promtool anymore.
* cmake: check if promtool is capable of running "test rules"
  command, bail out if it is not.

see also: https://tracker.ceph.com/issues/49653

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit f381aa8bf0e175940153975fa1534ef0559ecadd)

4 years agomgr/dashboard:test prometheus rules through promtool
Aashish Sharma [Wed, 3 Feb 2021 07:23:56 +0000 (12:53 +0530)]
mgr/dashboard:test prometheus rules through promtool

This PR intends to add unit testing for prometheus rules using promtool. To run the tests run 'run-promtool-unittests.sh' file.

Fixes: https://tracker.ceph.com/issues/45415
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 53a5816deda0874a3a37e131e9bc22d88bb2a588)

Conflicts:
install-deps.sh (changed dnf to yumdnf)

4 years agomgr/dashboard: Device health status is not getting listed under hosts section 40495/head
Aashish Sharma [Thu, 11 Mar 2021 06:06:22 +0000 (11:36 +0530)]
mgr/dashboard: Device health status is not getting listed under hosts section

Device health is shown as failed to retrieve data under Hosts > Device Health section. This PR intends to fix this issue.

Fixes: https://tracker.ceph.com/issues/49354
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 8f4574696c5272de4be6cbcbd3a8fc713d6b604e)

4 years agomgr/dashboard: Remove username, password fileds from -Cluster/Manager Modules/dashboard 40491/head
Aashish Sharma [Mon, 8 Mar 2021 09:44:00 +0000 (15:14 +0530)]
mgr/dashboard: Remove username, password fileds from -Cluster/Manager Modules/dashboard

Username, password fields are empty in Cluster/Manager Modules/dashboard.Since this functionality is when dashboard supported single user-password, now we need to remove these fields from here.

Fixes: https://tracker.ceph.com/issues/49645
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit d8fba40d982bb1ad824961aa210475bd7aa51524)

4 years agoMerge pull request #40790 from smithfarm/wip-50081-octopus 40999/head
Yuri Weinstein [Wed, 21 Apr 2021 18:38:55 +0000 (11:38 -0700)]
Merge pull request #40790 from smithfarm/wip-50081-octopus

octopus: rbd-mirror: fix UB while registering perf counters

Reviewed-by: Mykola Golub <mgolub@mirantis.com>
4 years agoMerge pull request #40666 from idryomov/wip-require-ceph-common-for-ioc-octopus
Nathan Cutler [Wed, 21 Apr 2021 17:39:14 +0000 (19:39 +0200)]
Merge pull request #40666 from idryomov/wip-require-ceph-common-for-ioc-octopus

octopus: packaging: require ceph-common for immutable object cache daemon

Reviewed-by: Nathan Cutler <ncutler@suse.com>
4 years agoMerge pull request #40958 from rhcs-dashboard/wip-50457-octopus
Ilya Dryomov [Wed, 21 Apr 2021 16:00:19 +0000 (18:00 +0200)]
Merge pull request #40958 from rhcs-dashboard/wip-50457-octopus

octopus: vstart.sh: disable "auth_allow_insecure_global_id_reclaim"

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
4 years agovstart.sh: disable "auth_allow_insecure_global_id_reclaim" 40958/head
Kefu Chai [Thu, 15 Apr 2021 13:07:53 +0000 (21:07 +0800)]
vstart.sh: disable "auth_allow_insecure_global_id_reclaim"

to silence the health warning of "mons are allowing insecure global_id
reclaim", which prevents the cluster from being active+clean. couple
tests are expecting a warning free cluster before they starts.

as this option is enabled by default for appeasing the old clients, but when it
comes to most of upstream testing, we can just disable it.

Fixes: https://tracker.ceph.com/issues/50374
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 77a8376d0731c24e7bbf24523d3d7450e9f978af)

4 years agoMerge branch 'octopus-saved' into octopus
Ilya Dryomov [Tue, 20 Apr 2021 08:57:53 +0000 (10:57 +0200)]
Merge branch 'octopus-saved' into octopus

4 years ago15.2.11 v15.2.11
Jenkins Build Slave User [Mon, 19 Apr 2021 13:47:30 +0000 (13:47 +0000)]
15.2.11

4 years agomgr/dashboard: improve telemetry opt-in reminder notification message 40894/head
Waad Alkhoury [Tue, 30 Mar 2021 06:38:01 +0000 (08:38 +0200)]
mgr/dashboard: improve telemetry opt-in reminder notification message

Added activition button and linked the word telemetry to telemetry documentation

Fixes: https://tracker.ceph.com/issues/49606
(cherry picked from commit 527d912b878087672ab537b59e3addf35108a77c)
Signed-off-by: Waad Alkhoury <walkhour@redhat.com>
4 years agoauth/cephx: make KeyServer::build_session_auth_info() less confusing
Ilya Dryomov [Thu, 15 Apr 2021 13:18:58 +0000 (15:18 +0200)]
auth/cephx: make KeyServer::build_session_auth_info() less confusing

The second KeyServer::build_session_auth_info() overload is used only
by the monitor, for mon <-> mon authentication.  The monitor passes in
service_secret (mon secret) and secret_id (-1).  The TTL is irrelevant
because there is no rotation.

However the signature doesn't make it obvious.  Clarify that
service_secret and secret_id are input parameters and info is the only
output parameter.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 6f12cd3688b753633c8ff29fb3bd64758f960b2b)

4 years agoauth/cephx: cap ticket validity by expiration of "next" key
Ilya Dryomov [Thu, 15 Apr 2021 07:48:13 +0000 (09:48 +0200)]
auth/cephx: cap ticket validity by expiration of "next" key

If auth_mon_ticket_ttl is increased by several times as done in
commit 522a52e6c258 ("auth/cephx: rotate auth tickets less often"),
active clients eventually get stuck because the monitor sends out an
auth ticket with a bogus validity.  The ticket is secured with the
"current" secret that is scheduled to expire according to the old TTL,
but the validity of the ticket is set to the new TTL.  As a result,
the client simply doesn't attempt to renew, letting the secrets rotate
potentially more than once.  When that happens, the client first hits
auth authorizer errors as it tries to renew service tickets and when
it finally gets to renewing the auth ticket, it hits the insecure
global_id reclaim wall.

Cap TTL by expiration of "next" key -- the "current" key may be
milliseconds away from expiration and still be used, legitimately.
Do it in KeyServerData alongside key rotation code and propagate the
capped TTL to the upper layer.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 370c9b13970d47a55b1b20ef983c6f01236c9565)

4 years agoauth/cephx: drop redundant KeyServerData::get_service_secret() overload
Ilya Dryomov [Thu, 15 Apr 2021 07:47:50 +0000 (09:47 +0200)]
auth/cephx: drop redundant KeyServerData::get_service_secret() overload

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 3078af716505ae754723864786a41a6d6af0534c)

4 years agomgr/dashboard: debug nodeenv hangs 40816/head
Ernesto Puerta [Tue, 6 Apr 2021 11:45:15 +0000 (13:45 +0200)]
mgr/dashboard: debug nodeenv hangs

Increase verbosity in nodeenv command for debugging purposes.

Fixes: https://tracker.ceph.com/issues/50044
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
(cherry picked from commit 2c2a397f84455147e1cc5c7b5fc1289e47bbe5ee)

 Conflicts:
make-dist
src/pybind/mgr/dashboard/CMakeLists.txt
    - Adopted the master branch changes.

(cherry picked from commit 11838fb544189a59cc02ff768585bfdaa7347ef6)

4 years agomgr/dashboard: Fix for alert notification message being undefined 40589/head
Nizamudeen A [Tue, 23 Mar 2021 07:10:46 +0000 (12:40 +0530)]
mgr/dashboard: Fix for alert notification message being undefined

Prometheus alert notification message in the dashboard always comes up
as undefined. Its because we were showing the alert.summary instead of
alert.description for displaying the message. I couldn't find the
summary field in the ceph_default_alerts.yml file. So removed all the
Summary fields from the dashboard code.

Fixes: https://tracker.ceph.com/issues/49342
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 2921b2e9a939e1ad52b07327fdf84885568384b9)

4 years agoqa/standalone: default to disable insecure global id reclaim
Sage Weil [Sun, 28 Mar 2021 22:07:57 +0000 (18:07 -0400)]
qa/standalone: default to disable insecure global id reclaim

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 72c4fc75ad301980baebc7789ed6391444057e5b)

4 years agoqa/suites/upgrade/octopus-x: disable insecure global_id reclaim health warnings
Sage Weil [Thu, 25 Mar 2021 17:36:56 +0000 (13:36 -0400)]
qa/suites/upgrade/octopus-x: disable insecure global_id reclaim health warnings

These will trigger on upgrade; suppress them so that our health gates
will still work.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 3e80f61efeafc186ea8130984d64c05b2707d6ba)

Conflicts:
qa/suites/rados/cephadm/upgrade/3-start-upgrade.yaml [ commit
  04a3d4c927e7 ("qa/suites/rados/cephadm/upgrade: deploy a legacy
  r.z-style rgw") not in octopus ]
qa/suites/upgrade/octopus-x/parallel/1-tasks.yaml [ no octopus-x
  upgrade suite in octopus ]
qa/suites/upgrade/octopus-x/rgw-multisite/overrides.yaml [ ditto ]
qa/suites/upgrade/octopus-x/stress-split/1-start.yaml [ ditto ]

4 years agoqa/tasks/ceph[adm].conf[.template]: disable insecure global_id reclaim health alerts
Sage Weil [Fri, 26 Mar 2021 22:08:46 +0000 (18:08 -0400)]
qa/tasks/ceph[adm].conf[.template]: disable insecure global_id reclaim health alerts

Turn these off everywhere for our tests so they don't interfere with our health checks.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 9f6fd4fe563c9cd4cf65316921d511b677c972e4)

4 years agocephadm: set auth_allow_insecure_global_id_reclaim for mon on bootstrap
Sage Weil [Fri, 26 Mar 2021 16:02:50 +0000 (12:02 -0400)]
cephadm: set auth_allow_insecure_global_id_reclaim for mon on bootstrap

If this is a fresh pacific cluster, let's assume that there won't be
legacy clients connecting.  (And if there are, let's put the burden on
the user to enable them to do so insecurely.)

This is in contrast to upgrades, where our focus is on not breaking
anything.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 7ca74183226b1125b29f4ea8f324ae9e38b46795)

Conflicts:
src/cephadm/cephadm [ commit 369989ebf90c ("cephadm: split-off
  config work on bootstrap") not in octopus ]

4 years agomon/HealthMonitor: raise AUTH_INSECURE_GLOBAL_ID_RENEWAL[_ALLOWED]
Sage Weil [Thu, 25 Mar 2021 22:07:53 +0000 (18:07 -0400)]
mon/HealthMonitor: raise AUTH_INSECURE_GLOBAL_ID_RENEWAL[_ALLOWED]

Two new alerts:

- AUTH_INSECURE_GLOBAL_ID_RENEWAL_ALLOWED if we are allowing clients to reclaim
global_ids in an insecure manner (for backwards compatibility until
clients are upgraded)

- AUTH_INSECURE_GLBOAL_ID_RENEWAL if there are currently clients connected that
do not know how to securely renew their global_id, as exposed by
auth_expose_insecure_global_id_reclaim=true.  The client auth names and IPs
are listed the alert details (up to a limit, at least).

The docs recommend operators mute these alerts instead of silencing, but
we still include option that allow the alerts to be disabled entirely.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 18b343b06e5dd904af425dc99e2c848e12f3b552)

Conflicts:
src/mon/HealthMonitor.cc [ commit e4bf716bfa07 ("mon: store
  a reference as member variable") not in octopus ]

4 years agoauth/cephx: ignore CEPH_ENTITY_TYPE_AUTH in requested keys
Ilya Dryomov [Tue, 2 Mar 2021 14:09:26 +0000 (15:09 +0100)]
auth/cephx: ignore CEPH_ENTITY_TYPE_AUTH in requested keys

When handling CEPHX_GET_AUTH_SESSION_KEY requests from nautilus+
clients, ignore CEPH_ENTITY_TYPE_AUTH in CephXAuthenticate::other_keys.
Similarly, when handling CEPHX_GET_PRINCIPAL_SESSION_KEY requests,
ignore CEPH_ENTITY_TYPE_AUTH in CephXServiceTicketRequest::keys.
These fields are intended for requesting service tickets, the auth
ticket (which is really a ticket granting ticket) must not be shared
this way.

Otherwise we end up sharing an auth ticket that a) isn't encrypted
with the old session key even if needed (should_enc_ticket == true)
and b) has the wrong validity, namely auth_service_ticket_ttl instead
of auth_mon_ticket_ttl.  In the CEPHX_GET_AUTH_SESSION_KEY case, this
undue ticket immediately supersedes the actual auth ticket already
encoded in the same reply (the reply frame ends up containing two auth
tickets).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 05772ab6127bdd9ed2f63fceef840f197ecd9ea8)

4 years agoauth/cephx: rotate auth tickets less often
Ilya Dryomov [Mon, 22 Mar 2021 18:16:32 +0000 (19:16 +0100)]
auth/cephx: rotate auth tickets less often

If unauthorized global_id (re)use is disallowed, a client that has
been disconnected from the network long enough for keys to rotate
and its auth ticket to expire (i.e. become invalid/unverifiable)
would not be able to reconnect.

The default TTL is 12 hours, resulting in a 12-24 hour reconnect
window (the previous key is kept around, so the actual window can be
up to double the TTL).  The setting has stayed the same since 2009,
but it also hasn't been enforced.  Bump it to get a 72 hour reconnect
window to cover for something breaking on Friday and not getting fixed
until Monday.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 522a52e6c258932274f0753feb623ce008519216)

4 years agomon: fail fast when unauthorized global_id (re)use is disallowed
Ilya Dryomov [Thu, 25 Mar 2021 19:59:13 +0000 (20:59 +0100)]
mon: fail fast when unauthorized global_id (re)use is disallowed

When unauthorized global_id (re)use is disallowed, we don't want to
let unpatched clients in because they wouldn't be able to reestablish
their monitor session later, resulting in subtle hangs and disrupted
user workloads.

Denying the initial connect for all legacy (CephXAuthenticate < v3)
clients is not feasible because a large subset of them never stopped
presenting their ticket on reconnects and are therefore compatible with
enforcing mode: most notably all kernel clients but also pre-luminous
userspace clients.  They don't need to be patched and excluding them
would significantly hamper the adoption of enforcing mode.

Instead, force clients that we are not sure about to reconnect shortly
after they go through authentication and obtain global_id.  This is
done in Monitor::dispatch_op() to capture both msgr1 and msgr2, most
likely instead of dispatching mon_subscribe.

We need to let mon_getmap through for "ceph ping" and "ceph tell" to
work.  This does mean that we share the monmap, which lets the client
return from MonClient::authenticate() considering authentication to be
finished and causing the potential reconnect error to not propagate to
the user -- the client would hang waiting for remaining cluster maps.
For msgr1, this is unavoidable because the monmap is sent immediately
after the final MAuthReply.  But for msgr2 this is rare: most of the
time we get to their mon_subscribe and cut the connection before they
process the monmap!

Regardless, the user doesn't get a chance to start a workload since
there is no proper higher-level session at that point.

To help with identifying clients that need patching, add global_id and
global_id_status to "sessions" output.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 08766a17edebb7450cd9b17cc2dc01efc068bb94)

4 years agoauth/cephx: option to disallow unauthorized global_id (re)use
Ilya Dryomov [Sat, 13 Mar 2021 13:53:52 +0000 (14:53 +0100)]
auth/cephx: option to disallow unauthorized global_id (re)use

global_id is a cluster-wide unique id that must remain stable for the
lifetime of the client instance.  The cephx protocol has a facility to
allow clients to preserve their global_id across reconnects:

(1) the client should provide its global_id in the initial handshake
    message/frame and later include its auth ticket proving previous
    possession of that global_id in CEPHX_GET_AUTH_SESSION_KEY request

(2) the monitor should verify that the included auth ticket is valid
    and has the same global_id and, if so, allow the reclaim

(3) if the reclaim is allowed, the new auth ticket should be
    encrypted with the session key of the included auth ticket to
    ensure authenticity of the client performing reclaim.  (The
    included auth ticket could have been snooped when the monitor
    originally shared it with the client or any time the client
    provided it back to the monitor as part of requesting service
    tickets, but only the genuine client would have its session key
    and be able to decrypt.)

Unfortunately, all (1), (2) and (3) have been broken for a while:

- (1) was broken in 2016 by commit a2eb6ae3fb57 ("mon/monclient:
  hunt for multiple monitor in parallel") and is addressed in patch
  "mon/MonClient: preserve auth state on reconnects"

- it turns out that (2) has never been enforced.  When cephx was
  being designed and implemented in 2009, two changes to the protocol
  raced with each other pulling it in different directions: commits
  0669ca21f4f7 ("auth: reuse global_id when requesting tickets")
  and fec31964a12b ("auth: when renewing session, encrypt ticket")
  added the reclaim mechanism based strictly on auth tickets, while
  commit 5eeb711b6b2b ("auth: change server side negotiation a bit")
  allowed the client to provide global_id in the initial handshake.
  These changes didn't get reconciled and as a result a malicious
  client can assign itself any global_id of its choosing by simply
  passing something other than 0 in MAuth message or AUTH_REQUEST
  frame and not even bother supplying any ticket.  This includes
  getting a global_id that is being used by another client.

- (3) was broken in 2019 with addition of support for msgr2, where
  the new auth ticket ends up being shared unencrypted.  However the
  root cause is deeper and a malicious client can coerce msgr1 into
  the same.  This also goes back to 2009 and is addressed in patch
  "auth/cephx: ignore CEPH_ENTITY_TYPE_AUTH in requested keys".

Because (2) has never been enforced, no one noticed when (1) got
broken and we began to rely on this flaw for normal operation in
the face of reconnects due to network hiccups or otherwise.  As of
today, only pre-luminous userspace clients and kernel clients are
not exercising it on a daily basis.

Bump CephXAuthenticate version and use a dummy v3 to distinguish
between legacy clients that don't (may not) include their auth ticket
and new clients.  For new clients, unconditionally disallow claiming
global_id without a corresponding auth ticket.  For legacy clients,
introduce a choice between permissive (current behavior, default for
the foreseeable future) and enforcing mode.

If the reclaim is disallowed, return EACCES.  While MonClient does
have some provision for global_id changes and we could conceivably
implement enforcement by handing out a fresh global_id instead of
the provided one, those code paths have never been tested and there
are too many ways a sudden global_id change could go wrong.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit abebd643cc60fa8a7cb82dc29a9d5041fb3c3d36)

Conflicts:
src/auth/cephx/CephxProtocol.h [ bufferlist vs
  ceph::buffer::list ]
src/auth/cephx/CephxServiceHandler.h [ ditto ]
src/auth/none/AuthNoneServiceHandler.h [ ditto ]

4 years agoauth/cephx: make cephx_decode_ticket() take a const ticket_blob
Ilya Dryomov [Tue, 30 Mar 2021 09:10:17 +0000 (11:10 +0200)]
auth/cephx: make cephx_decode_ticket() take a const ticket_blob

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 6b860684c6e59b11c727206819805f89f0518575)

4 years agoauth/AuthServiceHandler: keep track of global_id and whether it is new
Ilya Dryomov [Tue, 9 Mar 2021 15:33:55 +0000 (16:33 +0100)]
auth/AuthServiceHandler: keep track of global_id and whether it is new

AuthServiceHandler already has global_id field, but it is unused.
Revive it and let the handler know whether global_id is newly assigned
by the monitor or provided by the client.

Lift the setting of entity_name into AuthServiceHandler.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit b50b6abd60e730176a7ef602bdd25d789a3c467d)

Conflicts:
src/auth/cephx/CephxServiceHandler.cc [ bufferlist vs
  ceph::buffer::list ]
src/auth/cephx/CephxServiceHandler.h [ ditto ]
src/auth/none/AuthNoneServiceHandler.h [ ditto ]

4 years agoauth/AuthServiceHandler: build_cephx_response_header() is cephx-specific
Ilya Dryomov [Tue, 9 Mar 2021 13:36:39 +0000 (14:36 +0100)]
auth/AuthServiceHandler: build_cephx_response_header() is cephx-specific

Make the one in CephxServiceHandler private and drop the stub in
AuthNoneServiceHandler.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 49cba02a750d4c1ab68399401f0c04f9c9be5b9e)

Conflicts:
src/auth/cephx/CephxServiceHandler.h [ bufferlist vs
  ceph::buffer::list ]
src/auth/none/AuthNoneServiceHandler.h [ ditto ]

4 years agoauth/AuthServiceHandler: drop unused start_session() args
Ilya Dryomov [Tue, 9 Mar 2021 13:25:39 +0000 (14:25 +0100)]
auth/AuthServiceHandler: drop unused start_session() args

session_key, connection_secret and connection_secret_required_length
aren't material for start_session() across all three implementations.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit c151c9659bdb71f30b520bbd62f91cc009ec51cd)

Conflicts:
src/auth/cephx/CephxServiceHandler.h [ bufferlist vs
  ceph::buffer::list ]
src/auth/none/AuthNoneServiceHandler.h [ ditto ]

4 years agomon/MonClient: drop global_id arg from _add_conn() and _add_conns()
Ilya Dryomov [Tue, 30 Mar 2021 13:19:41 +0000 (15:19 +0200)]
mon/MonClient: drop global_id arg from _add_conn() and _add_conns()

Passing anything but MonClient instance's global_id doesn't make
sense.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit a71f6e90d43cca5a79db92ca6a640598796ae7ee)

Conflicts:
src/mon/MonClient.cc [ commit 1e9b18008c5e ("mon: set
  MonClient::_add_conn return type to void") not in octopus ]
src/mon/MonClient.h [ ditto ]

4 years agomon/MonClient: reset auth state in shutdown()
Ilya Dryomov [Thu, 1 Apr 2021 08:55:36 +0000 (10:55 +0200)]
mon/MonClient: reset auth state in shutdown()

Destroying AuthClientHandler and not resetting global_id is another
way to get MonClient to send CEPHX_GET_AUTH_SESSION_KEY requests with
CephXAuthenticate::old_ticket not populated.  This is particularly
pertinent to get_monmap_and_config() which shuts down the bootstrap
MonClient between retry attempts.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit c9b022e07392979e7f9ea6c11484a7dd872cc235)

4 years agomon/MonClient: preserve auth state on reconnects
Ilya Dryomov [Mon, 8 Mar 2021 14:37:02 +0000 (15:37 +0100)]
mon/MonClient: preserve auth state on reconnects

Commit a2eb6ae3fb57 ("mon/monclient: hunt for multiple monitor in
parallel") introduced a regression where auth state (global_id and
AuthClientHandler) was no longer preserved on reconnects.  The ensuing
breakage was quickly noticed and prompted a follow-on fix 8bb6193c8f53
("mon/MonClient: persist global_id across re-connecting").

However, as evident from the subject, the follow-on fix only took
care of the global_id part.  AuthClientHandler is still destroyed
and all cephx tickets are discarded.  A new from-scratch instance
is created for each MonConnection and CEPHX_GET_AUTH_SESSION_KEY
requests end up with CephXAuthenticate::old_ticket not populated.
The bug is in MonClient, so both msgr1 and msgr2 are affected.

This should have resulted in a similar sort of breakage but didn't
because of a much larger bug.  The monitor should have denied the
attempt to reclaim global_id with no valid ticket proving previous
possession of that global_id presented.  Alas, it appears that this
aspect of the cephx protocol has never been enforced.  This is dealt
with in the next patch.

To fix the issue at hand, clone AuthClientHandler into each
MonConnection so that each respective CEPHX_GET_AUTH_SESSION_KEY
request gets a copy of the current auth ticket.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 236b536b28482ec9d8b872de03da7d702ce4787b)

Conflicts:
src/mon/MonClient.cc [ commit 1e9b18008c5e ("mon: set
  MonClient::_add_conn return type to void") not in octopus ]

4 years agomon/MonClient: claim active_con's auth explicitly
Ilya Dryomov [Sat, 6 Mar 2021 10:15:40 +0000 (11:15 +0100)]
mon/MonClient: claim active_con's auth explicitly

Eliminate confusion by moving auth from active_con into MonClient
instead of swapping them.

The existing MonClient::auth can be destroyed right away -- I don't
see why active_con would need it or a reason to delay its destruction
(which is what stashing in active_con effectively does).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit eec24e4d119c57c7eb5119dc0083616a61b33b89)

4 years agomon/MonClient: resurrect "waiting for monmap|config" timeouts
Ilya Dryomov [Thu, 1 Apr 2021 08:07:00 +0000 (10:07 +0200)]
mon/MonClient: resurrect "waiting for monmap|config" timeouts

This fixes a regression introduced in commit 85157d5aae3d ("mon:
s/Mutex/ceph::mutex/").  Waiting for monmap and config indefinitely
is not just bad UX, it actually masks other more serious bugs.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 6faa18e0a8e8efba6bd2978942eb9909b6568d5c)

4 years agoqa/tasks/ceph.conf: shorten cephx TTL for testing 40662/head
Sage Weil [Mon, 5 Apr 2021 18:08:30 +0000 (13:08 -0500)]
qa/tasks/ceph.conf: shorten cephx TTL for testing

Rotate tickets frequently to exercise those code paths during testing.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 94df76244798cdc0bafd74c9e5197adb5aa990c0)

4 years agoMerge pull request #39949 from sebastian-philipp/octopus-remove-18.04_podman
Kefu Chai [Mon, 12 Apr 2021 15:55:12 +0000 (23:55 +0800)]
Merge pull request #39949 from sebastian-philipp/octopus-remove-18.04_podman

octopus: qa/suites/rados/cephadm: rm ubuntu_18.04_podman

Reviewed-by: Sage Weil <sage@redhat.com>
4 years agoMerge pull request #40399 from rhcs-dashboard/wip-49971-octopus
Ernesto Puerta [Mon, 12 Apr 2021 15:39:02 +0000 (17:39 +0200)]
Merge pull request #40399 from rhcs-dashboard/wip-49971-octopus

octopus: mgr/dashboard: Fix for broken User management role cloning

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Laura Paduano <lpaduano@suse.com>
4 years agoMerge pull request #40297 from rhcs-dashboard/split-tenant-octopus
Ernesto Puerta [Mon, 12 Apr 2021 15:38:27 +0000 (17:38 +0200)]
Merge pull request #40297 from rhcs-dashboard/split-tenant-octopus

octopus: mgr/dashboard: Splitting tenant$user when creating rgw user

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
4 years agoMerge pull request #40784 from tchaikov/octopus-boost-cmake
Kefu Chai [Mon, 12 Apr 2021 14:51:27 +0000 (22:51 +0800)]
Merge pull request #40784 from tchaikov/octopus-boost-cmake

octopus: cmake: define BOOST_ASIO_USE_TS_EXECUTOR_AS_DEFAULT globaly

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
4 years agorgw/notification: support GetTopicAttributes API 40812/head
Yuval Lifshitz [Wed, 18 Nov 2020 16:43:16 +0000 (18:43 +0200)]
rgw/notification: support GetTopicAttributes API

fixes: https://tracker.ceph.com/issues/46296

Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
(cherry picked from commit 3906884aa66b7b6c976d6165cc3b5dfaa8f754c4)

Conflicts:
PendingReleaseNotes
src/rgw/rgw_rest_pubsub.cc

4 years agoos/bluestore/BlueFS: do not _flush_range deleted files 40793/head
weixinwei [Sun, 4 Apr 2021 05:30:10 +0000 (13:30 +0800)]
os/bluestore/BlueFS: do not _flush_range deleted files

Fixes: https://tracker.ceph.com/issues/49861
Signed-off-by: weixinwei <weixw3@lenovo.com>
(cherry picked from commit 744bd5271cfcd2d84bc908a1893bbdfd51d2f8f0)

4 years agoqa/tasks/vstart_runner.py: start max required mgrs 40792/head
Alfonso Martínez [Wed, 31 Mar 2021 08:11:50 +0000 (10:11 +0200)]
qa/tasks/vstart_runner.py: start max required mgrs

Pass environment copy with max required mgrs when shell kwarg is True.

Fixes: https://tracker.ceph.com/issues/50077
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
(cherry picked from commit 45e1134e3b36ca103cef727103905d3db2960758)

4 years agocrush/CrushLocation: do not print logging message in constructor 40791/head
Alex Wu [Mon, 29 Mar 2021 02:09:50 +0000 (22:09 -0400)]
crush/CrushLocation: do not print logging message in constructor

do not use logging facility in constructor, as CephContext::_log is set
after CephContext::crush_location is created in constructor of CephContext.

Fixes: https://tracker.ceph.com/issues/50047
Signed-off-by: Alex Wu <notmycupoftea@163.com>
(cherry picked from commit 68812a2a4f63e9fd7c33f14b7a00c54a6a21128e)

4 years agorbd-mirror: fix UB while registering perf counters 40790/head
Arthur Outhenin-Chalandre [Wed, 24 Mar 2021 09:05:07 +0000 (10:05 +0100)]
rbd-mirror: fix UB while registering perf counters

register_perf_counters was called before m_image_spec initialization
resulting in UB in the perf counters' name.

This moves the register_perf_counters() call to the init function
after the m_image_spec initialization.

Fixes: https://tracker.ceph.com/issues/49959
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
(cherry picked from commit 5e3b9d29b3a81923fed51248aa21749dbecfcd73)

4 years agocmake: build static libs if they are internal ones 40789/head
Kefu Chai [Fri, 19 Feb 2021 04:04:32 +0000 (12:04 +0800)]
cmake: build static libs if they are internal ones

there are chances that user or build script set `BUILD_SHARED_LIBS`,
so these convenience libraries (using the autotools' terminology)
are built and linked by never get installed.

Fixes: https://tracker.ceph.com/issues/38611
Fixes: https://tracker.ceph.com/issues/49080
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit df841b241efd387044d9637b1cf67d198bd1398e)

Conflicts:
src/blk/CMakeLists.txt
- code being changed does not exist in octopus

4 years agoosd: ignore already dumped osd in dump_item() 40788/head
jhonxue [Fri, 5 Mar 2021 15:33:10 +0000 (23:33 +0800)]
osd: ignore already dumped osd in dump_item()

Fixes: https://tracker.ceph.com/issues/49627
Signed-off-by: Xue Yantao <jhonxue@tencent.com>
(cherry picked from commit 7813819445e73d1e7f333bd9aaaf42624cd781ec)

4 years agocmake: define BOOST_ASIO_USE_TS_EXECUTOR_AS_DEFAULT globaly 40784/head
Kefu Chai [Sun, 21 Mar 2021 15:06:00 +0000 (23:06 +0800)]
cmake: define BOOST_ASIO_USE_TS_EXECUTOR_AS_DEFAULT globaly

turns out we also need it for compiling librados tests with libboost
1.75, so just define it globally

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 7ce3ee6f346889d4d87d6424c6a1ad18badd139b)

Conflicts:
src/CMakeLists.txt
src/librbd/CMakeLists.txt: trivial resolution

4 years agocmake: define BOOST_ASIO_USE_TS_EXECUTOR_AS_DEFAULT for rgw tests
Kefu Chai [Fri, 19 Mar 2021 04:46:17 +0000 (12:46 +0800)]
cmake: define BOOST_ASIO_USE_TS_EXECUTOR_AS_DEFAULT for rgw tests

otherwise unittest_rgw_iam_policy does not compile with boost v1.75

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 36d2f006c6cf309d60857ce85325489865e8374c)

4 years agomonmaptool: Don't call set_port on an invalid address 40758/head
Brad Hubbard [Wed, 25 Nov 2020 01:57:37 +0000 (11:57 +1000)]
monmaptool: Don't call set_port on an invalid address

Verify we parse the entire address argument.

Fixes: https://tracker.ceph.com/issues/48336
Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit c0aa1fdbe7b87c2fb4b572169372ec487950b03d)

4 years agomon/MgrMonitor: populate available_modules from promote_standby() 40757/head
Sage Weil [Fri, 12 Mar 2021 20:00:49 +0000 (15:00 -0500)]
mon/MgrMonitor: populate available_modules from promote_standby()

This was done in the beacon path, where there is no active mgr and we
get a new entrant, but not for this case where an existing standby is
promoted to active.

This fixes a problem during upgrade where a new (standby) mgr's modules
have a new module option but it is not reflected immediately (not until
the next beacon).

Fixes: https://tracker.ceph.com/issues/49778
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit cd0094678d0e01fd7b74e2f6f5ff47a16a60dddd)

4 years agotests: ceph_test_rados_api_watch_notify: Allow for reconnect 40756/head
Brad Hubbard [Mon, 22 Feb 2021 03:28:12 +0000 (13:28 +1000)]
tests: ceph_test_rados_api_watch_notify: Allow for reconnect

An injected socket failure may cause rados_watch_check() to return
ENOENT instead of the expected ENOTCONN.

Fixes: https://tracker.ceph.com/issues/47719
Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
(cherry picked from commit 0a03a81f633f11bd3247b2f8f10f719c7b3d38e3)

4 years agorgw: objectlock: improve client error messages 40755/head
Matt Benjamin [Thu, 25 Feb 2021 22:39:08 +0000 (17:39 -0500)]
rgw: objectlock: improve client error messages

A bucket object lock configuration can only be set on buckets
created with the object-lock option enabled.  Likewise, on
object lock or object retention hold can only be set on objects
in buckets with object lock enabled.  Object lock and related
policy and policy violations are also potentially confusing
to client users.

Raise the debug level to 4, but add a human-readable client error
message, when object lock constraints are violated.

Fixes: https://tracker.ceph.com/issues/49541
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 7583374e5294b1c1c16068999123fef98827e9dc)

Conflicts:
src/rgw/rgw_op.cc

4 years agoMerge pull request #40673 from smithfarm/wip-50159-octopus
Yuri Weinstein [Fri, 9 Apr 2021 14:51:55 +0000 (07:51 -0700)]
Merge pull request #40673 from smithfarm/wip-50159-octopus

octopus: test/rgw: test_datalog_autotrim filters out new entries

Reviewed-by: Casey Bodley <cbodley@redhat.com>
4 years agoMerge pull request #40672 from smithfarm/wip-50096-octopus
Yuri Weinstein [Fri, 9 Apr 2021 14:51:29 +0000 (07:51 -0700)]
Merge pull request #40672 from smithfarm/wip-50096-octopus

octopus: rgw: return error when trying to copy encrypted object without key

Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
4 years agoMerge pull request #40384 from singuliere/wip-49091-octopus
Yuri Weinstein [Fri, 9 Apr 2021 14:51:00 +0000 (07:51 -0700)]
Merge pull request #40384 from singuliere/wip-49091-octopus

octopus: rgw/http: add timeout to http client

Reviewed-by: Casey Bodley <cbodley@redhat.com>
4 years agomon: Modifying trim logic to change paxos_service_trim_max dynamically 40699/head
Aishwarya Mathuria [Fri, 12 Mar 2021 11:27:40 +0000 (16:57 +0530)]
mon: Modifying trim logic to change paxos_service_trim_max dynamically

 Currently, the Paxos Service trim logic is bounded by a max value (paxos_service_trim_max). This change dynamically modifies the max value when the number of logs to be trimmed is higher than paxos_service_trim_max.

 The paxos_service_trim_max_multiplier has been added in case we want to increase paxos_service_trim_max by a certain factor. If this option is enabled we get a new upper bound when trim sizes are high.

Fixes: https://tracker.ceph.com/issues/50004
Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
(cherry picked from commit 2e1141e43980a0a44b18159860ebf9cc38316435)

Conflicts:
doc/rados/configuration/mon-config-ref.rst
- trivial resolution

4 years agomon: Adding variables for Paxos trim
Aishwarya Mathuria [Mon, 29 Mar 2021 14:05:12 +0000 (19:35 +0530)]
mon: Adding variables for Paxos trim
     1. Define variables for paxos_service_trim_min and paxos_service_trim_max.
     2. Use them in place of g_conf()→paxos_service_trim_min and g_conf()→paxos_service_trim_max

Signed-off-by: Aishwarya Mathuria <amathuri@redhat.com>
(cherry picked from commit 45c59f2f0d0d90beb9163804e86139c551cf505b)

4 years agotest/rgw: test_datalog_autotrim filters out new entries 40673/head
Casey Bodley [Mon, 15 Jun 2020 15:45:11 +0000 (11:45 -0400)]
test/rgw: test_datalog_autotrim filters out new entries

if other sync activity is racing with test_datalog_autotrim, it can
create new datalog entries after the 'datalog autotrim' command runs

instead of asserting that the datalog is empty after trim, assert that
any entries have a marker larger than the max-marker reported by
'datalog status' before the trim

Fixes: https://tracker.ceph.com/issues/45626
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit abd08f1843642e318d74dfadb0f9cf1f6b86d827)

4 years agorgw: return error when trying to copy encrypted object without key 40672/head
Ilsoo Byun [Fri, 11 Dec 2020 00:57:49 +0000 (09:57 +0900)]
rgw: return error when trying to copy encrypted object without key

Fixes: https://tracker.ceph.com/issues/48554
Signed-off-by: Ilsoo Byun <ilsoobyun@linecorp.com>
(cherry picked from commit dde1303c92b39daa2d760a110f48dc9655e7765f)

4 years agopackaging: require ceph-common for immutable object cache daemon 40666/head
Ilya Dryomov [Wed, 7 Apr 2021 09:36:53 +0000 (11:36 +0200)]
packaging: require ceph-common for immutable object cache daemon

This daemon has a systemd service which starts it with --setuser ceph
--setgroup ceph.  "ceph" user and group are created by ceph-common and
won't be there unless ceph-common is installed.

Fixes: https://tracker.ceph.com/issues/50207
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit dc55f0bb43226259068545c6e13c2921d225ddbe)

4 years agoMerge pull request #40392 from neha-ojha/wip-49964-octopus
Yuri Weinstein [Wed, 7 Apr 2021 17:02:18 +0000 (10:02 -0700)]
Merge pull request #40392 from neha-ojha/wip-49964-octopus

octopus: common/options: bluefs_buffered_io=true by default

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
4 years agoMerge pull request #40476 from tchaikov/octopus-pr-38665
Yuri Weinstein [Wed, 7 Apr 2021 15:35:46 +0000 (08:35 -0700)]
Merge pull request #40476 from tchaikov/octopus-pr-38665

octopus: pybind/ceph_argparse.py: use a safe value for timeout

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
4 years agoMerge pull request #40441 from neha-ojha/wip-49990-octopus
Yuri Weinstein [Wed, 7 Apr 2021 15:35:14 +0000 (08:35 -0700)]
Merge pull request #40441 from neha-ojha/wip-49990-octopus

octopus: os/bluestore: Make Onode::put/get resiliant to split_cache

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Igor Fedotov <ifedotov@suse.com>
Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
4 years agoMerge pull request #40424 from k0ste/wip-49995-octopus
Yuri Weinstein [Wed, 7 Apr 2021 15:34:11 +0000 (08:34 -0700)]
Merge pull request #40424 from k0ste/wip-49995-octopus

octopus: common/ipaddr: skip loopback interfaces named 'lo' and test it

Reviewed-by: Neha Ojha <nojha@redhat.com>
4 years agomgr/dashboard: Revoke read-only user's access to Manager modules 40649/head
Nizamudeen A [Tue, 6 Apr 2021 15:54:51 +0000 (21:24 +0530)]
mgr/dashboard: Revoke read-only user's access to Manager modules

This will disable read only user to read/open Manager Modules page in
Ceph Dashboard where some of the security related informations are
shown.

Fixes: https://tracker.ceph.com/issues/50174
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit fb607f1561371340d2c9d4e16c4eaceb365fd926)

4 years agoMerge pull request #39219 from k0ste/wip-49004-octopus
Yuri Weinstein [Tue, 6 Apr 2021 17:02:09 +0000 (10:02 -0700)]
Merge pull request #39219 from k0ste/wip-49004-octopus

octopus: mgr: update mon metadata when monmap is updated

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
4 years agoosd: drop entry in failure_pending when resetting stale peer 40558/head
Kefu Chai [Sun, 14 Mar 2021 03:56:59 +0000 (11:56 +0800)]
osd: drop entry in failure_pending when resetting stale peer

no need to keep it in the pending list anymore.

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit ff077fc3ea7d1595679c7053cda3b16d68aefd01)

4 years agoosd: mark HeartbeatInfo::is_stale() and friends "const"
Kefu Chai [Sun, 14 Mar 2021 03:56:06 +0000 (11:56 +0800)]
osd: mark HeartbeatInfo::is_stale() and friends "const"

just for more const correctness.

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 253cb8f4114e92768a54376a8834006479930b69)

4 years agomon/OSDMonitor: drop stale failure_info
Kefu Chai [Thu, 11 Mar 2021 13:13:13 +0000 (21:13 +0800)]
mon/OSDMonitor: drop stale failure_info

failure_info keeps strong references of the MOSDFailure messages
sent by osd or peon monitors, whenever monitor starts to handle
an MOSDFailure message, it registers it in its OpTracker. and
the failure report messageis unregistered when monitor acks them
by either canceling them or replying the reporters with a new
osdmap marking the target osd down. but if this does not happen,
the failure reports just pile up in OpTracker. and monitor considers
them as slow ops. and they are reported as SLOW_OPS health warning.

in theory, it does not take long to mark an unresponsive osd down if
we have enough reporters. but there is chance, that a reporter fails
to cancel its report before it reboots, and the monitor also fails
to collect enough reports and mark the target osd down. so the
target osd never gets an osdmap marking it down, so it won't send
an alive message to monitor to fix this.

in this change, we check for the stale failure info in tick(), and
simply drop the stale reports. so the messages can released and
marked "done".

Fixes: https://tracker.ceph.com/issues/47380
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit a124ee85b03e15f4ea371358008ecac65f9f4e50)

4 years agomon/OSDMonitor: restructure OSDMonitor::check_failures() loop
Kefu Chai [Thu, 11 Mar 2021 10:28:18 +0000 (18:28 +0800)]
mon/OSDMonitor: restructure OSDMonitor::check_failures() loop

will add a trim failures call in the loop, which mutates failure_info,
while we are still iterating this map. so have to restructure the loop
a little bit.

Fixes: https://tracker.ceph.com/issues/47380
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 6e512b2f1e228eb808d6bff1e5c159c4d16667ef)

4 years agomon/OSDMonitor: extract get_grace_time()
Kefu Chai [Thu, 11 Mar 2021 11:49:36 +0000 (19:49 +0800)]
mon/OSDMonitor: extract get_grace_time()

for better readability

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit d42815d5e9c4ba781ea710ef299cb9319f7fc3e6)

4 years agomon/OSDMonitor: do not return old failure report when updating it
Kefu Chai [Thu, 11 Mar 2021 09:47:50 +0000 (17:47 +0800)]
mon/OSDMonitor: do not return old failure report when updating it

there is no need to return stale report, as the caller is not interested
in it.

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 09216c01be6f57938b1bdb491e45ecfb15a3f6c5)

 Conflicts:
src/mon/OSDMonitor.h
because auto in master was map<int, failure_reporter_t>::iterator
        in octopus