]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
4 years agoMerge pull request #40887 from rhcs-dashboard/wip-50350-pacific
Ernesto Puerta [Tue, 27 Apr 2021 17:20:35 +0000 (19:20 +0200)]
Merge pull request #40887 from rhcs-dashboard/wip-50350-pacific

pacific: mgr/dashboard: improve telemetry opt-in reminder notification message

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
4 years agoMerge pull request #40929 from rhcs-dashboard/wip-49658-pacific
Ernesto Puerta [Tue, 27 Apr 2021 17:16:43 +0000 (19:16 +0200)]
Merge pull request #40929 from rhcs-dashboard/wip-49658-pacific

pacific: mgr/dashboard: test prometheus rules through promtool

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
4 years agoMerge pull request #40815 from rhcs-dashboard/wip-50171-pacific
Ernesto Puerta [Tue, 27 Apr 2021 17:07:07 +0000 (19:07 +0200)]
Merge pull request #40815 from rhcs-dashboard/wip-50171-pacific

pacific: mgr/dashboard: debug nodeenv hangs

Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
4 years agoMerge pull request #40980 from rhcs-dashboard/wip-50418-pacific
Ernesto Puerta [Tue, 27 Apr 2021 17:06:10 +0000 (19:06 +0200)]
Merge pull request #40980 from rhcs-dashboard/wip-50418-pacific

pacific: mgr/dashboard: filesystem pool size should use stored stat

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
4 years agoMerge pull request #40691 from singuliere/wip-50124-pacific
Yuri Weinstein [Mon, 26 Apr 2021 21:33:28 +0000 (14:33 -0700)]
Merge pull request #40691 from singuliere/wip-50124-pacific

pacific: mon: Modifying trim logic to change paxos_service_trim_max dynamically

Reviewed-by: Aishwarya Mathuria <amathuri@redhat.com>
4 years agoMerge pull request #40989 from trociny/wip-50480-pacific
Yuri Weinstein [Mon, 26 Apr 2021 21:29:57 +0000 (14:29 -0700)]
Merge pull request #40989 from trociny/wip-50480-pacific

pacific: os/FileStore: don't propagate split/merge error to "create"/"remove"

Reviewed-by: Neha Ojha <nojha@redhat.com>
4 years agoMerge pull request #40759 from smithfarm/wip-50154-pacific
Yuri Weinstein [Mon, 26 Apr 2021 21:29:32 +0000 (14:29 -0700)]
Merge pull request #40759 from smithfarm/wip-50154-pacific

pacific: osd/PeeringState: fix acting_set_writeable min_size check

Reviewed-by: Neha Ojha <nojha@redhat.com>
4 years agoMerge pull request #40690 from singuliere/wip-50131-pacific
Yuri Weinstein [Mon, 26 Apr 2021 21:27:43 +0000 (14:27 -0700)]
Merge pull request #40690 from singuliere/wip-50131-pacific

pacific: monmaptool: Don't call set_port on an invalid address

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Brad Hubbard <bhubbard@redhat.com>
4 years agoMerge pull request #40679 from singuliere/wip-50121-pacific
Yuri Weinstein [Mon, 26 Apr 2021 21:27:10 +0000 (14:27 -0700)]
Merge pull request #40679 from singuliere/wip-50121-pacific

pacific:  crush/CrushLocation: do not print logging message in constructor

Reviewed-by: Neha Ojha <nojha@redhat.com>
4 years agoMerge pull request #40677 from singuliere/wip-50212-pacific
Yuri Weinstein [Mon, 26 Apr 2021 21:26:46 +0000 (14:26 -0700)]
Merge pull request #40677 from singuliere/wip-50212-pacific

pacific: os/bluestore/BlueFS: do not _flush_range deleted files

Reviewed-by: Neha Ojha <nojha@redhat.com>
4 years agoMerge pull request #40853 from batrick/i50285
Yuri Weinstein [Mon, 26 Apr 2021 17:18:35 +0000 (10:18 -0700)]
Merge pull request #40853 from batrick/i50285

pacific: qa: test standby_replay in workloads

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agoMerge pull request #40852 from batrick/i50287
Yuri Weinstein [Mon, 26 Apr 2021 17:17:50 +0000 (10:17 -0700)]
Merge pull request #40852 from batrick/i50287

pacific: qa: "log [ERR] : error reading sessionmap 'mds2_sessionmap'"

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agoMerge pull request #40825 from batrick/i50253
Yuri Weinstein [Mon, 26 Apr 2021 17:17:09 +0000 (10:17 -0700)]
Merge pull request #40825 from batrick/i50253

pacific: mds: "cluster [WRN] Scrub error on inode 0x1000000039d (/client.0/tmp/blogbench-1.0/src/blogtest_in) see mds.a log and `damage ls` output for details"

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agoMerge pull request #40680 from singuliere/wip-50082-pacific
Yuri Weinstein [Mon, 26 Apr 2021 14:45:10 +0000 (07:45 -0700)]
Merge pull request #40680 from singuliere/wip-50082-pacific

pacific: rbd-mirror: fix UB while registering perf counters

Reviewed-by: Mykola Golub <mgolub@mirantis.com>
4 years agoMerge pull request #40981 from rhcs-dashboard/wip-50476-pacific
Ernesto Puerta [Mon, 26 Apr 2021 08:12:39 +0000 (10:12 +0200)]
Merge pull request #40981 from rhcs-dashboard/wip-50476-pacific

pacific: mgr/dashboard: Remove username and password from request body

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
4 years agoMerge pull request #40990 from rhcs-dashboard/wip-50485-pacific
Ernesto Puerta [Fri, 23 Apr 2021 14:50:41 +0000 (16:50 +0200)]
Merge pull request #40990 from rhcs-dashboard/wip-50485-pacific

pacific: mgr/dashboard: fix duplicated rows when creating NFS export.

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
4 years agoMerge pull request #40688 from singuliere/wip-50086-pacific
Yuri Weinstein [Fri, 23 Apr 2021 14:48:52 +0000 (07:48 -0700)]
Merge pull request #40688 from singuliere/wip-50086-pacific

pacific: qa/tasks/cephfs: create enough subvolumes

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agoMerge pull request #40686 from singuliere/wip-50180-pacific
Yuri Weinstein [Fri, 23 Apr 2021 14:48:23 +0000 (07:48 -0700)]
Merge pull request #40686 from singuliere/wip-50180-pacific

pacific: client: only check pool permissions for regular files

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agoMerge pull request #40684 from singuliere/wip-50185-pacific
Yuri Weinstein [Fri, 23 Apr 2021 14:47:56 +0000 (07:47 -0700)]
Merge pull request #40684 from singuliere/wip-50185-pacific

pacific: test: disable mgr/mirroring for `test_mirroring_init_failure_with_recovery` test

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
4 years agoMerge pull request #40683 from singuliere/wip-50190-pacific
Yuri Weinstein [Fri, 23 Apr 2021 14:46:52 +0000 (07:46 -0700)]
Merge pull request #40683 from singuliere/wip-50190-pacific

pacific: qa: fix ino_release_cb racy behavior

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agoMerge pull request #40682 from singuliere/wip-50225-pacific
Yuri Weinstein [Fri, 23 Apr 2021 14:46:26 +0000 (07:46 -0700)]
Merge pull request #40682 from singuliere/wip-50225-pacific

pacific: mds: skip the buffer in UnknownPayload::decode()

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agoMerge pull request #40678 from singuliere/wip-50199-pacific
Yuri Weinstein [Fri, 23 Apr 2021 14:45:56 +0000 (07:45 -0700)]
Merge pull request #40678 from singuliere/wip-50199-pacific

pacific: tools/cephfs_mirror/PeerReplayer.cc: add missing include

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #40630 from batrick/i50127
Yuri Weinstein [Fri, 23 Apr 2021 14:43:40 +0000 (07:43 -0700)]
Merge pull request #40630 from batrick/i50127

pacific: pybind/mgr/volumes: deadlock on async job hangs finisher thread

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agoMerge pull request #40627 from petrutlucian94/wip-50187-pacific
Yuri Weinstein [Fri, 23 Apr 2021 14:42:26 +0000 (07:42 -0700)]
Merge pull request #40627 from petrutlucian94/wip-50187-pacific

pacific: cephfs: minor ceph-dokan improvements

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agoMerge PR #40746 into pacific
Sage Weil [Fri, 23 Apr 2021 12:21:50 +0000 (07:21 -0500)]
Merge PR #40746 into pacific

* refs/pull/40746/head:
doc/cephadm: fix a typo
mgr/cephadm: rewrite/simplify describe_service
mgr/orchestrator: report osds as osd.unmanaged as appropriate
mgr/orchestrator: remove IMAGE ID from 'orch ls'
cephadm: normalize unqualified repo digests to docker.io
mgr/cephadm/upgrade: normalize unqualified target image
cephadm:persist the grafana.db file
qa/tasks/cephadm: add apply() method/task
cephadm: pass '-i' to docker|podman run for shell|enter

Reviewed-by: Juan Miguel Olmo <jolmomar@redhat.com>
4 years agoMerge PR #40985 into pacific
Sage Weil [Thu, 22 Apr 2021 16:23:50 +0000 (11:23 -0500)]
Merge PR #40985 into pacific

* refs/pull/40985/head:
ceph-volume: fix raw listing when finding OSDs from different clusters

Reviewed-by: Sage Weil <sage@redhat.com>
4 years agopybind/mgr/mgr_util: fix typing annotation 40630/head
Kefu Chai [Sun, 4 Apr 2021 02:02:55 +0000 (10:02 +0800)]
pybind/mgr/mgr_util: fix typing annotation

and refactor lock_timeout_log() a little bit to drop `locked`.

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 5e1c42082e7076ae6f5264d6005029690588e5aa)

4 years agopybind/mgr/volumes: log mutex locks to help debug deadlocks
Patrick Donnelly [Mon, 22 Mar 2021 18:48:46 +0000 (11:48 -0700)]
pybind/mgr/volumes: log mutex locks to help debug deadlocks

There is a hang in get_job which is holding the mutex [1]. This debug
output is meant to help find this issue in upstream QA logs.

[1] https://tracker.ceph.com/issues/49605#note-5
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit cf2a1ad651208242868a2a7fdf25006f30ad9cc8)

4 years agomgr/pybind/volumes: avoid acquiring lock for thread count updates
Patrick Donnelly [Mon, 22 Mar 2021 16:17:43 +0000 (09:17 -0700)]
mgr/pybind/volumes: avoid acquiring lock for thread count updates

Perform thread count updates in a dedicated tick thread. This avoids the
mgr Finisher thread from getting potentially hung via a mutex deadlock
in the cloner thread management.

Fixes: https://tracker.ceph.com/issues/49605
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit b27ddfaed4a3c66bac2343c8315a1fe542edb63e)

4 years agoqa: bump debugging for mgr
Patrick Donnelly [Fri, 19 Mar 2021 21:44:37 +0000 (14:44 -0700)]
qa: bump debugging for mgr

Hunting [1].

[1] https://tracker.ceph.com/issues/49605
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 17b291e57d18d13643761570adf208fbbca06252)

4 years agomgr: add debug output for commands dispatched
Patrick Donnelly [Fri, 19 Mar 2021 21:41:58 +0000 (14:41 -0700)]
mgr: add debug output for commands dispatched

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit bb56c30167bed615db86aff5290550a887b3731a)

4 years agomgr/dashboard: update documentation about creating NFS export. 40990/head
Alfonso Martínez [Thu, 22 Apr 2021 12:10:25 +0000 (14:10 +0200)]
mgr/dashboard: update documentation about creating NFS export.

- You learn first that orchestrator-managed clusters are detected automatically (therefore the documentation that follows is exclusively for user-defined clusters).
- Include nfs-ganesha in the security scope list.

Fixes: https://tracker.ceph.com/issues/50440
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
(cherry picked from commit 335a91d20d2e0ec68ddbda6d94f9e90dea16114f)

4 years agomgr/dashboard: fix duplicated rows when creating NFS export.
Alfonso Martínez [Thu, 22 Apr 2021 12:09:52 +0000 (14:09 +0200)]
mgr/dashboard: fix duplicated rows when creating NFS export.

- Show an error message if the same pool & namespace is used by more than 1 cluster.
- Fix error handling when no rgw daemons found.

Fixes: https://tracker.ceph.com/issues/50440
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
(cherry picked from commit 8d53d7b8e528a08358132032d11f96c774deedb8)

4 years agoos/FileStore: don't propagate split/merge error to "create"/"remove" 40989/head
Mykola Golub [Mon, 19 Apr 2021 07:32:01 +0000 (08:32 +0100)]
os/FileStore: don't propagate split/merge error to "create"/"remove"

Either ignore or terminate, otherwise it may confuse the
"create"/"remove" caller.

Fixes: https://tracker.ceph.com/issues/50395
Signed-off-by: Mykola Golub <mgolub@suse.com>
(cherry picked from commit 936898b8caf7b13a120ea6108df0b0dac29882c4)

4 years agoceph-volume: fix raw listing when finding OSDs from different clusters 40985/head
Sébastien Han [Thu, 22 Apr 2021 10:52:09 +0000 (12:52 +0200)]
ceph-volume: fix raw listing when finding OSDs from different clusters

When listing OSDs on host with 2 OSDs with the same ID, the output gets
overwritten with the last listed device. So a single OSD will show up.
See the ceph-volume.log which correctly parsed both disks:

```
[2021-04-22 09:44:21,391][ceph_volume.devices.raw.list][DEBUG ] Examining /dev/sda1
[2021-04-22 09:44:21,391][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph-bluestore-tool show-label --dev /dev/sda1
[2021-04-22 09:44:21,418][ceph_volume.process][INFO  ] stdout {
[2021-04-22 09:44:21,418][ceph_volume.process][INFO  ] stdout "/dev/sda1": {
[2021-04-22 09:44:21,418][ceph_volume.process][INFO  ] stdout "osd_uuid": "423bf64d-f241-4f4b-a589-25a66fc836d1",
[2021-04-22 09:44:21,418][ceph_volume.process][INFO  ] stdout "size": 6442450944,
[2021-04-22 09:44:21,418][ceph_volume.process][INFO  ] stdout "btime": "2021-04-22T09:32:55.894961+0000",
[2021-04-22 09:44:21,418][ceph_volume.process][INFO  ] stdout "description": "main",
[2021-04-22 09:44:21,418][ceph_volume.process][INFO  ] stdout "bfm_blocks": "1572864",
[2021-04-22 09:44:21,418][ceph_volume.process][INFO  ] stdout "bfm_blocks_per_key": "128",
[2021-04-22 09:44:21,418][ceph_volume.process][INFO  ] stdout "bfm_bytes_per_block": "4096",
[2021-04-22 09:44:21,418][ceph_volume.process][INFO  ] stdout "bfm_size": "6442450944",
[2021-04-22 09:44:21,418][ceph_volume.process][INFO  ] stdout "bluefs": "1",
[2021-04-22 09:44:21,419][ceph_volume.process][INFO  ] stdout "ceph_fsid": "d3cd4b72-5342-4fd3-96ec-a6e581261eab",
[2021-04-22 09:44:21,419][ceph_volume.process][INFO  ] stdout "kv_backend": "rocksdb",
[2021-04-22 09:44:21,419][ceph_volume.process][INFO  ] stdout "magic": "ceph osd volume v026",
[2021-04-22 09:44:21,419][ceph_volume.process][INFO  ] stdout "mkfs_done": "yes",
[2021-04-22 09:44:21,419][ceph_volume.process][INFO  ] stdout "osd_key": "AQDGQoFg+XHqJBAAw9ZQmtrnotHCLI0Nc2to6A==",
[2021-04-22 09:44:21,419][ceph_volume.process][INFO  ] stdout "ready": "ready",
[2021-04-22 09:44:21,419][ceph_volume.process][INFO  ] stdout "whoami": "0"
[2021-04-22 09:44:21,419][ceph_volume.process][INFO  ] stdout }
[2021-04-22 09:44:21,419][ceph_volume.process][INFO  ] stdout }
[2021-04-22 09:44:21,419][ceph_volume.devices.raw.list][DEBUG ] Examining /dev/sda2
[2021-04-22 09:44:21,419][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph-bluestore-tool show-label --dev /dev/sda2
[2021-04-22 09:44:21,445][ceph_volume.process][INFO  ] stdout {
[2021-04-22 09:44:21,445][ceph_volume.process][INFO  ] stdout "/dev/sda2": {
[2021-04-22 09:44:21,445][ceph_volume.process][INFO  ] stdout "osd_uuid": "c7c66bbd-7b38-4dcd-ad6d-3769c516f2fe",
[2021-04-22 09:44:21,446][ceph_volume.process][INFO  ] stdout "size": 6442450944,
[2021-04-22 09:44:21,446][ceph_volume.process][INFO  ] stdout "btime": "2021-04-22T09:32:21.814768+0000",
[2021-04-22 09:44:21,446][ceph_volume.process][INFO  ] stdout "description": "main",
[2021-04-22 09:44:21,446][ceph_volume.process][INFO  ] stdout "bfm_blocks": "1572864",
[2021-04-22 09:44:21,446][ceph_volume.process][INFO  ] stdout "bfm_blocks_per_key": "128",
[2021-04-22 09:44:21,446][ceph_volume.process][INFO  ] stdout "bfm_bytes_per_block": "4096",
[2021-04-22 09:44:21,446][ceph_volume.process][INFO  ] stdout "bfm_size": "6442450944",
[2021-04-22 09:44:21,446][ceph_volume.process][INFO  ] stdout "bluefs": "1",
[2021-04-22 09:44:21,446][ceph_volume.process][INFO  ] stdout "ceph_fsid": "69c40cb1-22af-42e4-9d59-4a4468a2f58f",
[2021-04-22 09:44:21,446][ceph_volume.process][INFO  ] stdout "kv_backend": "rocksdb",
[2021-04-22 09:44:21,446][ceph_volume.process][INFO  ] stdout "magic": "ceph osd volume v026",
[2021-04-22 09:44:21,446][ceph_volume.process][INFO  ] stdout "mkfs_done": "yes",
[2021-04-22 09:44:21,446][ceph_volume.process][INFO  ] stdout "osd_key": "AQCkQoFgre9SKBAANgHH6scIb+IiyKxh6MhY0A==",
[2021-04-22 09:44:21,446][ceph_volume.process][INFO  ] stdout "ready": "ready",
[2021-04-22 09:44:21,446][ceph_volume.process][INFO  ] stdout "require_osd_release": "16",
[2021-04-22 09:44:21,446][ceph_volume.process][INFO  ] stdout "whoami": "0"
[2021-04-22 09:44:21,446][ceph_volume.process][INFO  ] stdout }
[2021-04-22 09:44:21,446][ceph_volume.process][INFO  ] stdout }
```

However, a single OSD gets listed by `ceph-volume raw list`:

```
[root@2b5a3b8bf31c /]# ceph-volume raw list
{
    "0": {
        "ceph_fsid": "69c40cb1-22af-42e4-9d59-4a4468a2f58f",
        "device": "/dev/sda2",
        "osd_id": 0,
        "osd_uuid": "c7c66bbd-7b38-4dcd-ad6d-3769c516f2fe",
        "type": "bluestore"
    }
}
```

We now use the osd_uuid so the output will never conflict:

```
[root@2b5a3b8bf31c /]# ceph-volume raw list
{
    "423bf64d-f241-4f4b-a589-25a66fc836d1": {
        "ceph_fsid": "d3cd4b72-5342-4fd3-96ec-a6e581261eab",
        "dev": "/dev/sda1",
        "osd_id": 0,
        "osd_uuid": "423bf64d-f241-4f4b-a589-25a66fc836d1",
        "type": "bluestore"
    },
    "c7c66bbd-7b38-4dcd-ad6d-3769c516f2fe": {
        "ceph_fsid": "69c40cb1-22af-42e4-9d59-4a4468a2f58f",
        "dev": "/dev/sda2",
        "osd_id": 0,
        "osd_uuid": "c7c66bbd-7b38-4dcd-ad6d-3769c516f2fe",
        "type": "bluestore"
    }
}
```

Fixes: https://tracker.ceph.com/issues/50478
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit ec0f5f3b22d24754c16131a1996e42b787e4255f)

4 years agomgr/dashboard: Remove username and password from request body 40981/head
Nizamudeen A [Wed, 21 Apr 2021 08:10:39 +0000 (13:40 +0530)]
mgr/dashboard: Remove username and password from request body

Fixes: https://tracker.ceph.com/issues/50451
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 273a776cad8065f568f17a05804aabd14625a1f0)

4 years agomgr/dashboard: filesystem pool size should use stored stat 40980/head
Avan Thakkar [Thu, 15 Apr 2021 13:28:52 +0000 (18:58 +0530)]
mgr/dashboard: filesystem pool size should use stored stat

Fixes: https://tracker.ceph.com/issues/50195
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Replaces 'bytes_used' with 'stored' stat to see the correct results
of CephFS pool stats.

(cherry picked from commit 7110fd4e0c257d20aa56591f05d74a2851a2fe00)

4 years agomgr/dashboard:Simplify some complex calculations in test_alerts.yml 40929/head
Aashish Sharma [Thu, 25 Mar 2021 05:55:37 +0000 (11:25 +0530)]
mgr/dashboard:Simplify some complex calculations in test_alerts.yml

run-promtool-unittests is failing with difference in floating point values in some complex calculations. This PR intends to simplify those calculations and fix this issue.

Fixes: https://tracker.ceph.com/issues/49952
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 8d2f39e6c568afb6880689160212bcc93057e194)

4 years agoceph.spec,install-deps: use golang-github-prometheus for promtools
Kefu Chai [Mon, 22 Mar 2021 06:07:54 +0000 (14:07 +0800)]
ceph.spec,install-deps: use golang-github-prometheus for promtools

instead of installing docker for using promtools, install
golang-github-prometheus.

Signed-off-by: Aashish Sharma <aasharma@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit e33e3a931db97d01318643ec686fe63fdd614082)

4 years agoMerge pull request #40822 from rhcs-dashboard/wip-50303-pacific
Ernesto Puerta [Thu, 22 Apr 2021 09:01:44 +0000 (11:01 +0200)]
Merge pull request #40822 from rhcs-dashboard/wip-50303-pacific

pacific: mgr/dashboard: fix errors when creating NFS export.

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
4 years agomds/scrub: background scrub error fixes 40825/head
Milind Changire [Thu, 1 Apr 2021 09:24:10 +0000 (14:54 +0530)]
mds/scrub: background scrub error fixes

Accommodate new files and raw stats errors for dirty inodes and declare
scrub as passed.

Fixes: https://tracker.ceph.com/issues/48805
Signed-off-by: Milind Changire <mchangir@redhat.com>
(cherry picked from commit 4748053249d8b7b319ca3b36ec5a222ba6b9acef)

4 years agomon: check mdsmap is resizeable before promoting standby-replay 40852/head
Patrick Donnelly [Wed, 7 Apr 2021 19:27:05 +0000 (12:27 -0700)]
mon: check mdsmap is resizeable before promoting standby-replay

If any MDS is up:creating, some rank data structures may not exist yet.

Fixes: https://tracker.ceph.com/issues/50215
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit e1748e7a8399f862accf35ba21a63a8e1ae8bd4f)

4 years agoqa: test standby-replay with fs:workloads 40853/head
Patrick Donnelly [Mon, 29 Mar 2021 22:08:28 +0000 (15:08 -0700)]
qa: test standby-replay with fs:workloads

Fixes: https://tracker.ceph.com/issues/50045
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 4702be4f47fa5470488a905faee669c49397caa2)

4 years agoMerge pull request #40957 from rhcs-dashboard/wip-50458-pacific
Ilya Dryomov [Wed, 21 Apr 2021 16:00:46 +0000 (18:00 +0200)]
Merge pull request #40957 from rhcs-dashboard/wip-50458-pacific

pacific: vstart.sh: disable "auth_allow_insecure_global_id_reclaim"

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
4 years agomgr/dashboard: fix errors when creating NFS export. 40822/head
Alfonso Martínez [Fri, 9 Apr 2021 08:51:21 +0000 (10:51 +0200)]
mgr/dashboard: fix errors when creating NFS export.

- Fix daemon raw config parsing.
- Handle error when no rgw daemons found.

Fixes: https://tracker.ceph.com/issues/49925
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
(cherry picked from commit 8bad7360ef23fa154622d0bee7823092b9440ca6)

4 years agotest: run promtool test without docker on ubuntu/focal
Kefu Chai [Fri, 19 Mar 2021 02:32:16 +0000 (10:32 +0800)]
test: run promtool test without docker on ubuntu/focal

before this change, we use docker for running promtools offered by
a docker image, but this is not efficient, and quite a few developers
do not want to use docker for running "make check". this change was
introduced by #39246, the reason was that, in Ceph's CI process, we
are using Ubuntu/Bionic for running "make check" jobs, but prometheus
packaged by Bionic does not offer the "test rules" command. so, to
address problem, we are using "dnanexus/promtool:2.9.2" docker image
for verifying monitoring/prometheus/alerts/test_alerts.yml.

after this change, we use prometheus packaged by debian derivatives
instead of pulling a docker image.

* debian/control: add prometheus as a "make check" dependency
* install-deps.sh: partially revert
  53a5816deda0874a3a37e131e9bc22d88bb2a588, as we don't need to
  pull docker or start docker service for using promtool anymore.
* cmake: check if promtool is capable of running "test rules"
  command, bail out if it is not.

see also: https://tracker.ceph.com/issues/49653

Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit f381aa8bf0e175940153975fa1534ef0559ecadd)

4 years agomgr/dashboard:test prometheus rules through promtool
Aashish Sharma [Wed, 3 Feb 2021 07:23:56 +0000 (12:53 +0530)]
mgr/dashboard:test prometheus rules through promtool

This PR intends to add unit testing for prometheus rules using promtool. To run the tests run 'run-promtool-unittests.sh' file.

Fixes: https://tracker.ceph.com/issues/45415
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
(cherry picked from commit 53a5816deda0874a3a37e131e9bc22d88bb2a588)

4 years agovstart.sh: disable "auth_allow_insecure_global_id_reclaim" 40957/head
Kefu Chai [Thu, 15 Apr 2021 13:07:53 +0000 (21:07 +0800)]
vstart.sh: disable "auth_allow_insecure_global_id_reclaim"

to silence the health warning of "mons are allowing insecure global_id
reclaim", which prevents the cluster from being active+clean. couple
tests are expecting a warning free cluster before they starts.

as this option is enabled by default for appeasing the old clients, but when it
comes to most of upstream testing, we can just disable it.

Fixes: https://tracker.ceph.com/issues/50374
Signed-off-by: Kefu Chai <kchai@redhat.com>
(cherry picked from commit 77a8376d0731c24e7bbf24523d3d7450e9f978af)

4 years agoMerge PR #40687 into pacific
Patrick Donnelly [Wed, 21 Apr 2021 00:01:13 +0000 (17:01 -0700)]
Merge PR #40687 into pacific

* refs/pull/40687/head:
doc/cephfs/nfs: add user id, fs name and key to FSAL block

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agocephfs: ceph-dokan - properly log the mounted root 40627/head
Lucian Petrut [Wed, 10 Mar 2021 07:20:37 +0000 (07:20 +0000)]
cephfs: ceph-dokan - properly log the mounted root

This is a simple change that updates the logged mounted directory.
We're incorrectly using "ceph_getcwd" instead of the actual root
path.

Fixes: https://tracker.ceph.com/issues/49662
Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
(cherry picked from commit 7e8ea37c704422aade3c9b1ef34a0c8a4b17e1b3)

4 years agocephfs: Update ceph-dokan "--removable" flag
Lucian Petrut [Wed, 10 Mar 2021 07:14:37 +0000 (07:14 +0000)]
cephfs: Update ceph-dokan "--removable" flag

"-m" is the short option for "--removable". We tried to keep the
syntax as close as possible to the old ceph-dokan project but this
flag is already used.

We'll drop the short option, keeping the long one ("--removable").

Fixes: https://tracker.ceph.com/issues/49662
Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
(cherry picked from commit 5976e4f58373537374e47645647c84b568a165b8)

4 years agocephfs: document using multiple fs on Windows
Lucian Petrut [Mon, 8 Mar 2021 15:25:30 +0000 (15:25 +0000)]
cephfs: document using multiple fs on Windows

This change updates the ceph-dokan documentation, showing how
a non-default Ceph filesystem can be mounted.

Fixes: https://tracker.ceph.com/issues/49662
Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
(cherry picked from commit 5a1be89c5ac5fbde60332df8b008fcd5c6d0deb7)

4 years agocephfs: provide additional volume details on Windows
Lucian Petrut [Mon, 8 Mar 2021 14:25:25 +0000 (14:25 +0000)]
cephfs: provide additional volume details on Windows

At the moment, the Windows volume name and serial number are hardcoded.

This change makes the volume name configurable, defaulting to "Ceph"
or "Ceph - <fs_name>". This makes it easier to identify Ceph mounts.

At the same time, we're going to retrieve the Ceph filesystem identifier
instead of using the hardcoded value.

Fixes: https://tracker.ceph.com/issues/49662
Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
(cherry picked from commit 8a190816947b414366f5e1c6fac36ce0954f80d8)

4 years agocephfs: add ceph-dokan unmap command
Lucian Petrut [Mon, 8 Mar 2021 10:44:37 +0000 (10:44 +0000)]
cephfs: add ceph-dokan unmap command

At the moment, Windows CephFS mounts can only be removed by
terminating the daemon (e.g. sending CTRL-C) or through the
Windows mount manager if the "-o -m" parameters were passed
when the mapping was created.

This change adds the "ceph-dokan unmap" command, which takes
the mountpoint as input.

Fixes: https://tracker.ceph.com/issues/49662
Signed-off-by: Lucian Petrut <lpetrut@cloudbasesolutions.com>
(cherry picked from commit cd9b098afd2d6af3c097c9f79a6a3f924de9e02b)

4 years agoMerge branch 'pacific-saved' into pacific
Ilya Dryomov [Tue, 20 Apr 2021 09:01:15 +0000 (11:01 +0200)]
Merge branch 'pacific-saved' into pacific

Conflicts:
qa/tasks/ceph.conf.template [ commit 94df76244798
  ("qa/tasks/ceph.conf: shorten cephx TTL for testing") was
  cherry-picked to 16.2.0 separately and so exists both in
  16.2.0 and pacific-saved ]
qa/tasks/cephadm.conf [ ditto ]

4 years ago16.2.1 v16.2.1
Jenkins Build Slave User [Mon, 19 Apr 2021 13:50:07 +0000 (13:50 +0000)]
16.2.1

4 years agoauth/cephx: make KeyServer::build_session_auth_info() less confusing
Ilya Dryomov [Thu, 15 Apr 2021 13:18:58 +0000 (15:18 +0200)]
auth/cephx: make KeyServer::build_session_auth_info() less confusing

The second KeyServer::build_session_auth_info() overload is used only
by the monitor, for mon <-> mon authentication.  The monitor passes in
service_secret (mon secret) and secret_id (-1).  The TTL is irrelevant
because there is no rotation.

However the signature doesn't make it obvious.  Clarify that
service_secret and secret_id are input parameters and info is the only
output parameter.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 6f12cd3688b753633c8ff29fb3bd64758f960b2b)

4 years agoauth/cephx: cap ticket validity by expiration of "next" key
Ilya Dryomov [Thu, 15 Apr 2021 07:48:13 +0000 (09:48 +0200)]
auth/cephx: cap ticket validity by expiration of "next" key

If auth_mon_ticket_ttl is increased by several times as done in
commit 522a52e6c258 ("auth/cephx: rotate auth tickets less often"),
active clients eventually get stuck because the monitor sends out an
auth ticket with a bogus validity.  The ticket is secured with the
"current" secret that is scheduled to expire according to the old TTL,
but the validity of the ticket is set to the new TTL.  As a result,
the client simply doesn't attempt to renew, letting the secrets rotate
potentially more than once.  When that happens, the client first hits
auth authorizer errors as it tries to renew service tickets and when
it finally gets to renewing the auth ticket, it hits the insecure
global_id reclaim wall.

Cap TTL by expiration of "next" key -- the "current" key may be
milliseconds away from expiration and still be used, legitimately.
Do it in KeyServerData alongside key rotation code and propagate the
capped TTL to the upper layer.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 370c9b13970d47a55b1b20ef983c6f01236c9565)

4 years agoauth/cephx: drop redundant KeyServerData::get_service_secret() overload
Ilya Dryomov [Thu, 15 Apr 2021 07:47:50 +0000 (09:47 +0200)]
auth/cephx: drop redundant KeyServerData::get_service_secret() overload

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 3078af716505ae754723864786a41a6d6af0534c)

4 years agomgr/dashboard: improve telemetry opt-in reminder notification message 40887/head
Waad Alkhoury [Tue, 30 Mar 2021 06:38:01 +0000 (08:38 +0200)]
mgr/dashboard: improve telemetry opt-in reminder notification message

Added activition button and linked the word telemetry to telemetry documentation

Fixes: https://tracker.ceph.com/issues/49606
Signed-off-by: Waad Alkhoury <walkhour@redhat.com>
(cherry picked from commit 527d912b878087672ab537b59e3addf35108a77c)

4 years agodoc/cephadm: fix a typo 40746/head
Guillaume Abrioux [Wed, 7 Apr 2021 15:00:13 +0000 (17:00 +0200)]
doc/cephadm: fix a typo

This fixes a small typo in the cephadm/iscsi documentation

s/iSCSI Ganesha gateway/iSCSI gateway/

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 6602eb7e7cd127509c4967c84d92574ccd7296c4)

4 years agomgr/cephadm: rewrite/simplify describe_service
Sage Weil [Fri, 9 Apr 2021 20:26:00 +0000 (16:26 -0400)]
mgr/cephadm: rewrite/simplify describe_service

The prior implementation first tried to fabricate services based on the
running daemons, and then filled in defined services on top.  This led
to duplication and a range of small errors.

Instead, flip this around: start with the services that are defined,
and only fill in 'unmanaged' services where we need to.

Drop the osd kludges and instead rely on DaemonDescription.service_id to
return the right thing.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 58d9e90425679fd715aa31d7c8f1044f4582388e)

4 years agomgr/orchestrator: report osds as osd.unmanaged as appropriate
Sage Weil [Fri, 9 Apr 2021 20:22:49 +0000 (16:22 -0400)]
mgr/orchestrator: report osds as osd.unmanaged as appropriate

If there is no osdspec_affinity or service_name (from unit.meta), then
report as 'osd.unmanaged'.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 5adef5f7663e25dc946a9a44d5a1ac33e8452ccf)

4 years agomgr/orchestrator: remove IMAGE ID from 'orch ls'
Sage Weil [Fri, 9 Apr 2021 19:35:17 +0000 (15:35 -0400)]
mgr/orchestrator: remove IMAGE ID from 'orch ls'

This is not very useful at this level:
 - we see it from 'orch ps'
 - it can be a mix of ids during upgrade
 - some services may have multiple images at steady state (e.g., ingress)

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 2b63ae25c9af576b8cbab80b17af013b2868f7a2)

4 years agoMerge pull request #40826 from idryomov/wip-no-cephxv2-for-unmap-pacific
Ilya Dryomov [Tue, 13 Apr 2021 16:41:54 +0000 (18:41 +0200)]
Merge pull request #40826 from idryomov/wip-no-cephxv2-for-unmap-pacific

pacific: qa/suites/krbd: don't require CEPHX_V2 for unmap subsuite

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
4 years agoMerge pull request #40665 from idryomov/wip-require-ceph-common-for-ioc-pacific
Ilya Dryomov [Tue, 13 Apr 2021 09:44:48 +0000 (11:44 +0200)]
Merge pull request #40665 from idryomov/wip-require-ceph-common-for-ioc-pacific

pacific: packaging: require ceph-common for immutable object cache daemon

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
4 years agoqa/suites/krbd: don't require CEPHX_V2 for unmap subsuite 40826/head
Ilya Dryomov [Sat, 3 Apr 2021 09:13:56 +0000 (11:13 +0200)]
qa/suites/krbd: don't require CEPHX_V2 for unmap subsuite

Starting with pacific, CEPHX_V2 is required by default but
pre-single-major.yaml kernel doesn't support it.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 4027eb864efeb8b85f3d459048aabdffb894b150)

4 years agoqa/standalone: default to disable insecure global id reclaim
Sage Weil [Sun, 28 Mar 2021 22:07:57 +0000 (18:07 -0400)]
qa/standalone: default to disable insecure global id reclaim

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 72c4fc75ad301980baebc7789ed6391444057e5b)

4 years agoqa/suites/upgrade/octopus-x: disable insecure global_id reclaim health warnings
Sage Weil [Thu, 25 Mar 2021 17:36:56 +0000 (13:36 -0400)]
qa/suites/upgrade/octopus-x: disable insecure global_id reclaim health warnings

These will trigger on upgrade; suppress them so that our health gates
will still work.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 3e80f61efeafc186ea8130984d64c05b2707d6ba)

Conflicts:
qa/suites/upgrade/octopus-x/rgw-multisite/overrides.yaml [
  commit b6773dd3f197 ("qa/rgw: add octopus-x upgrade suite for
  multisite") not in pacific ]

4 years agoqa/tasks/ceph[adm].conf[.template]: disable insecure global_id reclaim health alerts
Sage Weil [Fri, 26 Mar 2021 22:08:46 +0000 (18:08 -0400)]
qa/tasks/ceph[adm].conf[.template]: disable insecure global_id reclaim health alerts

Turn these off everywhere for our tests so they don't interfere with our health checks.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 9f6fd4fe563c9cd4cf65316921d511b677c972e4)

4 years agocephadm: set auth_allow_insecure_global_id_reclaim for mon on bootstrap
Sage Weil [Fri, 26 Mar 2021 16:02:50 +0000 (12:02 -0400)]
cephadm: set auth_allow_insecure_global_id_reclaim for mon on bootstrap

If this is a fresh pacific cluster, let's assume that there won't be
legacy clients connecting.  (And if there are, let's put the burden on
the user to enable them to do so insecurely.)

This is in contrast to upgrades, where our focus is on not breaking
anything.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 7ca74183226b1125b29f4ea8f324ae9e38b46795)

4 years agomon/HealthMonitor: raise AUTH_INSECURE_GLOBAL_ID_RENEWAL[_ALLOWED]
Sage Weil [Thu, 25 Mar 2021 22:07:53 +0000 (18:07 -0400)]
mon/HealthMonitor: raise AUTH_INSECURE_GLOBAL_ID_RENEWAL[_ALLOWED]

Two new alerts:

- AUTH_INSECURE_GLOBAL_ID_RENEWAL_ALLOWED if we are allowing clients to reclaim
global_ids in an insecure manner (for backwards compatibility until
clients are upgraded)

- AUTH_INSECURE_GLBOAL_ID_RENEWAL if there are currently clients connected that
do not know how to securely renew their global_id, as exposed by
auth_expose_insecure_global_id_reclaim=true.  The client auth names and IPs
are listed the alert details (up to a limit, at least).

The docs recommend operators mute these alerts instead of silencing, but
we still include option that allow the alerts to be disabled entirely.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 18b343b06e5dd904af425dc99e2c848e12f3b552)

4 years agoauth/cephx: ignore CEPH_ENTITY_TYPE_AUTH in requested keys
Ilya Dryomov [Tue, 2 Mar 2021 14:09:26 +0000 (15:09 +0100)]
auth/cephx: ignore CEPH_ENTITY_TYPE_AUTH in requested keys

When handling CEPHX_GET_AUTH_SESSION_KEY requests from nautilus+
clients, ignore CEPH_ENTITY_TYPE_AUTH in CephXAuthenticate::other_keys.
Similarly, when handling CEPHX_GET_PRINCIPAL_SESSION_KEY requests,
ignore CEPH_ENTITY_TYPE_AUTH in CephXServiceTicketRequest::keys.
These fields are intended for requesting service tickets, the auth
ticket (which is really a ticket granting ticket) must not be shared
this way.

Otherwise we end up sharing an auth ticket that a) isn't encrypted
with the old session key even if needed (should_enc_ticket == true)
and b) has the wrong validity, namely auth_service_ticket_ttl instead
of auth_mon_ticket_ttl.  In the CEPHX_GET_AUTH_SESSION_KEY case, this
undue ticket immediately supersedes the actual auth ticket already
encoded in the same reply (the reply frame ends up containing two auth
tickets).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 05772ab6127bdd9ed2f63fceef840f197ecd9ea8)

4 years agoauth/cephx: rotate auth tickets less often
Ilya Dryomov [Mon, 22 Mar 2021 18:16:32 +0000 (19:16 +0100)]
auth/cephx: rotate auth tickets less often

If unauthorized global_id (re)use is disallowed, a client that has
been disconnected from the network long enough for keys to rotate
and its auth ticket to expire (i.e. become invalid/unverifiable)
would not be able to reconnect.

The default TTL is 12 hours, resulting in a 12-24 hour reconnect
window (the previous key is kept around, so the actual window can be
up to double the TTL).  The setting has stayed the same since 2009,
but it also hasn't been enforced.  Bump it to get a 72 hour reconnect
window to cover for something breaking on Friday and not getting fixed
until Monday.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 522a52e6c258932274f0753feb623ce008519216)

4 years agomon: fail fast when unauthorized global_id (re)use is disallowed
Ilya Dryomov [Thu, 25 Mar 2021 19:59:13 +0000 (20:59 +0100)]
mon: fail fast when unauthorized global_id (re)use is disallowed

When unauthorized global_id (re)use is disallowed, we don't want to
let unpatched clients in because they wouldn't be able to reestablish
their monitor session later, resulting in subtle hangs and disrupted
user workloads.

Denying the initial connect for all legacy (CephXAuthenticate < v3)
clients is not feasible because a large subset of them never stopped
presenting their ticket on reconnects and are therefore compatible with
enforcing mode: most notably all kernel clients but also pre-luminous
userspace clients.  They don't need to be patched and excluding them
would significantly hamper the adoption of enforcing mode.

Instead, force clients that we are not sure about to reconnect shortly
after they go through authentication and obtain global_id.  This is
done in Monitor::dispatch_op() to capture both msgr1 and msgr2, most
likely instead of dispatching mon_subscribe.

We need to let mon_getmap through for "ceph ping" and "ceph tell" to
work.  This does mean that we share the monmap, which lets the client
return from MonClient::authenticate() considering authentication to be
finished and causing the potential reconnect error to not propagate to
the user -- the client would hang waiting for remaining cluster maps.
For msgr1, this is unavoidable because the monmap is sent immediately
after the final MAuthReply.  But for msgr2 this is rare: most of the
time we get to their mon_subscribe and cut the connection before they
process the monmap!

Regardless, the user doesn't get a chance to start a workload since
there is no proper higher-level session at that point.

To help with identifying clients that need patching, add global_id and
global_id_status to "sessions" output.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 08766a17edebb7450cd9b17cc2dc01efc068bb94)

4 years agoauth/cephx: option to disallow unauthorized global_id (re)use
Ilya Dryomov [Sat, 13 Mar 2021 13:53:52 +0000 (14:53 +0100)]
auth/cephx: option to disallow unauthorized global_id (re)use

global_id is a cluster-wide unique id that must remain stable for the
lifetime of the client instance.  The cephx protocol has a facility to
allow clients to preserve their global_id across reconnects:

(1) the client should provide its global_id in the initial handshake
    message/frame and later include its auth ticket proving previous
    possession of that global_id in CEPHX_GET_AUTH_SESSION_KEY request

(2) the monitor should verify that the included auth ticket is valid
    and has the same global_id and, if so, allow the reclaim

(3) if the reclaim is allowed, the new auth ticket should be
    encrypted with the session key of the included auth ticket to
    ensure authenticity of the client performing reclaim.  (The
    included auth ticket could have been snooped when the monitor
    originally shared it with the client or any time the client
    provided it back to the monitor as part of requesting service
    tickets, but only the genuine client would have its session key
    and be able to decrypt.)

Unfortunately, all (1), (2) and (3) have been broken for a while:

- (1) was broken in 2016 by commit a2eb6ae3fb57 ("mon/monclient:
  hunt for multiple monitor in parallel") and is addressed in patch
  "mon/MonClient: preserve auth state on reconnects"

- it turns out that (2) has never been enforced.  When cephx was
  being designed and implemented in 2009, two changes to the protocol
  raced with each other pulling it in different directions: commits
  0669ca21f4f7 ("auth: reuse global_id when requesting tickets")
  and fec31964a12b ("auth: when renewing session, encrypt ticket")
  added the reclaim mechanism based strictly on auth tickets, while
  commit 5eeb711b6b2b ("auth: change server side negotiation a bit")
  allowed the client to provide global_id in the initial handshake.
  These changes didn't get reconciled and as a result a malicious
  client can assign itself any global_id of its choosing by simply
  passing something other than 0 in MAuth message or AUTH_REQUEST
  frame and not even bother supplying any ticket.  This includes
  getting a global_id that is being used by another client.

- (3) was broken in 2019 with addition of support for msgr2, where
  the new auth ticket ends up being shared unencrypted.  However the
  root cause is deeper and a malicious client can coerce msgr1 into
  the same.  This also goes back to 2009 and is addressed in patch
  "auth/cephx: ignore CEPH_ENTITY_TYPE_AUTH in requested keys".

Because (2) has never been enforced, no one noticed when (1) got
broken and we began to rely on this flaw for normal operation in
the face of reconnects due to network hiccups or otherwise.  As of
today, only pre-luminous userspace clients and kernel clients are
not exercising it on a daily basis.

Bump CephXAuthenticate version and use a dummy v3 to distinguish
between legacy clients that don't (may not) include their auth ticket
and new clients.  For new clients, unconditionally disallow claiming
global_id without a corresponding auth ticket.  For legacy clients,
introduce a choice between permissive (current behavior, default for
the foreseeable future) and enforcing mode.

If the reclaim is disallowed, return EACCES.  While MonClient does
have some provision for global_id changes and we could conceivably
implement enforcement by handing out a fresh global_id instead of
the provided one, those code paths have never been tested and there
are too many ways a sudden global_id change could go wrong.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit abebd643cc60fa8a7cb82dc29a9d5041fb3c3d36)

4 years agoauth/cephx: make cephx_decode_ticket() take a const ticket_blob
Ilya Dryomov [Tue, 30 Mar 2021 09:10:17 +0000 (11:10 +0200)]
auth/cephx: make cephx_decode_ticket() take a const ticket_blob

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 6b860684c6e59b11c727206819805f89f0518575)

4 years agoauth/AuthServiceHandler: keep track of global_id and whether it is new
Ilya Dryomov [Tue, 9 Mar 2021 15:33:55 +0000 (16:33 +0100)]
auth/AuthServiceHandler: keep track of global_id and whether it is new

AuthServiceHandler already has global_id field, but it is unused.
Revive it and let the handler know whether global_id is newly assigned
by the monitor or provided by the client.

Lift the setting of entity_name into AuthServiceHandler.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit b50b6abd60e730176a7ef602bdd25d789a3c467d)

4 years agoauth/AuthServiceHandler: build_cephx_response_header() is cephx-specific
Ilya Dryomov [Tue, 9 Mar 2021 13:36:39 +0000 (14:36 +0100)]
auth/AuthServiceHandler: build_cephx_response_header() is cephx-specific

Make the one in CephxServiceHandler private and drop the stub in
AuthNoneServiceHandler.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 49cba02a750d4c1ab68399401f0c04f9c9be5b9e)

4 years agoauth/AuthServiceHandler: drop unused start_session() args
Ilya Dryomov [Tue, 9 Mar 2021 13:25:39 +0000 (14:25 +0100)]
auth/AuthServiceHandler: drop unused start_session() args

session_key, connection_secret and connection_secret_required_length
aren't material for start_session() across all three implementations.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit c151c9659bdb71f30b520bbd62f91cc009ec51cd)

4 years agomon/MonClient: drop global_id arg from _add_conn() and _add_conns()
Ilya Dryomov [Tue, 30 Mar 2021 13:19:41 +0000 (15:19 +0200)]
mon/MonClient: drop global_id arg from _add_conn() and _add_conns()

Passing anything but MonClient instance's global_id doesn't make
sense.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit a71f6e90d43cca5a79db92ca6a640598796ae7ee)

4 years agomon/MonClient: reset auth state in shutdown()
Ilya Dryomov [Thu, 1 Apr 2021 08:55:36 +0000 (10:55 +0200)]
mon/MonClient: reset auth state in shutdown()

Destroying AuthClientHandler and not resetting global_id is another
way to get MonClient to send CEPHX_GET_AUTH_SESSION_KEY requests with
CephXAuthenticate::old_ticket not populated.  This is particularly
pertinent to get_monmap_and_config() which shuts down the bootstrap
MonClient between retry attempts.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit c9b022e07392979e7f9ea6c11484a7dd872cc235)

4 years agomon/MonClient: preserve auth state on reconnects
Ilya Dryomov [Mon, 8 Mar 2021 14:37:02 +0000 (15:37 +0100)]
mon/MonClient: preserve auth state on reconnects

Commit a2eb6ae3fb57 ("mon/monclient: hunt for multiple monitor in
parallel") introduced a regression where auth state (global_id and
AuthClientHandler) was no longer preserved on reconnects.  The ensuing
breakage was quickly noticed and prompted a follow-on fix 8bb6193c8f53
("mon/MonClient: persist global_id across re-connecting").

However, as evident from the subject, the follow-on fix only took
care of the global_id part.  AuthClientHandler is still destroyed
and all cephx tickets are discarded.  A new from-scratch instance
is created for each MonConnection and CEPHX_GET_AUTH_SESSION_KEY
requests end up with CephXAuthenticate::old_ticket not populated.
The bug is in MonClient, so both msgr1 and msgr2 are affected.

This should have resulted in a similar sort of breakage but didn't
because of a much larger bug.  The monitor should have denied the
attempt to reclaim global_id with no valid ticket proving previous
possession of that global_id presented.  Alas, it appears that this
aspect of the cephx protocol has never been enforced.  This is dealt
with in the next patch.

To fix the issue at hand, clone AuthClientHandler into each
MonConnection so that each respective CEPHX_GET_AUTH_SESSION_KEY
request gets a copy of the current auth ticket.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 236b536b28482ec9d8b872de03da7d702ce4787b)

4 years agomon/MonClient: claim active_con's auth explicitly
Ilya Dryomov [Sat, 6 Mar 2021 10:15:40 +0000 (11:15 +0100)]
mon/MonClient: claim active_con's auth explicitly

Eliminate confusion by moving auth from active_con into MonClient
instead of swapping them.

The existing MonClient::auth can be destroyed right away -- I don't
see why active_con would need it or a reason to delay its destruction
(which is what stashing in active_con effectively does).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit eec24e4d119c57c7eb5119dc0083616a61b33b89)

4 years agomon/MonClient: resurrect "waiting for monmap|config" timeouts
Ilya Dryomov [Thu, 1 Apr 2021 08:07:00 +0000 (10:07 +0200)]
mon/MonClient: resurrect "waiting for monmap|config" timeouts

This fixes a regression introduced in commit 85157d5aae3d ("mon:
s/Mutex/ceph::mutex/").  Waiting for monmap and config indefinitely
is not just bad UX, it actually masks other more serious bugs.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 6faa18e0a8e8efba6bd2978942eb9909b6568d5c)

4 years agoqa/tasks/ceph.conf: shorten cephx TTL for testing
Sage Weil [Mon, 5 Apr 2021 18:08:30 +0000 (13:08 -0500)]
qa/tasks/ceph.conf: shorten cephx TTL for testing

Rotate tickets frequently to exercise those code paths during testing.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 94df76244798cdc0bafd74c9e5197adb5aa990c0)

4 years agomgr/dashboard: debug nodeenv hangs 40815/head
Ernesto Puerta [Tue, 6 Apr 2021 11:45:15 +0000 (13:45 +0200)]
mgr/dashboard: debug nodeenv hangs

Increase verbosity in nodeenv command for debugging purposes.

Fixes: https://tracker.ceph.com/issues/50044
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
(cherry picked from commit 2c2a397f84455147e1cc5c7b5fc1289e47bbe5ee)

4 years agoMerge pull request #40805 from tchaikov/pacific-pr-40738
Kefu Chai [Mon, 12 Apr 2021 13:42:07 +0000 (21:42 +0800)]
Merge pull request #40805 from tchaikov/pacific-pr-40738

pacific: include/librados: fix doxygen syntax for docs build

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
4 years agoMerge pull request #40648 from rhcs-dashboard/wip-50203-pacific
Ernesto Puerta [Mon, 12 Apr 2021 09:36:37 +0000 (11:36 +0200)]
Merge pull request #40648 from rhcs-dashboard/wip-50203-pacific

pacific: mgr/dashboard: Revoke read-only user's access to Manager modules

Reviewed-by: Waadkh7 <walkhour@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
4 years agoinclude/librados: fix doxygen syntax for docs build 40805/head
Josh Durgin [Fri, 9 Apr 2021 22:01:32 +0000 (18:01 -0400)]
include/librados: fix doxygen syntax for docs build

The docs build is now warning about these like:

WARNING: Unparseable C cross-reference: '[in]'
Invalid C declaration: Expected identifier in nested name.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
(cherry picked from commit 70b8f16a2c9fe2375457bf66bf46e4296a86e31d)

4 years agoosd/PeeringState: fix acting_set_writeable min_size check 40759/head
Samuel Just [Fri, 2 Apr 2021 22:30:54 +0000 (22:30 +0000)]
osd/PeeringState: fix acting_set_writeable min_size check

acting.size() >= pool.info.min_size is meant to check min_size against
acting set participants, but acting is a vector with placeholders.
actingset is the representation with placeholders removed.

The upshot of this bug is that the activation process will basically
ignore min_size for an ec pool allowing writes in cases where it
shouldn't.  PastIntervals::check_new_interval, however, performs
the check correctly, and will therefore discount intervals in which
we really did serve writes as not writeable.  This can trigger many
different problem conditions including but not limited to:
  - Unfound objects due to accepting a last_update with insufficient
    osds
  - Lost writes
  - Crashes due to peering rules being violated

This bug was originally introduced with recovery below min_size in
e5a96fd, and then preserved through refactors in 749a13d and 95bec9.

7cb818a exposed it with with expansion of recovery below min_size
to include ec pools (acting.size() is sufficient for replicated
pools).

Fixes: https://tracker.ceph.com/issues/48613
Fixes: https://tracker.ceph.com/issues/48417
Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit 642a1c165499bcbd4cfdf907af313ac7ffe44ff4)

4 years agoosd/PeeringState: fix get_backfill_priority min_size comparison
Samuel Just [Fri, 2 Apr 2021 23:06:14 +0000 (23:06 +0000)]
osd/PeeringState: fix get_backfill_priority min_size comparison

acting has placeholders for ec, need to use actingset.

Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit 7b2e0f4fd1c9071495dae9189428aa1cb8774c30)

4 years agocephadm: normalize unqualified repo digests to docker.io
Sage Weil [Sat, 3 Apr 2021 13:14:00 +0000 (09:14 -0400)]
cephadm: normalize unqualified repo digests to docker.io

A RepoDigests returned by docker|podman image inspect can either include
the docker.io/ prefix or not.  For reasons that aren't entirely clear,
this may vary between hosts in a cluster.  However, ceph/ceph@sha256:abc...
is the same thing as docker.io/ceph/ceph@sha256:abc..., and should be
treated as such.  Otherwise, upgrade can get into a loop where it pulls
the image on a new host, finds the other variant of the repodigests,
sees no overlap, updates target_digests, and restarts.  (It will then
find the first variant again on the first host and loop.)

Avoid this by normalizing any docker.io digests by always including the
docker.io/ prefix.

Note that it is technically possible that this assumption is wrong: it
may be that the image that already exists on the local host is from a
different registry in registries.conf's unqualified-search-registries.
However, we don't know which, since this is a search list.  In practice,
it should be exceeding rare that an image that *we* are installing using
a fully-qualified image name will end up having an unqualified repodigest
in the local registry.  Hopefully!

Fixes: https://tracker.ceph.com/issues/50114
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit e07a73830436340e77180782216524f071a5a292)

4 years agomgr/cephadm/upgrade: normalize unqualified target image
Sage Weil [Tue, 6 Apr 2021 13:36:31 +0000 (09:36 -0400)]
mgr/cephadm/upgrade: normalize unqualified target image

If we get an unqualified target image, assume it's docker.io.  This
ensures that we're passing a fully-qualified target to docker|podman on
the various hosts and don't end up with something different based on the
per-host search path for unqualified image names.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 38f84520eefaba14d4f4ae5572ea948819a3fd7d)

4 years agocephadm:persist the grafana.db file
Paul Cuzner [Thu, 1 Apr 2021 04:38:18 +0000 (17:38 +1300)]
cephadm:persist the grafana.db file

This patch persists the grafana.db file, so the user can
create there own dashboard referencing ceph and node
metrics and not lose them on grafana restart! This also
ensures changes to users/passwords within grafana are
persisted.

Fixes: https://tracker.ceph.com/issues/49954
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
(cherry picked from commit 581250d943915cb3f8e7e4071440ee5cad5e5908)

4 years agoqa/tasks/cephadm: add apply() method/task
Sage Weil [Thu, 1 Apr 2021 20:37:13 +0000 (15:37 -0500)]
qa/tasks/cephadm: add apply() method/task

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 05b3ce258547d64a094b786a969d2b79c48a466e)

4 years agocephadm: pass '-i' to docker|podman run for shell|enter
Sage Weil [Mon, 5 Apr 2021 14:48:07 +0000 (09:48 -0500)]
cephadm: pass '-i' to docker|podman run for shell|enter

This allows us to pipe things to stdin.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 8091bfda762f76cacc9254d70f50eba8f80d4046)

4 years agoMerge PR #40663 into pacific
Sage Weil [Fri, 9 Apr 2021 23:04:33 +0000 (18:04 -0500)]
Merge PR #40663 into pacific

* refs/pull/40663/head:
qa/tasks/ceph.conf: shorten cephx TTL for testing

Reviewed-by: Sage Weil <sage@redhat.com>
4 years agoMerge pull request #40645 from batrick/i50015
Yuri Weinstein [Fri, 9 Apr 2021 15:41:42 +0000 (08:41 -0700)]
Merge pull request #40645 from batrick/i50015

pacific: qa: "AttributeError: 'NoneType' object has no attribute 'mon_manager'"

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agorbd-mirror: fix UB while registering perf counters 40680/head
Arthur Outhenin-Chalandre [Wed, 24 Mar 2021 09:05:07 +0000 (10:05 +0100)]
rbd-mirror: fix UB while registering perf counters

register_perf_counters was called before m_image_spec initialization
resulting in UB in the perf counters' name.

This moves the register_perf_counters() call to the init function
after the m_image_spec initialization.

Fixes: https://tracker.ceph.com/issues/49959
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
(cherry picked from commit 5e3b9d29b3a81923fed51248aa21749dbecfcd73)