Greg Farnum [Fri, 12 Nov 2021 23:05:02 +0000 (23:05 +0000)]
mon: MonMap: display disallowed_leaders whenever they're set
In c59a6f89465e3933631afa2ba92e8c1ae1c31c06, I erroneously changed
the CLI display output so it would only dump disallowed_leaders in
stretch mode. But they can also be set in connectivity or disallow
election modes and we want users to be able to see them then as well.
Sage Weil [Thu, 11 Nov 2021 15:31:22 +0000 (10:31 -0500)]
Merge PR #43046 into master
* refs/pull/43046/head:
mgr/rook: get running pods, auth rm, better error checking for orch nfs
qa/tasks/rook: add apply nfs to rook qa task
mgr/rook: prevent creation of NFS clusters not in .nfs rados pool
mgr/rook, mgr/nfs: update rook orchestrator to create and use .nfs pool
Reviewed-by: Juan Miguel Olmo <jolmomar@redhat.com> Reviewed-by: Varsha Rao <rvarsha016@gmail.com>
Roland Sommer [Fri, 8 Oct 2021 06:40:26 +0000 (08:40 +0200)]
mgr/prometheus: Make standby discoverable
Enable config settings to modify standby's behaviour on the index page
This makes the standby discoverable by reverse proxy or loadbalancer
setups. Testing for the empty response of the '/metrics' endpoint would
trigger metric collection on the active manager instance.
The newly added configuration options settings standby_behaviour and
standby_error_status_code are documented and flagged as runtime, as
modifying both settings has an immediate effect (no restart required).
Co-authored-by: Ernesto Puerta <37327689+epuertat@users.noreply.github.com> Signed-off-by: Roland Sommer <rol@ndsommer.de> Fixes: https://tracker.ceph.com/issues/53229
Patrick Donnelly [Wed, 10 Nov 2021 18:58:48 +0000 (13:58 -0500)]
Merge PR #42520 into master
* refs/pull/42520/head:
test: add cephfs-mirror HA active/active workunit and test yamls
test: add cephfs_mirror thrasher
tasks/cephfs_mirror: optionally run in foreground
mgr/mirroring: throttle directory reassigment to mirror daemons
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Joseph Sawaya [Tue, 14 Sep 2021 18:54:41 +0000 (14:54 -0400)]
mgr/rook: get running pods, auth rm, better error checking for orch nfs
This commit updates orch ls to show the age and the number of running nfs
pods, removes auth entities when removing an nfs service and implements
better error checking when creating nfs daemons.
Joseph Sawaya [Fri, 30 Jul 2021 16:07:31 +0000 (12:07 -0400)]
mgr/rook, mgr/nfs: update rook orchestrator to create and use .nfs pool
This commit moves the functionality for creating the .nfs pool from the
nfs module to the rook module and makes the rook module use the .nfs
pool when creating an NFS daemon.
mds/FSMap: assign v16.2.4 compat to pre-v16.2.5 standby daemons
With v16.2.5, the monitors store an MDS's CompatSet with its mds_info_t
in the MDSMap. If an older MDS fails and rejoins the cluster, it gets
assigned the empty CompatSet. This is problematic during upgrades as an
MDS failure may prevent the upgrade process from continuing and cause
file system unavailability.
This patch makes it so the mons will assign a reasonable default: a
CompatSet used since v14.2.0 until v16.2.5.
Fixes: https://tracker.ceph.com/issues/53150 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Sage Weil [Fri, 5 Nov 2021 15:39:07 +0000 (11:39 -0400)]
mgr/cephadm: allow osd spec removal
OSD specs/drivegroups are essentially templates for OSD creation but do
not map to the full lifecycle of the OSDs that they create. When a spec
is removed, remove it immediately.
If no --force is provided, the error lists which OSDs will be left behind.
If --force is passed, the service is removed.
This leaves behind a few oddities:
- When you list services, OSDs that were created by the drivegroup may
still exist, causing the drivegroup to appear in the list as
unmanaged services.
- If you create a new drivegroup with the same name, the prior OSDs will
appear to belong to the new spec instance, regardless of whether the
spec/drivegroup parameters are the same.
Kamoltat [Fri, 29 Oct 2021 21:23:52 +0000 (21:23 +0000)]
pybind/mgr/pg_autoscaler: typo default option scale-up to scale-down
Typo: `scale-up` should be `scale-down` in Module
Option.
This typo doesn't trigger a bug because we create
a key-value of `scale-down` profile in
the function `create_initial()` in `src/mon/KVMonitor.cc`.
This will override whatever is the default option
in pg_autoscaler/module.py when we start the cluster and
the monitor gets created.
The command: `ceph osd pool set autoscale-profile <option>`
is still the primary command to change the autoscale-profiler
after the pool is created.
crimson/net: drop crimson-specific check for the addr in ClientIdentFrame.
In crimson (but not in the classic OSD) we have an extra check that
verifies the address sent by our peer in `ClientIdentFrame` matches
`AsyncConnection::target_addr` at our side. In the Rook environment
this leads to problems with all cluster entities that are lacking
the `ms_learn_addr_from_peer=false` setting in their configurations.
This is true for `ceph-mgr`:
```
[root@rook-ceph-tools-698545dc56-zxrrx /]# ceph config show mgr.a ms_learn_addr_from_peer
true
```
Unfortunately, testing has shown that:
* clients in Rook also lack this extra bit of configuration while
* removing the extra check in crimson stops requiring any additional
configuration at clients.
Although this still might look like a workaround for Rook having
`ms_learn_addr_from_peer=false` solely for OSDs, I think we should
drop the check to preserve both:
* consistency of behaviour between OSD implementations,
* compatibility with Ceph clients in existing k8s clusters.
```
INFO 2021-10-26 18:53:26,067 [shard 0] ms - [osd.0(client) v2:172.17.0.2:6800/1270141526 >> unknown.? -@59700] ProtocolV2::start_accept(): target_addr=172.17.0.5:59700/0
DEBUG 2021-10-26 18:53:26,067 [shard 0] ms - [osd.0(client) v2:172.17.0.2:6800/1270141526 >> unknown.? -@59700] TRIGGER ACCEPTING, was NONE
DEBUG 2021-10-26 18:53:26,067 [shard 0] ms - [osd.0(client) v2:172.17.0.2:6800/1270141526 >> unknown.? -@59700] SEND(26) banner: len_payload=16, supported=1, required=0, banner="ceph v2
"
DEBUG 2021-10-26 18:53:26,068 [shard 0] ms - [osd.0(client) v2:172.17.0.2:6800/1270141526 >> unknown.? -@59700] RECV(10) banner: "ceph v2
"
DEBUG 2021-10-26 18:53:26,068 [shard 0] ms - [osd.0(client) v2:172.17.0.2:6800/1270141526 >> unknown.? -@59700] GOT banner: payload_len=16
DEBUG 2021-10-26 18:53:26,068 [shard 0] ms - [osd.0(client) v2:172.17.0.2:6800/1270141526 >> unknown.? -@59700] RECV(16) banner features: supported=1 required=0
DEBUG 2021-10-26 18:53:26,068 [shard 0] ms - [osd.0(client) v2:172.17.0.2:6800/1270141526 >> unknown.? -@59700] WRITE HelloFrame: my_type=osd, peer_addr=172.17.0.5:59700/0
DEBUG 2021-10-26 18:53:26,068 [shard 0] ms - [osd.0(client) v2:172.17.0.2:6800/1270141526 >> unknown.? -@59700] GOT HelloFrame: my_type=client peer_addr=v2:172.17.0.2:6800/1270141526
INFO 2021-10-26 18:53:26,068 [shard 0] ms - [osd.0(client) v2:172.17.0.2:6800/1270141526 >> client.? -@59700] UPDATE: peer_type=client, policy(lossy=true server=true standby=false resetcheck=false)
WARN 2021-10-26 18:53:26,068 [shard 0] ms - [osd.0(client) v2:172.17.0.2:6800/1270141526 >> client.? -@59700] my_addr_from_peer v2:172.17.0.2:6800/1270141526 port/nonce DOES match myaddr v2:172.17.0.2:6800/1270141526
DEBUG 2021-10-26 18:53:26,068 [shard 0] ms - [osd.0(client) v2:172.17.0.2:6800/1270141526 >> client.? -@59700] GOT AuthRequestFrame: method=2, preferred_modes={1, 2}, payload_len=174
INFO 2021-10-26 18:53:26,068 [shard 0] monc - added challenge on [osd.0(client) v2:172.17.0.2:6800/1270141526 >> client.? -@59700]
DEBUG 2021-10-26 18:53:26,068 [shard 0] ms - [osd.0(client) v2:172.17.0.2:6800/1270141526 >> client.? -@59700] WRITE AuthReplyMoreFrame: payload_len=32
DEBUG 2021-10-26 18:53:26,068 [shard 0] ms - [osd.0(client) v2:172.17.0.2:6800/1270141526 >> client.? -@59700] GOT AuthRequestMoreFrame: payload_len=174
DEBUG 2021-10-26 18:53:26,069 [shard 0] ms - [osd.0(client) v2:172.17.0.2:6800/1270141526 >> client.? -@59700] WRITE AuthDoneFrame: gid=14788, con_mode=crc, payload_len=36
DEBUG 2021-10-26 18:53:26,069 [shard 0] ms - [osd.0(client) v2:172.17.0.2:6800/1270141526 >> client.? -@59700] WRITE AuthSignatureFrame: signature=975c5d3ae09036abcb2ca7d4f7704ee681ca13151d9de2ee29394ec8aed9950c
DEBUG 2021-10-26 18:53:26,069 [shard 0] ms - [osd.0(client) v2:172.17.0.2:6800/1270141526 >> client.? -@59700] GOT AuthSignatureFrame: signature=6209032314d560a21a3109ec6d7c0623ebd78cf1ea4fc9462411dbabe28b2d8d
DEBUG 2021-10-26 18:53:26,069 [shard 0] ms - [osd.0(client) v2:172.17.0.2:6800/1270141526 >> client.? -@59700] GOT ClientIdentFrame: addrs=172.17.0.1:0/1137248631, target=v2:172.17.0.2:6800/1270141526, gid=14788, gs=9, features_supported=4540138297136906239, features_required=576460752303427584, flags=1, cookie=0
WARN 2021-10-26 18:53:26,069 [shard 0] ms - [osd.0(client) v2:172.17.0.2:6800/1270141526 >> client.? -@59700] peer's address 172.17.0.1:0/1137248631 is not v2 or not the same host with 172.17.0.5:59700/0
INFO 2021-10-26 18:53:26,070 [shard 0] ms - [osd.0(client) v2:172.17.0.2:6800/1270141526 >> client.? -@59700] execute_accepting(): fault at ACCEPTING, going to CLOSING -- std::system_error (error crimson::net:2, bad peer address)
```
This connectivity issue has been overcome by appending
`--ms_learn_addr_from_peer=false` to the `argv`:
```
[root@rook-ceph-tools-698545dc56-zxrrx /]# bin/rados bench -p test-pool 5 rand --ms_learn_addr_from_peer=false
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 1235 1219 4.76106 4.76172 0.020999 0.0129408
2 16 2531 2515 4.91158 5.0625 0.0131776 0.0126796
3 16 3746 3730 4.8563 4.74609 0.0145268 0.0128361
4 16 4951 4935 4.81889 4.70703 0.0154604 0.0129421
5 15 6236 6221 4.85972 5.02344 0.0121689 0.0128415
Total time run: 5.01136
Total reads made: 6236
Read size: 4096
Object size: 4096
Bandwidth (MB/sec): 4.86083
Average IOPS: 1244
Stddev IOPS: 43.1706
Max IOPS: 1296
Min IOPS: 1205
Average Latency(s): 0.01284
Max latency(s): 0.0244048
Min latency(s): 0.00201867
```
However, on classical OSD and **crimson with this patch applied** there
is no need for any configurables at the client-side:
```
[rook@rook-ceph-tools-698545dc56-xkkpf /]$ bin/rados bench -p test-pool 5 rand
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 1124 1108 4.32747 4.32812 0.011878 0.0143472
2 16 2323 2307 4.50534 4.68359 0.0117413 0.0138221
3 16 3517 3501 4.55813 4.66406 0.0195142 0.0136663
4 16 4680 4664 4.55425 4.54297 0.0131425 0.0136958
5 16 5725 5709 4.45976 4.08203 0.0143174 0.0139868
Total time run: 5.01332
Total reads made: 5725
Read size: 4096
Object size: 4096
Bandwidth (MB/sec): 4.46077
Average IOPS: 1141
Stddev IOPS: 65.113
Max IOPS: 1199
Min IOPS: 1045
Average Latency(s): 0.0139892
Max latency(s): 0.0361518
Min latency(s): 0.00231195
```
During the documentation pass for the Zipper API, a number of cleanups
were found: APIs that should be slightly different, or that were unused
entirely. This is a rollup commit of all those cleanups.
- move get_multipart_upload() to Bucket
- remove unused defer_gc
- move create_bucket() into User
- rename get_bucket_info() to load_bucket() to match load_user()
- Remove read_bucket_stats()
The codepaths using read_bucket_stats() used CLS data types, and the
function is confusingly named. Load the ent in load_bucket(), and use
an alternative data structure to get size stats for the bucket.
- rename get_bucket_stats to read_stats
- Remove remove_metadata() from API
- remove copy_obj_data() from API
- rename get_obj_layout to dump_obj_layout
- use SAL range_to_ofs
Signed-off-by: Daniel Gryniewicz <dang@redhat.com>
Venky Shankar [Tue, 10 Aug 2021 07:04:51 +0000 (03:04 -0400)]
tasks/cephfs_mirror: optionally run in foreground
cephfs mirror damon thrasher needs to send SIGTERM to mirror
daemons. The mirror daemon needs to run in foreground for
it to receive signal via `daemon.signal`.
Sage Weil [Mon, 8 Nov 2021 19:43:25 +0000 (14:43 -0500)]
Merge PR #43827 into master
* refs/pull/43827/head:
qa/suites/orch/cephadm: add repave-all test case
mgr/cephadm/services/osd: less noisy
mgr/cephadm/services/osd: do not log ok-to-stop/safe-to-destroy failures
mgr/orchestrator: clean up 'orch osd rm status'
Xuehan Xu [Sun, 7 Nov 2021 07:47:02 +0000 (15:47 +0800)]
crimson/os/seastore/segment_cleaner: initialize segments' avail_bytes with segments' sizes
Currently, we initialize segments' avail_bytes with "segment_size * num_segments". Both segment_size
and num_segments are 32 bits long, multiplying them would lead to overflow.
Paul Cuzner [Wed, 3 Nov 2021 02:24:20 +0000 (15:24 +1300)]
mgr/prometheus: Update rule format and enhance SNMP support
Rules now adhere to the format defined by Prometheus.io.
This changes alert naming and each alert now includes a
a summary description to provide a quick one-liner.
In addition to reformatting some missing alerts for MDS and
cephadm have been added, and corresponding tests added.
The MIB has also been refactored, so it now passes standard
lint tests and a README included for devs to understand the
OID schema.
Fixes: https://tracker.ceph.com/issues/53111 Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
Laura Flores [Thu, 4 Nov 2021 17:55:51 +0000 (17:55 +0000)]
mgr/telemetry: modify stats_per_pool
There is a much easier way to collect stats_per_pool than the current implementation. Fetching 'pg_dump' from the mgr module already provides a field called "pool_stats" that is the same as aggregated pg stats, which was the implementation up until this commit.
All in all, this solution should provide the information we want, with a much cleaner implementation.
Signed-off-by: Laura Flores <lflores@redhat.com>
Backport Message: In the case that this commit is backported, it is important to note that the commits in PR #42569 should be backported first, as the implementation of "get_stat_sum_per_pool()" in #42569 precedes the removal of it here.