Adam King [Mon, 21 Aug 2023 17:48:56 +0000 (13:48 -0400)]
cephadm: make custom_configs work for tcmu-runner container
This is intended to be a temporary workaround to make
custom config files be able to be mounted into
the tcmu-runner container. The hope is to refactor
cephadm's iscsi handling for squid, but a patch
like this could be useful for iscsi in older
releases where currently custom config files
are unusable for the tcmu-runner container
What this patch actually does is have us write the
custom config files to a dir for the tcmu-runner
container so that the rest of the logic works without
change. I thought this would be easier to remove later
than a patch that integrates more with the container
mounts or general deployment
Adam King [Tue, 13 Jun 2023 23:54:30 +0000 (19:54 -0400)]
cephadm: run tcmu-runner through script to do restart on failure
Currently, cephadm runs tcmu-runner as a background
process inside the unit file deployed for iscsi
(rbd-target-api is the primary process). This means
if tcmu-runner crashes for whatever reason, systemd
will not attempt to restart it. This commits sets
up a script to serve as the container entrypoint
for the tcmu-runner container that will run
tcmu-runner and also restart it on failure
(unless there are too many failures in a short
period, at which point it gives up).
The hope is to eventually drop use of this script
for a better solution in squid onward, but this
should be helpful on older releases (quincy and
pacific at least) where we won't be able to
bring that better solution
Adam King [Fri, 2 Jun 2023 00:06:35 +0000 (20:06 -0400)]
cephadm: add tcmu-runner to logrotate config
This process could be used to set up the tcmu-runner
to log to a file much like other ceph daemons
- create /etc/tcmu directory
- create /etc/tcmu/tcmu.conf directory with default options
- change dir to /var/log
- change log level to 4
- add -v /etc/tcmu:/etc/tcmu to tcmu-runner container podman line in unit.run
In order to support this (mostly for debugging) we should
add tcmu-runner to the logrotate config
Nizamudeen A [Wed, 27 Sep 2023 11:27:32 +0000 (16:57 +0530)]
mgr/dashboard: allow tls 1.2 with a config option
Provide the option to allow tls1.2
`ceph dashboard set-enable-unsafe-tls-v1-2 True` followed with a mgr
restart will enable tls 1.2.
With tls1.2 enabled
```
╰─$ nmap -sV --script ssl-enum-ciphers -p 11000 127.0.0.1
Starting Nmap 7.93 ( https://nmap.org ) at 2023-09-27 16:56 IST
Nmap scan report for localhost (127.0.0.1)
Host is up (0.00018s latency).
PORT STATE SERVICE VERSION
11000/tcp open ssl/http CherryPy wsgiserver
|_http-server-header: Ceph-Dashboard
| ssl-enum-ciphers:
| TLSv1.2:
| ciphers:
| TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (ecdh_x25519) - A
| TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 (ecdh_x25519) - A
| TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (ecdh_x25519) - A
| TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 (ecdh_x25519) - A
| TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA (ecdh_x25519) - A
| TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA (ecdh_x25519) - A
| TLS_RSA_WITH_AES_256_GCM_SHA384 (rsa 2048) - A
| TLS_RSA_WITH_AES_256_CCM (rsa 2048) - A
| TLS_RSA_WITH_AES_128_GCM_SHA256 (rsa 2048) - A
| TLS_RSA_WITH_AES_128_CCM (rsa 2048) - A
| TLS_RSA_WITH_AES_256_CBC_SHA256 (rsa 2048) - A
| TLS_RSA_WITH_AES_128_CBC_SHA256 (rsa 2048) - A
| TLS_RSA_WITH_AES_256_CBC_SHA (rsa 2048) - A
| TLS_RSA_WITH_AES_128_CBC_SHA (rsa 2048) - A
| compressors:
| NULL
| cipher preference: server
| TLSv1.3:
| ciphers:
| TLS_AKE_WITH_AES_256_GCM_SHA384 (ecdh_x25519) - A
| TLS_AKE_WITH_CHACHA20_POLY1305_SHA256 (ecdh_x25519) - A
| TLS_AKE_WITH_AES_128_GCM_SHA256 (ecdh_x25519) - A
| TLS_AKE_WITH_AES_128_CCM_SHA256 (ecdh_x25519) - A
| cipher preference: server
|_ least strength: A
Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Nmap done: 1 IP address (1 host up) scanned in 16.55 seconds
```
Without tls1.2 enabled (which defaults to tls 1.3)
```
╰─$ nmap -sV --script ssl-enum-ciphers -p 11000 127.0.0.1
Starting Nmap 7.93 ( https://nmap.org ) at 2023-09-27 16:54 IST
Nmap scan report for localhost (127.0.0.1)
Host is up (0.000075s latency).
PORT STATE SERVICE VERSION
11000/tcp open ssl/http CherryPy wsgiserver
| ssl-enum-ciphers:
| TLSv1.3:
| ciphers:
| TLS_AKE_WITH_AES_256_GCM_SHA384 (ecdh_x25519) - A
| TLS_AKE_WITH_CHACHA20_POLY1305_SHA256 (ecdh_x25519) - A
| TLS_AKE_WITH_AES_128_GCM_SHA256 (ecdh_x25519) - A
| TLS_AKE_WITH_AES_128_CCM_SHA256 (ecdh_x25519) - A
| cipher preference: server
|_ least strength: A
|_http-server-header: Ceph-Dashboard
```
Tobias Urdin [Mon, 7 Aug 2023 20:34:43 +0000 (20:34 +0000)]
rgw/auth: handle HTTP OPTIONS with v4 auth
This adds code to properly verify the signature
for HTTP OPTIONS calls that is preflight CORS
requests passing the expected method in the
access-control-request-method header.
Rishabh Dave [Mon, 11 Sep 2023 09:55:46 +0000 (15:25 +0530)]
doc/cephfs: write cephfs commands fully in docs
We write CephFS commands incompletely in docs. For example, "ceph tell
mds.a help" is simply written as "tell mds.a help". This might confuse
the reader and it won't harm to write the command in full.
Fixes: https://tracker.ceph.com/issues/62791 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit e63b573d3edc272d83ee1b5eb3dace037f762d87)
* refs/pull/51045/head:
qa: Add test for per-module finisher thread
qa: allow check_counter to look at nested keys
qa: allow specifying min for check-counter
mgr: Add one finisher thread per module
qa: add "failover / failback loop" test for rbd-mirror
For snapshot-based mirroring, check that demote (or other mirror
snapshots) don't pile up. Nothing in particular to assert on for
journal-based mirroring but the test is still useful.
Ilya Dryomov [Sat, 26 Aug 2023 11:04:52 +0000 (13:04 +0200)]
librbd: make CreatePrimaryRequest remove any unlinked mirror snapshots
After commit ac552c9b4d65 ("librbd: localize snap_remove op for mirror
snapshots"), rbd-mirror daemon no longer removes mirror snapshots when
it's done syncing them -- instead it only unlinks from them. However,
CreatePrimaryRequest state machine was not adjusted to compensate and
hence two cases were missed:
- primary demotion snapshot (rbd-mirror daemon unlinks from primary
demotion snapshots just like it does from regular primary snapshots);
this comes up when an image is demoted but then promoted on the same
cluster
- non-primary demotion snapshot (unlike regular non-primary snapshots,
non-primary demotion snapshots store peer uuids and rbd-mirror daemon
does unlinking just like in the case of primary snapshots); this
comes up when an image is demoted and promoted on the other cluster
Related is the case of orphan snapshots. Since they are dummy to begin
with, CreatePrimaryRequest would now clean up the orphan snapshot after
the creation of the force promote snapshot.
Fixes: https://tracker.ceph.com/issues/61707 Co-authored-by: Christopher Hoffman <choffman@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 9c05d3d81f4b06af2cfd47376e9ad86369bdf8cf)
Conflicts:
src/librbd/mirror/snapshot/CreatePrimaryRequest.cc [ commit 3a93b40721a1 ("librbd: s/boost::variant/std::variant/") not
in pacific ]
Ilya Dryomov [Tue, 22 Aug 2023 15:27:50 +0000 (17:27 +0200)]
librbd: don't attempt to remove image state on orphan snapshots
Despite being mirror snapshots, orphan snapshots don't have image
state: see CreateNonPrimaryRequest::write_image_state() for a similar
is_orphan() check. Attempting to remove image state generates bogus
"failed to read image state object" and "failed to remove image state"
errors.
Patrick Donnelly [Mon, 17 Jul 2023 20:10:59 +0000 (16:10 -0400)]
mds: drop locks and retry when lock set changes
An optimization was added to avoid an unnecessary gather on the inode
filelock when the client can safely get the file size without also
getting issued the requested caps. However, if a retry of getattr
is necessary, this conditional inclusion of the inode filelock
can cause lock-order violations resulting in deadlock.
So, if we've already acquired some of the inode's locks then we must
drop locks and retry.
Fixes: https://tracker.ceph.com/issues/62052 Fixes: c822b3e2573578c288d170d1031672b74e02dced Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit b5719ac32fe6431131842d62ffaf7101c03e9bac)
Ilya Dryomov [Sun, 27 Aug 2023 17:09:15 +0000 (19:09 +0200)]
qa/suites/upgrade/pacific-p2p: skip TestClsRbd.mirror_snapshot test
The behavior of the class method changed in reef; the change was
backported to pacific and quincy. An older pacific binary used against
newer pacific OSDs produces an expected failure:
[ RUN ] TestClsRbd.mirror_snapshot
.../ceph-16.2.7/src/test/cls_rbd/test_cls_rbd.cc:2278: Failure
Expected equality of these values:
-85
mirror_image_snapshot_unlink_peer(&ioctx, oid, 1, "peer2")
Which is: 0
[ FAILED ] TestClsRbd.mirror_snapshot (30 ms)
TestClsRbd.snapshots_namespaces test was removed in commit 4ad9d565a15c
("librbd: simplified retrieving snapshots from image header") many years
ago.
It's a no-no to acquire locks in these "fast" messenger methods. This
can lead to messenger slow downs in the best case as it's blocking reads
on the wire. In the worse case, the messenger may deadlock with other
threads, preventing any further message reads off the wire.
It's not obvious this method is "fast" so I've added a comment regarding
this.
Fixes: https://tracker.ceph.com/issues/61874 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 69980823e62f67d502c4045e15c41c5c44cd5127)
python-common: drive_selection: fix KeyError when osdspec_affinity is not set
When osdspec_affinity is not set, the drive selection code will fail.
This can happen when a device has multiple LVs where some of are used
by Ceph and at least one LV isn't used by Ceph.
Ilya Dryomov [Mon, 14 Aug 2023 11:16:59 +0000 (13:16 +0200)]
qa/suites/upgrade/octopus-x: skip TestClsRbd.mirror_snapshot test
The behavior of the class method changed in reef; the change was
backported to pacific and quincy. An octopus test binary used against
pacific OSDs produces an expected failure:
[ RUN ] TestClsRbd.mirror_snapshot
.../ceph-15.2.17/src/test/cls_rbd/test_cls_rbd.cc:2279: Failure
Expected equality of these values:
-85
mirror_image_snapshot_unlink_peer(&ioctx, oid, 1, "peer2")
Which is: 0
[ FAILED ] TestClsRbd.mirror_snapshot (6 ms)
liu shi [Fri, 14 May 2021 07:51:01 +0000 (03:51 -0400)]
cpu_profiler: fix asok command crash
fixes: https://tracker.ceph.com/issues/50814 Signed-off-by: liu shi <liu.shi@navercorp.com>
(cherry picked from commit be7303aafe34ae470d2fd74440c3a8d51fcfa3ff)
Patrick Donnelly [Fri, 21 Jul 2023 15:56:49 +0000 (11:56 -0400)]
mds: adjust cap acquisition throttles
For production workloads, these defaults rarely help. Adjust
accordingly. For a steady state "find" workload, these new throttles
will prevent acquiring more than ~2300 caps/second which is quite
manageable with typical recall rates.
-ln(0.5) / 30 * 100k = 2310
Fixes: https://tracker.ceph.com/issues/62114 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit f290ef9d0d2d09fb978d56c46be704c6efd45c43)