pacific: mds: "cluster [WRN] Scrub error on inode 0x1000000039d (/client.0/tmp/blogbench-1.0/src/blogtest_in) see mds.a log and `damage ls` output for details"
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Mon, 22 Mar 2021 16:17:43 +0000 (09:17 -0700)]
mgr/pybind/volumes: avoid acquiring lock for thread count updates
Perform thread count updates in a dedicated tick thread. This avoids the
mgr Finisher thread from getting potentially hung via a mutex deadlock
in the cloner thread management.
Fixes: https://tracker.ceph.com/issues/49605 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit b27ddfaed4a3c66bac2343c8315a1fe542edb63e)
Alfonso Martínez [Thu, 22 Apr 2021 12:10:25 +0000 (14:10 +0200)]
mgr/dashboard: update documentation about creating NFS export.
- You learn first that orchestrator-managed clusters are detected automatically (therefore the documentation that follows is exclusively for user-defined clusters).
- Include nfs-ganesha in the security scope list.
Fixes: https://tracker.ceph.com/issues/50440 Signed-off-by: Alfonso Martínez <almartin@redhat.com>
(cherry picked from commit 335a91d20d2e0ec68ddbda6d94f9e90dea16114f)
Sébastien Han [Thu, 22 Apr 2021 10:52:09 +0000 (12:52 +0200)]
ceph-volume: fix raw listing when finding OSDs from different clusters
When listing OSDs on host with 2 OSDs with the same ID, the output gets
overwritten with the last listed device. So a single OSD will show up.
See the ceph-volume.log which correctly parsed both disks:
mgr/dashboard: filesystem pool size should use stored stat
Fixes: https://tracker.ceph.com/issues/50195 Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Replaces 'bytes_used' with 'stored' stat to see the correct results
of CephFS pool stats.
Aashish Sharma [Thu, 25 Mar 2021 05:55:37 +0000 (11:25 +0530)]
mgr/dashboard:Simplify some complex calculations in test_alerts.yml
run-promtool-unittests is failing with difference in floating point values in some complex calculations. This PR intends to simplify those calculations and fix this issue.
Kefu Chai [Fri, 19 Mar 2021 02:32:16 +0000 (10:32 +0800)]
test: run promtool test without docker on ubuntu/focal
before this change, we use docker for running promtools offered by
a docker image, but this is not efficient, and quite a few developers
do not want to use docker for running "make check". this change was
introduced by #39246, the reason was that, in Ceph's CI process, we
are using Ubuntu/Bionic for running "make check" jobs, but prometheus
packaged by Bionic does not offer the "test rules" command. so, to
address problem, we are using "dnanexus/promtool:2.9.2" docker image
for verifying monitoring/prometheus/alerts/test_alerts.yml.
after this change, we use prometheus packaged by debian derivatives
instead of pulling a docker image.
* debian/control: add prometheus as a "make check" dependency
* install-deps.sh: partially revert 53a5816deda0874a3a37e131e9bc22d88bb2a588, as we don't need to
pull docker or start docker service for using promtool anymore.
* cmake: check if promtool is capable of running "test rules"
command, bail out if it is not.
to silence the health warning of "mons are allowing insecure global_id
reclaim", which prevents the cluster from being active+clean. couple
tests are expecting a warning free cluster before they starts.
as this option is enabled by default for appeasing the old clients, but when it
comes to most of upstream testing, we can just disable it.
Lucian Petrut [Wed, 10 Mar 2021 07:14:37 +0000 (07:14 +0000)]
cephfs: Update ceph-dokan "--removable" flag
"-m" is the short option for "--removable". We tried to keep the
syntax as close as possible to the old ceph-dokan project but this
flag is already used.
We'll drop the short option, keeping the long one ("--removable").
Lucian Petrut [Mon, 8 Mar 2021 10:44:37 +0000 (10:44 +0000)]
cephfs: add ceph-dokan unmap command
At the moment, Windows CephFS mounts can only be removed by
terminating the daemon (e.g. sending CTRL-C) or through the
Windows mount manager if the "-o -m" parameters were passed
when the mapping was created.
This change adds the "ceph-dokan unmap" command, which takes
the mountpoint as input.
Conflicts:
qa/tasks/ceph.conf.template [ commit 94df76244798
("qa/tasks/ceph.conf: shorten cephx TTL for testing") was
cherry-picked to 16.2.0 separately and so exists both in
16.2.0 and pacific-saved ]
qa/tasks/cephadm.conf [ ditto ]
auth/cephx: make KeyServer::build_session_auth_info() less confusing
The second KeyServer::build_session_auth_info() overload is used only
by the monitor, for mon <-> mon authentication. The monitor passes in
service_secret (mon secret) and secret_id (-1). The TTL is irrelevant
because there is no rotation.
However the signature doesn't make it obvious. Clarify that
service_secret and secret_id are input parameters and info is the only
output parameter.
auth/cephx: cap ticket validity by expiration of "next" key
If auth_mon_ticket_ttl is increased by several times as done in
commit 522a52e6c258 ("auth/cephx: rotate auth tickets less often"),
active clients eventually get stuck because the monitor sends out an
auth ticket with a bogus validity. The ticket is secured with the
"current" secret that is scheduled to expire according to the old TTL,
but the validity of the ticket is set to the new TTL. As a result,
the client simply doesn't attempt to renew, letting the secrets rotate
potentially more than once. When that happens, the client first hits
auth authorizer errors as it tries to renew service tickets and when
it finally gets to renewing the auth ticket, it hits the insecure
global_id reclaim wall.
Cap TTL by expiration of "next" key -- the "current" key may be
milliseconds away from expiration and still be used, legitimately.
Do it in KeyServerData alongside key rotation code and propagate the
capped TTL to the upper layer.
Sage Weil [Fri, 9 Apr 2021 20:26:00 +0000 (16:26 -0400)]
mgr/cephadm: rewrite/simplify describe_service
The prior implementation first tried to fabricate services based on the
running daemons, and then filled in defined services on top. This led
to duplication and a range of small errors.
Instead, flip this around: start with the services that are defined,
and only fill in 'unmanaged' services where we need to.
Drop the osd kludges and instead rely on DaemonDescription.service_id to
return the right thing.
Sage Weil [Fri, 9 Apr 2021 19:35:17 +0000 (15:35 -0400)]
mgr/orchestrator: remove IMAGE ID from 'orch ls'
This is not very useful at this level:
- we see it from 'orch ps'
- it can be a mix of ids during upgrade
- some services may have multiple images at steady state (e.g., ingress)
Conflicts:
qa/suites/upgrade/octopus-x/rgw-multisite/overrides.yaml [
commit b6773dd3f197 ("qa/rgw: add octopus-x upgrade suite for
multisite") not in pacific ]
Sage Weil [Fri, 26 Mar 2021 16:02:50 +0000 (12:02 -0400)]
cephadm: set auth_allow_insecure_global_id_reclaim for mon on bootstrap
If this is a fresh pacific cluster, let's assume that there won't be
legacy clients connecting. (And if there are, let's put the burden on
the user to enable them to do so insecurely.)
This is in contrast to upgrades, where our focus is on not breaking
anything.
- AUTH_INSECURE_GLOBAL_ID_RENEWAL_ALLOWED if we are allowing clients to reclaim
global_ids in an insecure manner (for backwards compatibility until
clients are upgraded)
- AUTH_INSECURE_GLBOAL_ID_RENEWAL if there are currently clients connected that
do not know how to securely renew their global_id, as exposed by
auth_expose_insecure_global_id_reclaim=true. The client auth names and IPs
are listed the alert details (up to a limit, at least).
The docs recommend operators mute these alerts instead of silencing, but
we still include option that allow the alerts to be disabled entirely.
Ilya Dryomov [Tue, 2 Mar 2021 14:09:26 +0000 (15:09 +0100)]
auth/cephx: ignore CEPH_ENTITY_TYPE_AUTH in requested keys
When handling CEPHX_GET_AUTH_SESSION_KEY requests from nautilus+
clients, ignore CEPH_ENTITY_TYPE_AUTH in CephXAuthenticate::other_keys.
Similarly, when handling CEPHX_GET_PRINCIPAL_SESSION_KEY requests,
ignore CEPH_ENTITY_TYPE_AUTH in CephXServiceTicketRequest::keys.
These fields are intended for requesting service tickets, the auth
ticket (which is really a ticket granting ticket) must not be shared
this way.
Otherwise we end up sharing an auth ticket that a) isn't encrypted
with the old session key even if needed (should_enc_ticket == true)
and b) has the wrong validity, namely auth_service_ticket_ttl instead
of auth_mon_ticket_ttl. In the CEPHX_GET_AUTH_SESSION_KEY case, this
undue ticket immediately supersedes the actual auth ticket already
encoded in the same reply (the reply frame ends up containing two auth
tickets).
Ilya Dryomov [Mon, 22 Mar 2021 18:16:32 +0000 (19:16 +0100)]
auth/cephx: rotate auth tickets less often
If unauthorized global_id (re)use is disallowed, a client that has
been disconnected from the network long enough for keys to rotate
and its auth ticket to expire (i.e. become invalid/unverifiable)
would not be able to reconnect.
The default TTL is 12 hours, resulting in a 12-24 hour reconnect
window (the previous key is kept around, so the actual window can be
up to double the TTL). The setting has stayed the same since 2009,
but it also hasn't been enforced. Bump it to get a 72 hour reconnect
window to cover for something breaking on Friday and not getting fixed
until Monday.
Ilya Dryomov [Thu, 25 Mar 2021 19:59:13 +0000 (20:59 +0100)]
mon: fail fast when unauthorized global_id (re)use is disallowed
When unauthorized global_id (re)use is disallowed, we don't want to
let unpatched clients in because they wouldn't be able to reestablish
their monitor session later, resulting in subtle hangs and disrupted
user workloads.
Denying the initial connect for all legacy (CephXAuthenticate < v3)
clients is not feasible because a large subset of them never stopped
presenting their ticket on reconnects and are therefore compatible with
enforcing mode: most notably all kernel clients but also pre-luminous
userspace clients. They don't need to be patched and excluding them
would significantly hamper the adoption of enforcing mode.
Instead, force clients that we are not sure about to reconnect shortly
after they go through authentication and obtain global_id. This is
done in Monitor::dispatch_op() to capture both msgr1 and msgr2, most
likely instead of dispatching mon_subscribe.
We need to let mon_getmap through for "ceph ping" and "ceph tell" to
work. This does mean that we share the monmap, which lets the client
return from MonClient::authenticate() considering authentication to be
finished and causing the potential reconnect error to not propagate to
the user -- the client would hang waiting for remaining cluster maps.
For msgr1, this is unavoidable because the monmap is sent immediately
after the final MAuthReply. But for msgr2 this is rare: most of the
time we get to their mon_subscribe and cut the connection before they
process the monmap!
Regardless, the user doesn't get a chance to start a workload since
there is no proper higher-level session at that point.
To help with identifying clients that need patching, add global_id and
global_id_status to "sessions" output.
Ilya Dryomov [Sat, 13 Mar 2021 13:53:52 +0000 (14:53 +0100)]
auth/cephx: option to disallow unauthorized global_id (re)use
global_id is a cluster-wide unique id that must remain stable for the
lifetime of the client instance. The cephx protocol has a facility to
allow clients to preserve their global_id across reconnects:
(1) the client should provide its global_id in the initial handshake
message/frame and later include its auth ticket proving previous
possession of that global_id in CEPHX_GET_AUTH_SESSION_KEY request
(2) the monitor should verify that the included auth ticket is valid
and has the same global_id and, if so, allow the reclaim
(3) if the reclaim is allowed, the new auth ticket should be
encrypted with the session key of the included auth ticket to
ensure authenticity of the client performing reclaim. (The
included auth ticket could have been snooped when the monitor
originally shared it with the client or any time the client
provided it back to the monitor as part of requesting service
tickets, but only the genuine client would have its session key
and be able to decrypt.)
Unfortunately, all (1), (2) and (3) have been broken for a while:
- (1) was broken in 2016 by commit a2eb6ae3fb57 ("mon/monclient:
hunt for multiple monitor in parallel") and is addressed in patch
"mon/MonClient: preserve auth state on reconnects"
- it turns out that (2) has never been enforced. When cephx was
being designed and implemented in 2009, two changes to the protocol
raced with each other pulling it in different directions: commits 0669ca21f4f7 ("auth: reuse global_id when requesting tickets")
and fec31964a12b ("auth: when renewing session, encrypt ticket")
added the reclaim mechanism based strictly on auth tickets, while
commit 5eeb711b6b2b ("auth: change server side negotiation a bit")
allowed the client to provide global_id in the initial handshake.
These changes didn't get reconciled and as a result a malicious
client can assign itself any global_id of its choosing by simply
passing something other than 0 in MAuth message or AUTH_REQUEST
frame and not even bother supplying any ticket. This includes
getting a global_id that is being used by another client.
- (3) was broken in 2019 with addition of support for msgr2, where
the new auth ticket ends up being shared unencrypted. However the
root cause is deeper and a malicious client can coerce msgr1 into
the same. This also goes back to 2009 and is addressed in patch
"auth/cephx: ignore CEPH_ENTITY_TYPE_AUTH in requested keys".
Because (2) has never been enforced, no one noticed when (1) got
broken and we began to rely on this flaw for normal operation in
the face of reconnects due to network hiccups or otherwise. As of
today, only pre-luminous userspace clients and kernel clients are
not exercising it on a daily basis.
Bump CephXAuthenticate version and use a dummy v3 to distinguish
between legacy clients that don't (may not) include their auth ticket
and new clients. For new clients, unconditionally disallow claiming
global_id without a corresponding auth ticket. For legacy clients,
introduce a choice between permissive (current behavior, default for
the foreseeable future) and enforcing mode.
If the reclaim is disallowed, return EACCES. While MonClient does
have some provision for global_id changes and we could conceivably
implement enforcement by handing out a fresh global_id instead of
the provided one, those code paths have never been tested and there
are too many ways a sudden global_id change could go wrong.
Ilya Dryomov [Tue, 9 Mar 2021 15:33:55 +0000 (16:33 +0100)]
auth/AuthServiceHandler: keep track of global_id and whether it is new
AuthServiceHandler already has global_id field, but it is unused.
Revive it and let the handler know whether global_id is newly assigned
by the monitor or provided by the client.
Lift the setting of entity_name into AuthServiceHandler.
Destroying AuthClientHandler and not resetting global_id is another
way to get MonClient to send CEPHX_GET_AUTH_SESSION_KEY requests with
CephXAuthenticate::old_ticket not populated. This is particularly
pertinent to get_monmap_and_config() which shuts down the bootstrap
MonClient between retry attempts.
Ilya Dryomov [Mon, 8 Mar 2021 14:37:02 +0000 (15:37 +0100)]
mon/MonClient: preserve auth state on reconnects
Commit a2eb6ae3fb57 ("mon/monclient: hunt for multiple monitor in
parallel") introduced a regression where auth state (global_id and
AuthClientHandler) was no longer preserved on reconnects. The ensuing
breakage was quickly noticed and prompted a follow-on fix 8bb6193c8f53
("mon/MonClient: persist global_id across re-connecting").
However, as evident from the subject, the follow-on fix only took
care of the global_id part. AuthClientHandler is still destroyed
and all cephx tickets are discarded. A new from-scratch instance
is created for each MonConnection and CEPHX_GET_AUTH_SESSION_KEY
requests end up with CephXAuthenticate::old_ticket not populated.
The bug is in MonClient, so both msgr1 and msgr2 are affected.
This should have resulted in a similar sort of breakage but didn't
because of a much larger bug. The monitor should have denied the
attempt to reclaim global_id with no valid ticket proving previous
possession of that global_id presented. Alas, it appears that this
aspect of the cephx protocol has never been enforced. This is dealt
with in the next patch.
To fix the issue at hand, clone AuthClientHandler into each
MonConnection so that each respective CEPHX_GET_AUTH_SESSION_KEY
request gets a copy of the current auth ticket.
Ilya Dryomov [Sat, 6 Mar 2021 10:15:40 +0000 (11:15 +0100)]
mon/MonClient: claim active_con's auth explicitly
Eliminate confusion by moving auth from active_con into MonClient
instead of swapping them.
The existing MonClient::auth can be destroyed right away -- I don't
see why active_con would need it or a reason to delay its destruction
(which is what stashing in active_con effectively does).
mon/MonClient: resurrect "waiting for monmap|config" timeouts
This fixes a regression introduced in commit 85157d5aae3d ("mon:
s/Mutex/ceph::mutex/"). Waiting for monmap and config indefinitely
is not just bad UX, it actually masks other more serious bugs.
acting.size() >= pool.info.min_size is meant to check min_size against
acting set participants, but acting is a vector with placeholders.
actingset is the representation with placeholders removed.
The upshot of this bug is that the activation process will basically
ignore min_size for an ec pool allowing writes in cases where it
shouldn't. PastIntervals::check_new_interval, however, performs
the check correctly, and will therefore discount intervals in which
we really did serve writes as not writeable. This can trigger many
different problem conditions including but not limited to:
- Unfound objects due to accepting a last_update with insufficient
osds
- Lost writes
- Crashes due to peering rules being violated
This bug was originally introduced with recovery below min_size in e5a96fd, and then preserved through refactors in 749a13d and 95bec9.
7cb818a exposed it with with expansion of recovery below min_size
to include ec pools (acting.size() is sufficient for replicated
pools).
Fixes: https://tracker.ceph.com/issues/48613 Fixes: https://tracker.ceph.com/issues/48417 Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit 642a1c165499bcbd4cfdf907af313ac7ffe44ff4)
Sage Weil [Sat, 3 Apr 2021 13:14:00 +0000 (09:14 -0400)]
cephadm: normalize unqualified repo digests to docker.io
A RepoDigests returned by docker|podman image inspect can either include
the docker.io/ prefix or not. For reasons that aren't entirely clear,
this may vary between hosts in a cluster. However, ceph/ceph@sha256:abc...
is the same thing as docker.io/ceph/ceph@sha256:abc..., and should be
treated as such. Otherwise, upgrade can get into a loop where it pulls
the image on a new host, finds the other variant of the repodigests,
sees no overlap, updates target_digests, and restarts. (It will then
find the first variant again on the first host and loop.)
Avoid this by normalizing any docker.io digests by always including the
docker.io/ prefix.
Note that it is technically possible that this assumption is wrong: it
may be that the image that already exists on the local host is from a
different registry in registries.conf's unqualified-search-registries.
However, we don't know which, since this is a search list. In practice,
it should be exceeding rare that an image that *we* are installing using
a fully-qualified image name will end up having an unqualified repodigest
in the local registry. Hopefully!
If we get an unqualified target image, assume it's docker.io. This
ensures that we're passing a fully-qualified target to docker|podman on
the various hosts and don't end up with something different based on the
per-host search path for unqualified image names.
Paul Cuzner [Thu, 1 Apr 2021 04:38:18 +0000 (17:38 +1300)]
cephadm:persist the grafana.db file
This patch persists the grafana.db file, so the user can
create there own dashboard referencing ceph and node
metrics and not lose them on grafana restart! This also
ensures changes to users/passwords within grafana are
persisted.
Fixes: https://tracker.ceph.com/issues/49954 Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
(cherry picked from commit 581250d943915cb3f8e7e4071440ee5cad5e5908)