Destroying AuthClientHandler and not resetting global_id is another
way to get MonClient to send CEPHX_GET_AUTH_SESSION_KEY requests with
CephXAuthenticate::old_ticket not populated. This is particularly
pertinent to get_monmap_and_config() which shuts down the bootstrap
MonClient between retry attempts.
Ilya Dryomov [Mon, 8 Mar 2021 14:37:02 +0000 (15:37 +0100)]
mon/MonClient: preserve auth state on reconnects
Commit a2eb6ae3fb57 ("mon/monclient: hunt for multiple monitor in
parallel") introduced a regression where auth state (global_id and
AuthClientHandler) was no longer preserved on reconnects. The ensuing
breakage was quickly noticed and prompted a follow-on fix 8bb6193c8f53
("mon/MonClient: persist global_id across re-connecting").
However, as evident from the subject, the follow-on fix only took
care of the global_id part. AuthClientHandler is still destroyed
and all cephx tickets are discarded. A new from-scratch instance
is created for each MonConnection and CEPHX_GET_AUTH_SESSION_KEY
requests end up with CephXAuthenticate::old_ticket not populated.
The bug is in MonClient, so both msgr1 and msgr2 are affected.
This should have resulted in a similar sort of breakage but didn't
because of a much larger bug. The monitor should have denied the
attempt to reclaim global_id with no valid ticket proving previous
possession of that global_id presented. Alas, it appears that this
aspect of the cephx protocol has never been enforced. This is dealt
with in the next patch.
To fix the issue at hand, clone AuthClientHandler into each
MonConnection so that each respective CEPHX_GET_AUTH_SESSION_KEY
request gets a copy of the current auth ticket.
Ilya Dryomov [Sat, 6 Mar 2021 10:15:40 +0000 (11:15 +0100)]
mon/MonClient: claim active_con's auth explicitly
Eliminate confusion by moving auth from active_con into MonClient
instead of swapping them.
The existing MonClient::auth can be destroyed right away -- I don't
see why active_con would need it or a reason to delay its destruction
(which is what stashing in active_con effectively does).
mon/MonClient: resurrect "waiting for monmap|config" timeouts
This fixes a regression introduced in commit 85157d5aae3d ("mon:
s/Mutex/ceph::mutex/"). Waiting for monmap and config indefinitely
is not just bad UX, it actually masks other more serious bugs.
Sage Weil [Mon, 29 Mar 2021 13:42:03 +0000 (08:42 -0500)]
mgr/cephadm/upgrade: ignore deployed_by until mgr is upgraded
Until we upgrade the mgr itself, we will never be able to make our
deployed daemons have a deployed_by == target_digests. Ignore those
daemons until the mgr is the right version.
Sage Weil [Sun, 28 Mar 2021 18:07:27 +0000 (13:07 -0500)]
Merge PR #40437 into pacific
* refs/pull/40437/head:
mgr/cephadm: make upgrade progress bar mention target version, not repo digest
doc/cephadm: fix rgw realm and zone flags
mgr/volumes: do not overwrite existant mds specs
mgr/cephadm: no-overwite flag for apply command
mgr/orchestrator: remove image name field from 'orch ps' and 'orch ls'
cephadm: fix parsing of keepalived version (drop leading 'v')
cephadm: keepalived needs --cap-add=NET_RAW
cephadm: fix --cap-add=NET_ADMIN
cephadm: fix quoting for keepalived env var
mgr/cephadm: ha-rgw: use correct port
cephadm: validate fsid during cephadm shell command
Reviewed-by: Juan Miguel Olmo <jolmomar@redhat.com>
Sage Weil [Thu, 25 Mar 2021 20:05:02 +0000 (15:05 -0500)]
mgr/cephadm: make upgrade progress bar mention target version, not repo digest
The repo digest is super long and meaningless for a human user. Instead,
use the target version (as soon as we know what it is--until then, use
the target image name).
Nathan Cutler [Fri, 26 Mar 2021 10:03:34 +0000 (11:03 +0100)]
rpm: drop extraneous explicit sqlite-libs runtime dependency
Commit 75980798f19b8c11efd75ba4aae3e491d4c99f98 introduced a new package,
libcephsqlite, with a hard RPM dependency on a package "sqlite-libs" which
does not exist in openSUSE.
Since the runtime library dependencies of libcephsqlite are handled by RPM
transparently, this line is not needed.
Adam King [Thu, 18 Mar 2021 17:20:46 +0000 (13:20 -0400)]
mgr/orchestrator: remove image name field from 'orch ps' and 'orch ls'
Now that we're typically using the image digests the name isn't as helpful. We also
end up in scenarios where some images use tags for their name and others use the
digest so the image name comes out as "mix" in orch ls despite it being the same image.
Ilya Dryomov [Wed, 24 Mar 2021 15:23:44 +0000 (16:23 +0100)]
auth: require CEPHX_V2 by default
It's been almost three years and support is present in all relevant
clients.
From the security perspective, roughly the same could be achieved
with "ceph osd set-require-min-compat-client nautilus", but this is
more user friendly as the client gets ENOTSUP instead of spinning on
"feature set mismatch" faults.
Sage Weil [Fri, 26 Mar 2021 12:17:42 +0000 (07:17 -0500)]
Merge PR #40355 into pacific
* refs/pull/40355/head:
mgr/cephadm: Fix dashboard gateway configuration when using IPV6
qa/workunits/cephadm/test_cephadm: specify image separately
mgr/cephadm: retry after JSONDecodeError in wait_for_mgr_restart()
cephadm: prevent podman from breaking socket.getfqdn()
qa/tasks/cephadm: use 'orch apply mon' to deploy mons
qa/suites/rados/cephadm/upgrade: add centos upgrade on latest octopus
mgr/cephadm/upgrade: do not crash if error races with user cancellation
doc/cephfs/nfs: Add note about cephadm NFS-Ganesha daemon port
cephadm: only bootstrap using image that matches cephadm version
mgr/cephadm: redeploy daemons deployed using old image during upgrade
mgr/cephadm: add container digests of mgr that deployed daemon to unit.meta
Reviewed-by: Juan Miguel Olmo <jolmomar@redhat.com>
Dan van der Ster [Tue, 23 Mar 2021 10:28:37 +0000 (11:28 +0100)]
test_ipaddr: check that we correctly skip loopback
We should skip devices named 'lo' or of the form 'lo:0' regardless
of their IP address.
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch> Related-to: https://tracker.ceph.com/issues/49938
(cherry picked from commit 780125d1ed93cd7b17172752b3e76186a524103b)
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch> Fixes: https://tracker.ceph.com/issues/49938
(cherry picked from commit 6147c0917157efd2d35610e759685656a4989abb)
Brad Hubbard [Thu, 25 Mar 2021 23:57:14 +0000 (09:57 +1000)]
Revert "mgr/dashboard:test prometheus rules through promtool"
Reverts: https://github.com/ceph/ceph/pull/39983
This is currently blocking testing on ubuntu on the eve of a pacific
release. The problems associated with this PR have been resolved
upstream but have not been backported yet and are non-trivial.
Adam Kupczyk [Mon, 22 Mar 2021 10:20:11 +0000 (11:20 +0100)]
os/bluestore: Make Onode::put/get resiliant to split_cache
In
OnodeCacheShard* ocs = c->get_onode_cache();
std::lock_guard l(ocs->lock);
while waiting for lock, split_cache might have changed OnodeCacheShard.
This will result in adding Onode to improper OnodeCacheShard.
Such action is obviously bad, as we will operate in future (at least once) on
different OnodeCacheShard then we got lock for. Particulary sensitive to this
are _trim and split_cache functions, as they iterate over elements.
Sage Weil [Tue, 23 Mar 2021 16:56:59 +0000 (11:56 -0500)]
os/bluestore: separate omap per-pool vs per-pg alerts
Currently the health alert raised does not match the docs, and the docs
do not describe what the health alert indicates.
Octopus added per-pool omap storage. This improves space accounting
and reporting.
Pacific added per-pg omap storage (object hash in key). This speeds up
PG removal.
Separate everthing out into two distinct alerts raised from bluestore
and surfaced as health alerts, with corresponding config options to
disable, and update the docs accordingly.
Also update the fsck options for warn vs error, and raise separate
errors for the per-pg and per-pool cases.
mgr/cephadm: Fix dashboard gateway configuration when using IPV6
Fixes: https://tracker.ceph.com/issues/49957 Signed-off-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>
(cherry picked from commit 1b18f4f9cb28708b544c62b3d07f9e1b4c701e41)
Kefu Chai [Thu, 25 Mar 2021 09:08:48 +0000 (17:08 +0800)]
run-make-check.sh: let ctest generate XML output
to enable XUnit plugin of jenkins to consume the ctest output and
publish it in the dashboard, we need to
* let ctest generate XML output instead of plain text output
* do not fail the test if any test case fails. this allows the publisher
to do its job by checking the XML output.
* prevent ctest from compressing the output. see
https://issues.jenkins.io/browse/JENKINS-21737
Dan van der Ster [Thu, 12 Nov 2020 16:14:37 +0000 (17:14 +0100)]
common/options: bluefs_buffered_io=true by default
Enable bluefs_buffered_io again because it makes a huge user-visible
improvement in metadata intensive scenarios, such as but not limited to
PG deletion.
In our environment, deleting PGs from 4 hybrid OSDs (sharing one SATA SSD block.db) saturates
the block.db at 350MB/s reads and causes slow reqs and flapping on the OSDs.
Those OSDs have 3GB osd_target_memory.
Enabling bluefs_buffered_io drops the SSD IO down to <1MBps and the OSDs
are performant again. (The underlying PG deletion inefficiency is being
solved separately, but the page cache is so much more effective than
the bluestore cache in this scenario).
Lastly, remove the comment about swap. We should separately advise
operators to disable swap on OSD machines, as it is much better in
our experience to OOM and restart than to chug along swapping.
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch> Related-to: https://tracker.ceph.com/issues/45765 Related-to: https://tracker.ceph.com/issues/47044
(cherry picked from commit 5ec8e8e63d409860c35e24a192090ac2b70af8f6)
Patrick Donnelly [Wed, 24 Mar 2021 23:11:03 +0000 (16:11 -0700)]
Merge PR #40317 into pacific
* refs/pull/40317/head:
cephsqlite: add julian day offset in milliseconds
doc: add libcephsqlite
ceph.spec,debian: package libcephsqlite
test/libcephsqlite,qa: add tests for libcephsqlite
libcephsqlite: rework architecture and backend
SimpleRADOSStriper: wait for finished aios after write
SimpleRADOSStriper: add new minimal async striper
mon: define simple-rados-client-with-blocklist profile
librados: define must renew lock flag
common: add timeval conversion for durations
Revert "libradosstriper: add function to read into char*"
test_libcephsqlite: test random inserts
cephsqlite: fix compiler errors
cmake: improve build inst for cephsqlite
libcephsqlite: sqlite interface to RADOS
libradosstriper: add function to read into char*
John Fulton [Wed, 17 Mar 2021 22:03:46 +0000 (18:03 -0400)]
mgr/cephadm: retry after JSONDecodeError in wait_for_mgr_restart()
'ceph mgr dump' does not always return valid JSON so cephadm
will throw an exception sometimes when applying a spec as per
the issue this PR closes. Add a try/except to catch a possible
JSONDecodeError and retry after sleeping.
Fixes: https://tracker.ceph.com/issues/49870 Signed-off-by: John Fulton <fulton@redhat.com>
(cherry picked from commit 0aba5704d9eb1a2df6dd437785fc1f8c558c0990)
Sage Weil [Thu, 18 Mar 2021 18:26:48 +0000 (14:26 -0400)]
cephadm: prevent podman from breaking socket.getfqdn()
socket.getfqdn() will return the reverse lookup for 127.0.1.1, which is
the last item listed for that IP in /etc/hosts. Podman, by default, will
append the container name (ceph-$fsid-$name) to that line, which is not
a valid hostname, and not what we want the dashbaord to use for the URI
it advertises in the service map.
Pass --no-hosts to podman to disable this.
Docker does not appear to modify /etc/hosts by default--or, more
importantly, does not add the container name there.
Jeff Layton [Wed, 17 Mar 2021 15:52:05 +0000 (11:52 -0400)]
test: reduce number of threads to 32 in LibCephFS.ShutdownRace
We're still occasionally hitting file descriptor limits when running
this test. Reduce the thread count to 32 for now, since it was possible
to reproduce the original problem with 10 or so threads.
Fixes: https://tracker.ceph.com/issues/49559 Signed-off-by: Jeff Layton <jlayton@redhat.com>
(cherry picked from commit 5aec283a1c33b6c21f877a27f57a1bc03b4894a0)
Jeff Layton [Tue, 16 Mar 2021 16:22:56 +0000 (12:22 -0400)]
mds: fix removexattr logic when there aren't any
The MDS currently returns success on a removexattr if the xattr map is
completely empty. Fix the subtle logic bug and have it return -ENODATA
in that case.
Fixes: https://tracker.ceph.com/issues/49833 Signed-off-by: Jeff Layton <jlayton@redhat.com>
(cherry picked from commit 85e73c7c7509cefbc50902436aca07a9a333eb23)
Sage Weil [Wed, 24 Mar 2021 15:34:03 +0000 (10:34 -0500)]
Merge PR #40094 into pacific
* refs/pull/40094/head:
rgw/kms/vault - PendingReleaseNotes pointer
rgw/kms/vault - s3tests for both old and new test logic.
rgw/kms/vault - rework unit test logic for new transit logic.
rgw/kms/vault - 0 terminate before rapidjson
rgw/kms/vault - document configuration for new transit logic
rgw/kms/vault - new transit logic - fix compat logic
rgw/kms/vault - define attribute for new transit logic
rgw/kms/vault - "compat" option
rgw/kms/vault - encryption context - first part
rgw/kms/vault - define attribute to store encryption context
rgw/kms/vault - share get/set attr between rgw_crypt.cc and rgw_kms.cc
rgw/kms/vault - relax configuration parsing for rgw_crypt_vault_secret_engine
rgw/kms/vault - need libicu to make canonical json for encryption contexts.
rgw/kms/kmip - document configuration for a new feature: kmip kms
rgw/kms/kmip - rgw / kmip test integration.
rgw/kms/kmip - rgw / kmip test integration.
rgw/kms/kmip - rgw / kmip test integration.
rgw/kms/kmip - correct documentation.
rgw/kms/kmip - pykmip.py needs to make keys too.
rgw/kms/kmip - pykmip.py should actually run pykmip.
rgw/kms/kmip - python3 changes for testing.
rgw/kms/kmip - string handling cleanup.
teuthology/rgw: pykmip task
kmip: first pass at implementation logic.
kmip: configuration options.
Including cmake build logic inside of libkmip.
cmake glue to build libkmip.
Added libkmip as a submodule.