Sage Weil [Thu, 25 Mar 2021 20:05:02 +0000 (15:05 -0500)]
mgr/cephadm: make upgrade progress bar mention target version, not repo digest
The repo digest is super long and meaningless for a human user. Instead,
use the target version (as soon as we know what it is--until then, use
the target image name).
Adam King [Thu, 18 Mar 2021 17:20:46 +0000 (13:20 -0400)]
mgr/orchestrator: remove image name field from 'orch ps' and 'orch ls'
Now that we're typically using the image digests the name isn't as helpful. We also
end up in scenarios where some images use tags for their name and others use the
digest so the image name comes out as "mix" in orch ls despite it being the same image.
Sage Weil [Fri, 26 Mar 2021 12:17:42 +0000 (07:17 -0500)]
Merge PR #40355 into pacific
* refs/pull/40355/head:
mgr/cephadm: Fix dashboard gateway configuration when using IPV6
qa/workunits/cephadm/test_cephadm: specify image separately
mgr/cephadm: retry after JSONDecodeError in wait_for_mgr_restart()
cephadm: prevent podman from breaking socket.getfqdn()
qa/tasks/cephadm: use 'orch apply mon' to deploy mons
qa/suites/rados/cephadm/upgrade: add centos upgrade on latest octopus
mgr/cephadm/upgrade: do not crash if error races with user cancellation
doc/cephfs/nfs: Add note about cephadm NFS-Ganesha daemon port
cephadm: only bootstrap using image that matches cephadm version
mgr/cephadm: redeploy daemons deployed using old image during upgrade
mgr/cephadm: add container digests of mgr that deployed daemon to unit.meta
Reviewed-by: Juan Miguel Olmo <jolmomar@redhat.com>
Sage Weil [Tue, 23 Mar 2021 16:56:59 +0000 (11:56 -0500)]
os/bluestore: separate omap per-pool vs per-pg alerts
Currently the health alert raised does not match the docs, and the docs
do not describe what the health alert indicates.
Octopus added per-pool omap storage. This improves space accounting
and reporting.
Pacific added per-pg omap storage (object hash in key). This speeds up
PG removal.
Separate everthing out into two distinct alerts raised from bluestore
and surfaced as health alerts, with corresponding config options to
disable, and update the docs accordingly.
Also update the fsck options for warn vs error, and raise separate
errors for the per-pg and per-pool cases.
mgr/cephadm: Fix dashboard gateway configuration when using IPV6
Fixes: https://tracker.ceph.com/issues/49957 Signed-off-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>
(cherry picked from commit 1b18f4f9cb28708b544c62b3d07f9e1b4c701e41)
Kefu Chai [Thu, 25 Mar 2021 09:08:48 +0000 (17:08 +0800)]
run-make-check.sh: let ctest generate XML output
to enable XUnit plugin of jenkins to consume the ctest output and
publish it in the dashboard, we need to
* let ctest generate XML output instead of plain text output
* do not fail the test if any test case fails. this allows the publisher
to do its job by checking the XML output.
* prevent ctest from compressing the output. see
https://issues.jenkins.io/browse/JENKINS-21737
Patrick Donnelly [Wed, 24 Mar 2021 23:11:03 +0000 (16:11 -0700)]
Merge PR #40317 into pacific
* refs/pull/40317/head:
cephsqlite: add julian day offset in milliseconds
doc: add libcephsqlite
ceph.spec,debian: package libcephsqlite
test/libcephsqlite,qa: add tests for libcephsqlite
libcephsqlite: rework architecture and backend
SimpleRADOSStriper: wait for finished aios after write
SimpleRADOSStriper: add new minimal async striper
mon: define simple-rados-client-with-blocklist profile
librados: define must renew lock flag
common: add timeval conversion for durations
Revert "libradosstriper: add function to read into char*"
test_libcephsqlite: test random inserts
cephsqlite: fix compiler errors
cmake: improve build inst for cephsqlite
libcephsqlite: sqlite interface to RADOS
libradosstriper: add function to read into char*
John Fulton [Wed, 17 Mar 2021 22:03:46 +0000 (18:03 -0400)]
mgr/cephadm: retry after JSONDecodeError in wait_for_mgr_restart()
'ceph mgr dump' does not always return valid JSON so cephadm
will throw an exception sometimes when applying a spec as per
the issue this PR closes. Add a try/except to catch a possible
JSONDecodeError and retry after sleeping.
Fixes: https://tracker.ceph.com/issues/49870 Signed-off-by: John Fulton <fulton@redhat.com>
(cherry picked from commit 0aba5704d9eb1a2df6dd437785fc1f8c558c0990)
Sage Weil [Thu, 18 Mar 2021 18:26:48 +0000 (14:26 -0400)]
cephadm: prevent podman from breaking socket.getfqdn()
socket.getfqdn() will return the reverse lookup for 127.0.1.1, which is
the last item listed for that IP in /etc/hosts. Podman, by default, will
append the container name (ceph-$fsid-$name) to that line, which is not
a valid hostname, and not what we want the dashbaord to use for the URI
it advertises in the service map.
Pass --no-hosts to podman to disable this.
Docker does not appear to modify /etc/hosts by default--or, more
importantly, does not add the container name there.
Sage Weil [Wed, 24 Mar 2021 15:34:03 +0000 (10:34 -0500)]
Merge PR #40094 into pacific
* refs/pull/40094/head:
rgw/kms/vault - PendingReleaseNotes pointer
rgw/kms/vault - s3tests for both old and new test logic.
rgw/kms/vault - rework unit test logic for new transit logic.
rgw/kms/vault - 0 terminate before rapidjson
rgw/kms/vault - document configuration for new transit logic
rgw/kms/vault - new transit logic - fix compat logic
rgw/kms/vault - define attribute for new transit logic
rgw/kms/vault - "compat" option
rgw/kms/vault - encryption context - first part
rgw/kms/vault - define attribute to store encryption context
rgw/kms/vault - share get/set attr between rgw_crypt.cc and rgw_kms.cc
rgw/kms/vault - relax configuration parsing for rgw_crypt_vault_secret_engine
rgw/kms/vault - need libicu to make canonical json for encryption contexts.
rgw/kms/kmip - document configuration for a new feature: kmip kms
rgw/kms/kmip - rgw / kmip test integration.
rgw/kms/kmip - rgw / kmip test integration.
rgw/kms/kmip - rgw / kmip test integration.
rgw/kms/kmip - correct documentation.
rgw/kms/kmip - pykmip.py needs to make keys too.
rgw/kms/kmip - pykmip.py should actually run pykmip.
rgw/kms/kmip - python3 changes for testing.
rgw/kms/kmip - string handling cleanup.
teuthology/rgw: pykmip task
kmip: first pass at implementation logic.
kmip: configuration options.
Including cmake build logic inside of libkmip.
cmake glue to build libkmip.
Added libkmip as a submodule.
Sage Weil [Mon, 22 Mar 2021 22:40:25 +0000 (18:40 -0400)]
mgr/cephadm/upgrade: do not crash if error races with user cancellation
If the user cancels the upgrade just before the upgrade thread runs into
a problem (and these things may be correlated!), ignore the failure
instead of crashing the module.
Sage Weil [Wed, 24 Mar 2021 03:00:46 +0000 (22:00 -0500)]
cephadm: only bootstrap using image that matches cephadm version
Only allow bootstrap to deploy if the cephadm version matches the
ceph version in the container. Allow the master branch version of cephadm
to deploy the latest stable version as well (at least for now).
Provide a flag to force bootstrap to continue despite the check.
Move the _pull_image call up into bootstrap so that it is easier to see
when it happens.
Adam King [Thu, 11 Feb 2021 16:43:01 +0000 (11:43 -0500)]
mgr/cephadm: redeploy daemons deployed using old image during upgrade
Add extra check that daemons were deployed by mgr using new image
during upgrade. Makes sure unit.run file for all daemons are updated
if they changed between old and new images.
Yin Congmin [Thu, 18 Mar 2021 13:33:07 +0000 (21:33 +0800)]
librbd/cache/pwl: fix bug of flush request blocked by deferd IO
Flush requests do not need to be queued behind the defer_io queue,
should be issued immediately. Otherwise, there will be a deadlock
scenario in which dirty data is waiting for flush req, flush req is
waiting for defer_io empty, and defer_io is waiting for dirty data
persistence to release space. So this sometimes occur when the cache
is small but the IO is large or the queue depth is large.
Alfonso Martínez [Tue, 23 Mar 2021 10:14:11 +0000 (11:14 +0100)]
mgr/dashboard: fix error notification shown when no rgw daemons are running.
- Adapted code to changes introduced in: https://github.com/ceph/ceph/pull/40220
- Improved error handling.
- Increased test coverage.
- Some refactoring.
- Simplified documentation about setting default daemon host and port.
Fixes: https://tracker.ceph.com/issues/49655 Signed-off-by: Alfonso Martínez <almartin@redhat.com>
(cherry picked from commit 58253a0002f8722abecaaf58161f6494fbe0eaa0)
Kefu Chai [Fri, 12 Mar 2021 04:02:22 +0000 (12:02 +0800)]
ceph.spec: build with system libpmem on fedora and el8
* build with WITH_SYSTEM_PMDK=ON on fedora, as f32 and f33 ship
libpmem1.8 and libpmem1.9 respectively. and we need libpmem v1.7
* build with WITH_SYSTEM_PMDK=ON on el8, as el8 and CentOS8 AppStream
ships libpmem v1.6,
quote from nvml.spec:
> By design, PMDK does not support any 32-bit architecture.
> Due to dependency on some inline assembly, PMDK can be compiled only
> on these architectures:
> - x86_64
> - ppc64le (experimental)
> - aarch64 (unmaintained, supporting hardware doesn't exist?)
so far, only x86_64 and ppc64le packages are built.
see also,
https://src.fedoraproject.org/rpms/nvml/blob/rawhide/f/nvml.spec
Sage Weil [Tue, 23 Mar 2021 00:20:48 +0000 (19:20 -0500)]
Merge PR #40184 into pacific
* refs/pull/40184/head:
qa/suites/rados/cephadm/orchestrator_cli: random-distro$ -> 0-random-distro$
qa/suites/rados/cephadm/smoke-roleless: distro -> 0-distro
qa/distros/podman: install kubic once per host, in parallel
qa/suites/fs/multiclient: use clients: not all: for pexec