Sage Weil [Fri, 5 Mar 2021 20:33:50 +0000 (15:33 -0500)]
Merge PR #39817 into master
* refs/pull/39817/head:
qa/suites/rados/cephadm: drop centos/rhel cephadm tests for the moment
qa/sites/rados/cephadm/thrash: rename 3-tasks.yaml/ -> 3-tasks/
qa/suites/rados/cephadm: adjust distros
qa/suites/upgrade: use kubic; test all distros
qa/suites/rados/cephadm/upgrade: use kubic on centos
qa: new kubic distro files; use kubic podman for centos/rhel
* refs/pull/38859/head:
mds: don't start purging inodes in the middle of recovery
mds: purge orphan objects created by lost async file creation
mds: track free prealloc_inos and delegated_inos separately
mds: cleanup code that purges orphan objects created by lost unsafe file creation
mds: subtract inos_to_purge from prealloc_inos when session close is logged
mds: use vector to define old_pools in PurgeItem and inode_backtrace_t
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
This broke cephadm (by triggering CEPHADM_STRAY_DAEMON) because cephadm
assumes that a daemon named rgw.r.z.foo will register as rgw.r.z.foo.
It is not clear to me that there is a way to work around this naming
mismatch that makes much sense. I think it makes more sense to focus on
the use-case that needs daemons to register under unique names and perhaps
control that naming behavior via an option or invest in providing daemons
with unique ids up front.
Kefu Chai [Sat, 13 Feb 2021 04:57:19 +0000 (12:57 +0800)]
doc/_theme: customize sphinx_rtd_theme
* move the breadcrumbs to the top
* add border around admonition elements
* use different colors and fonts for section headers
* add decoration lines at the bottom of breadcrumbs
* remove left and right borders in tables
* override the injected versions, the name of theme
is different from "sphinx_rtd_theme", but the
versions element is still displayed at the
bottom-left corner as "versions.html" defines.
without overriding .rst-badge CSS styling,
readthedocs puts the injected versions at
the default bottom-right corner, see
https://github.com/readthedocs/readthedocs.org/blob/2a519f1146142d18f6a63b61c2f08984067280e0/readthedocs/api/v2/templates/restapi/footer.html
Sage Weil [Wed, 3 Mar 2021 14:14:29 +0000 (08:14 -0600)]
qa: new kubic distro files; use kubic podman for centos/rhel
The current centos/rhel version of podman (2.2.1) is broken.
- create new qa/distros/podman/* files that install kubic podman
- include centos/rhel variants
- adjust cephadm jobs to use new yaml files
- remove old qa/distros/all/*_podman.yaml files
I did some visual cleanup too but mostly this changeset is to support
specifying subsets for each suite type. For now, only "fs" suite is
using partitions different from rados.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
* refs/pull/39724/head:
qa: skip exit-on-first-failure option for valgrind on ubuntu
mds,qa: exit instead of respawn under valgrind
qa: skip chdir for fuse_mount
qa: ignore all slow request warnings
qa: add new mds beacon grace mon config
qa: wait for MDS to join fsmap
qa: move get_valgrind_args to qa
* refs/pull/38684/head:
qa: add _check_scrub_status helper to simplify the code
qa: add run_scrub helper in filesystem class
qa: add get_scrub_status helper in filesystem class
qa: wait the scrub task to complete
qa: remove passed_validation check for test_damage
qa: move wait_until_scrub_complete helper to filesystem class
mds: simplify the C_MDS_EnqueueScrub finish code
Reviewed-by: Rishabh Dave <ridave@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Sage Weil [Thu, 4 Mar 2021 19:31:57 +0000 (14:31 -0500)]
Merge PR #39726 into master
* refs/pull/39726/head:
mgr/cephadm: document ok_to_stop output argument for clarity
mgr/DaemonServer: make warning language a bit friendlier
mgr/cephadm/upgrade: improve language a bit
mgr/cephadm/upgrade: restart multiple osds at once
mgr/cephadm: gather other osds that are safe to stop
mgr/cephadm: optional pass 'known' through to ok_to_stop
mgr/cephadm/upgrade: log start/stop/pause/resume
Sage Weil [Thu, 4 Mar 2021 13:35:24 +0000 (08:35 -0500)]
mgr/DaemonServer: osd ok-to-stop: return json when there are unknown PGs
In 791952cc01201010f298033003ba52374cc0159f we switched to return JSON
both on success and fail to describe which PGs are affected or are blocking
the ability to stop/restart OSDs. Do the same for the case where
some PG states are unknown (i.e., just after a mgr restart) so that
the cephadm upgrade process can unconditionally expect a JSON result.
Marcus Watts [Wed, 3 Feb 2021 19:26:46 +0000 (14:26 -0500)]
rgw/kms/kmip - document configuration for a new feature: kmip kms
I've written up a brief description of using kmip
with ceph. Major features:
* ceph configuration.
* making keys with a "paste-in" python script.
* pointers to PyKMIP and IBM SKLM.
Marcus Watts [Thu, 12 Nov 2020 03:38:18 +0000 (22:38 -0500)]
rgw/kms/kmip - rgw / kmip test integration.
s3tests needs to know key names in order to run kms tests.
It seems desirable to have s3tests default to discovering
the names that were created by the pykmip task, and that
if there is more than one rgw connected to more than one
pykmip, that names belonging to the appropriate pykmip
instance should be used.
This logic does the following:
rgw task: save pykmip role name.
s3tests task: set kms_key (and kms_keyid2) to
these in order of priority
1 s3tests client task property ['kms_key'] (or ['kms_key2'])
2 first (second) secret created in the matching pykmip instance.
3 testkey-1 (testkey-2)
For case 2, names from the secrets have an initial "token-" stripped from them.
The assumption here is that rgw is being run with a setting such as
rgw crypt kmip kms key template: pykmip-$keyid
therefore "pykmip-" will be prefixed back onto the key before use.
Marcus Watts [Thu, 29 Oct 2020 16:04:36 +0000 (12:04 -0400)]
rgw/kms/kmip - correct documentation.
The pykmip task should be after ceph, and before rgw.
kmip needs ssl certs in order to function correctly.
Because the openssl_keys task has an indeterminate
order of execution, it is best to create the ca as
a separate task. The ca can be shared with rgw, but
real life deployments of kmip are likely to have their
own CA.
In order to create kmip secrets, a client certificate
is necessary, so must be supplied to the pykmip task.
Marcus Watts [Thu, 29 Oct 2020 03:40:58 +0000 (23:40 -0400)]
rgw/kms/kmip - pykmip.py needs to make keys too.
The logic to deploy pykmip in teuthology was not complete.
The necessary logic to add kmip keys was missing.
Existing logic for other key services providers could use rest based
protocols directly from the teuthology host. For kmip, it is necessary
to use a special protocol, and it is more convenient to run this directly
on the pykmip server.
Marcus Watts [Tue, 27 Oct 2020 21:16:14 +0000 (17:16 -0400)]
rgw/kms/kmip - pykmip.py should actually run pykmip.
The logic to deploy pykmip in teuthology was not complete.
While it deployed all the code and certs to run pykmip,
it didn't actually run it. This commit fixes that.
Marcus Watts [Fri, 23 Oct 2020 23:07:09 +0000 (19:07 -0400)]
rgw/kms/kmip - python3 changes for testing.
python3 requires different imports and there's a different
way to get at the first element in a view.
This is to match changes introduced in the rest of ceph in these
commits: 24e7acc261a4d7258ea7fdcd
Marcus Watts [Sun, 16 Feb 2020 02:08:29 +0000 (21:08 -0500)]
kmip: first pass at implementation logic.
This implements SSE-KMS for the radosgw using kmip.
This uses symmetric raw keys with a name attribute in kmip,
so providing the same functionality as the "kv" key store
in hashicorp vault.
Nathan Cutler [Mon, 1 Mar 2021 11:07:29 +0000 (12:07 +0100)]
rpm: use PMDK system libraries on SUSE
As of a49d1dbb32e2436ff2836a85b2fa84418f0a5fff, when the rbd_rwl_cache and
rbd_ssd_cache bconds are enabled and WITH_SYSTEM_PMDK is disabled (as it is by
default), the RPM build attempts to
git clone https://github.com/ceph/pmdk.git
but of course that won't work in the OBS, where the build workers have no
Internet connectivity.
Fortunately, the openSUSE/SLE versions targeted by Ceph master and pacific ship
the necessary PMDK libraries as RPM packages.
2021-02-27T05:52:37.813 INFO:tasks.ceph.mds.d.smithi073.stderr:==00:01:03:20.556 41218== execve(0x18546740(/proc/self/exe), 0x18546670, 0x133ef310) failed, errno 2
2021-02-27T05:52:37.813 INFO:tasks.ceph.mds.d.smithi073.stderr:==00:01:03:20.556 41218== EXEC FAILED: I can't recover from execve() failing, so I'm dying.
2021-02-27T05:52:37.813 INFO:tasks.ceph.mds.d.smithi073.stderr:==00:01:03:20.556 41218== Add more stringent tests in PRE(sys_execve), or work out how to recover.
So configure the MDS to just exit so it can be restarted by QA infra (the
daemon watchdog).
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
When running under valgrind, MDS may be slow to be added to the FSMap
(especially if mons are in valgrind too). The file system creation that
follows will throw unnecessary warnings about insufficient standbys if
no MDS is available.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>