Sage Weil [Wed, 24 Mar 2021 15:34:03 +0000 (10:34 -0500)]
Merge PR #40094 into pacific
* refs/pull/40094/head:
rgw/kms/vault - PendingReleaseNotes pointer
rgw/kms/vault - s3tests for both old and new test logic.
rgw/kms/vault - rework unit test logic for new transit logic.
rgw/kms/vault - 0 terminate before rapidjson
rgw/kms/vault - document configuration for new transit logic
rgw/kms/vault - new transit logic - fix compat logic
rgw/kms/vault - define attribute for new transit logic
rgw/kms/vault - "compat" option
rgw/kms/vault - encryption context - first part
rgw/kms/vault - define attribute to store encryption context
rgw/kms/vault - share get/set attr between rgw_crypt.cc and rgw_kms.cc
rgw/kms/vault - relax configuration parsing for rgw_crypt_vault_secret_engine
rgw/kms/vault - need libicu to make canonical json for encryption contexts.
rgw/kms/kmip - document configuration for a new feature: kmip kms
rgw/kms/kmip - rgw / kmip test integration.
rgw/kms/kmip - rgw / kmip test integration.
rgw/kms/kmip - rgw / kmip test integration.
rgw/kms/kmip - correct documentation.
rgw/kms/kmip - pykmip.py needs to make keys too.
rgw/kms/kmip - pykmip.py should actually run pykmip.
rgw/kms/kmip - python3 changes for testing.
rgw/kms/kmip - string handling cleanup.
teuthology/rgw: pykmip task
kmip: first pass at implementation logic.
kmip: configuration options.
Including cmake build logic inside of libkmip.
cmake glue to build libkmip.
Added libkmip as a submodule.
Yin Congmin [Thu, 18 Mar 2021 13:33:07 +0000 (21:33 +0800)]
librbd/cache/pwl: fix bug of flush request blocked by deferd IO
Flush requests do not need to be queued behind the defer_io queue,
should be issued immediately. Otherwise, there will be a deadlock
scenario in which dirty data is waiting for flush req, flush req is
waiting for defer_io empty, and defer_io is waiting for dirty data
persistence to release space. So this sometimes occur when the cache
is small but the IO is large or the queue depth is large.
Alfonso Martínez [Tue, 23 Mar 2021 10:14:11 +0000 (11:14 +0100)]
mgr/dashboard: fix error notification shown when no rgw daemons are running.
- Adapted code to changes introduced in: https://github.com/ceph/ceph/pull/40220
- Improved error handling.
- Increased test coverage.
- Some refactoring.
- Simplified documentation about setting default daemon host and port.
Fixes: https://tracker.ceph.com/issues/49655 Signed-off-by: Alfonso Martínez <almartin@redhat.com>
(cherry picked from commit 58253a0002f8722abecaaf58161f6494fbe0eaa0)
Kefu Chai [Fri, 12 Mar 2021 04:02:22 +0000 (12:02 +0800)]
ceph.spec: build with system libpmem on fedora and el8
* build with WITH_SYSTEM_PMDK=ON on fedora, as f32 and f33 ship
libpmem1.8 and libpmem1.9 respectively. and we need libpmem v1.7
* build with WITH_SYSTEM_PMDK=ON on el8, as el8 and CentOS8 AppStream
ships libpmem v1.6,
quote from nvml.spec:
> By design, PMDK does not support any 32-bit architecture.
> Due to dependency on some inline assembly, PMDK can be compiled only
> on these architectures:
> - x86_64
> - ppc64le (experimental)
> - aarch64 (unmaintained, supporting hardware doesn't exist?)
so far, only x86_64 and ppc64le packages are built.
see also,
https://src.fedoraproject.org/rpms/nvml/blob/rawhide/f/nvml.spec
Sage Weil [Tue, 23 Mar 2021 00:20:48 +0000 (19:20 -0500)]
Merge PR #40184 into pacific
* refs/pull/40184/head:
qa/suites/rados/cephadm/orchestrator_cli: random-distro$ -> 0-random-distro$
qa/suites/rados/cephadm/smoke-roleless: distro -> 0-distro
qa/distros/podman: install kubic once per host, in parallel
qa/suites/fs/multiclient: use clients: not all: for pexec
Sage Weil [Tue, 23 Mar 2021 00:20:39 +0000 (19:20 -0500)]
Merge PR #40202 into pacific
* refs/pull/40202/head:
qa/suites/rados/cephadm/upgrade: wait for rgw servicemap entries to refresh
mgr/cephadm: identify iscsi service by the pool
qa/distros/podman: install containernetworking-plugins along with podman
python-common: Validate characters in service_id for container names
qa/suites/rados/cephadm/smoke-roleless: deploy additional daemon types
cephadm: fix a minor typo in logging message
qa/suites/rados/cephadm/dashboard: test on centos
cephadm: use debug verbosity during container exec
mgr/cephadm/upgrade: do not repeat crash message
mgr/cephadm/upgrade: a little less verbose
mgr/cephadm: don't log not-ok-to-stop at ERR level
mgr/cephadm: is presumed -> appears
mgr/cephadm: don't double-log ok-to-stop results
mgr/cephadm/upgrade: include upgrade progress in ceph -s
mgr/cephadm: clean up misc messages
mgr/cephadm/configcheck: do not spam info every minute
mgr/cephadm: stop conflicting daemon when deploying to a specific port
mgr/cephadm: make DaemonPlacement print nicer
mgr/cephadm: fix --force remove comment
mgr/cephadm/schedule: choose an IP from a subnet list
mgr/cephadm: rgw: clean up config and config-key values on removal
mgr/cephadm: rgw: drop .crt extension when storing cert in config-key
mgr/cephadm/services: allow beast/civetweb to bind to a particular IP
python-common: add 'networks' property to ServiceSpec
mgr/cephadm/schedule: match placement ip only combination with port
mgr/cephadm: less noise about refreshing hosts
mgr/cephadm: fall back to service spec port if none on DaemonDescription
mgr/cephadm: fix redeploy when daemons have ip:port
mgr/cephadm/schedule: add test case
qa/suites/rados/cephadm/smoke-roleless: add rgw test on many ports
doc/cephadm/rgw: update docs to show count-per-host
mgr/cephadm: add support for rgw_frontend_type (beast or civetweb)
mgr/cephadm: remove ssl_frontend_ssl_key from RGWSpec
mgr/cephadm: fix beast private key config option
mgr/cephadm: fix rgw ssl cert/key config-key path
mgr/cephadm/schedule: dynamically assign ports for rgw
mgr/cephadm/schedule: only 1 port in DaemonPlacement
mgr/cephadm: move rgw frontend logic into RgwService
mgr/cephadm/schedule: return DaemonPlacement instead of HostPlacementSpec
mgr/cephadm/schedule: remove unused methods
mgr/cephadm: propagate ip:port from CephadmDaemoNDeploySpec to deployment
cephadm: populate ports if known and not included in unit.meta
mgr/cephadm: gather and report ports in 'orch ps' output
qa/suites/rados/cephadm/orchestrator_cli: random-distro$ -> 0-random-distro$
qa/suites/rados/cephadm/smoke-roleless: distro -> 0-distro
qa/distros/podman: install kubic once per host, in parallel
qa/suites/fs/multiclient: use clients: not all: for pexec
mgr/cephadm: add info to 'ceph orch upgrade status' in cephadm
Reviewed-by: Michael Fritch <mfritch@suse.com> Reviewed-by: Juan Miguel Olmo <jolmomar@redhat.com>
Sage Weil [Tue, 23 Mar 2021 00:20:30 +0000 (19:20 -0500)]
Merge PR #40279 into pacific
* refs/pull/40279/head:
mgr/cephadm: identify rgw, cepfs-mirror in servicemap
mgr/ServiceMap: adjust 'ceph -s' summary
rgw: register daemons in servicemap by gid; include id
cephadm: fix rbd-mirror auth name
Kefu Chai [Mon, 22 Mar 2021 06:49:13 +0000 (14:49 +0800)]
qa/distros/podman: install containernetworking-plugins along with podman
/etc/cni/net.d/87-podman-bridge.conflist tries to load "bridge",
"firewall", "tuning" and "portmap" plugins, which are provided by
containernetworking-plugins package.
Sage Weil [Thu, 18 Mar 2021 16:45:48 +0000 (11:45 -0500)]
mon/MgrStatMonitor: ignore MMgrReport from non-active mgr
If it's not the active mgr, we should ignore it.
Since the mgr instance is best identified by the gid, add that to the
message. (We can't use the source_addrs for the message since that is
the MgrStandby monc addr, not the active mgr addrs in the MgrMap.)
This fixes a problem where a just-demoted mgr report gets processed and a
new mgr gets a ServiceMap with an epoch >= its pending map. (At least,
that is my theory!)
Josh Durgin [Wed, 17 Mar 2021 18:51:27 +0000 (14:51 -0400)]
common/options: turn off bluestore_fsck_quick_fix_on_mount by default
This option enables 3 conversions:
1) pool stats, added in nautilus
2) per-pool omap, added in octopus
3) per-pg omap (replacing (2)) in pacific
Upgrading the long running cluster in sepia from octopus to pacific
resulted in conversion (3). This conversion isn't particularly useful
yet since the follow-on optimization of pg removal aren't in pacific
yet.
This took 25 minutes for the SSD-based osds with <10GB of omap. That's
a lot of disruption, and some clusters have 10x that much omap data.
Upgrades going from nautilus to pacific will miss the finer-grained
stats granularity, but isn't such an important feature it's worth
causing potential availability problems.
In the future we can orchestrate these format changes via cephadm/rook
to minimize the impact on the whole cluster, e.g. going an osd at a
time or doing it during an off-peak period, and not necessarily at the
same time as an upgrade.
Sage Weil [Sun, 21 Mar 2021 18:25:06 +0000 (13:25 -0500)]
Merge PR #40247 into pacific
* refs/pull/40247/head:
common: reset last_log_sent when clog_to_monitors is updated
logclient: move LogChannel::set_log_to_monitors(bool v) to LogClient.cc
Sage Weil [Sun, 21 Mar 2021 14:38:49 +0000 (09:38 -0500)]
Merge PR #40129 into pacific
* refs/pull/40129/head:
osd: PeeringState: implement an acting_set_writeable() function
osd: PeeringState: fix a boolean conditional direction
osd: PeeringState: fix stretch peering so PGs can go peered but not active
osd: PeeringState: don't add acting-set OSDs to candidate set in stretch mode
osd: PeeringState: fix calc_replicated_acting_stretch() syntax/logic
osd: PeeringState: respect stretch peering constraints for async recovery
osd: PeeringState: add a comment about using size as a proxy for activateable
osd: check for is_stretch_pool() in stretch_set_can_peer()
scripts: some additions to help with local testing
script: set_up_stretch_mode: include OSDs in root=default so pg creation works
myoungwon oh [Wed, 24 Feb 2021 14:16:39 +0000 (23:16 +0900)]
osd: recover unreadable snapshots when handling manifest object
The manifest object needs adjacent clones to increment/decrement
refcount when modifying the object. So, recovering the clones is needed
if the adjacent clones are unreadable.
Kefu Chai [Sat, 20 Mar 2021 05:00:01 +0000 (13:00 +0800)]
install-deps.sh: remove existing ceph-libboost of different version
we install different versions of precompiled ceph-libboost packages
for different branches when building and testing them on ubuntu test
nodes. for instance,
- nautilus: v1.72
- octopus, pacific: v1.73
they share the same set of test nodes. and these ceph-libboost packages
conflict with each other, because they install files to the same places.
in order to avoid the confliction, we should uninstall existing packages
before installing a different version of ceph-libboost packages.
ceph-libboost${version}-dev is a package providing the shared headers of
boost library, so, in this change we check if it is installed before
returning or removing the existing packages.
Sage Weil [Fri, 19 Mar 2021 12:21:18 +0000 (08:21 -0400)]
mgr/ServiceMap: adjust 'ceph -s' summary
- Do not list individual daemon ids as this won't scale for larger
clusters
- Do not contemplate multile daemons of the same type that register with
different "daemon_type" -- not until we actually have any that do that.
- Present counts by various groupings: distinct hosts and rgw zones to
start.
services:
mon: 1 daemons, quorum a (age 4m)
mgr: x(active, since 3m)
osd: 1 osds: 1 up (since 3m), 1 in (since 3m)
cephfs-mirror: 1 daemon active (1 hosts)
rbd-mirror: 2 daemons active (1 hosts)
rgw: 2 daemons active (1 hosts, 1 zones)
Sage Weil [Fri, 19 Mar 2021 12:25:23 +0000 (08:25 -0400)]
rgw: register daemons in servicemap by gid; include id
Registering by gid allows multiple radosgw instances to share an auth
key/identity. Including the id in the metadata allows them to still be
identified by name (even if not uniquely).
Kefu Chai [Fri, 19 Mar 2021 04:05:45 +0000 (12:05 +0800)]
pybind/mgr/dashboard: bump flake8 to 3.9.0
to address the failure of
ERROR: Cannot install -r requirements-lint.txt (line 2) and -r requirements-lint.txt (line 8) because these package versions have conflicting dependencies.
The conflict is caused by:
flake8 3.8.4 depends on pycodestyle<2.7.0 and >=2.6.0a1
autopep8 1.5.6 depends on pycodestyle>=2.7.0
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
Adam C. Emerson [Tue, 19 Jan 2021 19:47:27 +0000 (14:47 -0500)]
rgw: Fix spurious error on empty datalog shard
-ENOENT on a shard simply means nothing has been written to it
yet. Return no entries and no error.
Also change dout_subsys target for fifo client so probes don't fill up
the logs.
Signed-off-by: Adam C. Emerson <aemerson@redhat.com> Fixes: https://tracker.ceph.com/issues/48929
(cherry picked from commit 8c5c7c7a9098fc688f63503b254065e5b3b4ae45)
Gaurav Sitlani [Tue, 9 Feb 2021 19:24:41 +0000 (00:54 +0530)]
doc: added missing documentation on "pubsub" in rgw_enable_apis Fixes: https://tracker.ceph.com/issues/49203
Adding support for the term "notifications" in rgw_enable_apis
rgw/sts: fix for encoding/decoding user namespace
in RGWUserInfo, to fetch the user details correctly
for OIDC users which are created in 'oidc' namespace.
Matt Benjamin [Tue, 12 Jan 2021 22:14:57 +0000 (17:14 -0500)]
test/rgw_file: elaborate test cycle
Ensure that all delete phases are run so that script can be
re-run when desired.
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Please enter the commit message for your changes. Lines starting
(cherry picked from commit 247d00817d71548d156f8855692baf1dce3a95e2)
cmake: install the ceph_test_librgw_file_* targets
these need to be installed in order to be included in packages for
testing in teuthology
Signed-off-by: Casey Bodley <cbodley@redhat.com> Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit cefd31b6f6192cab0a50c973ac02ef0a83a3afba)
Gerald Yang [Wed, 3 Mar 2021 04:37:15 +0000 (04:37 +0000)]
common: reset last_log_sent when clog_to_monitors is updated
When clog_to_monitors is disabled, "last_log" still keeps increasing by
get_next_seq() if OSD writes info to clog
But "last_log_sent" doesn't increase, if we disable clog_to_monitors for
a bit longer and then re-enabling it, the num_unsent could be bigger than
log_queue_size(), it will trigger an assertion in _get_mon_log_message
We need to reset last_log_sent to last_log before updating clog_to_monitors