Alfonso Martínez [Tue, 23 Mar 2021 10:14:11 +0000 (11:14 +0100)]
mgr/dashboard: fix error notification shown when no rgw daemons are running.
- Adapted code to changes introduced in: https://github.com/ceph/ceph/pull/40220
- Improved error handling.
- Increased test coverage.
- Some refactoring.
- Simplified documentation about setting default daemon host and port.
Fixes: https://tracker.ceph.com/issues/49655 Signed-off-by: Alfonso Martínez <almartin@redhat.com>
Patrick Donnelly [Tue, 23 Mar 2021 03:03:41 +0000 (20:03 -0700)]
Merge PR #40146 into master
* refs/pull/40146/head:
test: pass peer uuid when adding cephfs mirror peers
mon: check cephfs mirror peer based on remote cluster spec and file system name
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Tue, 23 Mar 2021 03:03:02 +0000 (20:03 -0700)]
Merge PR #40145 into master
* refs/pull/40145/head:
doc: add note about disabling standby-replay during upgrades
qa: add test for standby-replay disable
mon: fail standby-replay daemons when flag is turned off
Sage Weil [Tue, 23 Mar 2021 01:57:45 +0000 (21:57 -0400)]
Merge PR #39435 into master
* refs/pull/39435/head:
mgr/cephadm: redeploy daemons deployed using old image during upgrade
mgr/cephadm: add container digests of mgr that deployed daemon to unit.meta
Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Sebastian Wagner <swagner@suse.com>
Patrick Donnelly [Mon, 22 Mar 2021 17:06:08 +0000 (10:06 -0700)]
Merge PR #39191 into master
* refs/pull/39191/head:
pybind/mgr/snap_schedule: use ceph VFS
pybind/mgr/snap_schedule: idempotentize table creation
mgr: add ceph sqlite VFS
doc: add libcephsqlite
ceph.spec,debian: package libcephsqlite
test/libcephsqlite,qa: add tests for libcephsqlite
libcephsqlite: rework architecture and backend
SimpleRADOSStriper: wait for finished aios after write
SimpleRADOSStriper: add new minimal async striper
mon: define simple-rados-client-with-blocklist profile
librados: define must renew lock flag
common: add timeval conversion for durations
Revert "libradosstriper: add function to read into char*"
test_libcephsqlite: test random inserts
cephsqlite: fix compiler errors
cmake: improve build inst for cephsqlite
libcephsqlite: sqlite interface to RADOS
libradosstriper: add function to read into char*
Kefu Chai [Mon, 22 Mar 2021 06:49:13 +0000 (14:49 +0800)]
qa/distros/podman: install containernetworking-plugins along with podman
/etc/cni/net.d/87-podman-bridge.conflist tries to load "bridge",
"firewall", "tuning" and "portmap" plugins, which are provided by
containernetworking-plugins package.
John Fulton [Wed, 17 Mar 2021 22:03:46 +0000 (18:03 -0400)]
mgr/cephadm: retry after JSONDecodeError in wait_for_mgr_restart()
'ceph mgr dump' does not always return valid JSON so cephadm
will throw an exception sometimes when applying a spec as per
the issue this PR closes. Add a try/except to catch a possible
JSONDecodeError and retry after sleeping.
Fixes: https://tracker.ceph.com/issues/49870 Signed-off-by: John Fulton <fulton@redhat.com>
Xuehan Xu [Thu, 18 Mar 2021 03:15:51 +0000 (11:15 +0800)]
crimson/os/alienstore: add default behaviour for alien threads affinities
currently, we allow alienstore to be scheduled on to any cpu cores other than
the starting three, as in most current tests we use the those cores for crimson-osd
seastar threads
Sage Weil [Sat, 20 Mar 2021 13:15:58 +0000 (09:15 -0400)]
Merge PR #40220 into master
* refs/pull/40220/head:
mgr/cephadm: identify rgw, cepfs-mirror in servicemap
mgr/ServiceMap: adjust 'ceph -s' summary
rgw: register daemons in servicemap by gid; include id
cephadm: fix rbd-mirror auth name
Kefu Chai [Sat, 20 Mar 2021 05:00:01 +0000 (13:00 +0800)]
install-deps.sh: remove existing ceph-libboost of different version
we install different versions of precompiled ceph-libboost packages
for different branches when building and testing them on ubuntu test
nodes. for instance,
- nautilus: v1.72
- octopus, pacific: v1.73
they share the same set of test nodes. and these ceph-libboost packages
conflict with each other, because they install files to the same places.
in order to avoid the confliction, we should uninstall existing packages
before installing a different version of ceph-libboost packages.
ceph-libboost${version}-dev is a package providing the shared headers of
boost library, so, in this change we check if it is installed before
returning or removing the existing packages.
Sage Weil [Fri, 19 Mar 2021 20:42:14 +0000 (16:42 -0400)]
Merge PR #40242 into master
* refs/pull/40242/head:
mgr/cephadm/upgrade: do not repeat crash message
mgr/cephadm/upgrade: a little less verbose
mgr/cephadm: don't log not-ok-to-stop at ERR level
mgr/cephadm: is presumed -> appears
mgr/cephadm: don't double-log ok-to-stop results
mgr/cephadm/upgrade: include upgrade progress in ceph -s
Sage Weil [Fri, 19 Mar 2021 12:21:18 +0000 (08:21 -0400)]
mgr/ServiceMap: adjust 'ceph -s' summary
- Do not list individual daemon ids as this won't scale for larger
clusters
- Do not contemplate multile daemons of the same type that register with
different "daemon_type" -- not until we actually have any that do that.
- Present counts by various groupings: distinct hosts and rgw zones to
start.
services:
mon: 1 daemons, quorum a (age 4m)
mgr: x(active, since 3m)
osd: 1 osds: 1 up (since 3m), 1 in (since 3m)
cephfs-mirror: 1 daemon active (1 hosts)
rbd-mirror: 2 daemons active (1 hosts)
rgw: 2 daemons active (1 hosts, 1 zones)
Kefu Chai [Thu, 11 Mar 2021 13:13:13 +0000 (21:13 +0800)]
mon/OSDMonitor: drop stale failure_info
failure_info keeps strong references of the MOSDFailure messages
sent by osd or peon monitors, whenever monitor starts to handle
an MOSDFailure message, it registers it in its OpTracker. and
the failure report messageis unregistered when monitor acks them
by either canceling them or replying the reporters with a new
osdmap marking the target osd down. but if this does not happen,
the failure reports just pile up in OpTracker. and monitor considers
them as slow ops. and they are reported as SLOW_OPS health warning.
in theory, it does not take long to mark an unresponsive osd down if
we have enough reporters. but there is chance, that a reporter fails
to cancel its report before it reboots, and the monitor also fails
to collect enough reports and mark the target osd down. so the
target osd never gets an osdmap marking it down, so it won't send
an alive message to monitor to fix this.
in this change, we check for the stale failure info in tick(), and
simply drop the stale reports. so the messages can released and
marked "done".
will add a trim failures call in the loop, which mutates failure_info,
while we are still iterating this map. so have to restructure the loop
a little bit.
Kefu Chai [Thu, 11 Mar 2021 09:45:49 +0000 (17:45 +0800)]
mon/OSDMonitor: do not return no_reply() again
we always return "no_op" message to proxy monitor in
`OSDMonitor::prepare_failure()` at the very beginning of this method. so
no need to reply the peon again when discarding the failure report.
Kefu Chai [Thu, 11 Mar 2021 09:09:57 +0000 (17:09 +0800)]
mon/Monitor: early return if routed request is not found
* early return if routed request is not found in routed_requests.
reduce the indent level, for better readability.
* do not look up the request twice. for better performance.
* use unique_ptr<> for holding the request, for better readability
Patrick Donnelly [Thu, 28 Jan 2021 23:04:01 +0000 (15:04 -0800)]
SimpleRADOSStriper: add new minimal async striper
This was developed because the two other striper implementations were
unsuitable for libcephsqlite:
- libradosstriper: while the async APIs exist, its current protocol
requires synchronously locking an object for every write/read whether
that operation is async or not. For this reason, it's too far too slow
for latency sensitive applications.
- osdc/Filer: this requires the object name to be an inode number. It
also comes with other overhead burden which is not necessary for
libcephsqlite including caching/buffering.
SimpleRADOSStriper aims to be a minimalistic heavily asynchronous
striper. One way it achieves this is through the use of exclusive locks
to protect access to the striped objects. Most metadata updates are
deferred until the striped file is unlocked, flushed, (or closed). All
reads/writes are asynchronous (but a read implicitly gathers async
striped reads for each op). Writes are not buffered. Reads are not
cached. There is no readahead.
SimpleRADOSStriper aims to be compatible with the rados binary --striper
option for extracting files out of RADOS but it should not be used
otherwise.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Mon, 22 Feb 2021 03:19:25 +0000 (19:19 -0800)]
librados: define must renew lock flag
This flag already exists in cls_lock but was not made externally
available via librados. Additionally, internally cls_lock refers to the
_RENEW flag as _MAY_RENEW, add an alias for librados to match.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This library provides a SQLite front-end to the RADOS objects.
This effort will help alleviate the restriction on number of key-value pairs
that can be stored in an object.
This interface is a generic one without any constraint on the database
schema either. Library clients can enforce any schema and use SQLite API
to store data in the database backed by RADOS Objects.