Sage Weil [Mon, 22 Mar 2021 22:05:16 +0000 (18:05 -0400)]
cephadm: only bootstrap using image that matches cephadm version
Only allow bootstrap to deploy if the cephadm version matches the
ceph version in the container. Allow the master branch version of cephadm
to deploy the latest stable version as well (at least for now).
Provide a flag to force bootstrap to continue despite the check.
Move the _pull_image call up into bootstrap so that it is easier to see
when it happens.
Fixes: https://tracker.ceph.com/issues/49884 Signed-off-by: Sage Weil <sage@newdream.net>
Patrick Donnelly [Mon, 22 Mar 2021 17:06:08 +0000 (10:06 -0700)]
Merge PR #39191 into master
* refs/pull/39191/head:
pybind/mgr/snap_schedule: use ceph VFS
pybind/mgr/snap_schedule: idempotentize table creation
mgr: add ceph sqlite VFS
doc: add libcephsqlite
ceph.spec,debian: package libcephsqlite
test/libcephsqlite,qa: add tests for libcephsqlite
libcephsqlite: rework architecture and backend
SimpleRADOSStriper: wait for finished aios after write
SimpleRADOSStriper: add new minimal async striper
mon: define simple-rados-client-with-blocklist profile
librados: define must renew lock flag
common: add timeval conversion for durations
Revert "libradosstriper: add function to read into char*"
test_libcephsqlite: test random inserts
cephsqlite: fix compiler errors
cmake: improve build inst for cephsqlite
libcephsqlite: sqlite interface to RADOS
libradosstriper: add function to read into char*
Kefu Chai [Mon, 22 Mar 2021 06:49:13 +0000 (14:49 +0800)]
qa/distros/podman: install containernetworking-plugins along with podman
/etc/cni/net.d/87-podman-bridge.conflist tries to load "bridge",
"firewall", "tuning" and "portmap" plugins, which are provided by
containernetworking-plugins package.
Sage Weil [Sat, 20 Mar 2021 13:15:58 +0000 (09:15 -0400)]
Merge PR #40220 into master
* refs/pull/40220/head:
mgr/cephadm: identify rgw, cepfs-mirror in servicemap
mgr/ServiceMap: adjust 'ceph -s' summary
rgw: register daemons in servicemap by gid; include id
cephadm: fix rbd-mirror auth name
Kefu Chai [Sat, 20 Mar 2021 05:00:01 +0000 (13:00 +0800)]
install-deps.sh: remove existing ceph-libboost of different version
we install different versions of precompiled ceph-libboost packages
for different branches when building and testing them on ubuntu test
nodes. for instance,
- nautilus: v1.72
- octopus, pacific: v1.73
they share the same set of test nodes. and these ceph-libboost packages
conflict with each other, because they install files to the same places.
in order to avoid the confliction, we should uninstall existing packages
before installing a different version of ceph-libboost packages.
ceph-libboost${version}-dev is a package providing the shared headers of
boost library, so, in this change we check if it is installed before
returning or removing the existing packages.
Sage Weil [Fri, 19 Mar 2021 20:42:14 +0000 (16:42 -0400)]
Merge PR #40242 into master
* refs/pull/40242/head:
mgr/cephadm/upgrade: do not repeat crash message
mgr/cephadm/upgrade: a little less verbose
mgr/cephadm: don't log not-ok-to-stop at ERR level
mgr/cephadm: is presumed -> appears
mgr/cephadm: don't double-log ok-to-stop results
mgr/cephadm/upgrade: include upgrade progress in ceph -s
Sage Weil [Fri, 19 Mar 2021 12:21:18 +0000 (08:21 -0400)]
mgr/ServiceMap: adjust 'ceph -s' summary
- Do not list individual daemon ids as this won't scale for larger
clusters
- Do not contemplate multile daemons of the same type that register with
different "daemon_type" -- not until we actually have any that do that.
- Present counts by various groupings: distinct hosts and rgw zones to
start.
services:
mon: 1 daemons, quorum a (age 4m)
mgr: x(active, since 3m)
osd: 1 osds: 1 up (since 3m), 1 in (since 3m)
cephfs-mirror: 1 daemon active (1 hosts)
rbd-mirror: 2 daemons active (1 hosts)
rgw: 2 daemons active (1 hosts, 1 zones)
Kefu Chai [Thu, 11 Mar 2021 13:13:13 +0000 (21:13 +0800)]
mon/OSDMonitor: drop stale failure_info
failure_info keeps strong references of the MOSDFailure messages
sent by osd or peon monitors, whenever monitor starts to handle
an MOSDFailure message, it registers it in its OpTracker. and
the failure report messageis unregistered when monitor acks them
by either canceling them or replying the reporters with a new
osdmap marking the target osd down. but if this does not happen,
the failure reports just pile up in OpTracker. and monitor considers
them as slow ops. and they are reported as SLOW_OPS health warning.
in theory, it does not take long to mark an unresponsive osd down if
we have enough reporters. but there is chance, that a reporter fails
to cancel its report before it reboots, and the monitor also fails
to collect enough reports and mark the target osd down. so the
target osd never gets an osdmap marking it down, so it won't send
an alive message to monitor to fix this.
in this change, we check for the stale failure info in tick(), and
simply drop the stale reports. so the messages can released and
marked "done".
will add a trim failures call in the loop, which mutates failure_info,
while we are still iterating this map. so have to restructure the loop
a little bit.
Kefu Chai [Thu, 11 Mar 2021 09:45:49 +0000 (17:45 +0800)]
mon/OSDMonitor: do not return no_reply() again
we always return "no_op" message to proxy monitor in
`OSDMonitor::prepare_failure()` at the very beginning of this method. so
no need to reply the peon again when discarding the failure report.
Kefu Chai [Thu, 11 Mar 2021 09:09:57 +0000 (17:09 +0800)]
mon/Monitor: early return if routed request is not found
* early return if routed request is not found in routed_requests.
reduce the indent level, for better readability.
* do not look up the request twice. for better performance.
* use unique_ptr<> for holding the request, for better readability
Patrick Donnelly [Thu, 28 Jan 2021 23:04:01 +0000 (15:04 -0800)]
SimpleRADOSStriper: add new minimal async striper
This was developed because the two other striper implementations were
unsuitable for libcephsqlite:
- libradosstriper: while the async APIs exist, its current protocol
requires synchronously locking an object for every write/read whether
that operation is async or not. For this reason, it's too far too slow
for latency sensitive applications.
- osdc/Filer: this requires the object name to be an inode number. It
also comes with other overhead burden which is not necessary for
libcephsqlite including caching/buffering.
SimpleRADOSStriper aims to be a minimalistic heavily asynchronous
striper. One way it achieves this is through the use of exclusive locks
to protect access to the striped objects. Most metadata updates are
deferred until the striped file is unlocked, flushed, (or closed). All
reads/writes are asynchronous (but a read implicitly gathers async
striped reads for each op). Writes are not buffered. Reads are not
cached. There is no readahead.
SimpleRADOSStriper aims to be compatible with the rados binary --striper
option for extracting files out of RADOS but it should not be used
otherwise.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Mon, 22 Feb 2021 03:19:25 +0000 (19:19 -0800)]
librados: define must renew lock flag
This flag already exists in cls_lock but was not made externally
available via librados. Additionally, internally cls_lock refers to the
_RENEW flag as _MAY_RENEW, add an alias for librados to match.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This library provides a SQLite front-end to the RADOS objects.
This effort will help alleviate the restriction on number of key-value pairs
that can be stored in an object.
This interface is a generic one without any constraint on the database
schema either. Library clients can enforce any schema and use SQLite API
to store data in the database backed by RADOS Objects.
Sage Weil [Fri, 19 Mar 2021 12:25:23 +0000 (08:25 -0400)]
rgw: register daemons in servicemap by gid; include id
Registering by gid allows multiple radosgw instances to share an auth
key/identity. Including the id in the metadata allows them to still be
identified by name (even if not uniquely).
Kefu Chai [Fri, 19 Mar 2021 02:32:16 +0000 (10:32 +0800)]
test: run promtool test without docker on ubuntu/focal
before this change, we use docker for running promtools offered by
a docker image, but this is not efficient, and quite a few developers
do not want to use docker for running "make check". this change was
introduced by #39246, the reason was that, in Ceph's CI process, we
are using Ubuntu/Bionic for running "make check" jobs, but prometheus
packaged by Bionic does not offer the "test rules" command. so, to
address problem, we are using "dnanexus/promtool:2.9.2" docker image
for verifying monitoring/prometheus/alerts/test_alerts.yml.
after this change, we use prometheus packaged by debian derivatives
instead of pulling a docker image.
* debian/control: add prometheus as a "make check" dependency
* install-deps.sh: partially revert 53a5816deda0874a3a37e131e9bc22d88bb2a588, as we don't need to
pull docker or start docker service for using promtool anymore.
* cmake: check if promtool is capable of running "test rules"
command, bail out if it is not.
Kefu Chai [Thu, 18 Mar 2021 11:50:58 +0000 (19:50 +0800)]
install-deps.sh: install boost 1.75 on focal
we bump boost on regular basis. let's take the opportunity of moving to
focal to use boost v1.75.
v1.73 was used before this change. since both boost 1.75 and boost 1.73
install some files at the same places, we need to remove boost 1.73
before installing boost 1.75.
Kefu Chai [Thu, 18 Mar 2021 11:43:06 +0000 (19:43 +0800)]
install-deps.sh: install libzbd on focal
WITH_ZBD is enabled for testing the build of zbd bluestore backend, and
we plan to migrate to Ubuntu/Focal for testing "make check", so need to
install libzbd when the distro version is focal.
Kefu Chai [Fri, 19 Mar 2021 12:20:32 +0000 (20:20 +0800)]
osd/PeeringState: remove unused variable
recovery_ec_pool_below_min_size was used to verify if the osd in clsuter
are octopus and up, but since we are now quincy and up, there is no need
to verify this. so drop it for better readability and for silencing
the -Wunused-variable warning in Release build.