rpm: add "Provides: python3-*" for python packages
so user can install python3-rados, instead of python36-rados, without
specifying the minor version of python. also, we should not break our
teuthology tests with this naming scheme change. for instance, our
cephfs qa suite installs `python3-cephfs` for testing the `cephfs-shell`
After PR https://github.com/ceph/ceph/pull/26572, when RGW is not
configured, accessing /rgw drop-down (daemons, users or buckets)
results in nothing apparently happening (not even an error).
Under the curtains, what is happening is that the ModuleStatusGuard
has redirected the route to the rgw/501, but as this route is now
under parent rgw route handler, which sets CanActivateChild guards,
this results in a new ModuleStatusGuard invokation, a subsequent
failure and a new redirection to rgw/501.
Several approaches could be taken here:
- Remove error pages from lazy-loaded modules. Probably it does not
make sense to have a 501 page per component.
- Add some whitelist to avoid this kind of loop (e.g.: 501, or any
error page).
- Set a max number of redirections (cautionary measure).
* common/auth_handler.h: add an abstract class of AuthHandler, the one who is interested in
an authenticated peer should implement this class
* mon/MonClient: let mon::Client implement AuthServer, as it has access the keyring. it
will update the registered AuthHandler if the client (peer) is
authenticated.
* osd: implement AuthHandler class. we will keep track of the connected
sessions along their caps in a follow-up change.
* refs/pull/27415/head:
qa: decouple session map test from simple msgr
msg/async: move connection ref
msg/async: dec active connections when marked down
Instead of looking at the number of threads (used by the simple messenger) to
judge the coming and going of connections, use the (async) messenger perf
counters.
Plus some other minor improvements.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
msg/async: dec active connections when marked down
Otherwise, tests can't tell when a connection is stopped until it's eventually
"lazily" deleted. This should be safe since the perf counter is manipulating an
atomic value.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Jason Dillaman [Thu, 4 Apr 2019 01:35:40 +0000 (21:35 -0400)]
librbd: copyup should restart delayed ops against the same object
This avoids the potential for a race condition where an in-flight
copyup is removed from the in-flight copyup list and a subsequent
IO against the same object causes a second in-flight copyup.
Fixes: http://tracker.ceph.com/issues/39021 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Tue, 2 Apr 2019 15:51:36 +0000 (11:51 -0400)]
librbd: merge copyup object map update states
The object map HEAD and HEAD/snapshot update states have been
simplified and merged into a single state. This also fixes
several potential race conditions and an issue where CoR might
incorrectly mark the HEAD object has exists+dirty.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Fri, 29 Mar 2019 17:23:37 +0000 (13:23 -0400)]
librbd: properly hold snap/parent locks during IO
The ImageCtx::parent pointer was dereferenced without holding the lock
which could lead to a crash. The ImageCtx::migration_info structure
was also dereferenced without holding a lock.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Jason Dillaman [Mon, 1 Apr 2019 18:48:47 +0000 (14:48 -0400)]
librbd: deep-copy object copy should register an in-flight op
When handling live migrations, the source image is the parent
image of the destination image. To prevent the parent image from
being closed while a request is in-flight, the object copy
state machine now registers an async operation with the source
image.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Sage Weil [Mon, 8 Apr 2019 14:24:09 +0000 (09:24 -0500)]
Merge PR #27386 into master
* refs/pull/27386/head:
os/filestore/FileJournal: note EIO events
os/filestore: make note of EIO errors when we see them
os/filestore: note devname for later use
global/signal_handler: avoid core dump on EIO
os/bluestore/KernelDevice: note EIO metadata on aio EIO
global: add hook to annotate crash report with EIO information
some of our centos7 jenkins builders are failing to build ceph master and
nautilus branches. because EPEL7 recently switched from python3.4 to
python3.6 as the native python3. see
https://lists.fedoraproject.org/archives/list/epel-announce@lists.fedoraproject.org/message/EGUMKAIMPK2UD5VSHXM53BH2MBDGDWMO/
and one of our BuildRequires, cmake3,
was offered by EPEL7. it also followed the python3.6 switch-over to
rebuild against python3.6. as a result, the cmake3-data-3.13.4-2.el7
started to depend on /usr/bin/python3.6, which is in turn offered by
python36 package. after installing python36 as a dependency of the
updated cmake3. but in cmake, we originally checks for the latest
python3 interpreter if WITH_PYTHON3 is enabled, that's why these
builders which happen to install these updated packages started to fail
when detecting the existence of python3.6 related build dependencies.
as a fix, in d1e83082,
python%{python3_pkgversion}-{devel,setuptools,Cython} are listed as
BuildRequires to reflect this change in EPEL7. before d1e83082, we
hardwired them to python34-*.
but as following analysis puts, there are cases where `yum-builddep`
is inconsistent with `rpmbuild`. as `yum-builddep` changes the how
`python3_pkgversion` and `python3_version` macros are expanded:
- none of the packages installed by `yum-builddep` installs the python3
related rpm macros, so the system stays with whatever python3 it was
using. in this case, `rpmbuild` won't complain, as the
`python3_pkgversion` and `python_version` are consistent before and
after `yum-builddep`.
- system has python3.4 installed before `yum-builddep`. but
`yum-builddep` installed python3.6 and also the updated
`python-rpm-macros` packages, which points `python3_version` and
`python3_pkgversion` to 3.6 and 36 respectively. in this case,
`rpmbuild` will complain, because when we run `yum-builddep`,
`python3_version` was still "3.4".
- system does not have python3 installed before `yum-builddep`. so
it was using python34 for preparing the "BuildRequires". but some
of the packages installed by `yum-builddep` installs python36, and
also the updated `python-rpm-macros` packages, which points
`python3_version` and `python3_pkgversion` to 3.6 and 36 respectively.
in this case, `rpmbuild` will complain, because the python36 related
dependencies are missing. what the system has is python34
dependencies.
- system does not have python3 installed before `yum-builddep`. so
it was using python34 for preparing the "BuildRequires". but some
of the packages installed by `yum-builddep` installs python34, and
also the updated `python-rpm-macros` packages, which points
`python3_version` and `python3_pkgversion` to 3.4 and 34 respectively.
in this case, `rpmbuild` won't complain, as the
`python3_pkgversion` and `python_version` are also consistent before and
after `yum-builddep`.
as we cannot tell if the system has python3 or what the python3 version
the system has before `yum-builddep`, so what we can do is to ensure
`rpmbuild` has what it needs to build Ceph. so let's just stick with
python3.6.
xie xingguo [Sat, 2 Mar 2019 08:23:12 +0000 (16:23 +0800)]
msg/async: add timeout for connections which are not yet ready
There could be various corner cases that may cause an async
connection stuck in the connecting stage (e.g., by manually
creating some loop back connections on the switches of our test cluster,
we can almost 100% reproduce http://tracker.ceph.com/issues/37499).
In 61b9432ef9a3847eceb96f8d5a854567c49bbf61 I try to employ the
existing keep_alive mechanism to get those stuck connections out of the
trap but it does not work if the corresponding connection
is not yet ready, since we always require the underlying connection to be
**ready** in order to send out a keep_alive message.
Fix by making a more general connecting timeout strategy.
If a connecting process can not be finished within a specific interval,
then we simply cut it off and retry.
Sage Weil [Thu, 4 Apr 2019 20:48:40 +0000 (15:48 -0500)]
os/filestore: make note of EIO errors when we see them
This is imprecise, since we can't (easily) map an EIO back to a specific
part of the device, or even (easily) tell whether it was a read or write
error. It's enough to mark a crash dump as an EIO event, though, and to
include the name of the (primary) filestore device.
Sage Weil [Thu, 4 Apr 2019 19:47:17 +0000 (14:47 -0500)]
os/bluestore/KernelDevice: note EIO metadata on aio EIO
Note that we only do this if we're about to induce a crash. If we can
pass EIO up the stack, it's up to the upper layer to handle it or trigger
its own crash if it can't.
Sage Weil [Sun, 7 Apr 2019 18:54:59 +0000 (13:54 -0500)]
common: add --log-early command line option
Sometime it is important and useful to see the logs from the bootstrap
phase where we are getting the initial configs from the monitors. Add
a command-line option --log-early to do that.