Stephan Müller [Wed, 3 Apr 2019 14:13:54 +0000 (16:13 +0200)]
mgr/dashboard: Enable read only users to read again
The dashboards Prometheus receiver API needed to receive a POST from the
frontend, but POSTs aren't allowed by default by any read only
users. As a result the receiver API call had thrown a 403 error which
redirected the user to the 403 error page. Now you can get the last
notifications via GET. This prevents the redirection for read only
users, as a result they can get the last notifications and also
see all other allowed pages again.
Fixes: https://tracker.ceph.com/issues/39086 Signed-off-by: Stephan Müller <smueller@suse.com>
* common/auth_handler.h: add an abstract class of AuthHandler, the one who is interested in
an authenticated peer should implement this class
* mon/MonClient: let mon::Client implement AuthServer, as it has access the keyring. it
will update the registered AuthHandler if the client (peer) is
authenticated.
* osd: implement AuthHandler class. we will keep track of the connected
sessions along their caps in a follow-up change.
* refs/pull/27415/head:
qa: decouple session map test from simple msgr
msg/async: move connection ref
msg/async: dec active connections when marked down
Instead of looking at the number of threads (used by the simple messenger) to
judge the coming and going of connections, use the (async) messenger perf
counters.
Plus some other minor improvements.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
msg/async: dec active connections when marked down
Otherwise, tests can't tell when a connection is stopped until it's eventually
"lazily" deleted. This should be safe since the perf counter is manipulating an
atomic value.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Sage Weil [Mon, 8 Apr 2019 14:24:09 +0000 (09:24 -0500)]
Merge PR #27386 into master
* refs/pull/27386/head:
os/filestore/FileJournal: note EIO events
os/filestore: make note of EIO errors when we see them
os/filestore: note devname for later use
global/signal_handler: avoid core dump on EIO
os/bluestore/KernelDevice: note EIO metadata on aio EIO
global: add hook to annotate crash report with EIO information
some of our centos7 jenkins builders are failing to build ceph master and
nautilus branches. because EPEL7 recently switched from python3.4 to
python3.6 as the native python3. see
https://lists.fedoraproject.org/archives/list/epel-announce@lists.fedoraproject.org/message/EGUMKAIMPK2UD5VSHXM53BH2MBDGDWMO/
and one of our BuildRequires, cmake3,
was offered by EPEL7. it also followed the python3.6 switch-over to
rebuild against python3.6. as a result, the cmake3-data-3.13.4-2.el7
started to depend on /usr/bin/python3.6, which is in turn offered by
python36 package. after installing python36 as a dependency of the
updated cmake3. but in cmake, we originally checks for the latest
python3 interpreter if WITH_PYTHON3 is enabled, that's why these
builders which happen to install these updated packages started to fail
when detecting the existence of python3.6 related build dependencies.
as a fix, in d1e83082,
python%{python3_pkgversion}-{devel,setuptools,Cython} are listed as
BuildRequires to reflect this change in EPEL7. before d1e83082, we
hardwired them to python34-*.
but as following analysis puts, there are cases where `yum-builddep`
is inconsistent with `rpmbuild`. as `yum-builddep` changes the how
`python3_pkgversion` and `python3_version` macros are expanded:
- none of the packages installed by `yum-builddep` installs the python3
related rpm macros, so the system stays with whatever python3 it was
using. in this case, `rpmbuild` won't complain, as the
`python3_pkgversion` and `python_version` are consistent before and
after `yum-builddep`.
- system has python3.4 installed before `yum-builddep`. but
`yum-builddep` installed python3.6 and also the updated
`python-rpm-macros` packages, which points `python3_version` and
`python3_pkgversion` to 3.6 and 36 respectively. in this case,
`rpmbuild` will complain, because when we run `yum-builddep`,
`python3_version` was still "3.4".
- system does not have python3 installed before `yum-builddep`. so
it was using python34 for preparing the "BuildRequires". but some
of the packages installed by `yum-builddep` installs python36, and
also the updated `python-rpm-macros` packages, which points
`python3_version` and `python3_pkgversion` to 3.6 and 36 respectively.
in this case, `rpmbuild` will complain, because the python36 related
dependencies are missing. what the system has is python34
dependencies.
- system does not have python3 installed before `yum-builddep`. so
it was using python34 for preparing the "BuildRequires". but some
of the packages installed by `yum-builddep` installs python34, and
also the updated `python-rpm-macros` packages, which points
`python3_version` and `python3_pkgversion` to 3.4 and 34 respectively.
in this case, `rpmbuild` won't complain, as the
`python3_pkgversion` and `python_version` are also consistent before and
after `yum-builddep`.
as we cannot tell if the system has python3 or what the python3 version
the system has before `yum-builddep`, so what we can do is to ensure
`rpmbuild` has what it needs to build Ceph. so let's just stick with
python3.6.
xie xingguo [Sat, 2 Mar 2019 08:23:12 +0000 (16:23 +0800)]
msg/async: add timeout for connections which are not yet ready
There could be various corner cases that may cause an async
connection stuck in the connecting stage (e.g., by manually
creating some loop back connections on the switches of our test cluster,
we can almost 100% reproduce http://tracker.ceph.com/issues/37499).
In 61b9432ef9a3847eceb96f8d5a854567c49bbf61 I try to employ the
existing keep_alive mechanism to get those stuck connections out of the
trap but it does not work if the corresponding connection
is not yet ready, since we always require the underlying connection to be
**ready** in order to send out a keep_alive message.
Fix by making a more general connecting timeout strategy.
If a connecting process can not be finished within a specific interval,
then we simply cut it off and retry.
Sage Weil [Thu, 4 Apr 2019 20:48:40 +0000 (15:48 -0500)]
os/filestore: make note of EIO errors when we see them
This is imprecise, since we can't (easily) map an EIO back to a specific
part of the device, or even (easily) tell whether it was a read or write
error. It's enough to mark a crash dump as an EIO event, though, and to
include the name of the (primary) filestore device.
Sage Weil [Thu, 4 Apr 2019 19:47:17 +0000 (14:47 -0500)]
os/bluestore/KernelDevice: note EIO metadata on aio EIO
Note that we only do this if we're about to induce a crash. If we can
pass EIO up the stack, it's up to the upper layer to handle it or trigger
its own crash if it can't.
Sage Weil [Sun, 7 Apr 2019 18:54:59 +0000 (13:54 -0500)]
common: add --log-early command line option
Sometime it is important and useful to see the logs from the bootstrap
phase where we are getting the initial configs from the monitors. Add
a command-line option --log-early to do that.
/home/pdonnell/ceph/src/msg/async/ProtocolV2.cc: In member function ‘Ct<ProtocolV2>* ProtocolV2::handle_auth_signature(ceph::bufferlist&)’:
/home/pdonnell/ceph/src/msg/async/ProtocolV2.cc:2259:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
doc: Update documentation for the MANY_OBJECTS_PER_PG warning
The current documentation for the MANY_OBJECTS_PER_PG warning
states that The threshold can be raised to silence the health
warning by adjusting the mon_pg_warn_max_object_skew config
option on the monitors. It seems that this is not true (at least)
since the luminous times, and this option should be adjusted on
the managers.
I encountered this problem and I spend quite sometime injecting
the mon_pg_warn_max_object_skew to the monitors, added the option
ceph.conf and restarted the monitors several times but the warning
was not going away. I had to download the code to see what's
happening and I found out this:
$ git grep -A 3 mon_pg_warn_max_object_skew src/common/options.cc
src/common/options.cc:1480: Option("mon_pg_warn_max_object_skew", Option::TYPE_FLOAT, Option::LEVEL_ADVANCED)
src/common/options.cc-1481- .set_default(10.0)
src/common/options.cc-1482- .set_description("max skew few average in objects per pg")
src/common/options.cc-1483- .add_service("mgr"),
After I restarted the ceph-mgr service, the warning went away.
rgw: limit entries in remove_olh_pending_entries()
If there are too many entries to send in a single osd op, the osd rejects
the request with EINVAL. This error happens in follow_olh(), which means
that requests against the object logical head (requests with no version
id) can't be resolved to the current object version. In multisite, this
also causes data sync to get stuck in retries