Tim Serong [Thu, 2 Apr 2020 05:46:57 +0000 (16:46 +1100)]
mgr/PyModule: fix missing tracebacks in handle_pyerror()
In certain cases, errors raised in mgr modules don't actually result in a
proper traceback in the mgr log; all you see is a message like "'Hello'
object has no a ttribute 'dneasdfasdf'", but you have no idea where that
came from, which is a complete PITA to debug.
Here's what's going on: handle_pyerror() calls PyErr_Fetch() to get
information about the error that occurred, then passes that information
back to python's traceback.format_exception() function to get the traceback.
If we write code in an mgr module that explicitly raises an exception
(e.g.: 'raise RuntimeError("that didn't work")'), the error value returned
by PyErr_Fetch() is of type RuntimeError, and traceback.format_exception()
does the right thing. If however we accidentally write code that's just
broken (e.g.: 'self.dneasdfasdf += 1'), the error value returned is not
an actual exception, it's just a string. So traceback.format_exception()
freaks out with something like "'str' object has no attribute '__cause__'"
(which we don't actually ever see in the logs), which in turn dumps us in a
"catch (error_already_set const &)" block, which just prints out the
single line error string.
https://docs.python.org/3/c-api/exceptions.html#c.PyErr_NormalizeException
tells us that "Under certain circumstances, the values returned by
PyErr_Fetch() below can be “unnormalized”, meaning that *exc is a class
object but *val is not an instance of the same class.". And that's exactly
the problem we're having here. We're getting a 'str', not an Exception.
Adding a call to PyErr_NormalizeException() turns the value back into a
proper Exception type and traceback.format_exception() now always does the
right thing.
I've also added calls to peek_pyerror() in the catch blocks, so if anything
else ever somehow causes traceback.format_exception to fail, we'll at least
have an idea of what it is in the log.
Fixes: https://tracker.ceph.com/issues/44799 Signed-off-by: Tim Serong <tserong@suse.com>
This commit adds a call to `ceph-facts` role in the first play of this
playbook. This is needed so `ceph-validate` won't fail because of
following error:
zhangdaolong [Tue, 24 Mar 2020 00:51:44 +0000 (08:51 +0800)]
pybind/rbd: fix no lockers are obtained, ImageNotFound exception will be output
No lockers are obtained, ImageNotFound exception will be output,
but tht image is always exist.when lockers number is zero,
Should not output any exceptions。
Kefu Chai [Tue, 31 Mar 2020 03:19:23 +0000 (11:19 +0800)]
test/crimson: increase variance of stdev to 0.20
per Mark Nelson,
> yeah, 5% variation is way too low
> Sometimes we can stay within 5%, but especially if we are pushing the
> system hard we can see closer to 10-20% sometimes.
Jason Dillaman [Thu, 2 May 2019 20:55:44 +0000 (16:55 -0400)]
librbd: re-add support for nautilus clients talking to jewel clusters
We want to support N - 3 client backward compatibility (special case
to support Jewel since it was a LTS release). The "get_snapshot_timestamp"
cls method does not exist in Jewel clusters so librbd should fallback
to excluding the op if it fails.
Note that this N - 3 also needs to apply for downstream releases as well,
which implies we still need Jewel for the time being.
Fixes: http://tracker.ceph.com/issues/39450 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit c644121820b83c97e68f9896393a45cd34787672)
Conflicts:
src/test/librbd/image/test_mock_RefreshRequest.cc: tweaked to support pool configs
Jason Dillaman [Wed, 20 Mar 2019 18:40:50 +0000 (14:40 -0400)]
librbd: ignore -EOPNOTSUPP errors when retrieving image group membership
The Luminous release did not support adding images to a group (it only
included the bare-minimum support for creating groups). Commit f76df32666b
incorrectly dropped support for ignoring this possible failure. This
prevents Nautilus-release clients from opening images contained within
a Luminous-release cluster.
Fixes: http://tracker.ceph.com/issues/38834 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 6f29dc69a0db3b6d982c95ab9d3b9b00a7029e37)
Sage Weil [Mon, 30 Mar 2020 13:24:59 +0000 (08:24 -0500)]
Merge PR #34248 into master
* refs/pull/34248/head:
qa/tasks/cephadm: no need to explicitly reconfig
qa/tasks/cephadm: fetch final ceph.conf
qa/tasks/cephadm: distribute ceph.conf and admin keyring to all nodes
Sage Weil [Sun, 29 Mar 2020 12:17:37 +0000 (07:17 -0500)]
Merge PR #34061 into master
* refs/pull/34061/head:
mgr/orch: Add `ceph orch ls --export`
mgr/dashboard: adapt to new `ServiceDescription.to_json()`
python-common: add `service_name` to `ServiceSpec.to_json`
python-common: make ServiceSpec and ServiceDescription compatible
src/ceph.in: add yaml to known formats
mgr/orch: add yaml to `orch ls`
mgr/orch: remove `orch spec dump`
python-common: reorder RGWSpec arguments
python-common: prevent ServiceSpec of wrong type
pybind/mgr: tox.ini: omit cov report
mgr/cephadm: test describe_service
mgr/orch: ServiceDescription: change json representation
mgr/orch: ServiceDescription: Make spec a requirement
Reviewed-by: Kiefer Chang <kiefer.chang@suse.com> Reviewed-by: Sage Weil <sage@redhat.com>
John Law [Sun, 29 Mar 2020 01:14:45 +0000 (03:14 +0200)]
doc: Fix inconsistency in logging settings
This patch fixes inconsistency in logging settings with options, namely `log_flush_on_exit` and `log_to_stderr`. This patch also adds `log_to_file` to the section.
Sage Weil [Sat, 28 Mar 2020 21:22:27 +0000 (16:22 -0500)]
mgr/DaemonServer: fetch metadata for new daemons (e.g., mons)
We fetch metadata for mon/osd/mds daemons on startup. If a new one comes
along *after* we start up, we need to fetch it on demand. Otherwise,
notably, we will ignore any new mons added to the cluster:
- mon.a starts
- mgr starts, fetchs a's metadata
- mon.b added
- mon.b tries top open mgr connection, and is rejected each time
This can lead to follow-on badness when the mon tries to proxy mgr
commands or do other things.
Fix by fetching metadata on demand, like we already do in the report
path.
Fixes: https://tracker.ceph.com/issues/44798 Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Fri, 27 Mar 2020 15:39:09 +0000 (10:39 -0500)]
qa/tasks/cephadm: distribute ceph.conf and admin keyring to all nodes
Revert part of 96220c0c0574eb5b896023e1552f528bef9e1ca5 so that we still
distribute a *final* ceph.conf and admin keyring to all nodes, right after
all of the mons are up.
Tatjana Dehler [Fri, 27 Mar 2020 14:58:42 +0000 (15:58 +0100)]
mgr/dashboard: do not fail on user creation
Accordingly to other Ceph commands do not fail on user
creation with a non-zero error code if the user already
exists.
Instead succeed and return the message 'User <username>
already exists'.
Fixes: https://tracker.ceph.com/issues/44502 Signed-off-by: Tatjana Dehler <tdehler@suse.com>