qa/tasks/ceph_deploy: install python3.6 instead of python3.4 for py3 tests
EPEL7 has switched over to python3.6 as the main python3. and we started
packaging python bindings for python3.6 since
https://github.com/ceph/ceph-build/pull/1283
rpm: add "Provides: python3-*" for python packages
so user can install python3-rados, instead of python36-rados, without
specifying the minor version of python. also, we should not break our
teuthology tests with this naming scheme change. for instance, our
cephfs qa suite installs `python3-cephfs` for testing the `cephfs-shell`
some of our centos7 jenkins builders are failing to build ceph master and
nautilus branches. because EPEL7 recently switched from python3.4 to
python3.6 as the native python3. see
https://lists.fedoraproject.org/archives/list/epel-announce@lists.fedoraproject.org/message/EGUMKAIMPK2UD5VSHXM53BH2MBDGDWMO/
and one of our BuildRequires, cmake3,
was offered by EPEL7. it also followed the python3.6 switch-over to
rebuild against python3.6. as a result, the cmake3-data-3.13.4-2.el7
started to depend on /usr/bin/python3.6, which is in turn offered by
python36 package. after installing python36 as a dependency of the
updated cmake3. but in cmake, we originally checks for the latest
python3 interpreter if WITH_PYTHON3 is enabled, that's why these
builders which happen to install these updated packages started to fail
when detecting the existence of python3.6 related build dependencies.
as a fix, in d1e83082,
python%{python3_pkgversion}-{devel,setuptools,Cython} are listed as
BuildRequires to reflect this change in EPEL7. before d1e83082, we
hardwired them to python34-*.
but as following analysis puts, there are cases where `yum-builddep`
is inconsistent with `rpmbuild`. as `yum-builddep` changes the how
`python3_pkgversion` and `python3_version` macros are expanded:
- none of the packages installed by `yum-builddep` installs the python3
related rpm macros, so the system stays with whatever python3 it was
using. in this case, `rpmbuild` won't complain, as the
`python3_pkgversion` and `python_version` are consistent before and
after `yum-builddep`.
- system has python3.4 installed before `yum-builddep`. but
`yum-builddep` installed python3.6 and also the updated
`python-rpm-macros` packages, which points `python3_version` and
`python3_pkgversion` to 3.6 and 36 respectively. in this case,
`rpmbuild` will complain, because when we run `yum-builddep`,
`python3_version` was still "3.4".
- system does not have python3 installed before `yum-builddep`. so
it was using python34 for preparing the "BuildRequires". but some
of the packages installed by `yum-builddep` installs python36, and
also the updated `python-rpm-macros` packages, which points
`python3_version` and `python3_pkgversion` to 3.6 and 36 respectively.
in this case, `rpmbuild` will complain, because the python36 related
dependencies are missing. what the system has is python34
dependencies.
- system does not have python3 installed before `yum-builddep`. so
it was using python34 for preparing the "BuildRequires". but some
of the packages installed by `yum-builddep` installs python34, and
also the updated `python-rpm-macros` packages, which points
`python3_version` and `python3_pkgversion` to 3.4 and 34 respectively.
in this case, `rpmbuild` won't complain, as the
`python3_pkgversion` and `python_version` are also consistent before and
after `yum-builddep`.
as we cannot tell if the system has python3 or what the python3 version
the system has before `yum-builddep`, so what we can do is to ensure
`rpmbuild` has what it needs to build Ceph. so let's just stick with
python3.6.
Boris Ranto [Thu, 4 Apr 2019 20:00:55 +0000 (22:00 +0200)]
cmake: check for MAJOR.MINOR version of python3
We can only check for MAJOR.MINOR version of python3 since
FindPython3Libs does not support checking for MAJOR.MINOR.PATCH version
of python3. We also need to make sure we use the PYTHON3 versions of
these variables.
This should fix a regression introduced by c961e00.
Sage Weil [Wed, 10 Apr 2019 21:49:17 +0000 (16:49 -0500)]
Merge PR #27387 into nautilus
* refs/pull/27387/head:
mgr/pg_autoscaler: apply bias to pg_num selection
mgr/pg_autoscaler: include pg_autoscale_bias in autoscale-status table
osd/osd_types,mon: add pg_autoscale_bias pool property
Sage Weil [Wed, 10 Apr 2019 21:48:16 +0000 (16:48 -0500)]
Merge PR #27440 into nautilus
* refs/pull/27440/head:
os/filestore/FileJournal: note EIO events
os/filestore: make note of EIO errors when we see them
os/filestore: note devname for later use
global/signal_handler: avoid core dump on EIO
os/bluestore/KernelDevice: note EIO metadata on aio EIO
global: add hook to annotate crash report with EIO information
After PR https://github.com/ceph/ceph/pull/26572, when RGW is not
configured, accessing /rgw drop-down (daemons, users or buckets)
results in nothing apparently happening (not even an error).
Under the curtains, what is happening is that the ModuleStatusGuard
has redirected the route to the rgw/501, but as this route is now
under parent rgw route handler, which sets CanActivateChild guards,
this results in a new ModuleStatusGuard invokation, a subsequent
failure and a new redirection to rgw/501.
Several approaches could be taken here:
- Remove error pages from lazy-loaded modules. Probably it does not
make sense to have a 501 page per component.
- Add some whitelist to avoid this kind of loop (e.g.: 501, or any
error page).
- Set a max number of redirections (cautionary measure).
Ernesto Puerta [Tue, 26 Mar 2019 18:01:01 +0000 (19:01 +0100)]
mgr/dashboard: unify button/URL actions naming
- Mappings (actually an Enum) created for actions (buttons and other UI elements) and URLs: ActionLabels and URLVerbs.
- An alternative would be to fix/improve the current i18n-polyfill, which only works with literal strings (not even with 'const enums' which become literals after Typescript transpiling).
- Additionally having a predefined file with some strings to translate (actions, verbs, etc) could improve on the 1st of the 2-stage i18n process (as extraction tool has a lot of limitations).
- A corresponding ActionLabelsI18n service with translated labels (it's a service as I haven't found the way to either translate no-const strings (ngx-translate/AST parser failure) or get a static translator).
- This services could/should be extended to cover all strings that are defined in static/globally scoped objects before any I18n provider has been initialized.
- Breadcrumbs are not translated (neither were they before this change). This part remains untackled: using 'proxy' static objects and performing live translation could deal with the issue.
- New URLBuilder service created (following a established pattern in the Java/.NET world) . This should avoid the need of messing with literal URLs and string composition/parsing, and while the front-end is not meant to be consumed by anyone, Angular does not provide any other way for the app to navigate between components, so the URLs are a de-facto interface contract. Unlike this approach is not flawless, it's easier to enforce, while issues coming from free-from strings are really hard to catch.
- This could be further improved by using a router registry/dynamic routing. Most of the routes are trivial.
- As a side effect of these changes, routing module has been refactored and some routes moved to their specific modules (pool, rbd, rgw), via loadChildren and routes.forChild() magic. Now the above mentioned components are lazy-loaded/pre-loaded (it means right after the main code is loaded). This should also decrease the loading time (though probably this is not biggest time eater here).
- As now modules can be loaded multiple times, not only from App module by means of lazy loading, but also from other ones (as PoolModule loads BlockModule to get QoS widgets in Pool windows), now lazy loaded modules include 2 NgModules (one with imports: RouterModule.forChild(routes), meant for lazy-loading, and another without routes).
- Caveat: Some parts might not be (fully) translated (NFS, iSCSI, mirroring), as there's been ongoing work on them and it's hard to keep up with the new code.
These changes will be a waste of time if the new code does not take benefit from/adheres to it, so I'm still figuring out how to spread this (nothing really fancy to demo). Maybe adding some checks/harnessing to enforce the new naming convention (ideas greatly welcome here).
Sage Weil [Sun, 7 Apr 2019 18:54:59 +0000 (13:54 -0500)]
common: add --log-early command line option
Sometime it is important and useful to see the logs from the bootstrap
phase where we are getting the initial configs from the monitors. Add
a command-line option --log-early to do that.
Sage Weil [Thu, 4 Apr 2019 20:48:40 +0000 (15:48 -0500)]
os/filestore: make note of EIO errors when we see them
This is imprecise, since we can't (easily) map an EIO back to a specific
part of the device, or even (easily) tell whether it was a read or write
error. It's enough to mark a crash dump as an EIO event, though, and to
include the name of the (primary) filestore device.
Sage Weil [Thu, 4 Apr 2019 19:47:17 +0000 (14:47 -0500)]
os/bluestore/KernelDevice: note EIO metadata on aio EIO
Note that we only do this if we're about to induce a crash. If we can
pass EIO up the stack, it's up to the upper layer to handle it or trigger
its own crash if it can't.
Sage Weil [Fri, 5 Apr 2019 13:59:23 +0000 (08:59 -0500)]
OSD: OSDMapRef access by multiple threads is unsafe
we update OSD::osdmap in OSD::_committed_osd_maps() which is executed
by objectstore's finisher thread. while PG::sched_scrub() is called
by OSD's sharded work queue's worker thread.and we push the osdmap
updates down to PGs OSD::consume_map() which is in turn called by
OSD::_committed_osd_maps() where osdmap is updated. so it does not big
deal if we are checking a stale CEPH_OSDMAP_NODEEP_SCRUB flag.
also this flag will be updated with the latest osdmap very soon.
Sage Weil [Tue, 2 Apr 2019 21:50:08 +0000 (16:50 -0500)]
mon/MonmapMonitor: clean up empty created stamp in monmap
Some old clusters have an empty created timestamp. This is mostly
harmless, but it is confusing/wrong, and it does currently break the
telemetry module with errors like
ValueError: time data '0.000000' does not match format '%Y-%m-%d %H:%M:%S.%f'
from 'ceph telemetry show'.
If we detect an empty created stamp, look at old monmap and use the oldest
modified stamp we can find.
Jason Dillaman [Tue, 2 Apr 2019 20:34:56 +0000 (16:34 -0400)]
test/librados_test_stub: ensure the log flusher thread is started
Recent changes merged in cd6a5b9c40779956629803f222c365bdb291a169
resulted in the logger flusher thread never being started for
librados_test_stub-derived unit tests.
Stephan Müller [Wed, 20 Feb 2019 11:26:44 +0000 (12:26 +0100)]
mgr/dashboard: Fixes tooltip behavior
The problem was that the tool tip element was added to the current parent
element which caused the CSS to make the last
button in a button group look like the fore last button as a rectangle
but the last element should have a rounded corner.
Fixes: https://tracker.ceph.com/issues/38932 Signed-off-by: Stephan Müller <smueller@suse.com>
(cherry picked from commit 4b23b78)
Volker Theile [Tue, 12 Mar 2019 13:12:55 +0000 (14:12 +0100)]
mgr/dashboard: Add separate option to config SSL port
There is a need to introduce this new config option because the MgrModule::get_module_option() and MgrModule::get_localized_module_option() method will be refactored soon and will not support the default parameter anymore. Instead the default value must be configured in the MODULE_OPTIONS. Currently we misuse the server_port depending on if SSL is enabled or not.
Stephan Müller [Thu, 21 Feb 2019 10:53:46 +0000 (11:53 +0100)]
mgr/dashboard: Make preventDefault work with 400 errors
The problem was that, if a error with the status code 400 was
received by the error interceptor the "timeoutId" was not tracked,
therefor "preventDefault" didn't prevent anything as "timeoutId"
was undefined.
Fixes: https://tracker.ceph.com/issues/38418 Signed-off-by: Stephan Müller <smueller@suse.com>
(cherry picked from commit 5aa984cc6c5a737e2dfcc7806f0fe48d1b41d1c5)
Sage Weil [Wed, 3 Apr 2019 19:54:55 +0000 (14:54 -0500)]
common/common_init: start log from common_init_finish (if not yet started)
This captures any non-global_init users who created their cct but haven't
started up the log thread yet. As long as common_init_finish() happens
after we have all of our config options (from the mon config or whatever),
we will log (or not log) to the right location(s).
Sage Weil [Mon, 25 Mar 2019 11:39:28 +0000 (06:39 -0500)]
mgr/pg_autoscaler: apply bias to pg_num selection
This is a relatively naive way to apply the bias: we just multiply it
to whatever we would have chosen. A more clever approach would be to
factor this into the overall cluster-wide PG budget, so that biasing one
pool's PGs up would put downward pressure on other pools. That is
significantly more complicated, however, and (I think) not worth the
effort.
to force cmake to use the python3 and python3 modules for building
python3 bindings
on the debian side, it's okay to continue using "-DWITH_PYTHON3=ON", as
- cmake does normalize "ON" to 3
- debian's cmake extension lives on /usr/lib/python3/dist-packages/
not in a specific /usr/lib/python3.x/dist-packages directory
use might have multiple python3 installed, some of them has/have all
dependencies installed and is good enough for building Ceph. we should
not always use the latest python installed in the system and complain that
there is missing dependencies, even if user has installed all the
python3 dependencies for the older python3.
put in other words, if user only installs cython module for python3.4, but
she has both python3.6 and python3.4 in her system. we should not force
her to uninstall python3.6 for installing Ceph.
this change also aligns with MGR_PYTHON_VERSION. i am not applying the
same change to WITH_PYTHON2, because python2 is already stablized. and distros
are not likely to release new python2 releases.