some of our centos7 jenkins builders are failing to build ceph master and
nautilus branches. because EPEL7 recently switched from python3.4 to
python3.6 as the native python3. see
https://lists.fedoraproject.org/archives/list/epel-announce@lists.fedoraproject.org/message/EGUMKAIMPK2UD5VSHXM53BH2MBDGDWMO/
and one of our BuildRequires, cmake3,
was offered by EPEL7. it also followed the python3.6 switch-over to
rebuild against python3.6. as a result, the cmake3-data-3.13.4-2.el7
started to depend on /usr/bin/python3.6, which is in turn offered by
python36 package. after installing python36 as a dependency of the
updated cmake3. but in cmake, we originally checks for the latest
python3 interpreter if WITH_PYTHON3 is enabled, that's why these
builders which happen to install these updated packages started to fail
when detecting the existence of python3.6 related build dependencies.
as a fix, in d1e83082,
python%{python3_pkgversion}-{devel,setuptools,Cython} are listed as
BuildRequires to reflect this change in EPEL7. before d1e83082, we
hardwired them to python34-*.
but as following analysis puts, there are cases where `yum-builddep`
is inconsistent with `rpmbuild`. as `yum-builddep` changes the how
`python3_pkgversion` and `python3_version` macros are expanded:
- none of the packages installed by `yum-builddep` installs the python3
related rpm macros, so the system stays with whatever python3 it was
using. in this case, `rpmbuild` won't complain, as the
`python3_pkgversion` and `python_version` are consistent before and
after `yum-builddep`.
- system has python3.4 installed before `yum-builddep`. but
`yum-builddep` installed python3.6 and also the updated
`python-rpm-macros` packages, which points `python3_version` and
`python3_pkgversion` to 3.6 and 36 respectively. in this case,
`rpmbuild` will complain, because when we run `yum-builddep`,
`python3_version` was still "3.4".
- system does not have python3 installed before `yum-builddep`. so
it was using python34 for preparing the "BuildRequires". but some
of the packages installed by `yum-builddep` installs python36, and
also the updated `python-rpm-macros` packages, which points
`python3_version` and `python3_pkgversion` to 3.6 and 36 respectively.
in this case, `rpmbuild` will complain, because the python36 related
dependencies are missing. what the system has is python34
dependencies.
- system does not have python3 installed before `yum-builddep`. so
it was using python34 for preparing the "BuildRequires". but some
of the packages installed by `yum-builddep` installs python34, and
also the updated `python-rpm-macros` packages, which points
`python3_version` and `python3_pkgversion` to 3.4 and 34 respectively.
in this case, `rpmbuild` won't complain, as the
`python3_pkgversion` and `python_version` are also consistent before and
after `yum-builddep`.
as we cannot tell if the system has python3 or what the python3 version
the system has before `yum-builddep`, so what we can do is to ensure
`rpmbuild` has what it needs to build Ceph. so let's just stick with
python3.6.
doc: Update documentation for the MANY_OBJECTS_PER_PG warning
The current documentation for the MANY_OBJECTS_PER_PG warning
states that The threshold can be raised to silence the health
warning by adjusting the mon_pg_warn_max_object_skew config
option on the monitors. It seems that this is not true (at least)
since the luminous times, and this option should be adjusted on
the managers.
I encountered this problem and I spend quite sometime injecting
the mon_pg_warn_max_object_skew to the monitors, added the option
ceph.conf and restarted the monitors several times but the warning
was not going away. I had to download the code to see what's
happening and I found out this:
$ git grep -A 3 mon_pg_warn_max_object_skew src/common/options.cc
src/common/options.cc:1480: Option("mon_pg_warn_max_object_skew", Option::TYPE_FLOAT, Option::LEVEL_ADVANCED)
src/common/options.cc-1481- .set_default(10.0)
src/common/options.cc-1482- .set_description("max skew few average in objects per pg")
src/common/options.cc-1483- .add_service("mgr"),
After I restarted the ceph-mgr service, the warning went away.
rgw: limit entries in remove_olh_pending_entries()
If there are too many entries to send in a single osd op, the osd rejects
the request with EINVAL. This error happens in follow_olh(), which means
that requests against the object logical head (requests with no version
id) can't be resolved to the current object version. In multisite, this
also causes data sync to get stuck in retries
Yingxin Cheng [Thu, 28 Feb 2019 08:40:24 +0000 (16:40 +0800)]
crimson/net: skeleton code for ProtocolV2 logic
crimson ProtocolV2 class is following a state-machine design style:
* states are defined in ProtocolV2::state_t;
* call `execute_<state_name>()` methods to trigger different states;
* V2 logics are implemented in each execute_<state_name>() methods, and
with explicit transitions to other states at the end of the execute_*;
* each state is associated with a write state defined in Protocol.h:
- none: not allowed to send;
- delay: messages can be queued, but will be delayed to send;
- open: dispatch queued message/keepalive/ack;
- drop: not send any messages, drop them all.
crimson ProtocolV2 is alike async ProtocolV2, with some considerations:
* explicit and encapsulated client/server handshake workflow.
* futurized-exception-based fault handling, which can interrupt protocol
workflow at any time in each state.
* introduced SERVER_WAIT state, meaning to wait for peer-client's socket
to reset or be replaced, and expect no further reads.
* introduced an explicit REPLACING state, async-msgr would be at the
NONE state during replacing.
Yingxin Cheng [Tue, 19 Mar 2019 14:26:59 +0000 (22:26 +0800)]
test/crimson: improved perf tool for crimson-msgr
New features:
* --jobs: start multiple client messengers from core #1 ~ #jobs;
* --core: can assign server core to get away from busy client cores;
* --rounds: a client will send <rounds>/<jobs> messages;
Improved:
* Better configuration report;
* Report individual client results plus a summary;
* Validate if CPU number is sufficient before running;
* Sleep 1 second while connecting, so it won't hurt performance;
* Simplify client logic and bug fixes;
Removed unecessary features:
* finish_decode() for MOSDOp;
* keepalive ratio;
Boris Ranto [Thu, 4 Apr 2019 20:00:55 +0000 (22:00 +0200)]
cmake: check for MAJOR.MINOR version of python3
We can only check for MAJOR.MINOR version of python3 since
FindPython3Libs does not support checking for MAJOR.MINOR.PATCH version
of python3. We also need to make sure we use the PYTHON3 versions of
these variables.
This should fix a regression introduced by c961e00.
* refs/pull/27303/head:
mds: add Octopus cephfs feature bit
mds: verify current release has a cephfs feature bit
mds: add cephfs feature bit for Nautilus
mon: generalize and refactor lambda use
qa: test featureful client with mimic base
qa: remove obsolete snap upgrade tests
to force cmake to use the python3 and python3 modules for building
python3 bindings
on the debian side, it's okay to continue using "-DWITH_PYTHON3=ON", as
- cmake does normalize "ON" to 3
- debian's cmake extension lives on /usr/lib/python3/dist-packages/
not in a specific /usr/lib/python3.x/dist-packages directory
use might have multiple python3 installed, some of them has/have all
dependencies installed and is good enough for building Ceph. we should
not always use the latest python installed in the system and complain that
there is missing dependencies, even if user has installed all the
python3 dependencies for the older python3.
put in other words, if user only installs cython module for python3.4, but
she has both python3.6 and python3.4 in her system. we should not force
her to uninstall python3.6 for installing Ceph.
this change also aligns with MGR_PYTHON_VERSION. i am not applying the
same change to WITH_PYTHON2, because python2 is already stablized. and distros
are not likely to release new python2 releases.
xie xingguo [Mon, 3 Dec 2018 08:54:19 +0000 (16:54 +0800)]
osd: automatically repair replicated replica on pulling error
However this is not a very complete solution since the broken object
info may still get lost if we switch primaries or simply power off nodes.
I think a better idea would be also adding these kind of broken objects
back into replica's own missing set simultaneously, e.g., like we handling
primary reading errors.
But for now I am not sure if that should be a concern?
Sage Weil [Wed, 3 Apr 2019 19:54:55 +0000 (14:54 -0500)]
common/common_init: start log from common_init_finish (if not yet started)
This captures any non-global_init users who created their cct but haven't
started up the log thread yet. As long as common_init_finish() happens
after we have all of our config options (from the mon config or whatever),
we will log (or not log) to the right location(s).