xie xingguo [Fri, 16 Nov 2018 06:56:59 +0000 (14:56 +0800)]
osd: fix heartbeat brain-split behaviour
Yet another similar issue as 8d8e8a359c66b5767be6a4a2327c5f7097885464.
To reproduce, construct a cluster with 3 hosts, each containing a single osd only:
- cut off osd.1's cluster network, waiting osd.1 to be marked as down
- cut off both osd.2 & osd.3's cluster network
It is possible we'll get __two__ down osds (e.g., both osd.1 & osd.2 are down)
now and then restore osd.1 and osd.2's cluster network won't change anything.
The root cause is that by default we always call for at least 1/3 active heartbeat
connections with all current __up__ osds to bring a previously dead (unhealthy)
osd back to life. However, it is possible that the __up__ set could be the
minority part that has been cut off from the rest of the cluster entirely and hence
cause brain-split behaviour as demonstrated above.
The simplest way to fix is to try to re-activate an unhealthy osd whenever
we are still safe to do so. Also please keep in mind that frequent up-to-down
transitions will kill off the osd process entirely, and that is why the
```osd_markdown_log``` related checking is needed here..
xie xingguo [Fri, 16 Nov 2018 06:54:39 +0000 (14:54 +0800)]
osd: cancel pending failure reports on re-activating osd
To reproduce, construct a cluster with 3 hosts, each containing a single
osd only:
- cut off osd.1's cluster network, waiting osd.1 to be marked as down
- cut off both osd.2 & osd.3's cluster network
```
Note that there are two possible outputs for the above step:
1. osd.1's failure reports get ignored by monitor as osd.1 has already been
marked as down. Osd.2 & osd.3 stay __up__ as a result.
2. osd.1's failure reports are considered as valid. Either osd.2 or osd.3 is
marked as __down__.
We consider case __2__ only here.
```
- restore osd.1 & osd.2's cluster network
Now you get __3__ up osds.
The root cause is that monitor will simply discard any failure reports from
dead osds, whereas osds never re-send pending failure reports unless they are
re-connecting to monitors.
Fix by cancelling any pending failure reports each time an osd is
transiting from dead to active *again*.
hsiang41 [Wed, 7 Nov 2018 14:05:35 +0000 (22:05 +0800)]
mgr: Separate diskprediction cloud plugin from the diskprediction plugin
Separate diskprediction local cloud from the diskprediction plugin.
Devicehealth invoke device prediction function related on the global
configuration "device_failure_prediction_mode".
Signed-off-by: Rick Chen <rick.chen@prophetstor.com>
Kefu Chai [Thu, 15 Nov 2018 05:56:19 +0000 (13:56 +0800)]
tools/ceph_kvstore_tool: do not open rocksdb when repairing it
before this change, the `need_open_db` parameter is passed to the
constructor of BlueStore as `min_alloc_size`. and rocksdb will fail to
repair because Repairer::Run() also tries to acquire the db lock, and it
will fail to do so if the lock file is already acquired by
BlueStore::_mount().
Kefu Chai [Thu, 15 Nov 2018 01:47:27 +0000 (09:47 +0800)]
qa: use FOUND_VAR to be backward compatible with cmake 2.8.12
before this change, we assume that the variable set if rados::radospp is
found will be radospp_FOUND, but this is a feature cmake 3, see
https://cmake.org/cmake/help/v3.3/module/FindPackageHandleStandardArgs.html
while the cmake shipped by centos is cmake 2.8.12, where the variable
name will be <UPPERCASED_NAME>_FOUND, see
https://cmake.org/cmake/help/v2.8.12/cmake.html#module:FindPackageHandleStandardArgs
in the test of test_envlibrados_for_rocksdb.sh, we are using cmake not
the cmake3 offered by EPEL7, so RADOSPP_FOUND will be set instead. that's why
executable env_librados_test will fail to link against rados::radospp.
as rados::radospp won't be defined if radospp_FOUND is not defined/set.
after this change, the 2nd mode of FIND_PACKAGE_HANDLE_STANDARD_ARGS()
is used instead to ensure that radospp_FOUND is defined even if cmake
2.8.12 is used.
also, the message() commands for debugging purpose are removed.
Kefu Chai [Thu, 15 Nov 2018 03:34:29 +0000 (11:34 +0800)]
qa/suites: add librados2 to "extra_packages" for upgrade tests
we use the playbook of "testnodes.yml" defined by ceph-cm-ansible for
initializing test nodes, and the role of "testnode" is used by
testnodes.yml. "testnode" requires "qemu-system-x86" or "qemu-kvm"
package to be installed. the qemu in turn depends on librbd1 and
librados2.
before librados3 was introduced, this worked perfectly. because in ceph
repo, qa/packages/packages.yaml defines the default set of packages the
"install" tasks should install. and in that yaml file, librados2 was
listed. so the package management system will overwrite the librados2
installed by ansible playbook with the version specified by the
"install" task, as apt/yum thinks this is what user requires explicitly,
so it's fine to install a different version of librados2.
after librados3 was introduced, librados2 was removed from
qa/packages/packages.yaml. because, by default, we need to install
librados3 instead of librados2 for ready a nautilus cluster. but the
problem is, the packge list also applies to "install" tasks installing
releases before nautilus, where we still need to replace the librados2
installed by ansible.
so, to address this issue, "librados2" is added to "extra_packages" of
the "install" tasks of tests installing old releases to install
librados2 explicitly instead of as a dependency of other ceph packages
like librbd1.
Kefu Chai [Tue, 13 Nov 2018 08:45:10 +0000 (16:45 +0800)]
os/tests: silence -Wsign-compare warning
silence warning like
In file included from
/home/jenkins-build/build/workspace/ceph-pull-requests-arm64/src/test/objectstore/store_test.cc:25:0:
/home/jenkins-build/build/workspace/ceph-pull-requests-arm64/src/googletest/googletest/include/gtest/gtest.h:
In instantiation of 'testing::AssertionResult
testing::internal::CmpHelperEQ(const char*, const char*, const T1&,
const T2&) [with T1 = int; T2 = long unsigned int]':
Patrick Donnelly [Tue, 13 Nov 2018 21:03:02 +0000 (13:03 -0800)]
Merge PR #24490 into master
* refs/pull/24490/head:
mds: flush dirty dirfrags that weren't logged when deactivating mds
mds: use MDlog::trim_all() to trim log when deactivating mds
mds: don't cap log when there are replicated objects
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Alfredo Deza [Mon, 12 Nov 2018 17:57:26 +0000 (12:57 -0500)]
ceph-volume tests patch Device() by splitting parametrized method
This was causing failures on systems where there is no LVM or where the
device names don't match. Patching is always recommended to avoid
conflicts with the system testing
Jerry Lee [Mon, 12 Nov 2018 06:27:53 +0000 (14:27 +0800)]
ceph-mgr: hold lock while accessing the request list and submitting request
The request creation can fire up the notify event early and it can cause
a race condition where the actual request was not yet added to the
self.requests list which makes the submit_request() function waits
forever without accepting new requests.
Kefu Chai [Sat, 10 Nov 2018 21:33:43 +0000 (13:33 -0800)]
install-deps: install setuptools before upgrading virtualenv
this should address the failures when running install-deps.sh, like
Downloading/unpacking virtualenv
Running setup.py egg_info for package virtualenv
/usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown
distribution option: 'python_requires'
warnings.warn(msg)
error in virtualenv setup command: 'extras_require' must be a
dictionary whose values are strings or lists of strings containing valid
project/version requirement specifiers.
Complete output from command python setup.py egg_info:
/usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown
distribution option: 'python_requires'
warnings.warn(msg)
error in virtualenv setup command: 'extras_require' must be a dictionary
whose values are strings or lists of strings containing valid
project/version requirement specifiers.
this only happens on very old virtualenv shipped with RHEL7.4
Jason Dillaman [Tue, 30 Oct 2018 01:55:54 +0000 (21:55 -0400)]
librbd: new pool init/stat API methods
The init method is a stub for handling new pool initialization. It
currently only handles setting the application tag. The stats method
will quickly calculate the number of images and provisioned space for
those images within the pool. Querying the pool stats on a pool with
10,000 images only required approximately 2 seconds as compared to
over 2 minutes for a "rbd ls -l" scan.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Sage Weil [Fri, 9 Nov 2018 14:37:10 +0000 (08:37 -0600)]
Merge PR #24925 into master
* refs/pull/24925/head:
Avoid import _strptime failed
Avoid exception if remote plugin not enabled
Separate diskprediction local plugin from the diskprediction plugin
Jan Fajerski [Wed, 31 Oct 2018 13:59:05 +0000 (14:59 +0100)]
ceph-volume: add inventory command
The inventory command provides information about a nodes disk inventory.
Existing logical volumes on a disk or one of its partitions are scanned
and reported.
The output can be formatted as plain text or json.
Matthew Vernon [Thu, 8 Nov 2018 17:23:36 +0000 (17:23 +0000)]
debian: correct ceph-common relationship with older radosgw package
Fixes: https://tracker.ceph.com/issues/36741 9fd30b93f7281fad70b93512f0a25e3465f5b225 moved
/etc/bash_completion.d/radosgw-admin from radosgw to ceph-common. This
means that if you try and install a newer ceph-common over an older
radosgw, there's a conflict, and the install fails:
```
Unpacking ceph-common (12.2.8-1xenial) over (10.2.9-0ubuntu0.16.04.1) ...
dpkg: error processing archive ceph-common_12.2.8-1xenial_amd64.deb (--install):
trying to overwrite '/etc/bash_completion.d/radosgw-admin', which is also in package radosgw 10.2.9-0ubuntu0.16.04.1
```
Per Debian policy (
https://www.debian.org/doc/debian-policy/ch-relationships.html#overwriting-files-in-other-packages
) the correct way to handle a package taking over a file is for a
versioned Replaces and Breaks.
The change went into 12.0.3, so this commit adds Replaces and Breaks
against radosgw less than that version. It should be backported to
Luminous to avoid issues with upgrades from older versions (Jewel and
Kraken).