Kefu Chai [Sat, 9 Jun 2018 07:15:01 +0000 (15:15 +0800)]
osd: always set query_epoch explicitly for MOSDPGLog
it's a follow-up change of 339ae18b. also remove the MOSDPGLog
contructor where query_epoch is optional. it's less error-prone if we
make this parameter mandatory.
Sage Weil [Sat, 9 Jun 2018 01:45:08 +0000 (20:45 -0500)]
messages/MOSDPGScan: encode map_epcoh for query_epoch for pre-nautilus peers
Pre-nautilus OSDs do not create last_peering_reset reliably (due to not
having the previous commit's fix in place). Avoid breaking them during
an upgrade by encoding the map_epcoh in place of the query_epoch.
Sage Weil [Fri, 8 Jun 2018 12:29:31 +0000 (07:29 -0500)]
osd/PG: create new PGs from activate in last_peering_reset epoch
If we create a new PG (e.g., a backfill target) in the current epoch, it
might be > last_peering_reset. That can lead to last_peering_reset on
the replica having a higher last_peering_reset than the primary's, which
can then lead to future messages, like pg_scan during backfill, being
ignored.
Fixes: http://tracker.ceph.com/issues/24452 Signed-off-by: Sage Weil <sage@redhat.com>
Kefu Chai [Fri, 25 May 2018 04:34:33 +0000 (12:34 +0800)]
denc: specialize for denc(const T&, size_t&, uint64_t)
otherwise GCC complains that 'unsigned long int' is not a class or
namespace when trying to materialize is_const_iterator_v<>. not sure why
SFINAE does not work here, though.
Kefu Chai [Fri, 23 Feb 2018 06:45:10 +0000 (14:45 +0800)]
cmake: update for accomodate seastar
* add unit_test_framework for appease seastar's find_package() call,
even we don't build seastar's tests
* some seastar functions declare their return value like:
const size_t str_len(...). and GCC does not like the "const" in it.
so silence it
Stephan Müller [Fri, 1 Jun 2018 15:11:35 +0000 (17:11 +0200)]
mgr/dashboard: Resolve TestBed performance issue
With this helper function you can easily resolve the TestBed resetting
performance issue. If more tests exists in a test suite, it makes sense
to configure TestBed only once if you are not doing a lot of TestBed
specific stuff (haven't hit the limitation). It will reduce the test
run time by around $tests * 50 %. In my case it was a test suite with
47 tests with a run time of over 30s after using the static test bed
method it ran in 1.2s. The run time was reduced to 0.04 %! This is
equivalent to a speed increase of 2500% (100/0.04)!
For our own security the normal way will be taken if you not
set the _DEV_ configuration variable to true. It will be false when
"run-frontend-unittests.sh" is run.
Sage Weil [Thu, 7 Jun 2018 13:33:46 +0000 (08:33 -0500)]
osd/PG: normalize query processing in Stray and ReplicaActive
A stray PG may end up in ReplicaActive if it is participating in backfill.
However, whether it is or isn't, we should treat queries the same.
Otherwise we end up with weird behaviors like:
- osd's stray pg moves to ReplicaActive (gets info+log from primary)
- osd goes down and back up
- primary restarts peering, request FULLLOG to find missing objects
- osd ignores FULLLOG because it is ReplicaActive and not Stray
Fixes: http://tracker.ceph.com/issues/24373 Reported-by: Kouya Shimura <kouya@jp.fujitsu.com> Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 7 Jun 2018 12:07:19 +0000 (07:07 -0500)]
osd/PG: reset PG peering if osd transitions from down -> up
Consider a PG that is stray and ends up in ReplicaActive (because it is
participating as a recovery source). If it is marked down wrongly and
then comes back up, then the PG will not reset, because there was not
an interval change (the PG is not part of the up or acting sets).
This can leave the PG in an odd state, leading to questionable behavior.
(For example, a stray might be in ReplicaActive and then ignore some
types of query messages.)
ceph-volume goes to great lengths to ensure that the symlinks in the
osd dir are accurate. Having these values here is an opportunity to
get them out of sync. And that can happen very easily if the initial
mkfs was performed using a /dev/sdX device name (which is unstable
across reboots). Even after ceph-volume corrects the symlink, bluestore
will continue to use the stale device path.
Kefu Chai [Wed, 6 Jun 2018 02:27:38 +0000 (10:27 +0800)]
cmake: find liboath using the correct name
we should reference liboath by the $name in Find${name}.cmake, also the
$name should be consistent when calling find_package_handle_standard_args().
in this change
* rename Findliboath.cmake to FindOATH.cmake to be consistent with other
find_package() moduless.
* use "OATH" in find_package_handle_standard_args() instead of "oath"
* set the interface properties for OATH::OATH, so the target linking
against it can reference its header directories and libraries automatically.
* remove the stale comment for find_package_handle_standard_args()
* set OATH_INCLUDE_DIRS and OATH_LIBRARIES to follow the convention of
find_package(), even they are not used directly in this project.
Lenz Grimmer [Wed, 6 Jun 2018 10:17:31 +0000 (12:17 +0200)]
Merge pull request #21644 from p-na/grafana-proxy
mgr/dashboard: Grafana proxy backend
Reviewed-by: Laura Paduano <lpaduano@suse.com> Reviewed-by: Ricardo Dias <rdias@suse.com> Reviewed-by: Ricardo Marques <rimarques@suse.com> Reviewed-by: Sebastian Wagner <swagner@suse.com> Reviewed-by: Tatjana Dehler <tdehler@suse.com>
Sage Weil [Tue, 5 Jun 2018 22:30:14 +0000 (17:30 -0500)]
mon: add 'osd destroy-new' command that only destroys NEW osd slots
ceph-volume may run into a problem and want to clean up, but we do not
want to give it blanket access to the 'osd destroy' command. Instead,
make an 'osd destroy-new' that can only create new OSDs (ones that are
in the process of being created but have never booted yet).
Sage Weil [Tue, 5 Jun 2018 21:25:28 +0000 (16:25 -0500)]
Merge PR #22371 into master
* refs/pull/22371/head:
doc/conf.py: fix man page build vs governance.rst
doc/governance: adjust title
doc/governance: fix link
doc/governance: edits and add user committee
doc/governance.rst: a few notes on ceph project governance
Reviewed-by: Lenz Grimmer <lgrimmer@suse.com> Reviewed-by: João Eduardo Luís <joao@suse.de>
Sage Weil [Mon, 4 Jun 2018 17:51:11 +0000 (12:51 -0500)]
osd/PrimaryLogPG: fix on_local_recover crash on stray clone
If there is a stray clone (one that does not appear in the SnapSet) and
we do any sort of recovery on it the OSD will crash. Log an error instead
but continue.
This addresses a problem where a cluster has both (1) an unexpected clone
and (2) the clone is not present on all replicas. Doing repair on that
PG will both not fix the unexpected clone and also cause the remaining
OSDs to crash trying to recover it.
Include a test.
Fixes: https://tracker.ceph.com/issues/24396 Signed-off-by: Sage Weil <sage@redhat.com>