Patrick Donnelly [Wed, 21 Aug 2019 17:57:15 +0000 (10:57 -0700)]
Merge PR #28378 into master
* refs/pull/28378/head:
qa/tasks: introduce Thrasher base class
qa/tasks: Fix typo
qa/tasks: manage thrashers
qa/tasks: start DaemonWatchdog when ceph starts
qa/tasks: make watch and bark handle more daemons
qa/tasks: move DaemonWatchdog to new file
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Jos Collin [Mon, 5 Aug 2019 10:52:10 +0000 (16:22 +0530)]
qa/tasks: introduce Thrasher base class
* Introduced a Thrasher base class.
* Updated thrashers to inherit from Thrasher.
* Replaced the magic variable e with Thrasher.exception as per the discussion.
Now the exception variable sets by default as the thrashers are inheriting
from the Thrasher class.
Fixes: https://github.com/ceph/ceph/pull/28378#discussion_r309337928 Fixes: https://tracker.ceph.com/issues/41133 Signed-off-by: Jos Collin <jcollin@redhat.com>
Nathan Cutler [Tue, 20 Aug 2019 14:23:06 +0000 (16:23 +0200)]
Merge pull request #29743 from smithfarm/wip-ceph-backport-https
script/ceph-backport.sh: carry https through to logical conclusion
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Abhishek Lekshmanan <abhishek@suse.com> Reviewed-by: Willem Jan Withagen <wjw@digiware.nl>
Nathan Cutler [Mon, 19 Aug 2019 14:57:07 +0000 (16:57 +0200)]
scripts/ceph-backport.sh: always use https://tracker.ceph.com
Completing the wave of fixes to this script in the wake of
https://tracker.ceph.com/issues/38764, this commit replaces
"http" with "https" in the comments and puts the Redmine endpoint
into a variable, along with some other cleanups.
Kefu Chai [Mon, 19 Aug 2019 07:21:06 +0000 (15:21 +0800)]
cmake,run-make-check.sh,deb,rpm: disable SPDK by default
but we still enable it in `run-make-check.sh`
* cmake: disable SPDK by default
* run-make-check.sh: enable WITH_SPDK so at least we can ensure it
builds
* deb,rpm: add uuid-dev / libuuid-devel as a "make check" dependency
xie xingguo [Wed, 14 Aug 2019 06:12:43 +0000 (14:12 +0800)]
osd/PrimaryLogPG: fix dirty range of write_full
A write_full operation may implicitly truncate the object down,
hence we need to mark the truncated part as dirty as well since
follow-up randomized writes may still be able to (re)extend the
object size and leave some holes against the truncated part,
which as a result might cause problems during incremental-mode
recovery.
Note that write_update_size_and_usage would reset oi.size
and that's why we move the mark_data_region_dirty call before that.
The ondisk_{read,write}_lock infrastructure was long gone with
https://github.com/ceph/ceph/pull/20177 merged - c244300ef33a044ad71fea7d92d77f33b5d41851,
to be specific. Hence the related comments must die since they
could be super-misleading.
xie xingguo [Fri, 16 Aug 2019 07:28:54 +0000 (15:28 +0800)]
os/bluestore: prefix omap of temp objects by real pool
For recovery or backfill, we use temp object at destination
if the whole object context can not sent in one shot.
And since https://github.com/ceph/ceph/pull/29292, we now segregate
omap keys by object's binding pool.
However, https://github.com/ceph/ceph/pull/29292 does not
work well for recovery or backfill process that need
to manipulate temp objects because we always (deliberately)
assign a negative pool id for each temp object which is
(obviously) different from the corresponding target object,
and we do not fix it when trying to rename the temp object
back to the target object at the end of recovery/backfill,
as a result of which we totally lose track of the omap
portion of the recovered object.
Fix by prefixing all omap-related stuff of temp objects
by using its real(pg's) pool.
Fixes time out failure because the module was trying to accessing
attribute events line 437 in this block of code:
435 if marked == "in":
436 for ev_id in list(self._events):
437 ev = self.events[ev_id]
438 if isinstance(ev, PgRecoveryEvent) and osd_id in ev.which_osds:
439 self.log.info("osd.{0} came back in, cancelling event".format(
440 osd_id
441 ))
442 self._complete(ev)
Refactor CMake add_tox_test to automatically add py27 and/or py3 to
provided toxenvs.
Refactor tox.ini:
- Remove requirements-{py27,py3}.txt, as python release dependant
packages can be handled with PEP 508 syntax.
- Remove develepment dependencies from requirements.
- Move pycodestyle settings to separate section.
- Add flake8 check and other checkers (rst, naming, etc). Some of them
are commented out for future clean-ups (Ceph trackers have been opened)
- Pycodestyle removed, as flake8 is a wrapper for pycodestyle.
- Add instafail plugin to report failures immediately
- Add timeout plugin to limit max run time (sometimes test_tasks hangs)
- Remove unused dependencies (lru_cache, pluggy)
Test and code linting fixes:
- Unused imports
- Fixes to HACKING.rst
Doc:
- Update HACKING.rst
Add conftest.py to mock imported modules (rados, rbd, cephfs), and mock
also rados Error and OSError Exceptions.
Refactor JwtManager check to avoid code duplication (found by pylint
when --jobs=1).
Fix issue with DashboardException.code property, using abs() on
potentially None attribute.
Disables Doctests in services/rbd.py that are actually integration tests
(they check the value of rbd.RBD_FEATURES_NAME_MAPPING). Ideally, these
kind of tests should be explicitly executed in an integration testing
stage, rather that unit-testing. Disabled tests have been prepended with
@DISABLEDOCTEST token.
Sage Weil [Thu, 15 Aug 2019 17:28:26 +0000 (12:28 -0500)]
Merge PR #29422 into master
* refs/pull/29422/head:
qa/tasks/mgr/dashboard/test_health: update schema
doc/rados/operations/monitoring: document muting health alerts
qa/standalone/mon/health-mutes: add tests
doc/rados/operations/health-checks: document MON_DISK_{LOW,CRIT,BIG}
doc/rados/operations/health-checks: document OSD_NO_DOWN_OUT_INTERVAL
doc/rados/operations/health-checks: document AUTH_BAD_CAPS
doc/reados/operations/health-checks: document PG_SLOW_SNAP_TRIMMING
doc/rados/operations/health-checks: document MGR_DOWN
mon/HealthCheck: check mutes based on count, not parsing the summary string
mon/health_checks: associate a count with health_alert_t
mon/HealthMonitor: simplify health alert dump
mon/PGMap: use nice timespan for PG stuck warnings
mon/HealthMonitor: allow muted alert counts to decrease but not increase
mon/PGMap: fix summary form for bluestore health alerts
doc/rados/operations/health-alerts: document BLUESTORE_NO_COMPRESSION
mon/PGMap: fix summary form for POOL_APP_NOT_ENABLED
mon/HealthMonitor: persist summary for non-sticky mutes
mon/HealthMonitor: move get_health_status()
mon/HealthMonitor: automatically clear non-sticky mutes when alert clears
mon/HealthMonitor: add gather_all_health_checks helper
mon/HealthMonitor: add sticky flag to mutes
mon/HealthMonitor: expire mutes based on ttl
mon: apply mutes to health [detail]
mon/HealthMonitor: implement mute and unmount commands
mon/HealthMonitor: maintain list of mutes
mon: refactor/simplify health [detail]
mon/health_checks: format 'health summary' with a colon
mon/health_checks: drop dump_summary_compat
Jan Fajerski [Thu, 15 Aug 2019 10:20:00 +0000 (12:20 +0200)]
ceph-volume: don't keep device lists as sets
This was introduced by #27754. The explicit device lists were cast to
sets but other parts of the code where not updated accordingly. To avoid
touching all code places, only cast to sets for disjoint test and keep
lists otherwise.
Fixes: https://tracker.ceph.com/issues/41292 Signed-off-by: Jan Fajerski <jfajerski@suse.com>
Project Zipper Part 1 - Framework and RGWRadosStore
This is the first part of Project Zipper, the Store Abstraction Layer.
It introduces the basic framework, and wraps RGWRados in RGWRadosStore.
The goal over the next few weeks is to do the same for user, bucket, and
object. This will make most of the remaining users of RGWRados wrapped
in SAL classes, allowing it to be completely absorbed into the private
RGWRadosStore. This will also expose all the APIs that need to be
pusheg up to higher layers in the SAL.
Signed-off-by: Daniel Gryniewicz <dang@redhat.com>