Changcheng Liu [Fri, 28 Jun 2019 06:26:41 +0000 (14:26 +0800)]
msg/async/rdma: fix error argument to get right qp state
1. It's wrong to use "-1" as argument to query queue state.
In rdma library, ibv_query_qp will call ibv_cmd_query_qp to query
queue state. If "-1" is used as attr_mask, ibv_cmd_query_qp will
return error EOPNOTSUPP which means query failed.
2. In class QueuePair, is_error() could use member function get_state()
to get the queue pair state.
3. It's better to use qp_state as queue pair state according to
ibv_query_qp manual guide.
struct ibv_qp_attr {
enum ibv_qp_state qp_state; /* Current QP state */
enum ibv_qp_state cur_qp_state; /* Current QP state - irrelevant for ibv_query_qp */
...
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
Changcheng Liu [Mon, 3 Jun 2019 05:31:09 +0000 (13:31 +0800)]
msg/async/rdma: export RDMAV_HUGEPAGES_SAFE before ibv_fork_init
In rdma-core library, ibv_fork_init will check environment variable
RDMAV_HUGEPAGES_SAFE to decide whether huge page is usable in system.
It doesn't make sense to export RDMAV_HUGEPAGES_SAFE env after
calling ibv_fork_init.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
Changcheng Liu [Mon, 3 Jun 2019 05:00:22 +0000 (13:00 +0800)]
msg/async/rdma: use ibv_port_attr object type in Port class
1. Avoid to do memory management without using pointer to operate
operate the allocated space. Or, it could have memory leak.
2. Since member type has been changed in class Device, it need
to use member domain operator "." to access to the sub-member in
object.
3. There's no need to consider experimental API of ibv_query_port.
So, merge ibv_query_port in the prolog.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
Patrick Donnelly [Wed, 21 Aug 2019 17:57:15 +0000 (10:57 -0700)]
Merge PR #28378 into master
* refs/pull/28378/head:
qa/tasks: introduce Thrasher base class
qa/tasks: Fix typo
qa/tasks: manage thrashers
qa/tasks: start DaemonWatchdog when ceph starts
qa/tasks: make watch and bark handle more daemons
qa/tasks: move DaemonWatchdog to new file
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Jos Collin [Mon, 5 Aug 2019 10:52:10 +0000 (16:22 +0530)]
qa/tasks: introduce Thrasher base class
* Introduced a Thrasher base class.
* Updated thrashers to inherit from Thrasher.
* Replaced the magic variable e with Thrasher.exception as per the discussion.
Now the exception variable sets by default as the thrashers are inheriting
from the Thrasher class.
Fixes: https://github.com/ceph/ceph/pull/28378#discussion_r309337928 Fixes: https://tracker.ceph.com/issues/41133 Signed-off-by: Jos Collin <jcollin@redhat.com>
Nathan Cutler [Tue, 20 Aug 2019 14:23:06 +0000 (16:23 +0200)]
Merge pull request #29743 from smithfarm/wip-ceph-backport-https
script/ceph-backport.sh: carry https through to logical conclusion
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com> Reviewed-by: Abhishek Lekshmanan <abhishek@suse.com> Reviewed-by: Willem Jan Withagen <wjw@digiware.nl>
Nathan Cutler [Mon, 19 Aug 2019 14:57:07 +0000 (16:57 +0200)]
scripts/ceph-backport.sh: always use https://tracker.ceph.com
Completing the wave of fixes to this script in the wake of
https://tracker.ceph.com/issues/38764, this commit replaces
"http" with "https" in the comments and puts the Redmine endpoint
into a variable, along with some other cleanups.
Kefu Chai [Tue, 20 Aug 2019 08:15:17 +0000 (16:15 +0800)]
osd: always initialize local variable
to silence a GCC warning like:
../src/osd/OSD.cc:4608:24: warning: ‘sender_delta_ub’ may be used
uninitialized in this function [-Wmaybe-uninitialized]
4608 | ceph::signedspan sender_delta_ub;
| ^~~~~~~~~~~~~~~
Kefu Chai [Mon, 19 Aug 2019 07:21:06 +0000 (15:21 +0800)]
cmake,run-make-check.sh,deb,rpm: disable SPDK by default
but we still enable it in `run-make-check.sh`
* cmake: disable SPDK by default
* run-make-check.sh: enable WITH_SPDK so at least we can ensure it
builds
* deb,rpm: add uuid-dev / libuuid-devel as a "make check" dependency
xie xingguo [Mon, 17 Jun 2019 03:05:31 +0000 (11:05 +0800)]
osd: do not invalidate clear_regions of missing item at boot
Seems we'll always mark clear_regions as all dirty
when reading pg logs and missing items off the disk,
which as a result turns incremental recovery off by default.
Also using std::move seems to be a bit more efficient
and robust here.
xie xingguo [Wed, 14 Aug 2019 06:12:43 +0000 (14:12 +0800)]
osd/PrimaryLogPG: fix dirty range of write_full
A write_full operation may implicitly truncate the object down,
hence we need to mark the truncated part as dirty as well since
follow-up randomized writes may still be able to (re)extend the
object size and leave some holes against the truncated part,
which as a result might cause problems during incremental-mode
recovery.
Note that write_update_size_and_usage would reset oi.size
and that's why we move the mark_data_region_dirty call before that.
The ondisk_{read,write}_lock infrastructure was long gone with
https://github.com/ceph/ceph/pull/20177 merged - c244300ef33a044ad71fea7d92d77f33b5d41851,
to be specific. Hence the related comments must die since they
could be super-misleading.
xie xingguo [Fri, 16 Aug 2019 07:28:54 +0000 (15:28 +0800)]
os/bluestore: prefix omap of temp objects by real pool
For recovery or backfill, we use temp object at destination
if the whole object context can not sent in one shot.
And since https://github.com/ceph/ceph/pull/29292, we now segregate
omap keys by object's binding pool.
However, https://github.com/ceph/ceph/pull/29292 does not
work well for recovery or backfill process that need
to manipulate temp objects because we always (deliberately)
assign a negative pool id for each temp object which is
(obviously) different from the corresponding target object,
and we do not fix it when trying to rename the temp object
back to the target object at the end of recovery/backfill,
as a result of which we totally lose track of the omap
portion of the recovered object.
Fix by prefixing all omap-related stuff of temp objects
by using its real(pg's) pool.
Fixes time out failure because the module was trying to accessing
attribute events line 437 in this block of code:
435 if marked == "in":
436 for ev_id in list(self._events):
437 ev = self.events[ev_id]
438 if isinstance(ev, PgRecoveryEvent) and osd_id in ev.which_osds:
439 self.log.info("osd.{0} came back in, cancelling event".format(
440 osd_id
441 ))
442 self._complete(ev)
Refactor CMake add_tox_test to automatically add py27 and/or py3 to
provided toxenvs.
Refactor tox.ini:
- Remove requirements-{py27,py3}.txt, as python release dependant
packages can be handled with PEP 508 syntax.
- Remove develepment dependencies from requirements.
- Move pycodestyle settings to separate section.
- Add flake8 check and other checkers (rst, naming, etc). Some of them
are commented out for future clean-ups (Ceph trackers have been opened)
- Pycodestyle removed, as flake8 is a wrapper for pycodestyle.
- Add instafail plugin to report failures immediately
- Add timeout plugin to limit max run time (sometimes test_tasks hangs)
- Remove unused dependencies (lru_cache, pluggy)
Test and code linting fixes:
- Unused imports
- Fixes to HACKING.rst
Doc:
- Update HACKING.rst
Add conftest.py to mock imported modules (rados, rbd, cephfs), and mock
also rados Error and OSError Exceptions.