Xiaoxi CHEN [Fri, 25 May 2018 09:15:08 +0000 (02:15 -0700)]
Revert "mon: no delay for single message MSG_ALIVE and MSG_PGTEMP"
This change doesn't looks right and causing twice as much proposal as we targeted to (limited by paxos_propose_interval).
Imaging we have a sequence of pg_temp/up_thru during a large recovery.
now =T
The 1st up_thru/pg_temp will go through fast path and trigger propose at T + paxos_min_wait, last_attempted_minwait_time = T.
now = T+ paxos_min_wait
The [2, K] up_thru will failed by (now - last_attempted_minwait_time > g_conf->paxos_propose_interval)
and go through PaxosService::should_propose, which will schedule the propose at) T+paxos_propose_interval
now= T+ paxos_propose_interval + paxos_min_wait
The K+1 up_thru/pg_temp comes, both (now - last_attempted_minwait_time > g_conf->paxos_propose_interval
and now - paxos->get_last_commit_time() > g_conf->paxos_min_wait satisfied, so we trigger another propose
in now+ paxos_min_wait = T+ paxos_propose_interval +paxos_min_wait.
clearly we made TWO proposal in each paxos_propose_interval.
Kefu Chai [Thu, 24 May 2018 09:21:42 +0000 (17:21 +0800)]
dout: declare dpp using `decltype(auto)` instead of `auto`
this makes `pdpp` an alias of `dpp`, guess this ensure GCC that the
returned `sub` is a constant.
In file included from /home/kefu/dev/ceph/src/kv/LevelDBStore.h:25,
from /home/kefu/dev/ceph/src/kv/KeyValueDB.cc:6:
/home/kefu/dev/ceph/src/osd/osd_types.h: In lambda function:
/home/kefu/dev/ceph/src/common/dout.h:101:75: error: the value of ‘pdpp’
is not usable in a constant expression
dout_impl(pdpp->get_cct(),
ceph::dout::need_dynamic(pdpp->get_subsys()), v) \
^
/home/kefu/dev/ceph/src/common/dout.h:81:58: note: in definition of
macro ‘dout_impl’
return (cctX->_conf->subsys.template should_gather<sub, v>()); \
^~~
/home/kefu/dev/ceph/src/osd/osd_types.h:2992:3: note: in expansion of
macro ‘ldpp_dout’
ldpp_dout(dpp, 10) << "build_prior all_probe " << all_probe << dendl;
^~~~~~~~~
/home/kefu/dev/ceph/src/common/dout.h:100:12: note: ‘pdpp’ was not
declared ‘constexpr’
if (auto pdpp = (dpp); pdpp) /* workaround -Wnonnull-compare for
'this' */ \
^~~~
/home/kefu/dev/ceph/src/common/dout.h:100:12: note: in definition of
macro ‘ldpp_dout’
if (auto pdpp = (dpp); pdpp) /* workaround -Wnonnull-compare for
'this' */ \
^~~~
Kefu Chai [Thu, 24 May 2018 08:21:48 +0000 (16:21 +0800)]
deb,rpm: package librgw_admin_user.{h,so.*}
* install and package librgw_admin_user.h, so developers can use it to
create rgw user.
* package librgw_admin_user, so user can use it to create rgw user.
Sage Weil [Tue, 22 May 2018 21:55:03 +0000 (16:55 -0500)]
mon/MgrMonitor: change 'unresponsive' message to info level
We generate a MGR_DOWN health warning at the appropriate points; having
this at WRN level just triggers failed teuthology runs but doesn't much
value for the user.
Clear out teuthology whitelisting for this message.
Fixes: http://tracker.ceph.com/issues/24222 Signed-off-by: Sage Weil <sage@redhat.com>
common: OpTracker doesn't visit TrackedOp when nref == 0.
The patch fixes a race condition that happens between
`unregister_inflight_op` and `visit_ops_in_flight` of
`OpTracker`. When a callable passed to the former one
turns the plain reference it gets into `TrackedOpRef`,
an almost-to-terminate `TrackedOp` (with `nref == 0`)
can be resurrected (`nref++`). This will be reflected
in extra call to `unregister_inflight_op` for same op
leading to e.g. use-after-free. For more details see:
https://tracker.ceph.com/issues/24037#note-5.
The fix deals with the problem by ensuring there will
be no call to the visitor for ops with zeroized `nref`.
Jianpeng Ma [Mon, 21 May 2018 14:46:12 +0000 (22:46 +0800)]
os/bluefs: only flush dirty devices when do _fsync.
Now _fsync call flush_bdev make data safely. But flush_bdev flush all
devices which don't care whether has data for this sync.
So add new api flush_bdev(std::array<bool, MAX_BDEV>& dirty_bdevs)
which only flush dirty devices for this sync op.
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Sage Weil [Mon, 21 May 2018 13:24:54 +0000 (08:24 -0500)]
os/bluestore: move txc on_commits assignment into ctor
This avoids adjusting the oncommits without a lock after the txc is
queued on the sequencer.
This is a bit defensive since the ObjectStore caller doesn't call
flush_commit() at the same time as queue_transaction(), but the could
change in the future.
Sage Weil [Mon, 21 May 2018 15:06:37 +0000 (10:06 -0500)]
os/bluestore: simplify and fix SharedBlob::put()
There is a narrow race possible:
A: lookup foo
A: put on foo
A: foo --nref == 0
B: lookup foo
B: put foo
B: foo --nref == 0
B: try_remove() succeeds, removes
A: try_remove() tries to remove foo again, probably crashes
We could fix this by flagging the object in some way to indicate it was
removed (maybe clearing parent?), but then we need to be careful about
dereferencing foo to get parent from put().
Fix this by moving to a simpler model: make lookup fail if nref == 0.
This eliminates the races around put() entirely because once nref reaches
0 it never goes up again.
Fixes: http://tracker.ceph.com/issues/24211 Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 21 May 2018 13:41:00 +0000 (08:41 -0500)]
Merge PR #22091 into master
* refs/pull/22091/head:
crush: update choose_args on bucket removal
crush: update choose_args on bucket removal, resize, or position mismatch
crush: create weight-set on demand when doing a choose-args reweight
test/cli/crushtool: use straw2 buckets for choose-args test
crush: weight_set_size -> weight_set_positions
Sage Weil [Fri, 18 May 2018 14:18:11 +0000 (09:18 -0500)]
os/bluestore: fix flush_commit locking
We were updating the txc state to KV_DONE and queuing the oncommits
waiters without holding any locks. This was mostly fine, *except* that
Collection|OpSequencer::flush_commit(Context *) was looking at the state
(under qlock) and also adding items to oncommits.
The flush_commit() method is only used in 2 places: osd bench, and the
PG reset_interval_flush outgoing message blocking machinery (which is
a bit ick). The first we could get rid of, but the second is hard to
remove (despite its ick factor).
The simple fix is to take qlock while updating the state value and
working with oncommits.
Fixes: http://tracker.ceph.com/issues/21480 Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Mon, 21 May 2018 12:07:59 +0000 (07:07 -0500)]
Merge PR #21540 into wip-sage-testing-20180521.120735
* refs/pull/21540/head:
tests/crypto: print compile warning when NSS is unavailable.
tests/crypto: add tests for the no-bl encrypt/decrypt, part 2.
tests/crypto: add tests for the no-bl encrypt/decrypt.
auth: use OpenSSL for CryptoAESKeyHandler's no-bl encrypt/decrypt.
auth: extend CryptoKey with no-bl encrypt/decrypt.
auth: CryptoAESKeyHandler switches from NSS to OpenSSL.
auth: the outbuf of AES should be multiple of block size
auth: cache the PK11Context for CryptoAESKeyHandler
Kefu Chai [Sun, 20 May 2018 08:52:53 +0000 (16:52 +0800)]
qa/workunits/rados/test_envlibrados_for_rocksdb: use cmake not make
* so we just rely on a single build system instead of two of them, the
other place we use cmake is cmake/modules/BuildRockDB.cmake.
* disable gflags when building rocksdb, it's optional and does not help
in the sense of testing librados support.
* disable prompts when installing on debian, to silence warnings like:
debconf: unable to initialize frontend: Dialog
* drop --force-yes option, as it is deprecated, and is replaced with
--allow-downgrades, --allow-remove-essential,
--allow-change-held-packages, but none of them apply in our case.