Kefu Chai [Mon, 26 Aug 2019 03:36:50 +0000 (11:36 +0800)]
install-deps.sh: download wheel using 'pip wheel'
otherwise we will fail to install the build dependencies of
`lazy-object-proxy` from the wheelhouse. as `lazy-object-proxy` does not
add `setuptools_scm` in its `setup.py`, instead it lists
`setuptools_scm` in `setup.cfg` and `pyproject.toml` as a `build-system`
requires. but unfortunately, `pip download` only downloads the
install/run-time dependencies at this moment. and `lazy-object-proxy`
does not offer binary package for at least python2.7.
ideally, `pip download` should collects its dependencies like
xie xingguo [Mon, 26 Aug 2019 02:28:34 +0000 (10:28 +0800)]
osd/osd_types: add_next_event - add special handler for lost_revert
For unfound objects, we might append LOST_REVERT log entries,
which shall allow these objects to be reverted to the newest
available version later.
However, we are currently lack of support to rewind the clean_regions
portion too when marking unfound objects as lost with inc-recovery mode
enabled. Hence we must mark these unfound-revert objects as fully dirty
to make sure they can be correctly recovered.
E.g.,:
- primary is pulling object A from replica 1
- object A is corrupted on replica 1
- object A is now unfound
- mark object A as lost, replica 1 will persist a wrong
missing item for object A..
xie xingguo [Sat, 24 Aug 2019 01:51:20 +0000 (09:51 +0800)]
osd/osd_types: drop 'new_object' from constructor
There is no consumer.
Actually, I think this field is only meaningful to be used
to indicate whether we should initiate an inc-recovery or not.
If not, then we shall fall back to triggering a full-recovery
instead.
xie xingguo [Sat, 24 Aug 2019 02:56:57 +0000 (10:56 +0800)]
osd/osd_types: always call mark_fully_dirty for missing.add
In general we shall build missing set (and hence clean_regions)
based on pg log. However, currently there are still 5 cases we might call
missing.add to add a new pg_missing_item into the missing set
explicitly (or replace an existing pg_missing_item entirely):
1. we explicitly build missing set on startup, in which case
we know we are trying to be compatiable with pre-kraken versions,
so it should be ok for us to disable inc-recovery.
2. we are currently processing authoritative log, and there are
some divergent objects detected. For simplicity (and correctness),
we should disable inc-recovery entirly for these objects.
3. we are re-building missing set, e.g., due to the global
CEPH_OSDMAP_RECOVERY_DELETES policy changing.
In this case we know we are at the end of upgrading from a
pervious version that is lack of CEPH_OSDMAP_RECOVERY_DELETES support.
Hence it should be the recommended option to disable inc-recovery
simultaneously since these objects should be lack of inc-recovery support
too.
4. we are adding or re-adding missing object into primary's missing_loc.
It doesn't matter whether we have a correct clean_regions there
since we never actually refer to that field from missing_loc
when we actually start to perform object recovery later.
5. we are auto-repairing a corrupted object and hence the need of
adding it to the corresponding missing set first, e.g, by leveraging
the existing recovery procedure. In this case, we always disable
inc-recovery to make sure this object can be fully (and correctly)
recovered later.
Do this outside the standard tick interval as it needs to be driven more
frequently to keep up with client workloads that generate a lot of
capabilities.
Fixes: https://tracker.ceph.com/issues/41141 Fixes: https://tracker.ceph.com/issues/41140 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
crimson/osd: introduce OpsExecuter to uniform calling CEPH_OSD_OPS.
OSD has two entry points for executing CEPH_OSD_OP_*:
1. the MOSDOp message handler,
2. the Object Class API (cls_* and cls_cxx_* functions).
We definitely want to address these two users without code
duplication. However, exposing the entire PG to Obj Class
would break encapsulation. Moreover, there is difference
in life times between PG and sequence-of-operations-from-
MOSDOp.
crimson/osd: exceptions derive from std::system_error now.
This change is be useful especially for CEPH_OSD_OP_CALL
which will be brought by further commits. The issue here
is that the Ceph Classes will be based on existing iface
handling errors with usual, int-based ret codes. These
codes needs to be glued with the existing handling in
pg::do_osd_ops().
Patrick Donnelly [Fri, 23 Aug 2019 23:16:16 +0000 (16:16 -0700)]
Merge PR #28855 into master
* refs/pull/28855/head:
doc: document scrub summary in ceph status output
test: extend scrub control test to validate mds task status
mds: send scrub state changes to cluster log.
mds: periodically sent mds scrub status to ceph manager
mgr, mon: allow normal ceph services to register with manager
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Kefu Chai [Fri, 23 Aug 2019 08:42:22 +0000 (16:42 +0800)]
install-deps.sh,deb,rpm: move python-saml deps into debian/control and ceph.spec.in
these dependencies are only used for building python-saml which is in
turn used for the SAML support. this feature is tested using
`test_sso.py` while performing dashboard tests. we do not package or
ship python-saml along with other Ceph packages. so let's move these
dependencies to the "make check" sections in ceph.spec.in and
debian/control for simplifying install-deps.sh.
Sage Weil [Fri, 23 Aug 2019 16:25:28 +0000 (11:25 -0500)]
Merge PR #28727 into master
* refs/pull/28727/head:
test/crimson: resolve name collision
test: switch to ldout; let users specify mon debug level
test: add new ElectionLogic unit test framework
elector: const-ify a bunch of functions
elector: swap order of parameters in ElectionLogic::receive_propose
elector: Update Elector and ElectionLogic function documentation
elector: persist the epoch in bump_epoch()
elector: make some more ElectionLogic members private
elector: fix privacy and restore dout in Elector
elector: don't clear peer_info in bump_epoch()
elector: split ElectionLogic into its own compilation unit
elector: move all the elector callouts into the Elector
elector: make ElectionLogic private to Elector; undo most public shenanigans
elector: create declare_standlone_victory in Elector/Logic for Monitor
elector: make ElectionLogic::declare_victory private
elector: route _bump_epoch through the interface-to-be
elector: rename handle_propose_logic -> receive_propose
elector: hoist handle_victory into ElectionLogic
elector: hoist handle_ack into ElectionLogic
elector: hoist victory into ElectionLogic
elector: hoist expire into ElectionLogic
elector: hoist start into ElectionLogic
elector: hoist participating into ElectionLogic
elector: hoist init into ElectionLogic
elector: hoist defer into ElectionLogic
elector: split handle_propose in two and hoist into ElectionLogic
elector: hoist bump_epoch into ElectionLogic
elector: store accessors for ElectionLogic
elector: hoist Elector data bits out into a new ElectionLogic class
mon: Rearrange Paxos::dispatch to be a little cleaner
Reviewed-by: Brad Hubbard <bhubbard@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
Kefu Chai [Fri, 23 Aug 2019 02:36:44 +0000 (10:36 +0800)]
mgr/BaseMgrModule: use PyInt_Check() to compatible with py2
in python2, we have both `PyLongObject` and `PyIntObject` for presenting
`long` and `int` types. and in python3, `PyIntObject` is dropped, `int`
is represented using `PyLongObject`. and the related functions like
`PyInt_Check()` and `PyInt_AsLong()` are removed along with
`PyIntObject`.
so we should use different set of functions for converting an `int` python
object to native type in C++. since we don't use `long` for "count" when
calling `MgrModule.set_health_checks()`, in this change, we just define
a macro to unify the difference between python2 and python3.
the same applies to "max_count" parameter of `MgrModule.add_osd_perf_query()`
call.
Changcheng Liu [Fri, 2 Aug 2019 03:23:08 +0000 (11:23 +0800)]
msg/async/rdma: implement function to prefetch buffers
The original RDMAConnectedSocketImpl::read read date from buffers and
prefertch data into buffers for next round of reading. It makes the
logical a little complex and the code isn't smooth to be read.
In this patch:
1) RDMAConnectedSocketImpl::buffer_prefetch private API is added to
prefetch data into buffers at the head of read_buffers.
2) reduce one time of calling notify() to reduce context switches.
It's really not needed to notify upper layer to read data since current
read operation hasn't finished yet.
3) Simplify RDMAConnectedSocketImpl::read implementation.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
Changcheng Liu [Wed, 19 Jun 2019 07:53:08 +0000 (15:53 +0800)]
msg/async/rdma: remove redundant code
1. Below three bits are meaningless in pollfd::events field:
POLLERR, POLLHUP, or POLLNVAL.
2. QueuePair::pd is initialized in the initialize list.
There's no need to assign same value to it.
3. Remove the never used function Chunk::set_bound
4. Remove the never used function Chunk::set_offset
5. Remove the never used function QueuePair::is_error
6. Remove SimplePolicyMessenger used vars
7. remove socket_fd() interface since it's never used.
All data write/read is based on ConnectedSocketImpl::fd.
So, there's no need to expose socket_fd since it's never used.
8. Remove RDMAServerSocketImpl::get_fd which is not used.
BTW, RDMAServerSocketImpl::fd has the same function as get_fd.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>
Changcheng Liu [Wed, 7 Aug 2019 05:33:37 +0000 (13:33 +0800)]
msg/async/rdma: rename variable to improve readability
Device::binding_port
1. port_id is more meaningful compared to i as variable name.
2. start port_id from 1 instead of 0.
PoolAllocator::malloc
1. make clear relationship among buffer/chunk/block/memory_region with new
variable name.
2. define the variable when it's first being used.
RDMAConnectedSocketImpl::submit
1. use "wait_copy_len" to replace "need_reserve_bytes" which stands for the memory
that is waiting to be copied into chunk.
2. use "copy_start" to replace "copy_it" which stands for the start iterator to be copied.
3. use "total_copied" to replace "total" which stands for the memory that has been copied.
allocate huge page
1. use "HUGE_PAGE_SIZE_2MB" to be used for 2MB page alignment.
2. use "ALIGN_TO_PAGE_2MB" to stands align request size to 2MB.
Signed-off-by: Changcheng Liu <changcheng.liu@aliyun.com>