Kefu Chai [Thu, 9 Mar 2017 04:08:29 +0000 (12:08 +0800)]
mon/OSDMonitor: add send_pg_create() to OSDMonitor
OSDMonitor will handle the pg-create subscriptions after luminous.
1. scan new pools to get the pgs to be created
2. send pg creates using the collected pgs
3. trim the creating_pgs using the "created!" messages from OSD.
please note that we need to wait for the OSDMonitor::mapping to be fully
populated, so we cannot scan the incrementa map for creating pgs until
it is applied and accepted by other monitors.
Kefu Chai [Tue, 28 Mar 2017 03:29:20 +0000 (11:29 +0800)]
mon: acquire lock when accessing mon->session_map
we will access the mon->session_map for sending the osd-pg-creates
messages when finishing osdmapping in OSDMonitor, this could happen in
another thread without the protection of Monitor::lock, or in the same
thread already guarded by Monitor::lock. so instead of changing
Monitor::lock to a recursive lock, a new lock is introduced to protect
session_map.
Kefu Chai [Sun, 19 Mar 2017 06:02:15 +0000 (14:02 +0800)]
mon/OSDMonitor: run mapping on peons also
otherwise subcriptions on peons won't get the creating_pgs notification
mapping updated. we want to send the notification from peons also. and
the notifications should be updated with the updated pg mapping.
Kefu Chai [Thu, 16 Mar 2017 09:43:19 +0000 (17:43 +0800)]
mon/OSDMonitor: mapping is not optional anymore
as pg_creatings needs mapping to get the acting_primary osd of the
creating pg, so we can send the pg-create message to it if it subscribes
to this information, mapping should always be available now.
Kefu Chai [Fri, 10 Mar 2017 17:27:59 +0000 (01:27 +0800)]
messages/MOSDBeacon: add beacon msg
osd will send beacon message to monitor periodically to inform it that
"i am still alive!", previously, monitor use the pg-stats to check the
status of OSD, but since osd will only send pg stat to mgr after
luminous, we use a dedicated msg for this purpose.
Kefu Chai [Thu, 9 Mar 2017 14:14:41 +0000 (22:14 +0800)]
mon/OSDMonitor: update comment in update_from_paxos()
this change updates the comment for 7fb3804fb, 97462a3 and e807770,
to reflect the reason why we need to fix latest_full in current code.
as the fix is not a workaround for cuttlefish anymore, it resolves the
issue where
0. mon.c has a latest_full of 5
1. mon.c is shutdown and out of sync with the quorum
2. mon.c starts sync
3. mon.c now has osdmap[31..50], and the latest_full is still 5.
Kefu Chai [Fri, 24 Feb 2017 12:38:03 +0000 (20:38 +0800)]
mon: pass const variables by const ref not pointer
* PGMapUpdater::check_down_pgs(): pass a const reference to pgmap
instead of a pointer
* PGMapUpdater::register_new_pgs(): pass a const reference to pgmap
instead of a pointer
Amir Vadai [Wed, 22 Mar 2017 07:05:21 +0000 (09:05 +0200)]
msg/async/rdma: Introduce RDMAConnMgr
Encapsulate all connection establishment stuff in a new class -
RDMAConnMgr and make it a friend class of RDMAConnectedSocketImpl.
This class will be inherited for every type of connection establishment
- Currently only TCP is supported, very soon CM will be added too.
RDMAServerConnImpl which only handle connection establishment became an
abstract class and RDMAServerConnTCP is inherting it for connections of
type TCP.
Some of the code was left in its original file and place, and therefore
it looks misplaced. This was done to make it easier to review and rebase.
Once it is accepted a cleanup patch will be sent to move the code into
the right place.
Issue: 995322
Change-Id: I8b0e163525ec80c2452f4b6481bf696968cc1e51 Signed-off-by: Amir Vadai <amir@vadai.me>
Dan Mick [Wed, 29 Mar 2017 03:08:13 +0000 (20:08 -0700)]
tasks/workunit.py: when cloning, use --depth=1
Help avoid killing git.ceph.com. A depth 1 clone takes about
7 seconds, whereas a full one takes about 3:40 (much of it
waiting for the server to create a huge compressed pack)
Sage Weil [Wed, 15 Mar 2017 16:46:25 +0000 (12:46 -0400)]
crush: implement try_remap_rule
Simulate a CRUSH mapping but try to identify alternative OSD
choices (based on an underfull list and overfull set) that still
respect the CRUSH rule constraints.
Tim Serong [Tue, 28 Mar 2017 09:37:51 +0000 (11:37 +0200)]
pybind/mgr/rest: don't set timezone to Chicago
Setting TIME_ZONE in the Django app causes the timestamps printed in
the mgr log to suddenly be wrong after the rest module starts up
(unless, I imagine, if the host happens to be in Chicago).
Amir Vadai [Wed, 22 Mar 2017 07:03:31 +0000 (09:03 +0200)]
msg/async/rdma: Extract sockets stuff from RDMAStack.h
This is a preparation commit, in order to make review easier. In this
commit I move code from RDMAStack.h into the new file
RDMAConnectedSocketImpl.h - without changing the code.
In the next commit, the actual logic changes will be done and socket
classes will be split into a base RDMAConnected classes and child
classes with TCP connection establishment specific code.
Issue: 995322
Change-Id: I639fda490a6fbd02addb95d3158c5ac1e7390ef0 Signed-off-by: Amir Vadai <amir@vadai.me>
Test not only for -march support, but also the actual
presence of the intrinsic routines. Not sure why, but gcc
4.8.5 passes the first but not the second.
Fixes: http://tracker.ceph.com/issues/19386 Signed-off-by: Dan Mick <dan.mick@redhat.com>
David Zafman [Fri, 3 Mar 2017 23:04:02 +0000 (15:04 -0800)]
osd: Simplify DBObjectMap by no longer creating complete tables
Bump the version for new maps to 3
Make clone less efficient but simpler
Add rename operation (use instead of clone/unlink)
For now keep code that understands version 2 maps
David Zafman [Wed, 15 Feb 2017 23:02:33 +0000 (15:02 -0800)]
osd: Add automatic repair for DBObjectMap bug
Add repair command to ceph-osdomap-tool too
Under some situations the previous rm_keys() code would
generated a corrupt complete table. There is no way to
figure out what the table should look like now. By removing
the entries we fix the corruption and aren't much worse off
because the corruption caused some deleted keys to re-appear.
This doesn't breaking the parent/child relationship during
repair because some of the keys may still be contained
in the parent.
Samuel Just [Fri, 10 Feb 2017 23:50:57 +0000 (15:50 -0800)]
DBObjectMap: strengthen in_complete_region post condition
Previously, in_complete_region didn't guarantee anything about
where it left complete_iter pointing. It will be handy for
complete_iter to be pointing at the lowest interval which ends
after to_test. Make it so.
Samuel Just [Fri, 10 Feb 2017 23:48:57 +0000 (15:48 -0800)]
DBObjectMap: fix next_parent()
The previous implementation assumed that
lower_bound(parent_iter->key()) always leaves the iterator
on_parent(). There isn't any guarantee, however, that that
key isn't present on the child as well.