Yan, Zheng [Wed, 1 Mar 2017 03:57:20 +0000 (11:57 +0800)]
mds: don't break order of inter-dependent requests during mds recovers
If there is a recovering mds who replcated an object when it failed
and scatterlock in the object was in MIX state, It's possible that
the recovering mds needs to take wrlock on the scatterlock when it
replays unsafe requests. The surviver mds should delay taking rdlock
on the scatterlock for new requests. Otherwise new request may get
processed before replaying the unsafe requests. For example:
The recovering mds is auth mds of dirfrag, the survivor mds is auth
mds of correspinding inode. When 'rm -rf' the direcotry, the rmdir
request should get processed after the recovering mds replays unsafe
unlink requests.
To handle this corner case, add a flag to ScatterLock to indicate
if it was in MIX state when the recovering mds failed. If the flag
is set, delay taking rdlock on the scatterlock until the recovering
mds become active.
Yan, Zheng [Fri, 24 Feb 2017 09:24:47 +0000 (17:24 +0800)]
mds: fix mds gets stuck in clientreplay state
When client request in clientreplay queue finishes, we should call
MDSRank::queue_one_replay(). Otherwise mds gets stuck in clientreplay
state. There are several cases that client request in clientreplay
queue finishes, but MDSRank::queue_one_replay() does not get called
To make the code clear, add a flag to MClientRequest to indicate if
it's in clientreplay queue.
Yan, Zheng [Tue, 21 Feb 2017 07:11:49 +0000 (15:11 +0800)]
mds: properly record dirty sessionmap in log segment
rename may dirty sessionmap. If sessionmap get dirtied, sessionmap
version should be recorded in corresponding log segment. Otherwise,
sessionmap doesn't get flushed properly when trimming log segments
Yan, Zheng [Wed, 22 Feb 2017 07:38:06 +0000 (15:38 +0800)]
client: hold reference for newly updated snaprealm
Client::update_snap_trace() may create new snaprealm, then update
them. When Client::update_snap_trace() return, the newly created
snaprealm get freed immediately. This is wrong because callers of
Client::update_snap_trace() expects Client::get_snap_realm() return
the updated snaprealm.
Amir Vadai [Wed, 22 Mar 2017 07:05:21 +0000 (09:05 +0200)]
msg/async/rdma: Introduce RDMAConnMgr
Encapsulate all connection establishment stuff in a new class -
RDMAConnMgr and make it a friend class of RDMAConnectedSocketImpl.
This class will be inherited for every type of connection establishment
- Currently only TCP is supported, very soon CM will be added too.
RDMAServerConnImpl which only handle connection establishment became an
abstract class and RDMAServerConnTCP is inherting it for connections of
type TCP.
Some of the code was left in its original file and place, and therefore
it looks misplaced. This was done to make it easier to review and rebase.
Once it is accepted a cleanup patch will be sent to move the code into
the right place.
Issue: 995322
Change-Id: I8b0e163525ec80c2452f4b6481bf696968cc1e51 Signed-off-by: Amir Vadai <amir@vadai.me>
Dan Mick [Wed, 29 Mar 2017 03:08:13 +0000 (20:08 -0700)]
tasks/workunit.py: when cloning, use --depth=1
Help avoid killing git.ceph.com. A depth 1 clone takes about
7 seconds, whereas a full one takes about 3:40 (much of it
waiting for the server to create a huge compressed pack)
Sage Weil [Wed, 15 Mar 2017 16:46:25 +0000 (12:46 -0400)]
crush: implement try_remap_rule
Simulate a CRUSH mapping but try to identify alternative OSD
choices (based on an underfull list and overfull set) that still
respect the CRUSH rule constraints.
Tim Serong [Tue, 28 Mar 2017 09:37:51 +0000 (11:37 +0200)]
pybind/mgr/rest: don't set timezone to Chicago
Setting TIME_ZONE in the Django app causes the timestamps printed in
the mgr log to suddenly be wrong after the rest module starts up
(unless, I imagine, if the host happens to be in Chicago).
Amir Vadai [Wed, 22 Mar 2017 07:03:31 +0000 (09:03 +0200)]
msg/async/rdma: Extract sockets stuff from RDMAStack.h
This is a preparation commit, in order to make review easier. In this
commit I move code from RDMAStack.h into the new file
RDMAConnectedSocketImpl.h - without changing the code.
In the next commit, the actual logic changes will be done and socket
classes will be split into a base RDMAConnected classes and child
classes with TCP connection establishment specific code.
Issue: 995322
Change-Id: I639fda490a6fbd02addb95d3158c5ac1e7390ef0 Signed-off-by: Amir Vadai <amir@vadai.me>
Test not only for -march support, but also the actual
presence of the intrinsic routines. Not sure why, but gcc
4.8.5 passes the first but not the second.
Fixes: http://tracker.ceph.com/issues/19386 Signed-off-by: Dan Mick <dan.mick@redhat.com>
David Zafman [Fri, 3 Mar 2017 23:04:02 +0000 (15:04 -0800)]
osd: Simplify DBObjectMap by no longer creating complete tables
Bump the version for new maps to 3
Make clone less efficient but simpler
Add rename operation (use instead of clone/unlink)
For now keep code that understands version 2 maps
David Zafman [Wed, 15 Feb 2017 23:02:33 +0000 (15:02 -0800)]
osd: Add automatic repair for DBObjectMap bug
Add repair command to ceph-osdomap-tool too
Under some situations the previous rm_keys() code would
generated a corrupt complete table. There is no way to
figure out what the table should look like now. By removing
the entries we fix the corruption and aren't much worse off
because the corruption caused some deleted keys to re-appear.
This doesn't breaking the parent/child relationship during
repair because some of the keys may still be contained
in the parent.
Samuel Just [Fri, 10 Feb 2017 23:50:57 +0000 (15:50 -0800)]
DBObjectMap: strengthen in_complete_region post condition
Previously, in_complete_region didn't guarantee anything about
where it left complete_iter pointing. It will be handy for
complete_iter to be pointing at the lowest interval which ends
after to_test. Make it so.
Samuel Just [Fri, 10 Feb 2017 23:48:57 +0000 (15:48 -0800)]
DBObjectMap: fix next_parent()
The previous implementation assumed that
lower_bound(parent_iter->key()) always leaves the iterator
on_parent(). There isn't any guarantee, however, that that
key isn't present on the child as well.