]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
12 years agodoc: document new hadoop config options
Noah Watkins [Sun, 17 Mar 2013 19:10:16 +0000 (12:10 -0700)]
doc: document new hadoop config options

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agodoc: start Hadoop installation docs
Noah Watkins [Sun, 24 Feb 2013 22:10:35 +0000 (14:10 -0800)]
doc: start Hadoop installation docs

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agodoc: Hadoop clarifications
Noah Watkins [Sat, 23 Feb 2013 01:58:25 +0000 (17:58 -0800)]
doc: Hadoop clarifications

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agoAdded -r option to usage
Christophe Courtaut [Wed, 29 May 2013 09:07:24 +0000 (11:07 +0200)]
Added -r option to usage

Added the -r option, which starts the radosgw and apache2 to access it
to the usage message.

Signed-off-by: Christophe Courtaut <christophe.courtaut@gmail.com>
12 years agorbd/concurrent.sh: probe rbd module at start
Alex Elder [Thu, 30 May 2013 15:10:16 +0000 (10:10 -0500)]
rbd/concurrent.sh: probe rbd module at start

There's no guarantee the rbd module is loaded when this script is
run, so add a line that loads it if necessary.

Signed-off-by: Alex Elder <elder@inktank.com>
12 years agoMerge pull request #331 from ceph/wip-osd-interfacecheck
Sage Weil [Thu, 30 May 2013 05:45:37 +0000 (22:45 -0700)]
Merge pull request #331 from ceph/wip-osd-interfacecheck

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoMerge branch 'next'
Sage Weil [Thu, 30 May 2013 05:44:40 +0000 (22:44 -0700)]
Merge branch 'next'

12 years agoosd: wait for healthy pings from peers in waiting-for-healthy state 331/head
Sage Weil [Wed, 29 May 2013 20:26:45 +0000 (13:26 -0700)]
osd: wait for healthy pings from peers in waiting-for-healthy state

If we are (wrongly) marked down, we need to go into the waiting-for-healthy
state and verify that our network interfaces are working before trying to
rejoin the cluster.

 - make _is_healthy() check require positive proof of pings working
 - do heartbeat checks and updates in this state
 - reset the random peers every heartbeat_interval, in case we keep picking
   bad ones

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: distinguish between definitely healthy and definitely not unhealthy
Sage Weil [Wed, 29 May 2013 20:15:41 +0000 (13:15 -0700)]
osd: distinguish between definitely healthy and definitely not unhealthy

is_unhealthy() will assume they are healthy for some period after we
send our first ping attempt.  is_healthy() is now a strict check that we
know they are healthy.

Switch the failure report check to use is_unhealthy(); use is_healthy()
everywhere else, including the waiting-for-healthy pre-boot checks.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: remove down hb peers
Sage Weil [Wed, 29 May 2013 19:24:28 +0000 (12:24 -0700)]
osd: remove down hb peers

If a (say, random) peer goes down, filter it out.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: only add pg peers if active
Sage Weil [Wed, 29 May 2013 19:24:04 +0000 (12:24 -0700)]
osd: only add pg peers if active

We will soon be in this method for the waiting-for-healthy state.  As
a consequence, we need to remove any down peers.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: factor out _remove_heartbeat_peer
Sage Weil [Wed, 29 May 2013 19:16:28 +0000 (12:16 -0700)]
osd: factor out _remove_heartbeat_peer

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: augment osd heartbeat peers with neighbors and randoms, to up some min
Sage Weil [Wed, 29 May 2013 18:27:38 +0000 (11:27 -0700)]
osd: augment osd heartbeat peers with neighbors and randoms, to up some min

- always include our neighbors to ensure we have a fully-connected
  graph
- include some random neighbors to get at least some min number of peers.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: initialize new_state field when we use it
Sage Weil [Wed, 29 May 2013 23:50:04 +0000 (16:50 -0700)]
osd: initialize new_state field when we use it

If we use operator[] on a new int field its value is undefined; avoid
reading it or using |= et al until we initialize it.

Fixes: #4967
Backport: cuttlefish, bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
12 years agoMerge branch 'wip_osd_throttle'
Samuel Just [Wed, 29 May 2013 22:06:18 +0000 (15:06 -0700)]
Merge branch 'wip_osd_throttle'

Fixes: #4782
Reviewed-by: Sage Weil
12 years agoWBThrottle: add some comments and some asserts
Samuel Just [Wed, 29 May 2013 22:05:51 +0000 (15:05 -0700)]
WBThrottle: add some comments and some asserts

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoWBThrottle: rename replica nocache
Samuel Just [Wed, 29 May 2013 22:05:34 +0000 (15:05 -0700)]
WBThrottle: rename replica nocache

We may want to influence the caching behavior for other
reasons.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: move health checks into a single helper
Sage Weil [Mon, 27 May 2013 22:27:59 +0000 (15:27 -0700)]
osd: move health checks into a single helper

For now we still only look at the internal heartbeats.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: avoid duplicate mon requests for a new osdmap
Sage Weil [Wed, 29 May 2013 20:16:24 +0000 (13:16 -0700)]
osd: avoid duplicate mon requests for a new osdmap

sub_want() returns true if this is a new sub; only renew then.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: tell peers that ping us if they are dead
Sage Weil [Wed, 29 May 2013 20:16:01 +0000 (13:16 -0700)]
osd: tell peers that ping us if they are dead

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: simplify is_healthy() check during boot
Sage Weil [Mon, 27 May 2013 22:24:56 +0000 (15:24 -0700)]
osd: simplify is_healthy() check during boot

This has a slight behavior change in that we ask the mon for the latest
osdmap if our internal heartbeat is failing.  That isn't useful yet, but
will be shortly.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomds: stay in SCAN state in file_eval
Sage Weil [Tue, 28 May 2013 17:51:11 +0000 (10:51 -0700)]
mds: stay in SCAN state in file_eval

If we are in the SCAN state, stay there until the recovery finishes.  Do
not jump to another state from file_eval().

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 0071b8e75bd3f5a09cc46e2225a018f6d1ef0680)

12 years agomds: stay in SCAN state in file_eval
Sage Weil [Tue, 28 May 2013 17:51:11 +0000 (10:51 -0700)]
mds: stay in SCAN state in file_eval

If we are in the SCAN state, stay there until the recovery finishes.  Do
not jump to another state from file_eval().

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMakefile: include new message header files
Sage Weil [Tue, 28 May 2013 22:52:46 +0000 (15:52 -0700)]
Makefile: include new message header files

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'yan/wip-mds'
Sage Weil [Wed, 29 May 2013 17:26:56 +0000 (10:26 -0700)]
Merge remote-tracking branch 'yan/wip-mds'

Reviewed-by: Sage Weil <sage@inktank.com>
Conflicts:
src/mds/MDCache.cc

12 years agoosd: do not assume head obc object exists when getting snapdir
Sage Weil [Wed, 29 May 2013 16:49:11 +0000 (09:49 -0700)]
osd: do not assume head obc object exists when getting snapdir

For a list-snaps operation on the snapdir, do not assume that the obc for the
head means the object exists.  This fixes a race between a head deletion and
a list-snaps that wrongly returns ENOENT, triggered by the DiffItersateStress
test when thrashing OSDs.

Fixes: #5183
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoMerge pull request #329 from javacruft/wip-fuse-deps
Sage Weil [Wed, 29 May 2013 15:14:27 +0000 (08:14 -0700)]
Merge pull request #329 from javacruft/wip-fuse-deps

Use new fuse package instead of fuse-utils

12 years agoUse new fuse package instead of fuse-utils 329/head
James Page [Wed, 29 May 2013 09:57:17 +0000 (10:57 +0100)]
Use new fuse package instead of fuse-utils

The fuse-utils package was deprecated a while ago.

Switch the primary dependency for fuse tools to use
the preferred 'fuse' package.

Signed-off-by: James Page <james.page@ubuntu.com>
12 years agomon: disable tdump by default
Sage Weil [Wed, 29 May 2013 05:13:11 +0000 (22:13 -0700)]
mon: disable tdump by default

Grr.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'gh/last'
Sage Weil [Wed, 29 May 2013 05:10:21 +0000 (22:10 -0700)]
Merge remote-tracking branch 'gh/last'

12 years agoMerge branch 'wip-5172'
Sage Weil [Wed, 29 May 2013 03:44:48 +0000 (20:44 -0700)]
Merge branch 'wip-5172'

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: fix note_down_osd
Sage Weil [Wed, 29 May 2013 03:38:43 +0000 (20:38 -0700)]
osd: fix note_down_osd

Fix bug introduced in 27381c0c6259ac89f5f9c592b4bfb585937a1cfc.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: fix hb con failure handler
Sage Weil [Wed, 29 May 2013 03:39:30 +0000 (20:39 -0700)]
osd: fix hb con failure handler

Fix a few bugs introduced by 27381c0c6259ac89f5f9c592b4bfb585937a1cfc:

- check against both front and back cons; either one may have failed.
- close *both* front and back before reopening either.  this is
  overkill, but slightly simpler code.
- fix leak of con when marking down
- handle race against osdmap update and note_down_osd

Fixes: #5172
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #319 from dalgaaf/wip-da-pylint-3
Sage Weil [Wed, 29 May 2013 02:52:41 +0000 (19:52 -0700)]
Merge pull request #319 from dalgaaf/wip-da-pylint-3

Fix some smaller Python issues

12 years agoMerge pull request #326 from dalgaaf/wip-da-CID-727978
Sage Weil [Tue, 28 May 2013 22:48:11 +0000 (15:48 -0700)]
Merge pull request #326 from dalgaaf/wip-da-CID-727978

kv_flat_btree_async.cc: fix AioCompletion resource leak

12 years agov0.63 v0.63
Gary Lowell [Tue, 28 May 2013 20:58:22 +0000 (13:58 -0700)]
v0.63

12 years agoHashIndex: sync top directory during start_split,merge,col_split
Samuel Just [Tue, 28 May 2013 18:10:05 +0000 (11:10 -0700)]
HashIndex: sync top directory during start_split,merge,col_split

Otherwise, the links might be ordered after the in progress
operation tag write.  We need the in progress operation tag to
correctly recover from an interrupted merge, split, or col_split.

Fixes: #5180
Backport: cuttlefish, bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agodoc/dev/osd_internals: add wbthrottle.rst 332/head
Samuel Just [Fri, 24 May 2013 20:35:14 +0000 (13:35 -0700)]
doc/dev/osd_internals: add wbthrottle.rst

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoWBThrottle: add perfcounters
Samuel Just [Tue, 28 May 2013 17:41:52 +0000 (10:41 -0700)]
WBThrottle: add perfcounters

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoMerge pull request #325 from dalgaaf/wip-da-CID-727980
Sage Weil [Tue, 28 May 2013 17:27:56 +0000 (10:27 -0700)]
Merge pull request #325 from dalgaaf/wip-da-CID-727980

kv_flat_btree_async.cc: fix AioCompletion resource leak

12 years agoMerge pull request #324 from dalgaaf/wip-da-CID-727979
Sage Weil [Tue, 28 May 2013 17:27:25 +0000 (10:27 -0700)]
Merge pull request #324 from dalgaaf/wip-da-CID-727979

kv_flat_btree_async.cc: fix AioCompletion resource leak

12 years agoosd/OSDMap: fix Incremental dump
Sage Weil [Tue, 28 May 2013 16:16:17 +0000 (09:16 -0700)]
osd/OSDMap: fix Incremental dump

The front hb addr entry may not be present.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #322 from guilhem/patch-1
Sage Weil [Tue, 28 May 2013 15:43:10 +0000 (08:43 -0700)]
Merge pull request #322 from guilhem/patch-1

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agokv_flat_btree_async.cc: fix AioCompletion resource leak 326/head
Danny Al-Gaaf [Tue, 28 May 2013 10:43:12 +0000 (12:43 +0200)]
kv_flat_btree_async.cc: fix AioCompletion resource leak

Call AioCompletion::release() if the completion is no longer needed.

CID 727978 (#1-2 of 2): Resource leak (RESOURCE_LEAK)
  leaked_storage: Variable "obj_aioc" going out of scope leaks the
  storage it points to.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agokv_flat_btree_async.cc: fix AioCompletion resource leak 324/head
Danny Al-Gaaf [Tue, 28 May 2013 10:38:57 +0000 (12:38 +0200)]
kv_flat_btree_async.cc: fix AioCompletion resource leak

Call AioCompletion::release() if the completion is no longer needed.

CID 727979 (#1-2 of 2): Resource leak (RESOURCE_LEAK)
  leaked_storage: Variable "a" going out of scope leaks the storage
  it points to.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agokv_flat_btree_async.cc: fix AioCompletion resource leak 325/head
Danny Al-Gaaf [Tue, 28 May 2013 10:27:37 +0000 (12:27 +0200)]
kv_flat_btree_async.cc: fix AioCompletion resource leak

Call AioCompletion::release() if the completion is no longer
needed.

CID 727980 (#1-4 of 4): Resource leak (RESOURCE_LEAK)
  leaked_storage: Variable "aioc" going out of scope leaks
  the storage it points to.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoRemove mon socket in post-stop 322/head
Guilhem Lettron [Mon, 27 May 2013 10:41:53 +0000 (12:41 +0200)]
Remove mon socket in post-stop

If ceph-mon segfault, socket file isn't removed.

By adding a remove in post-stop, upstart clean run directory properly.

Signed-off-by: Guilhem Lettron <guilhem@lettron.fr>
12 years agomds: use "open-by-ino" function to open remote link
Yan, Zheng [Sun, 26 May 2013 11:04:34 +0000 (19:04 +0800)]
mds: use "open-by-ino" function to open remote link

Also add a new config option "mds_open_remote_link_mode". The anchor
approach is used by default. If mode is non-zero, use the open-by-ino
function. In case open-by-ino function fails, if mode is 1, retry
using the anchor approach, otherwise trigger assertion.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: open missing cap inodes
Yan, Zheng [Sat, 25 May 2013 13:30:38 +0000 (21:30 +0800)]
mds: open missing cap inodes

When a recovering MDS enters reconnect stage, client sends reconnect
messages to it. The message lists open files, their path, and issued
caps. If an inode is not in the cache, the recovering MDS uses the
path client provides to determine if it's the inode's authority. If
not, the recovering MDS exports the inode's caps to other MDS. The
issue here is that the path client provides isn't always accuracy.

The fix is use recently added "open inode by ino" function to open
any missing cap inodes when the recovering MDS enters rejoin stage.
Send cache rejoin messages to other MDS after all caps' authorities
are determined.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: bump the protocol version
Yan, Zheng [Fri, 24 May 2013 05:42:15 +0000 (13:42 +0800)]
mds: bump the protocol version

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: open inode by ino
Yan, Zheng [Wed, 15 May 2013 02:28:58 +0000 (10:28 +0800)]
mds: open inode by ino

This patch adds "open-by-ino" helper. It utilizes backtrace to find
inode's path and open the inode. The algorithm looks like:

1. Check MDS peers. If any MDS has the inode in its cache, goto step 6.
2. Fetch backtrace. If backtrace was previously fetched and get the
   same backtrace again, return -EIO.
3. Traverse the path in backtrace. If the inode is found, goto step 6;
   if non-auth dirfrag is encountered, goto next step. If fail to find
   the inode in its parent dir, goto step 1.
4. Request MDS peers to traverse the path in backtrace. If the inode
   is found, goto step 6. If MDS peer encounters non-auth dirfrag, it
   stops traversing. If any MDS peer fails to find the inode in its
   parent dir, goto step 1.
5. Use the same algorithm to open the inode's parent. Goto step 3 if
   succeeds; goto step 1 if fails.
6. return the inode's auth MDS ID.

The algorithm has two main assumptions:
1. If an inode is in its auth MDS's cache, its on-disk backtrace
   can be out of date.
2. If an inode is not in any MDS's cache, its on-disk backtrace
   must be up to date.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: move fetch_backtrace() to class MDCache
Yan, Zheng [Fri, 17 May 2013 21:49:22 +0000 (05:49 +0800)]
mds: move fetch_backtrace() to class MDCache

We may want to fetch backtrace while corresponding inode isn't
instantiated. MDCache::fetch_backtrace() will be used by later
patch.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: remove old backtrace handling
Yan, Zheng [Fri, 17 May 2013 08:11:27 +0000 (16:11 +0800)]
mds: remove old backtrace handling

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: update backtraces when unlinking inodes
Yan, Zheng [Sat, 18 May 2013 09:16:03 +0000 (17:16 +0800)]
mds: update backtraces when unlinking inodes

unlink moves inodes to stray dir, it's a special form of rename.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: bring back old style backtrace handling
Yan, Zheng [Fri, 17 May 2013 08:43:01 +0000 (16:43 +0800)]
mds: bring back old style backtrace handling

To queue a backtrace update, current code allocates a BacktraceInfo
structure and adds it to log segment's update_backtraces list. The
main issue of this approach is that BacktraceInfo is independent
from inode. It's very inconvenient to find pending backtrace updates
for given inodes. When exporting inodes from one MDS to another
MDS, we need find and cancel all pending backtrace updates on the
source MDS.

This patch brings back old backtrace handling code and adapts it
for the current backtrace format. The basic idea behind of the old
code is: when an inode's backtrace becomes dirty, add the inode to
log segment's dirty_parent_inodes list.

Compare to the current backtrace handling, another difference is
that backtrace update is journalled in EMetaBlob::full_bit

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: rename last_renamed_version to backtrace_version
Yan, Zheng [Fri, 17 May 2013 06:24:57 +0000 (14:24 +0800)]
mds: rename last_renamed_version to backtrace_version

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: journal backtrace update in EMetaBlob::fullbit
Yan, Zheng [Fri, 17 May 2013 08:02:03 +0000 (16:02 +0800)]
mds: journal backtrace update in EMetaBlob::fullbit

Current way to journal backtrace update is set EMetaBlob::update_bt
to true. The problem is that an EMetaBlob can include several inodes.
If an EMetaBlob's update_bt is true, journal replay code has to queue
backtrace updates for all inodes in the EMetaBlob.

This patch adds two new flags to class EMetaBlob::fullbit, make it be
able to journal backtrace update.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: reorder EMetaBlob::add_primary_dentry's parameters
Yan, Zheng [Thu, 9 May 2013 03:27:53 +0000 (11:27 +0800)]
mds: reorder EMetaBlob::add_primary_dentry's parameters

prepare for adding new state parameter such as 'dirty_parent'

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: warn on unconnected snap realms
Yan, Zheng [Wed, 15 May 2013 03:24:36 +0000 (11:24 +0800)]
mds: warn on unconnected snap realms

When there are more than one active MDS, restarting MDS triggers
assertion "reconnected_snaprealms.empty()" quite often. If there
is no snapshot in the FS, the items left in reconnected_snaprealms
should be other MDS' mdsdir. I think it's harmless.

If there are snapshots in the FS, the assertion probably can catch
real bugs. But at present, snapshot feature is broken, fixing it is
non-trivial. So replace the assertion with a warning.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: slient MDCache::trim_non_auth()
Yan, Zheng [Wed, 22 May 2013 23:37:40 +0000 (07:37 +0800)]
mds: slient MDCache::trim_non_auth()

No need to output the function's debug message to console.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: fix check for base inode discovery
Yan, Zheng [Sat, 11 May 2013 10:47:49 +0000 (18:47 +0800)]
mds: fix check for base inode discovery

If a MDiscover message is for discovering base inode, want_base_dir
should be false, path should be empty.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: Fix replica's allowed caps for filelock in SYNC_LOCK state
Yan, Zheng [Mon, 6 May 2013 02:18:36 +0000 (10:18 +0800)]
mds: Fix replica's allowed caps for filelock in SYNC_LOCK state

For replica, filelock in LOCK_LOCK state doesn't allow Fc cap. So
filelock in LOCK_SYNC_LOCK/LOCK_EXCL_LOCK state shouldn't allow Fc
cap either.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: defer releasing cap if necessary
Yan, Zheng [Mon, 6 May 2013 01:09:59 +0000 (09:09 +0800)]
mds: defer releasing cap if necessary

When inode is freezing or frozen, we defer processing MClientCaps
messages and cap release embedded in requests. The same deferral
logical should also cover MClientCapRelease messages.

12 years agomds: fix Locker::request_inode_file_caps()
Yan, Zheng [Thu, 16 May 2013 17:44:23 +0000 (01:44 +0800)]
mds: fix Locker::request_inode_file_caps()

After sending cache rejoin message, replica need notify auth MDS when
cap_wanted changes. But it can send MInodeFileCaps message only after
receiving auth MDS' rejoin ack. Locker::request_inode_file_caps() has
correct wait logical, but it skips sending MInodeFileCaps message if
the auth MDS is still in rejoin state.

The fix is defer sending MInodeFileCaps message until the auth MDS
is active. It makes the function's wait logical less tricky.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: notify auth MDS when cap_wanted changes
Yan, Zheng [Mon, 6 May 2013 01:17:01 +0000 (09:17 +0800)]
mds: notify auth MDS when cap_wanted changes

So the auth MDS can choose locks' states base on our cap_wanted.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: export CInode:mds_caps_wanted
Yan, Zheng [Mon, 6 May 2013 01:06:52 +0000 (09:06 +0800)]
mds: export CInode:mds_caps_wanted

CInode:mds_caps_wanted is used to keep track of caps wanted by non-auth
MDS. The auth MDS checks it when choosing locks' states.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: export CInode::STATE_NEEDSRECOVER
Yan, Zheng [Mon, 6 May 2013 01:00:19 +0000 (09:00 +0800)]
mds: export CInode::STATE_NEEDSRECOVER

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: send slave request after target MDS is active
Yan, Zheng [Mon, 8 Apr 2013 08:17:11 +0000 (16:17 +0800)]
mds: send slave request after target MDS is active

when failure of peer is detected, MDCache::handle_mds_failure()
checks if there are requests waiting for slave replies from the
failed peer, and adds them to the "wait for active peer" list.
The "retry request" logical only covers slave requests sent before
MDCache::handle_mds_failure() is called. If a slave request was
sent while peer isn't up, we wait for its reply forever.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: unfreeze inode after rename rollback finishes
Yan, Zheng [Sat, 6 Apr 2013 22:35:56 +0000 (06:35 +0800)]
mds: unfreeze inode after rename rollback finishes

we should not wake up the unfreeze waiter while the inode is still
linked to a non-auth dirfrag.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: remove buggy cache rejoin code
Yan, Zheng [Tue, 7 May 2013 00:56:11 +0000 (08:56 +0800)]
mds: remove buggy cache rejoin code

I previously added code to handle a corner case of cache rejoin:
entire subtree, together with the inode subtree root belongs to,
were trimmed between sending cache rejoin and receiving rejoin ack.
In this case, we should send cache expire message to the subtree's
auth MDS. But the code is complete broken, remove it temporarily.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: fix typo in Server::do_rename_rollback
Yan, Zheng [Sun, 7 Apr 2013 06:49:53 +0000 (14:49 +0800)]
mds: fix typo in Server::do_rename_rollback

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: fix import cancel race
Yan, Zheng [Sat, 6 Apr 2013 03:25:15 +0000 (11:25 +0800)]
mds: fix import cancel race

Current code uses import state to detect obsolete import discover/prep
message. it does not work for the case: cancel a subtree import, import
the same subtree again, the discover/prep message for the first import
get dispatched.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: fix straydn race
Yan, Zheng [Fri, 5 Apr 2013 11:50:35 +0000 (19:50 +0800)]
mds: fix straydn race

For unlink/rename request, the target dentry's linkage may change
before all locks are acquired. So we need check if the existing stray
dentry is valid.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: fix slave commit tracking
Yan, Zheng [Wed, 15 May 2013 08:35:39 +0000 (16:35 +0800)]
mds: fix slave commit tracking

MDS may crash after journalling a slave commit, but before sending
commit ack to the master. Later when the MDS restarts, it will not
send commit ack to the master. So the master waits for the commit
ack forever. The fix is remove failed MDS from requests' uncommitted
slave list. When failed MDS recovers, its resolve message will tell
the master which slave requests are not committed. The master will
re-add the recovering MDS to requests' uncommitted slave list if
necessary.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: fix uncommitted master wait
Yan, Zheng [Fri, 5 Apr 2013 06:48:13 +0000 (14:48 +0800)]
mds: fix uncommitted master wait

We may add new waiter while the master is committing. so we should
take the waiters and wake up them when the master is committed.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: adjust subtree auth if import aborts in PREPPED state
Yan, Zheng [Thu, 4 Apr 2013 23:53:39 +0000 (07:53 +0800)]
mds: adjust subtree auth if import aborts in PREPPED state

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: don't stop at export bounds when journaling dir context
Yan, Zheng [Thu, 4 Apr 2013 03:06:09 +0000 (11:06 +0800)]
mds: don't stop at export bounds when journaling dir context

We only journal the finish of exporting subtree, so we shouldn't
consider export bounds as subtree root.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: fix underwater dentry cleanup
Yan, Zheng [Tue, 2 Apr 2013 07:46:51 +0000 (15:46 +0800)]
mds: fix underwater dentry cleanup

If the underwater dentry is a remove link, we shouldn't mark the
inode clean

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: journal new subtrees created by rename
Yan, Zheng [Wed, 20 Mar 2013 07:42:50 +0000 (15:42 +0800)]
mds: journal new subtrees created by rename

this avoids creating bare dirfrags during journal replay.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agoPendingReleaseNotes: notes about enabling HASHPSPOOL
Sage Weil [Tue, 28 May 2013 04:16:46 +0000 (21:16 -0700)]
PendingReleaseNotes: notes about enabling HASHPSPOOL

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdmaptool: fix cli tests
Sage Weil [Tue, 28 May 2013 04:12:29 +0000 (21:12 -0700)]
osdmaptool: fix cli tests

Now that the default pool flags have changed.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #321 from dalgaaf/wip-da-CID-727981
Sage Weil [Mon, 27 May 2013 20:55:54 +0000 (13:55 -0700)]
Merge pull request #321 from dalgaaf/wip-da-CID-727981

kv_flat_btree_async.cc: fix AioCompletion resource leak

12 years agoMerge pull request #320 from dalgaaf/wip-da-CID-727983
Sage Weil [Mon, 27 May 2013 20:55:24 +0000 (13:55 -0700)]
Merge pull request #320 from dalgaaf/wip-da-CID-727983

kv_flat_btree_async.cc: fix resource leak

12 years agodoc: Updated rgw.conf example.
John Wilkins [Sat, 25 May 2013 22:13:01 +0000 (15:13 -0700)]
doc: Updated rgw.conf example.

fixes: #4608

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Updated RGW Quickstart.
John Wilkins [Sat, 25 May 2013 22:11:49 +0000 (15:11 -0700)]
doc: Updated RGW Quickstart.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Updated index for newer terms.
John Wilkins [Sat, 25 May 2013 22:11:06 +0000 (15:11 -0700)]
doc: Updated index for newer terms.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agopg_pool_t: enable FLAG_HASHPSPOOL by default
Samuel Just [Fri, 24 May 2013 23:20:38 +0000 (16:20 -0700)]
pg_pool_t: enable FLAG_HASHPSPOOL by default

Fixes: #5160
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agokv_flat_btree_async.cc: fix AioCompletion resource leak 321/head
Danny Al-Gaaf [Fri, 24 May 2013 12:47:49 +0000 (14:47 +0200)]
kv_flat_btree_async.cc: fix AioCompletion resource leak

Call AioCompletion::release() if the completion is no longer
needed to free the resources.

CID 727981 (#3 of 3): Resource leak (RESOURCE_LEAK)
  leaked_storage: Variable "top_aioc" going out of scope leaks the
  storage it points to.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agokv_flat_btree_async.cc: fix resource leak 320/head
Danny Al-Gaaf [Fri, 24 May 2013 12:29:14 +0000 (14:29 +0200)]
kv_flat_btree_async.cc: fix resource leak

Call AioCompletion::release() if the completion is no longer
needed to free the resources.

CID 727983 : Resource leak (RESOURCE_LEAK)
  leaked_storage: Variable "aioc" going out of scope leaks the
  storage it points to.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoceph-disk: remove unnecessary semicolons 319/head
Danny Al-Gaaf [Fri, 24 May 2013 10:46:15 +0000 (12:46 +0200)]
ceph-disk: remove unnecessary semicolons

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoceph-disk: cast output of _check_output()
Danny Al-Gaaf [Fri, 24 May 2013 10:41:11 +0000 (12:41 +0200)]
ceph-disk: cast output of _check_output()

Cast output of _check_output() to str() to be able to use
str.split().

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoceph-disk: fix undefined variable
Danny Al-Gaaf [Fri, 24 May 2013 10:33:16 +0000 (12:33 +0200)]
ceph-disk: fix undefined variable

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoceph-disk: add missing spaces around operator
Danny Al-Gaaf [Fri, 24 May 2013 10:29:07 +0000 (12:29 +0200)]
ceph-disk: add missing spaces around operator

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoMerge branch 'wip_scrub_tphandle' into next
Samuel Just [Fri, 24 May 2013 03:08:11 +0000 (20:08 -0700)]
Merge branch 'wip_scrub_tphandle' into next

Fixes: #5159
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoPG: ping tphandle during omap loop as well
Samuel Just [Fri, 24 May 2013 00:40:44 +0000 (17:40 -0700)]
PG: ping tphandle during omap loop as well

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoPG: reset timeout in _scan_list for each object, read chunk
Samuel Just [Thu, 23 May 2013 22:24:39 +0000 (15:24 -0700)]
PG: reset timeout in _scan_list for each object, read chunk

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoOSD,PG: pass tphandle down to _scan_list
Samuel Just [Thu, 23 May 2013 22:23:05 +0000 (15:23 -0700)]
OSD,PG: pass tphandle down to _scan_list

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agodoc: Updated Ceph FS Quick Start.
John Wilkins [Fri, 24 May 2013 00:02:17 +0000 (17:02 -0700)]
doc: Updated Ceph FS Quick Start.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added troubleshooting to Ceph FS index.
John Wilkins [Fri, 24 May 2013 00:01:51 +0000 (17:01 -0700)]
doc: Added troubleshooting to Ceph FS index.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added separate troubleshooting for MDS and Ceph FS.
John Wilkins [Fri, 24 May 2013 00:01:29 +0000 (17:01 -0700)]
doc: Added separate troubleshooting for MDS and Ceph FS.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>