]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
14 years agoconfig: Update sample config with more examples
Wido den Hollander [Fri, 29 Apr 2011 17:39:04 +0000 (10:39 -0700)]
config: Update sample config with more examples

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Signed-off-by: Wido den Hollander <wido@widodh.nl>
14 years agocommon_init: set log_file, not log_dir, by default
Colin Patrick McCabe [Fri, 29 Apr 2011 17:23:10 +0000 (10:23 -0700)]
common_init: set log_file, not log_dir, by default

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
14 years agocommon_init: don't modify log_per_instance
Colin Patrick McCabe [Fri, 29 Apr 2011 17:21:08 +0000 (10:21 -0700)]
common_init: don't modify log_per_instance

check it in DoutStreambuf instead.

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
14 years agomsgr: remove dup .start() call check, remove cruft
Sage Weil [Fri, 29 Apr 2011 16:49:05 +0000 (09:49 -0700)]
msgr: remove dup .start() call check, remove cruft

There is now no ordering constraint wrt the daemonize bits; those can
safely be pulled out.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agohadoop: cleanups for libceph type update
Jim Schutt [Fri, 29 Apr 2011 15:13:59 +0000 (09:13 -0600)]
hadoop: cleanups for libceph type update

Signed-off-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agolfn: put lfn outside of user.ceph namesapce
Sage Weil [Thu, 28 Apr 2011 23:01:23 +0000 (16:01 -0700)]
lfn: put lfn outside of user.ceph namesapce

This completely hides the lfn from the ObjectStore interface users.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agoMerge remote branch 'origin/master' into lfn
Sage Weil [Thu, 28 Apr 2011 22:45:19 +0000 (15:45 -0700)]
Merge remote branch 'origin/master' into lfn

14 years agomdsmap: show mds name in summary
Sage Weil [Thu, 28 Apr 2011 22:55:20 +0000 (15:55 -0700)]
mdsmap: show mds name in summary

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agohadoop: update libceph types
Sage Weil [Thu, 28 Apr 2011 22:52:07 +0000 (15:52 -0700)]
hadoop: update libceph types

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agohypertable: update libceph types
Sage Weil [Thu, 28 Apr 2011 22:52:01 +0000 (15:52 -0700)]
hypertable: update libceph types

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agolibceph: error out if USE_FILE_OFFSET64 not defined
Sage Weil [Thu, 28 Apr 2011 22:49:37 +0000 (15:49 -0700)]
libceph: error out if USE_FILE_OFFSET64 not defined

Otherwise struct dirent will not match user code and badness on readdir
will ensure.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agolfn: don't return ENOENT if it's not lfn in some cases
Yehuda Sadeh [Thu, 28 Apr 2011 22:44:28 +0000 (15:44 -0700)]
lfn: don't return ENOENT if it's not lfn in some cases

14 years agomds: ignore fragment_notify when dft state doesn't match
Sage Weil [Thu, 28 Apr 2011 22:17:18 +0000 (15:17 -0700)]
mds: ignore fragment_notify when dft state doesn't match

In particular, if there is a resolve in there somewhere, we may have found
out about this refragment from the src because they send resolve messages
to all nodes (to resolve ambiguous migrations).  If that's the case we
can ignore the message.

Fixes crash like

2011-04-28 14:30:31.179106 7fe72e325710 -- 10.0.1.252:6805/22158 <== mds2 10.0.1.252:6803/25635 548 ==== fragment_notify(300000000b4#00* 1) v1 ==== 17+0+0 (2192211443 0 0) 0x2c9ec00 con 0x323d140
2011-04-28 14:30:31.179116 7fe72e325710 mds1.cache handle_fragment_notify fragment_notify(300000000b4#00* 1) v1 from mds2
2011-04-28 14:30:31.179149 7fe72e325710 mds1.cache adjust_dir_fragments 00* 1 on [inode 300000000b4 [...2,head] /syn.4114.0/dir.0/dir.0/dir.0/dir.0/dir.0/dir.5/dir.5/ auth{0=1,2=1} fragtree_t(*^2 00*^1) v188 ap=1 f(v0 m2011-04-28 14:23:59.074510 7=7+0) n(v2 rc2011-04-28 14:23:59.074510 15=14+1) (idft mix->lock g=0,2 dirty) (inest mix dirty) (ifile excl dirty) (ixattr excl) (iversion lock) caps={4114=pAsLsXsxFsx/-@1},l=4114(-1) | dirtyscattered dirfrag caps replicated dirty authpin 0x32bec70]
2011-04-28 14:30:31.179182 7fe72e325710 mds1.cache adjust_dir_fragments 00* bits 1 srcfrags 0x3080860,0x378da50 on [inode 300000000b4 [...2,head] /syn.4114.0/dir.0/dir.0/dir.0/dir.0/dir.0/dir.5/dir.5/ auth{0=1,2=1} fragtree_t(*^2 00*^1) v188 ap=1 f(v0 m2011-04-28 14:23:59.074510 7=7+0) n(v2 rc2011-04-28 14:23:59.074510 15=14+1) (idft mix->lock g=0,2 dirty) (inest mix dirty) (ifile excl dirty) (ixattr excl) (iversion lock) caps={4114=pAsLsXsxFsx/-@1},l=4114(-1) | dirtyscattered dirfrag caps replicated dirty authpin 0x32bec70]
2011-04-28 14:30:31.179218 7fe72e325710 mds1.cache  new fragtree is fragtree_t(*^2 00*^1)
mds/MDCache.cc: In function 'void MDCache::adjust_dir_fragments(CInode*, std::list<CDir*, std::allocator<CDir*> >&, frag_t, int, std::list<CDir*, std::allocator<CDir*> >&, std::list<Context*, std::allocator<Context*> >&, bool)', in thread '0x7fe72e325710'
mds/MDCache.cc: 9254: FAILED assert(srcfrags.size() == 1)
 ceph version 0.27-165-gaf908f8 (commit:af908f82924a67be3aeb2767eaa05ba04c145f42)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x53) [0xa5775e]
 2: (MDCache::adjust_dir_fragments(CInode*, std::list<CDir*, std::allocator<CDir*> >&, frag_t, int, std::list<CDir*, std::allocator<CDir*> >&, std::list<Context*, std::allocator<Context*> >&, bool)+0x2dd) [0x888bbf]
 3: (MDCache::adjust_dir_fragments(CInode*, frag_t, int, std::list<CDir*, std::allocator<CDir*> >&, std::list<Context*, std::allocator<Context*> >&, bool)+0x13d) [0x88817d]
 4: (MDCache::handle_fragment_notify(MMDSFragmentNotify*)+0x199) [0x88bac5]
 5: (MDCache::dispatch(Message*)+0x124) [0x8765ea]
 6: (MDS::handle_deferrable_message(Message*)+0x1f5) [0x77a607]
 7: (MDS::_dispatch(Message*)+0x784) [0x77ba90]

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: do not send fragment_notify to <= recovering nodes
Sage Weil [Thu, 28 Apr 2011 22:02:58 +0000 (15:02 -0700)]
mds: do not send fragment_notify to <= recovering nodes

They will get sorted out during rejoin.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix uninint warning on cur
Sage Weil [Thu, 28 Apr 2011 21:53:35 +0000 (14:53 -0700)]
mds: fix uninint warning on cur

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: handle import cancel while logging EImportStart
Sage Weil [Thu, 28 Apr 2011 21:17:49 +0000 (14:17 -0700)]
mds: handle import cancel while logging EImportStart

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoclient: do not send request to mds -1
Sage Weil [Thu, 28 Apr 2011 21:09:52 +0000 (14:09 -0700)]
client: do not send request to mds -1

If we can't find a target, or the chosen target isn't active, wait.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agolfn: set hash and file name constants
Yehuda Sadeh [Thu, 28 Apr 2011 20:57:45 +0000 (13:57 -0700)]
lfn: set hash and file name constants

14 years agoosd: remove warning about max object name length
Yehuda Sadeh [Thu, 28 Apr 2011 20:51:48 +0000 (13:51 -0700)]
osd: remove warning about max object name length

14 years agomds: try_trim_non_auth_subtree on any canceled import (including resolve)
Sage Weil [Thu, 28 Apr 2011 20:44:55 +0000 (13:44 -0700)]
mds: try_trim_non_auth_subtree on any canceled import (including resolve)

We were trimming on journal replay of an import failure, but not on a
canceled ambiguous import during resolve.  Fix that by moving the call into
the helper (and passing a CDir* instead of a dirfrag_t).

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: make trim_non_auth paths complete filepaths (not dnames)
Sage Weil [Thu, 28 Apr 2011 20:34:34 +0000 (13:34 -0700)]
mds: make trim_non_auth paths complete filepaths (not dnames)

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix steal_dentry dir_auth_pins adjustment
Sage Weil [Thu, 28 Apr 2011 20:22:30 +0000 (13:22 -0700)]
mds: fix steal_dentry dir_auth_pins adjustment

Pass down the correct value for dir_auth_pins (dh->auth_pins plus the
inode's auth_pins, but nothing nested beneath the inode).  The CDentry
doesn't track dir auth pins independently, and doesn't really need to.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomon: use tcmalloc
Sage Weil [Thu, 28 Apr 2011 20:08:34 +0000 (13:08 -0700)]
mon: use tcmalloc

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix export_prep trace format
Sage Weil [Thu, 28 Apr 2011 20:00:44 +0000 (13:00 -0700)]
mds: fix export_prep trace format

The prep message includes a spanning tree in the interior of the subtree
that includes all parent inodes of bounding dirfrags.  That used to look
like
df dentry inode (dir dentry inode)*

The code to generate those traces was stopping if the df->ino had already
been included.  The problem was that we may have done the that inode on a
different dirfrag.

Change this to be

df ('-' | ('f' dir | 'd') dentry inode (dir dentry inode)*)

so that we can start with a dentry (already had the dirfrag, same check
as before) or a dirfrag (already had the inode, the new case), or a '-'
(nothing at all).  A single byte is used to indicate which it is and how
to start decoding.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agolibceph: no _t types
Sage Weil [Thu, 28 Apr 2011 19:34:11 +0000 (12:34 -0700)]
libceph: no _t types

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agolfn: short fn length is constant and accurate
Yehuda Sadeh [Thu, 28 Apr 2011 18:24:50 +0000 (11:24 -0700)]
lfn: short fn length is constant and accurate

also, disabling real hashing for now

14 years agoosd: bump up max object name size
Yehuda Sadeh [Thu, 28 Apr 2011 18:16:17 +0000 (11:16 -0700)]
osd: bump up max object name size

14 years agocrypto: add support for SHA256
Yehuda Sadeh [Thu, 28 Apr 2011 18:15:50 +0000 (11:15 -0700)]
crypto: add support for SHA256

14 years agolibceph: typedef struct mystruct *mystruct_t
Sage Weil [Thu, 28 Apr 2011 18:11:14 +0000 (11:11 -0700)]
libceph: typedef struct mystruct *mystruct_t

Needed to drop the ceph_ prefix on the internal ceph_dir_result_t type
to prevent the ceph_dir_result_t typedef from colliding.

ceph_mount_info to avoid colliding with int ceph_mount().

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge commit 'origin/master' into lfn
Yehuda Sadeh [Thu, 28 Apr 2011 18:04:40 +0000 (11:04 -0700)]
Merge commit 'origin/master' into lfn

14 years agolibceph: include 'struct' in declarations for C compilation
Sage Weil [Thu, 28 Apr 2011 17:37:18 +0000 (10:37 -0700)]
libceph: include 'struct' in declarations for C compilation

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix auth_pin check
Sage Weil [Thu, 28 Apr 2011 16:30:29 +0000 (09:30 -0700)]
mds: fix auth_pin check

The inode only gets an auth_pin if the dirfrag is not a subtree root.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'master' of ceph.newdream.net:git/ceph
Sage Weil [Thu, 28 Apr 2011 00:08:49 +0000 (17:08 -0700)]
Merge branch 'master' of ceph.newdream.net:git/ceph

14 years agomds: handle freeze completion delayed by frozen inode
Sage Weil [Thu, 28 Apr 2011 00:07:54 +0000 (17:07 -0700)]
mds: handle freeze completion delayed by frozen inode

We can't complete a freeze_tree if we are not a subtree and the parent
inode is frozen.  If that's the case, we were just doing nothing on the
auth_unpin, but that means the freeze_tree would never complete.

Instead, retake an auth_pin (on behalf of the parent) and release it when
the parent inode unfreezes.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agolfn: replace hash function
Yehuda Sadeh [Wed, 27 Apr 2011 23:32:52 +0000 (16:32 -0700)]
lfn: replace hash function

for some reason crashes when using libnss

14 years agomds: add 'mds debug auth pins' option
Sage Weil [Wed, 27 Apr 2011 23:34:16 +0000 (16:34 -0700)]
mds: add 'mds debug auth pins' option

This counts dirfrag auth_pins and ensure the inode's nested_auth_pins
count is correct.  Helped catch the bug fixed in the previous commit.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix nested_auth_pin accounting on refragment
Sage Weil [Wed, 27 Apr 2011 23:33:21 +0000 (16:33 -0700)]
mds: fix nested_auth_pin accounting on refragment

The diri gets an auth_pin on the first frag pin when it is not a subtree
root.  When we are moving dentries between frags during refragment, make
sure we use the adjust_nested_auth_pins method to have one such pin per
fragment.

Carry an auth_pin on the old fragment for the duration to ensure that the
pinning/unpinning as no side-effects.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: maintain dn pinning invariants during freezing for refragmenting
Sage Weil [Wed, 27 Apr 2011 22:52:26 +0000 (15:52 -0700)]
mds: maintain dn pinning invariants during freezing for refragmenting

fragment_mark_and_complete aims to complete the in-cache directory,
mark+pin every dentry, then drop a final auth_pin so that the whole thing
freezes.  The problem is we may not be holding the final auth_pin, and
other dentries may get added (or removed?) between the mark and freeze
stages.

Use the DNPINNEDFRAG dir state bit to maintain the invariant that that
bit is set IFF all dentries are similarly pinned and marked.  Update the
add_*_dentry and remove_dentry methods to do that.

Fix the success path to assert this was true and to clean up(!).  Also
fix the unwind/failure path to assert.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: freeze fragments during split/merge
Sage Weil [Wed, 27 Apr 2011 22:09:35 +0000 (15:09 -0700)]
mds: freeze fragments during split/merge

Freeze the target fragment(s) before unfreezing the old fragment(s) to
avoid any weird events going off when the unfreeze unauth_pins the dir
inode (in certain cases).  This makes the whole process cleaner and more
symmetrical.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agolfn: some more fixes
Yehuda Sadeh [Wed, 27 Apr 2011 20:03:28 +0000 (13:03 -0700)]
lfn: some more fixes

14 years agoautomake: Make debug targets known but not built by default in non-debug builds.
Tommi Virtanen [Wed, 27 Apr 2011 19:16:52 +0000 (12:16 -0700)]
automake: Make debug targets known but not built by default in non-debug builds.

With this, "./configure --without-debug && make -C src testceph" will work.
Before this, it would use make builtin rules, and fail to compile in a
confusing manner.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
14 years agomds: remove erroneous fixme.
Greg Farnum [Wed, 27 Apr 2011 18:24:43 +0000 (11:24 -0700)]
mds: remove erroneous fixme.

This is for the client map journaling, but that's handled
elsewhere within this function...no idea why it ever had
a fixme there!

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agomds: handle discovers that race with refragmenting
Sage Weil [Wed, 27 Apr 2011 17:59:40 +0000 (10:59 -0700)]
mds: handle discovers that race with refragmenting

Consider:

 - send discover on frag X
 - X refragments
   - we take the waiter and rediscover on frag Y
 - we get the reply for the X discover

The auth mds will correctly delay sending the reply until the refragment
completes and it unfreezes, but the reply was getting the original frag_t,
not the new one.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: Replay new client sessions on slave-rename importing.
Greg Farnum [Wed, 27 Apr 2011 17:52:09 +0000 (10:52 -0700)]
mds: Replay new client sessions on slave-rename importing.

We've been logging the sessions for ages but never
actually opened them.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agomds: pay attention to *stat staleness during split
Sage Weil [Wed, 27 Apr 2011 17:14:10 +0000 (10:14 -0700)]
mds: pay attention to *stat staleness during split

Leave only the first frag stale, since we are already doing that with the
accounted_ differential.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: merge accounted_* stats
Sage Weil [Wed, 27 Apr 2011 17:00:30 +0000 (10:00 -0700)]
mds: merge accounted_* stats

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoobsync: use lxml to parse XML ACL
Colin Patrick McCabe [Tue, 26 Apr 2011 23:27:14 +0000 (16:27 -0700)]
obsync: use lxml to parse XML ACL

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
14 years agolibceph: move header file to include/ceph/libceph.h
Sage Weil [Wed, 27 Apr 2011 03:45:36 +0000 (20:45 -0700)]
libceph: move header file to include/ceph/libceph.h

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agolfn: some fixes
Yehuda Sadeh [Wed, 27 Apr 2011 00:24:53 +0000 (17:24 -0700)]
lfn: some fixes

14 years agolfn: amend long file name hashing
Yehuda Sadeh [Tue, 26 Apr 2011 23:49:19 +0000 (16:49 -0700)]
lfn: amend long file name hashing

14 years agomds: ignore resolve messages received prior to resolve stage
Sage Weil [Tue, 26 Apr 2011 23:46:57 +0000 (16:46 -0700)]
mds: ignore resolve messages received prior to resolve stage

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: handle aborted export during pre-export sync
Sage Weil [Tue, 26 Apr 2011 23:39:18 +0000 (16:39 -0700)]
mds: handle aborted export during pre-export sync

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agolfn: push cid/oid translation down
Yehuda Sadeh [Tue, 26 Apr 2011 23:33:30 +0000 (16:33 -0700)]
lfn: push cid/oid translation down

compiles now, not tested, probably doesn't work

14 years agomds: drop messages to down mdss
Sage Weil [Tue, 26 Apr 2011 23:28:03 +0000 (16:28 -0700)]
mds: drop messages to down mdss

...instead of asserting in MDSMap::get_inst.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: do not send heartbeat when degraded
Sage Weil [Tue, 26 Apr 2011 23:18:21 +0000 (16:18 -0700)]
mds: do not send heartbeat when degraded

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix discover tid assignment
Sage Weil [Tue, 26 Apr 2011 23:09:43 +0000 (16:09 -0700)]
mds: fix discover tid assignment

Hmm!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agovstart.sh: remove cruft
Sage Weil [Tue, 26 Apr 2011 22:51:11 +0000 (15:51 -0700)]
vstart.sh: remove cruft

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomon: fix standby-replay assignment (again)
Sage Weil [Tue, 26 Apr 2011 22:44:56 +0000 (15:44 -0700)]
mon: fix standby-replay assignment (again)

Only assign a random node to standby-replay if they are marked as
STANDBY_ANY.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoauth: Avoid const mismatch in nss_aes_operation
Jim Schutt [Tue, 26 Apr 2011 22:39:00 +0000 (15:39 -0700)]
auth: Avoid const mismatch in nss_aes_operation

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Signed-off-by: Jim Schutt <jaschut@sandia.gov>
14 years agoconfigure.ac: check for supported compiler flags
Jim Schutt [Tue, 26 Apr 2011 22:06:46 +0000 (15:06 -0700)]
configure.ac: check for supported compiler flags

Ancient versions of gcc, such as the gcc 4.1.2 in RHEL 5.5, don't
support some -W flags that newer versions do.  Fix up configure.ac
and Makefile.am to use them if you have them.

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
14 years agovstart.sh: set up pairs for each rank when -s is on
Sage Weil [Tue, 26 Apr 2011 22:31:53 +0000 (15:31 -0700)]
vstart.sh: set up pairs for each rank when -s is on

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomon: rework assignment of standby-replay, expansion nodes
Sage Weil [Tue, 26 Apr 2011 22:26:57 +0000 (15:26 -0700)]
mon: rework assignment of standby-replay, expansion nodes

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomon: fix standby-replay assignment logic
Sage Weil [Tue, 26 Apr 2011 21:43:24 +0000 (14:43 -0700)]
mon: fix standby-replay assignment logic

Assign a standby-replay at any time based on rank, name, or no preference.
Previously this could only happen when the MDS first started, and we would
fail if the target MDS wasn't followable at that point in time.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'osd_trans'
Sage Weil [Tue, 26 Apr 2011 20:40:34 +0000 (13:40 -0700)]
Merge branch 'osd_trans'

14 years agoMerge remote branch 'origin/stable'
Sage Weil [Tue, 26 Apr 2011 20:40:21 +0000 (13:40 -0700)]
Merge remote branch 'origin/stable'

14 years agoosd: move watch/notify effects out of do_osd_ops
Sage Weil [Tue, 26 Apr 2011 20:39:58 +0000 (13:39 -0700)]
osd: move watch/notify effects out of do_osd_ops

Apply watch/notify side effects in do_osd_op_effects() only if the
transaction will succeed.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoobsync: implement RadosStore
Colin Patrick McCabe [Tue, 26 Apr 2011 18:23:18 +0000 (11:23 -0700)]
obsync: implement RadosStore

Implement RadosStore, a storage backend which accesses librados
directly, without going through RGW (Rados GateWay).

This version is still very preliminary because ACLs aren't supported.
We need ACLs even to do things like properly create buckets.
Instead, this version has ACL_HACK, which is just for testing purposes.

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
14 years agoosd: mention invalid snapc in log
Sage Weil [Tue, 26 Apr 2011 19:34:21 +0000 (12:34 -0700)]
osd: mention invalid snapc in log

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: include (some) osd op flags in MOSDOp print method
Sage Weil [Tue, 26 Apr 2011 19:10:33 +0000 (12:10 -0700)]
osd: include (some) osd op flags in MOSDOp print method

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agoosd: add RWORDERED osd op flag
Sage Weil [Tue, 26 Apr 2011 19:10:12 +0000 (12:10 -0700)]
osd: add RWORDERED osd op flag

Order this op wrt reads the same way a read-modify-write would be.
(Otherwise we may get a fast/stale read result on a not-yet-complete
write.)

This fixes a problem where the Filer was marking a probe stat as a write
to get this same effect, but the OSD would EINVAL if it was a snapped
object (which happens in certain cases where the MDS is recovering the
file size of a snapped file).

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agoradostool: fix getxattr / setxattr return code
Colin Patrick McCabe [Mon, 25 Apr 2011 18:52:45 +0000 (11:52 -0700)]
radostool: fix getxattr / setxattr return code

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
14 years agorbd: make showmapped output a bit prettier
Sage Weil [Tue, 26 Apr 2011 18:04:28 +0000 (11:04 -0700)]
rbd: make showmapped output a bit prettier

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agorbd: showmapped
Sage Weil [Tue, 26 Apr 2011 17:55:43 +0000 (10:55 -0700)]
rbd: showmapped

Show mapped rbd devices.

Fixes: #1024
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agopybind-rados: fix Ioctx::close
Colin Patrick McCabe [Tue, 26 Apr 2011 17:49:02 +0000 (10:49 -0700)]
pybind-rados: fix Ioctx::close

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
14 years agomds: only include head dentries in check_rstats() rstat check
Sage Weil [Tue, 26 Apr 2011 17:39:46 +0000 (10:39 -0700)]
mds: only include head dentries in check_rstats() rstat check

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agoosd: move ObjectState side effects out of do_osd_ops
Sage Weil [Tue, 26 Apr 2011 17:35:03 +0000 (10:35 -0700)]
osd: move ObjectState side effects out of do_osd_ops

We want to be able to handle a failure mid-way through an OSDOp
transaction and bail out with no side effects.  This patch

 * puts an ObjectState new_obs in the OoContext that modifications go in
 * only applies if it the transaction is a success
 * only does make_writeable (at the end!) if the transaction is a success

There are still side effects with the watch/notify stuff, though.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoobjectstore: implement Transaction::swap()
Sage Weil [Tue, 26 Apr 2011 17:17:47 +0000 (10:17 -0700)]
objectstore: implement Transaction::swap()

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoobjectstore: transaction::append()
Sage Weil [Tue, 26 Apr 2011 04:11:08 +0000 (21:11 -0700)]
objectstore: transaction::append()

Combine two transactions into one.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agobuffer: use std::swap
Sage Weil [Tue, 26 Apr 2011 17:22:20 +0000 (10:22 -0700)]
buffer: use std::swap

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agotest-obsync.py: add tests with --no-preserve-acls
Colin Patrick McCabe [Tue, 26 Apr 2011 17:06:54 +0000 (10:06 -0700)]
test-obsync.py: add tests with --no-preserve-acls

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
14 years agoosd: remove obsolete noop cruft
Sage Weil [Mon, 25 Apr 2011 22:41:31 +0000 (15:41 -0700)]
osd: remove obsolete noop cruft

The noop branching is all dead code.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: move snapset_context into ObjectContext from ObjectState
Sage Weil [Mon, 25 Apr 2011 22:28:15 +0000 (15:28 -0700)]
osd: move snapset_context into ObjectContext from ObjectState

ObjectState is now just static info (object_info_t and bool exists).

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoobjectstore: drop decode support for <= v0.19 encoded transactions
Sage Weil [Tue, 26 Apr 2011 04:06:18 +0000 (21:06 -0700)]
objectstore: drop decode support for <= v0.19 encoded transactions

This only affects online upgrade or journal replay of v0.19 generated
transactions.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agomon: fix pg stat summary
Sage Weil [Mon, 25 Apr 2011 23:27:38 +0000 (16:27 -0700)]
mon: fix pg stat summary

Had the pg state counts in there twice.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: undump-journal
Sage Weil [Tue, 26 Apr 2011 16:40:15 +0000 (09:40 -0700)]
mds: undump-journal

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agobuffer: break out read_fd method
Sage Weil [Tue, 26 Apr 2011 16:40:03 +0000 (09:40 -0700)]
buffer: break out read_fd method

Read N bytes from the provided fd into the bufferlist.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agorgw: check if bucket is empty before removing it
Yehuda Sadeh [Tue, 26 Apr 2011 00:15:05 +0000 (17:15 -0700)]
rgw: check if bucket is empty before removing it

14 years agoobsync: another fix for --no-preserve-acls
Colin Patrick McCabe [Mon, 25 Apr 2011 23:39:57 +0000 (16:39 -0700)]
obsync: another fix for --no-preserve-acls

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
14 years agoobsync: fix bug in --no-preserve-acls
Colin Patrick McCabe [Mon, 25 Apr 2011 23:03:19 +0000 (16:03 -0700)]
obsync: fix bug in --no-preserve-acls

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
14 years agomds: Don't twiddle lock states in the middle of an import.
Greg Farnum [Mon, 25 Apr 2011 21:40:09 +0000 (14:40 -0700)]
mds: Don't twiddle lock states in the middle of an import.

This should have gone in a028c8954ca240ec9a12682678aaee02eb507ae3.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoMerge branch 'stable'
Sage Weil [Mon, 25 Apr 2011 21:34:49 +0000 (14:34 -0700)]
Merge branch 'stable'

Conflicts:
src/mds/MDLog.cc
src/osdc/Journaler.cc
src/osdc/Journaler.h

14 years agorgw: send content length on put operation
Yehuda Sadeh [Mon, 25 Apr 2011 21:31:25 +0000 (14:31 -0700)]
rgw: send content length on put operation

14 years agorgw: send content length on put operation
Yehuda Sadeh [Mon, 25 Apr 2011 21:31:25 +0000 (14:31 -0700)]
rgw: send content length on put operation

14 years agomds: only move the journaler expire_pos forward
Sage Weil [Mon, 25 Apr 2011 18:44:14 +0000 (11:44 -0700)]
mds: only move the journaler expire_pos forward

We were seeing weird trim errors because expire_pos was getting moved
backwards after a standby-replay -> replay transition.  Make sure the two
places that update the expire_pos only move it forward--never backward.

Fixes: #1023
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: always trim standby segments after rereading the head
Sage Weil [Mon, 25 Apr 2011 18:42:31 +0000 (11:42 -0700)]
mds: always trim standby segments after rereading the head

When we re-read the head we may get an expire_pos that has moved forward in
time.  That is the appropriate time to trim segments during standby-replay.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: only write head once after expiring logsegments
Sage Weil [Mon, 25 Apr 2011 18:31:24 +0000 (11:31 -0700)]
mds: only write head once after expiring logsegments

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: small journaler cleanups
Sage Weil [Mon, 25 Apr 2011 18:25:05 +0000 (11:25 -0700)]
mds: small journaler cleanups

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agojournaler: separate out trimmed_pos setter
Sage Weil [Mon, 25 Apr 2011 18:24:22 +0000 (11:24 -0700)]
journaler: separate out trimmed_pos setter

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: wait for blacklisting osdmap on standby-replay -> replay final pass
Sage Weil [Mon, 25 Apr 2011 17:15:26 +0000 (10:15 -0700)]
mds: wait for blacklisting osdmap on standby-replay -> replay final pass

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agolibrados python binding: always use 64-bit offsets
Colin Patrick McCabe [Mon, 25 Apr 2011 17:29:12 +0000 (10:29 -0700)]
librados python binding: always use 64-bit offsets

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>