]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
14 years agoosd: trigger a store snapshot when the osdmap says to
Sage Weil [Wed, 11 May 2011 20:11:57 +0000 (13:11 -0700)]
osd: trigger a store snapshot when the osdmap says to

Move the OSDMap decoding up a bit so that we can either snapshot or flush.
We can't do it after we take map_lock or else we'll have problems dropping
and retaking osd_lock.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agofilestore: add a snapshot command to create a snapshot of the entire store
Sage Weil [Wed, 11 May 2011 20:10:53 +0000 (13:10 -0700)]
filestore: add a snapshot command to create a snapshot of the entire store

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomon: add 'osd cluster_snap foo' command
Sage Weil [Wed, 11 May 2011 20:10:28 +0000 (13:10 -0700)]
mon: add 'osd cluster_snap foo' command

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosdmap: add cluster_snapshot field
Sage Weil [Wed, 11 May 2011 20:10:02 +0000 (13:10 -0700)]
osdmap: add cluster_snapshot field

Add a cluster_snapshot marker in the map that is valid for a single epoch
to do a coordinated snapshot of the entire OSD cluster.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: initialize oi.oloc if on-disk value is bogus
Sage Weil [Tue, 10 May 2011 15:22:34 +0000 (08:22 -0700)]
osd: initialize oi.oloc if on-disk value is bogus

If the on-disk locator is undefined (upgrade of an old cluster?) initialize
the oloc fields based on the PG::Info.

Reported-by: ar Fred <ar.fred@yahoo.com>
Tested-by: ar Fred <ar.fred@yahoo.com>
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agoclient: map file stripes to acting osds
Sage Weil [Fri, 6 May 2011 22:23:44 +0000 (15:23 -0700)]
client: map file stripes to acting osds

Old result was just wrong if any osds were down.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: used fixed size types for fiemap/mapext/sparseread encoding
Sage Weil [Fri, 6 May 2011 20:42:23 +0000 (13:42 -0700)]
osd: used fixed size types for fiemap/mapext/sparseread encoding

The client expects <uint64_t,uint64_t>, so this breaks on any 32-bit osd.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosdmap: fix temp osd pg mapping
Sage Weil [Thu, 5 May 2011 23:08:58 +0000 (16:08 -0700)]
osdmap: fix temp osd pg mapping

If you feed in a raw pg (full precision) you should get the same mapping
out as when you plug in the effective/reduced precision pg.  The
raw_to_temp_osds() wasn't doing that, which gave you results like

flak:src 04:01 PM $ ./ceph pg map 0.4
2011-05-05 16:01:18.524051 mon <- [pg,map,0.4]
2011-05-05 16:01:18.524987 mon2 -> 'osdmap e11 pg 0.4 (0.4) -> up [1,0] acting [0]' (0)
flak:src 04:01 PM $ ./ceph pg map 0.7ed4
2011-05-05 16:01:21.755490 mon <- [pg,map,0.7ed4]
2011-05-05 16:01:21.755996 mon1 -> 'osdmap e11 pg 0.7ed4 (0.4) -> up [1,0] acting [1,0]' (0)

The objecter was feeding in raw pgs, so this was sending requests to the
wrong nodes.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agov0.27.1 v0.27.1
Sage Weil [Thu, 5 May 2011 20:42:20 +0000 (13:42 -0700)]
v0.27.1

14 years agomds: fix --reset-journal
Sage Weil [Thu, 5 May 2011 20:35:50 +0000 (13:35 -0700)]
mds: fix --reset-journal

Don't fork.  (Already fixed in master branch by the start_with_nonce
refactor, so this is just for 0.27.1.)

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agocfuse: encode/decode dev_t properly
Sage Weil [Tue, 3 May 2011 01:19:32 +0000 (18:19 -0700)]
cfuse: encode/decode dev_t properly

The fuse layer passes through "encoded" dev_t values (probably for
compatibility reasons or something).  I copied the encode/decode methods
from the kernel and encode/decode the st_rdev values where appropriate
(where struct stat is exposed directory or via the fuse_entry_param
struct).

Fixes: #1031
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agofilestore: fiemap should close the fd
Yehuda Sadeh [Fri, 29 Apr 2011 21:08:57 +0000 (14:08 -0700)]
filestore: fiemap should close the fd

14 years agomon: make 'ceph osd (down,out,in) N' take mulitple osd numbers
Sage Weil [Thu, 28 Apr 2011 19:42:23 +0000 (12:42 -0700)]
mon: make 'ceph osd (down,out,in) N' take mulitple osd numbers

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agoconfig: Remove debug output in conf_get
Wido den Hollander [Thu, 28 Apr 2011 13:03:08 +0000 (15:03 +0200)]
config: Remove debug output in conf_get

Signed-off-by: Wido den Hollander <wido@widodh.nl>
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agoosd: include (some) osd op flags in MOSDOp print method
Sage Weil [Tue, 26 Apr 2011 19:10:33 +0000 (12:10 -0700)]
osd: include (some) osd op flags in MOSDOp print method

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agoosd: add RWORDERED osd op flag
Sage Weil [Tue, 26 Apr 2011 19:10:12 +0000 (12:10 -0700)]
osd: add RWORDERED osd op flag

Order this op wrt reads the same way a read-modify-write would be.
(Otherwise we may get a fast/stale read result on a not-yet-complete
write.)

This fixes a problem where the Filer was marking a probe stat as a write
to get this same effect, but the OSD would EINVAL if it was a snapped
object (which happens in certain cases where the MDS is recovering the
file size of a snapped file).

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agomds: only include head dentries in check_rstats() rstat check
Sage Weil [Tue, 26 Apr 2011 17:39:46 +0000 (10:39 -0700)]
mds: only include head dentries in check_rstats() rstat check

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agomds: only move the journaler expire_pos forward
Sage Weil [Mon, 25 Apr 2011 18:44:14 +0000 (11:44 -0700)]
mds: only move the journaler expire_pos forward

We were seeing weird trim errors because expire_pos was getting moved
backwards after a standby-replay -> replay transition.  Make sure the two
places that update the expire_pos only move it forward--never backward.

Fixes: #1023
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: always trim standby segments after rereading the head
Sage Weil [Mon, 25 Apr 2011 18:42:31 +0000 (11:42 -0700)]
mds: always trim standby segments after rereading the head

When we re-read the head we may get an expire_pos that has moved forward in
time.  That is the appropriate time to trim segments during standby-replay.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: only write head once after expiring logsegments
Sage Weil [Mon, 25 Apr 2011 18:31:24 +0000 (11:31 -0700)]
mds: only write head once after expiring logsegments

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: small journaler cleanups
Sage Weil [Mon, 25 Apr 2011 18:25:05 +0000 (11:25 -0700)]
mds: small journaler cleanups

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agojournaler: separate out trimmed_pos setter
Sage Weil [Mon, 25 Apr 2011 18:24:22 +0000 (11:24 -0700)]
journaler: separate out trimmed_pos setter

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: wait for blacklisting osdmap on standby-replay -> replay final pass
Sage Weil [Mon, 25 Apr 2011 17:15:26 +0000 (10:15 -0700)]
mds: wait for blacklisting osdmap on standby-replay -> replay final pass

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agojournaler: fix flush completion when nothing to flush
Sage Weil [Mon, 25 Apr 2011 17:01:31 +0000 (10:01 -0700)]
journaler: fix flush completion when nothing to flush

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agojournaler: default to readonly; fix asserts
Sage Weil [Mon, 25 Apr 2011 16:44:17 +0000 (09:44 -0700)]
journaler: default to readonly; fix asserts

Previously were were never in a readonly state, which made all the existing
asserts meaningless.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agovstart.sh: fix -s
Sage Weil [Mon, 25 Apr 2011 16:14:09 +0000 (09:14 -0700)]
vstart.sh: fix -s

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agov0.27 v0.27
Sage Weil [Fri, 22 Apr 2011 23:52:04 +0000 (16:52 -0700)]
v0.27

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agocauthtool: -C not -c in man page
Sage Weil [Tue, 19 Apr 2011 22:33:16 +0000 (15:33 -0700)]
cauthtool: -C not -c in man page

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agoosd: better debug output on replay completion
Sage Weil [Tue, 19 Apr 2011 21:32:56 +0000 (14:32 -0700)]
osd: better debug output on replay completion

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agomkcephfs: allow a prebuild osdmap to be specified
Sage Weil [Tue, 19 Apr 2011 21:13:02 +0000 (14:13 -0700)]
mkcephfs: allow a prebuild osdmap to be specified

Otherwise we'll create one with osdmaptool --createsimple with the default
generic settins.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agoRevert "Revert "autoconf: Complain if tcmalloc is not found.""
Sage Weil [Tue, 19 Apr 2011 19:05:36 +0000 (12:05 -0700)]
Revert "Revert "autoconf: Complain if tcmalloc is not found.""

This reverts commit 05c281bfa9e9d69ea3d0197590950c8e6845a13a.

This should be okay now.

14 years agodebian: Handle missing tcmalloc on Debian lenny.
Tommi Virtanen [Tue, 19 Apr 2011 18:20:24 +0000 (11:20 -0700)]
debian: Handle missing tcmalloc on Debian lenny.

lenny doesn't have a suitable libgoogle-perftools-dev, and
release.sh edits it out of build-deps. Detect that and tell
configure that not having tcmalloc is ok.

This should make 05c281bfa9e9d69ea3d0197590950c8e6845a13a
unnecessary.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
14 years agodebian: Build without tcmalloc on non-i386/amd64.
Tommi Virtanen [Tue, 19 Apr 2011 18:04:01 +0000 (11:04 -0700)]
debian: Build without tcmalloc on non-i386/amd64.

This is not strictly needed as of 05c281bfa9e9d69ea3d0197590950c8e6845a13a,
but that reverting is hopefully only temporary.

Without this, with 05c281 undone, non-mainstream architectures
would fail to build.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
14 years agomds: remove MDSlaveUpdate from list on deletion
Sage Weil [Tue, 19 Apr 2011 16:25:30 +0000 (09:25 -0700)]
mds: remove MDSlaveUpdate from list on deletion

These are added to the LogSegment list on the slaves, but also need to be
removed from that list when we replay a COMMIT|ROLLBACK or when the op's
fate is determined during the resolve stage.

This fixes a crash like

./include/elist.h: In function 'elist<T>::item::~item() [with T =
MDSlaveUpdate*]', in thread '0x7fb2004d5700'
./include/elist.h: 39: FAILED assert(!is_on_list())
 ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
 1: (MDSlaveUpdate::~MDSlaveUpdate()+0x59) [0x4d9fe9]
 2: (ESlaveUpdate::replay(MDS*)+0x422) [0x4d2772]
 3: (MDLog::_replay_thread()+0xb90) [0x67f850]
 4: (MDLog::ReplayThread::entry()+0xd) [0x4b89ed]
 5: (()+0x7971) [0x7fb20564a971]
 6: (clone()+0x6d) [0x7fb2042e692d]
 ceph version 0.26 (commit:9981ff90968398da43c63106694d661f5e3d07d5)
 1: (MDSlaveUpdate::~MDSlaveUpdate()+0x59) [0x4d9fe9]
 2: (ESlaveUpdate::replay(MDS*)+0x422) [0x4d2772]
 3: (MDLog::_replay_thread()+0xb90) [0x67f850]
 4: (MDLog::ReplayThread::entry()+0xd) [0x4b89ed]
 5: (()+0x7971) [0x7fb20564a971]

Fixes: #1019
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge commit '8038c491ba90a8cbcd569e84d4cafc8bbdff81d5' into next
Sage Weil [Mon, 18 Apr 2011 23:26:06 +0000 (16:26 -0700)]
Merge commit '8038c491ba90a8cbcd569e84d4cafc8bbdff81d5' into next

14 years agoMerge remote branch 'origin/stable' into next
Sage Weil [Mon, 18 Apr 2011 23:23:03 +0000 (16:23 -0700)]
Merge remote branch 'origin/stable' into next

14 years agoosd: make ZERO on non-existent object a no-op
Sage Weil [Mon, 18 Apr 2011 20:55:16 +0000 (13:55 -0700)]
osd: make ZERO on non-existent object a no-op

Fixes bug where oi.size gets out of sync with the object size because we
actually write zeros.  (This explains #933.)

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoclitests: fix radosgw_admin test
Colin Patrick McCabe [Mon, 18 Apr 2011 18:43:45 +0000 (11:43 -0700)]
clitests: fix radosgw_admin test

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
14 years agoclitests: eliminate use of old-style section name
Colin Patrick McCabe [Mon, 18 Apr 2011 18:43:23 +0000 (11:43 -0700)]
clitests: eliminate use of old-style section name

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
14 years agoMDS: move slave rename xlock handling before finish_export_inode.
Greg Farnum [Sat, 16 Apr 2011 01:00:57 +0000 (18:00 -0700)]
MDS: move slave rename xlock handling before finish_export_inode.

finish_export_inode changes states! That's not good for our checks,
so just handle unpinning and stuff before we finish_export_inode.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoimprove debug printing
Greg Farnum [Fri, 15 Apr 2011 22:49:46 +0000 (15:49 -0700)]
improve debug printing

14 years agomds: Unify migration-handling code in _commit_slave_rename.
Greg Farnum [Thu, 14 Apr 2011 22:53:09 +0000 (15:53 -0700)]
mds: Unify migration-handling code in _commit_slave_rename.

We need to handle locks and pins on exported inodes but we
were using a separate if block with its own (non-matching!) check
for no good reason.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agomds: _commit_slave_rename needs to drop auth_pins for exported xlocks.
Greg Farnum [Mon, 11 Apr 2011 23:55:09 +0000 (16:55 -0700)]
mds: _commit_slave_rename needs to drop auth_pins for exported xlocks.

Otherwise these pins are never dropped from the inode since we
don't go through our normal xlock teardown code. Now we do!

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoMDS: Make _rename_apply inode import auth_pinning more intelligent.
Greg Farnum [Thu, 7 Apr 2011 00:05:26 +0000 (17:05 -0700)]
MDS: Make _rename_apply inode import auth_pinning more intelligent.

We don't want auth_pins on the locallocks (they're never auth_pinned)
and we only want new auth_pins that are for locks on the inode that we
imported -- not for each xlock that the mdr has everywhere (like,
say, on the srcdn)!

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agomds: If we're a slave, clean up xlocks when we export an inode.
Greg Farnum [Thu, 31 Mar 2011 21:02:48 +0000 (14:02 -0700)]
mds: If we're a slave, clean up xlocks when we export an inode.

Because we can do an inode import during a rename that skips the usual
channels, we were getting into an odd state with the xlocks (which we
did as a slave for an inode that we exported away). Clean up the
record of these xlocks for inodes before we get into the request
cleanup (at which point we are labeled as no-longer-auth, and the
standard cleanup routines will break).

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agomds: properly drop imported xlocks.
Greg Farnum [Thu, 31 Mar 2011 00:10:05 +0000 (17:10 -0700)]
mds: properly drop imported xlocks.

Because we can do an inode import during a rename that skips the usual
channels, we were getting into an odd state with the xlocks (which
were formerly remote and are now local). Clean up the record of
those remote xlocks.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoMDS: Server takes auth_pins for xlocks on imported inodes.
Greg Farnum [Fri, 25 Mar 2011 23:41:49 +0000 (16:41 -0700)]
MDS: Server takes auth_pins for xlocks on imported inodes.

Should fix #934.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoobjecter: resub ops on full->nonfull transition
Sage Weil [Mon, 18 Apr 2011 17:15:07 +0000 (10:15 -0700)]
objecter: resub ops on full->nonfull transition

This was broken a while ago during the last refactor.  Whoops!  Clean it
up to be smarter (and work at all).

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agoosd: show "full" or "nearfull" in osdmap summary line
Sage Weil [Mon, 18 Apr 2011 16:57:55 +0000 (09:57 -0700)]
osd: show "full" or "nearfull" in osdmap summary line

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agoMerge remote branch 'origin/stable'
Sage Weil [Mon, 18 Apr 2011 16:58:15 +0000 (09:58 -0700)]
Merge remote branch 'origin/stable'

Conflicts:
src/osdc/Journaler.cc

14 years agoMerge branch 'rgw_uid'
Yehuda Sadeh [Mon, 18 Apr 2011 16:56:08 +0000 (09:56 -0700)]
Merge branch 'rgw_uid'

14 years agorgw: remove get_user_info() and clean up
Yehuda Sadeh [Mon, 18 Apr 2011 15:56:52 +0000 (08:56 -0700)]
rgw: remove get_user_info() and clean up

rename all the get_uid_by_* to get_user_info_by_*, remove get_user_info()
and call the appropriate function instead (either the by_uid or by_access_key).

14 years agorgw: store user info on all indexes in the same format
Yehuda Sadeh [Mon, 18 Apr 2011 15:32:09 +0000 (08:32 -0700)]
rgw: store user info on all indexes in the same format

this breaks backward compatibility, we'll have to deal with that
later.

14 years agorgw_admin: can lookup user by access key
Yehuda Sadeh [Mon, 18 Apr 2011 15:15:11 +0000 (08:15 -0700)]
rgw_admin: can lookup user by access key

14 years agomount.ceph: behave when CONFIG_KEYS is not compiled in
Sage Weil [Mon, 18 Apr 2011 04:58:27 +0000 (21:58 -0700)]
mount.ceph: behave when CONFIG_KEYS is not compiled in

In that case we get ENOSYS.  This also implies an old version of the client
and that we should fall back.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agoradosgw_admin: Update manpage to new syntax
Wido den Hollander [Mon, 18 Apr 2011 00:40:46 +0000 (17:40 -0700)]
radosgw_admin: Update manpage to new syntax

Signed-off-by: Wido den Hollander <wido@widodh.nl>
Signed-off-by: Colin McCabe <cmccabe@alumni.cmu.edu>
14 years agoMDS: Fix Locker::handle_reqrdlock for xlocked locks.
Greg Farnum [Wed, 13 Apr 2011 23:02:51 +0000 (16:02 -0700)]
MDS: Fix Locker::handle_reqrdlock for xlocked locks.

We previously dropped the request but that was inappropriate for that
one case because the replica has no way to trigger a resend.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agomds: Always _open_parents when opening a new snaprealm
Sage Weil [Wed, 13 Apr 2011 20:57:49 +0000 (13:57 -0700)]
mds: Always _open_parents when opening a new snaprealm

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agomds: don't run all of try_subtree_merge on a rename across MDSes.
Greg Farnum [Mon, 11 Apr 2011 23:57:50 +0000 (16:57 -0700)]
mds: don't run all of try_subtree_merge on a rename across MDSes.

Previously we'd try and do the whole thing, which meant that
the replica got a lock twiddle before it had finished the export.
That broke things spectacularly, since we weren't respecting our
invariants about who gets remote locking messages.
Now we pass through a flag and respect our invariants.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agomds: adjust LocalLock can_xlock_local().
Greg Farnum [Thu, 7 Apr 2011 00:03:12 +0000 (17:03 -0700)]
mds: adjust LocalLock can_xlock_local().

I don't remember why we needed can_xlock_local() to begin with, but
I can tell that adding this get_xlock_by() check won't stop anything
working that was ever working to begin with (really it's still not
strong enough a check).

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agomds: Extend use of find_ino_peers.
Greg Farnum [Thu, 7 Apr 2011 00:01:53 +0000 (17:01 -0700)]
mds: Extend use of find_ino_peers.

Missed a few places that need it.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agomds: Make use of find_ino_peers
Greg Farnum [Fri, 1 Apr 2011 00:25:52 +0000 (17:25 -0700)]
mds: Make use of find_ino_peers

Previously we just had to give up on ESTALE. Now
we can attempt to recover!

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agorandom commenting
Greg Farnum [Thu, 24 Mar 2011 21:26:46 +0000 (14:26 -0700)]
random commenting

14 years agoMDS: Remove inappropriate assert from _logged_slave_rename.
Greg Farnum [Thu, 24 Mar 2011 21:11:06 +0000 (14:11 -0700)]
MDS: Remove inappropriate assert from _logged_slave_rename.

The slave also can hold some auth pins from locks which the
master has asked it to grab. It's possible we can intelligently
determine how many, but for now just drop the assert.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agoMDS: Server::handle_slave_rename_prep now accounts for dir snaplock.
Greg Farnum [Thu, 24 Mar 2011 19:23:38 +0000 (12:23 -0700)]
MDS: Server::handle_slave_rename_prep now accounts for dir snaplock.

Previously it ignored the auth pin required to hold snap xlock, which
is currently always held for a rename on a dir. This would lead to
a permanent hang on the request. Now we account for it!

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agoMDS: Don't move inode to snaprealms if not primary inode.
Greg Farnum [Wed, 23 Mar 2011 18:50:43 +0000 (11:50 -0700)]
MDS: Don't move inode to snaprealms if not primary inode.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agoMDCache: update assert to account for being a slave.
Greg Farnum [Wed, 23 Mar 2011 17:41:36 +0000 (10:41 -0700)]
MDCache: update assert to account for being a slave.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agoServer: push_projected_linkage in _link_remote
Greg Farnum [Tue, 22 Mar 2011 22:27:21 +0000 (15:27 -0700)]
Server: push_projected_linkage in _link_remote

_link_remote_finish will pop the linkage if inc==true, so we'd
better push it to match!

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agoServer: ensure slave mdses have full dest tree
Greg Farnum [Tue, 22 Mar 2011 21:23:33 +0000 (14:23 -0700)]
Server: ensure slave mdses have full dest tree

We were already taking rdlocks on the source tree, to make
sure that each slave MDS could traverse to the source dentry. Now,
if there are slave MDSes, we take rdlocks on each destination
ancestor to make sure the slaves can also traverse there.
This fixes an fsstress bug.

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agorgw: basic support for separate uid and access key
Yehuda Sadeh [Sat, 16 Apr 2011 00:20:44 +0000 (17:20 -0700)]
rgw: basic support for separate uid and access key

14 years agomds: fix null deref in debug
Sage Weil [Fri, 15 Apr 2011 23:32:45 +0000 (16:32 -0700)]
mds: fix null deref in debug

The *dir isn't always non-null (namely, during DISCOVERING state).

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: keep import/export subtree_map state in sync with journal
Sage Weil [Fri, 15 Apr 2011 22:51:50 +0000 (15:51 -0700)]
mds: keep import/export subtree_map state in sync with journal

We were being sloppy before with the ESubtreeMap vs import/export events.
Fix that by doing a few things:

 - add an ambig flag to the subtree map items, and set it for in-progress
   imports.  That means an ESubtreeMap followed by EImportFinish will do
   the right thing now.
 - adjust the dir_auth on EExport journaling (handle_export_dir_ack) so
   that our journaled subtree_map state is always in sync with what we
   see during replay.

Also document clearly what the dir_auth variations actually mean.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix export cancel during IMPORT_PREPPING
Sage Weil [Fri, 15 Apr 2011 20:53:54 +0000 (13:53 -0700)]
mds: fix export cancel during IMPORT_PREPPING

If we are in PREPPING, we need to drop the stickydirs() on the inodes, and
not the pins on the dirfrags.  Do this in the helper so we can keep the
call chains simple.

Also deal with the case where we get a cancel in PREPPED state.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: clean up trim_non_auth_subtree output
Sage Weil [Fri, 15 Apr 2011 17:05:50 +0000 (10:05 -0700)]
mds: clean up trim_non_auth_subtree output

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: cancel exports in PREPPING state on any failure
Sage Weil [Thu, 14 Apr 2011 01:36:33 +0000 (18:36 -0700)]
mds: cancel exports in PREPPING state on any failure

The prepping nodes may need to discover bounds from the failed node and
may hang indefinitely.  Meanwhile, we won't send out mds_resolve messages
until in-progress migrations complete.  Deadlock.

In certain cases the importing node can manufacture the replica.  If it
doesn't realize that right off, though, it will get hung up trying to
discover from the wrong node, get referred to the failed node, and block
waiting for recovery.  The replica forging is a bit suspect anyway, so
let's avoid the whole thing if we can!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: use helpers for import_reverse
Sage Weil [Thu, 14 Apr 2011 01:34:55 +0000 (18:34 -0700)]
mds: use helpers for import_reverse

Use helpers for common code shared between handle_export_cancel and
handle_mds_failure_or_stop.

Also include handling for IMPORT_PREPPING state, even though we don't use
it yet.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: don't skip inodes in journal that may be trimmed during replay
Sage Weil [Fri, 15 Apr 2011 17:02:46 +0000 (10:02 -0700)]
mds: don't skip inodes in journal that may be trimmed during replay

During replay we trim non-auth inodes on EExport or EImportFinish abort.
Subtree trimming may be delayed, too.

Skip parents if the diri is in the same blob, or if it is journaled in the
current segment *and* it is in a subtree that is unambiguously auth.  We can't
easily be more precise than that because the actual event we care about on
replay is EExport, but the migrator doesn't twiddle auth bits to false until
later.

Also, reset last_journaled on import.

This fixes replay bugs like

2011-04-13 18:15:18.064029 7f65588ef710 mds1.journal EImportStart.replay 10000000015 bounds []
2011-04-13 18:15:18.064034 7f65588ef710 mds1.journal EMetaBlob.replay 2 dirlumps by unknown0
2011-04-13 18:15:18.064040 7f65588ef710 mds1.journal EMetaBlob.replay dir 10000000010
2011-04-13 18:15:18.064046 7f65588ef710 mds1.journal EMetaBlob.replay missing dir ino  10000000010
mds/journal.cc: In function 'void EMetaBlob::replay(MDS*, LogSegment*)', in thread '0x7f65588ef710'
mds/journal.cc: 407: FAILED assert(0)
 ceph version 0.25-683-g653580a (commit:653580ae84c471c34872f14a0308c78af71f7243)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x53) [0xa53d26]
 2: (EMetaBlob::replay(MDS*, LogSegment*)+0x7eb) [0x7a737d]

Fixes: #994
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoconfig: warn about old-style conf section names
Colin Patrick McCabe [Fri, 15 Apr 2011 22:21:36 +0000 (15:21 -0700)]
config: warn about old-style conf section names

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
14 years agoman: Update cmds documentation.
Greg Farnum [Fri, 15 Apr 2011 22:54:12 +0000 (15:54 -0700)]
man: Update cmds documentation.

You always need to specify a rank if you do journal-check.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
14 years agomkcephfs: fix check for highest osd
Sage Weil [Fri, 15 Apr 2011 22:11:25 +0000 (15:11 -0700)]
mkcephfs: fix check for highest osd

This breaks on osd0.  I was doing something stupid with sed but I can't
figure out what right now, but osdN support is going away anyway.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agovstart.sh: use new-style section names in config
Colin Patrick McCabe [Fri, 15 Apr 2011 21:45:56 +0000 (14:45 -0700)]
vstart.sh: use new-style section names in config

Use new-style section names in vstart.sh.
Also update sample.ceph.conf.

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
14 years agomon:don't check for old-style monitor section name
Colin Patrick McCabe [Fri, 15 Apr 2011 21:40:49 +0000 (14:40 -0700)]
mon:don't check for old-style monitor section name

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
14 years agocconf: update man page
Colin Patrick McCabe [Fri, 15 Apr 2011 21:34:39 +0000 (14:34 -0700)]
cconf: update man page

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
14 years agomkcephfs, init-ceph: tolerate complete lack of a type
Sage Weil [Fri, 15 Apr 2011 21:03:20 +0000 (14:03 -0700)]
mkcephfs, init-ceph: tolerate complete lack of a type

We were bailing out of mkcephfs with a config with no mds's defined
(because we set -e and grep returns an error here).

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agoobjecter: log when we defer a write because of FULL osdmap flag
Sage Weil [Fri, 15 Apr 2011 21:03:40 +0000 (14:03 -0700)]
objecter: log when we defer a write because of FULL osdmap flag

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agomkcephfs, init-ceph: tolerate complete lack of a type
Sage Weil [Fri, 15 Apr 2011 21:03:20 +0000 (14:03 -0700)]
mkcephfs, init-ceph: tolerate complete lack of a type

We were bailing out of mkcephfs with a config with no mds's defined
(because we set -e and grep returns an error here).

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
14 years agoconfig: do not accept old-style section names
Colin Patrick McCabe [Fri, 15 Apr 2011 20:59:57 +0000 (13:59 -0700)]
config: do not accept old-style section names

Stop accepting old-style section names of the form $type$id.  Instead,
we want section names of the form $type.$id.  So [osd0] will no longer
be a valid section name; instead, use [osd.0].

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
14 years agocconf: fix usage; clean up some code
Colin Patrick McCabe [Fri, 15 Apr 2011 20:49:55 +0000 (13:49 -0700)]
cconf: fix usage; clean up some code

cconf: fix obsolete usage message. Add --list-all-sections flag.
Use new ceph_argparse stuff. Update tests.

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
14 years agoconfig: normalize key names, cleanup
Colin Patrick McCabe [Fri, 15 Apr 2011 19:03:12 +0000 (12:03 -0700)]
config: normalize key names, cleanup

Normalize key names in md_config_t::get_val and md_config_t::set_val

Remove unused fields from struct config_option.

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
14 years agorgw: fix other err related issues
Yehuda Sadeh [Fri, 15 Apr 2011 18:15:11 +0000 (11:15 -0700)]
rgw: fix other err related issues

also remove the now redundant formatter->flush()

14 years agorgw: adjustments to error handling
Yehuda Sadeh [Fri, 15 Apr 2011 17:52:14 +0000 (10:52 -0700)]
rgw: adjustments to error handling

fixing mixup between s3 error code and s3 error message

14 years agolibceph: implement ceph_conf_set and ceph_conf_get
Colin Patrick McCabe [Fri, 15 Apr 2011 17:37:05 +0000 (10:37 -0700)]
libceph: implement ceph_conf_set and ceph_conf_get

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
14 years agomds: init metablob MDLog* for EImportStart
Sage Weil [Thu, 14 Apr 2011 01:59:14 +0000 (18:59 -0700)]
mds: init metablob MDLog* for EImportStart

This will initialize metablob.my_offset, which makes the parent inode
journaling logic work properly.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoinit-ceph: no log_dir default
Sage Weil [Thu, 14 Apr 2011 01:06:29 +0000 (18:06 -0700)]
init-ceph: no log_dir default

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix journal offset types
Sage Weil [Thu, 14 Apr 2011 02:01:15 +0000 (19:01 -0700)]
mds: fix journal offset types

Always uint64_t!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: show migration state names on cancel
Sage Weil [Thu, 14 Apr 2011 02:08:05 +0000 (19:08 -0700)]
mds: show migration state names on cancel

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agorgw: rework error handling a bit
Colin Patrick McCabe [Thu, 14 Apr 2011 23:14:48 +0000 (16:14 -0700)]
rgw: rework error handling a bit

Rados Gateway: get rid of RGWOp::err. We already have req_state::err and
that represents the same thing.

Standardize nomenclature for errors. 'errno' is our internal
representation of the error. 'code' is what is returned by S3.
'message' is the message at the end. Improve rgw_err.

dump_errno shouldn't modify req_state, but just dump the error.
A new function set_req_state_err sets the error based on an 'errno'.

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
14 years agoconfig: add test for override ordering, comment
Colin Patrick McCabe [Thu, 14 Apr 2011 20:45:13 +0000 (13:45 -0700)]
config: add test for override ordering, comment

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
14 years agoconfig: de-globalize reading config file
Colin Patrick McCabe [Thu, 14 Apr 2011 22:26:20 +0000 (15:26 -0700)]
config: de-globalize reading config file

Reading a config file into any md_config_t structure except g_conf used
to be impossible. This is because the config_option code used to
contain explicit references to g_conf. Those have been removed, so now
any md_config_t should be able to read a configuration file.

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
14 years agoradosgw_admin: fix make check
Colin Patrick McCabe [Thu, 14 Apr 2011 22:18:31 +0000 (15:18 -0700)]
radosgw_admin: fix make check

Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>