]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
13 years agopaxos: store stashed state _and_ incrementals
Sage Weil [Thu, 17 Nov 2011 22:10:34 +0000 (14:10 -0800)]
paxos: store stashed state _and_ incrementals

Paxos::share_state() may share a stashed state and incrementals that
follow; we need to store the same.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: elector: always start election via monitor
Sage Weil [Fri, 11 Nov 2011 05:58:53 +0000 (21:58 -0800)]
mon: elector: always start election via monitor

Don't go from active -> electing without passing (monitor) go.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agocommon: libraries should not log to stdout/stderr
Sage Weil [Thu, 17 Nov 2011 20:07:34 +0000 (12:07 -0800)]
common: libraries should not log to stdout/stderr

Certainly not by default.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoobjecter: set skipped_map if we skip a map
Sage Weil [Thu, 17 Nov 2011 19:56:37 +0000 (11:56 -0800)]
objecter: set skipped_map if we skip a map

This ensures that we resend _all_ requests, since we aren't sure which
may have mapped to a different primary and then back.  This was missed in
the original implementation in 4fe9cca5dd63a1924be2b5cb18f542fb4b97a768.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoobjecter: add is_locked() asserts
Sage Weil [Thu, 17 Nov 2011 19:39:55 +0000 (11:39 -0800)]
objecter: add is_locked() asserts

Sanity check.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoobjecter: send slow osd MPing via Connection*
Sage Weil [Thu, 17 Nov 2011 19:39:36 +0000 (11:39 -0800)]
objecter: send slow osd MPing via Connection*

This may address #1732 indirectly because we have a Connection* reference
here.  However, it's still not clear how we ended up with an OSDSession*
for an osd that doesn't exist.  :/

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: add pending_ops assert
Sage Weil [Wed, 16 Nov 2011 21:10:58 +0000 (13:10 -0800)]
osd: add pending_ops assert

Just a sanity check, hopefully helping us track down #1727.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: renamed get_latest* -> get_stashed*
Sage Weil [Wed, 16 Nov 2011 19:01:59 +0000 (11:01 -0800)]
mon: renamed get_latest* -> get_stashed*

This makes e.g. get_latest_version() vs get_last_committed() less
confusing.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: fix ver tracking for auth database
Sage Weil [Wed, 16 Nov 2011 18:57:23 +0000 (10:57 -0800)]
mon: fix ver tracking for auth database

Local variable keys_ver needs to be updated when we slurp up latest stashed
version.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: always load stashed version when version doesn't match
Sage Weil [Wed, 16 Nov 2011 18:54:59 +0000 (10:54 -0800)]
mon: always load stashed version when version doesn't match

The slurp process can happen after the monitor has started and has some
in-memory version of the state, and that process may wipe out old
incrementals and change the stashed version.  That means that in
update_from_paxos, we need to pull the stashed version if it doesn't
match what we currently have or else we may not have the incrementals we
need to get up to date.

This simplifies and cleans up that code a bit so it is not specific to
monitor startup.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agorgw: don't log entries with bad utf8
Yehuda Sadeh [Tue, 15 Nov 2011 01:02:08 +0000 (17:02 -0800)]
rgw: don't log entries with bad utf8

13 years agorgw: adjust error code in swift copy failures
Yehuda Sadeh [Mon, 14 Nov 2011 22:39:18 +0000 (14:39 -0800)]
rgw: adjust error code in swift copy failures

13 years agorgw: fix swift responses encoding
Yehuda Sadeh [Mon, 14 Nov 2011 21:55:09 +0000 (13:55 -0800)]
rgw: fix swift responses encoding

13 years agorgw: Fix some merge problems uncovered by gcc warnings:
Josh Pieper [Fri, 11 Nov 2011 13:19:55 +0000 (08:19 -0500)]
rgw: Fix some merge problems uncovered by gcc warnings:

 * a refactor in e2100bce left the mod_ptr and unmod_ptr members set
   incorrectly in RGWCopyObj::init_common
 * a fix in 6752babd aggregated error returns, but then failed to do
   anything with them

Signed-off-by: Josh Pieper <jjp@pobox.com>
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoResolve gcc warnings.
Josh Pieper [Fri, 11 Nov 2011 13:19:02 +0000 (08:19 -0500)]
Resolve gcc warnings.

These should have no functional changes:
 * Check errors from functions that currently cannot return any
 * Initialize variables that gcc can't determine will be initialized
   in a following function call
 * Remove unused variables

Signed-off-by: Josh Pieper <jjp@pobox.com>
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: remove dead osd_max_opq code
Sage Weil [Mon, 14 Nov 2011 20:15:14 +0000 (12:15 -0800)]
osd: remove dead osd_max_opq code

This is no longer used as of a while ago!

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoworkunits: rados python workunit should be executable
Josh Durgin [Mon, 14 Nov 2011 16:18:17 +0000 (08:18 -0800)]
workunits: rados python workunit should be executable

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agocrush: send debug output to dout, not stdout/err
Sage Weil [Sun, 13 Nov 2011 22:18:19 +0000 (14:18 -0800)]
crush: send debug output to dout, not stdout/err

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agotest/run_cmd: use mkstemp instead of mkstemps
Sage Weil [Sun, 13 Nov 2011 22:16:52 +0000 (14:16 -0800)]
test/run_cmd: use mkstemp instead of mkstemps

my box didn't have mkstemps

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoceph-authtool: fix clitests
Sage Weil [Sun, 13 Nov 2011 22:07:01 +0000 (14:07 -0800)]
ceph-authtool: fix clitests

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agotest_str_list: make sure ' ' and ', ' separaters work for str lists
Sage Weil [Sat, 12 Nov 2011 23:17:29 +0000 (15:17 -0800)]
test_str_list: make sure ' ' and ', ' separaters work for str lists

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoceph-authtool: make error msg more helpful
Sage Weil [Sat, 12 Nov 2011 22:55:39 +0000 (14:55 -0800)]
ceph-authtool: make error msg more helpful

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agokeyring: don't print auid if it is the default
Sage Weil [Sat, 12 Nov 2011 22:55:28 +0000 (14:55 -0800)]
keyring: don't print auid if it is the default

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomon: implement 'fsid' command
Sage Weil [Sat, 12 Nov 2011 22:55:06 +0000 (14:55 -0800)]
mon: implement 'fsid' command

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoMerge branch 'stable'
Sage Weil [Sat, 12 Nov 2011 22:19:26 +0000 (14:19 -0800)]
Merge branch 'stable'

13 years agomon: fix 'osd crush add ..' weight
Sage Weil [Sat, 12 Nov 2011 22:04:15 +0000 (14:04 -0800)]
mon: fix 'osd crush add ..' weight

This was changed to floating point in commit 3f67893.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosdmap: build_simple with normal osd/host/rack/pool hierarchy
Sage Weil [Sat, 12 Nov 2011 22:05:15 +0000 (14:05 -0800)]
osdmap: build_simple with normal osd/host/rack/pool hierarchy

This will be useful in the general case where the cluster is created with
an empty map and useful crush hierarchy.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomon: fix 'osd crush add ..' weight
Sage Weil [Sat, 12 Nov 2011 22:04:15 +0000 (14:04 -0800)]
mon: fix 'osd crush add ..' weight

This was changed to floating point in commit 3f67893.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agovstart.sh: don't generate initial osdmap explicitly
Sage Weil [Sat, 12 Nov 2011 21:42:38 +0000 (13:42 -0800)]
vstart.sh: don't generate initial osdmap explicitly

This is simpler and exercises the monitors ability to start with a generic
osdmap and build it out as new osds are added to the cluster.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomon: make initial osdmap optional
Sage Weil [Sat, 12 Nov 2011 21:41:59 +0000 (13:41 -0800)]
mon: make initial osdmap optional

If an initial osdmap is not provided, we generate an empty one.  The user
add osds on their own after that.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosdmap: build_simple: create reasonable pools when numosd==0
Sage Weil [Sat, 12 Nov 2011 21:41:05 +0000 (13:41 -0800)]
osdmap: build_simple: create reasonable pools when numosd==0

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomon: add '--fsid foo' arg for setting generated monmap fsid
Sage Weil [Sat, 12 Nov 2011 21:16:30 +0000 (13:16 -0800)]
mon: add '--fsid foo' arg for setting generated monmap fsid

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomon: take '--fsid foo' arg with --mkfs
Sage Weil [Sat, 12 Nov 2011 05:02:23 +0000 (21:02 -0800)]
mon: take '--fsid foo' arg with --mkfs

This will set the seed monmap's fsid.  This is useful if the monmap is
dynamically generated (e.g., based on ceph.conf or --mon-host list).

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: fix warnings
Sage Weil [Sat, 12 Nov 2011 05:03:09 +0000 (21:03 -0800)]
osd: fix warnings

osd/ReplicatedPG.cc: In member function 'virtual void ReplicatedPG::remove_watchers_and_notifies()':
osd/ReplicatedPG.cc:1167: warning: suggest a space before ';' or explicit braces around empty body in 'for' statement
osd/ReplicatedPG.cc:1176: warning: suggest a space before ';' or explicit braces around empty body in 'for' statement

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomonmaptool: fix clitests
Sage Weil [Sat, 12 Nov 2011 04:52:28 +0000 (20:52 -0800)]
monmaptool: fix clitests

Initial map is epoch 0.  Modifications still bump epoch by one.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agopaxos: discard waiting_for_active events on reset
Sage Weil [Fri, 11 Nov 2011 23:38:24 +0000 (15:38 -0800)]
paxos: discard waiting_for_active events on reset

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomonclient: use blank fsid (instead of epoch==0) for monmap checks
Sage Weil [Fri, 11 Nov 2011 23:25:59 +0000 (15:25 -0800)]
monclient: use blank fsid (instead of epoch==0) for monmap checks

We can safely mkfs with an epoch=0 monmap as long as the fsid is set.  And
that is what commit f31825cee5300c708800a01a08201eef2bc03c0c changed.

Instead, use a zeroed fsid to tell if the monmap is valid/usable.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agouse libuuid for fsid
Sage Weil [Sat, 12 Nov 2011 00:38:35 +0000 (16:38 -0800)]
use libuuid for fsid

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agocrush: grammer: allow '.' in name token
Sage Weil [Fri, 11 Nov 2011 22:59:38 +0000 (14:59 -0800)]
crush: grammer: allow '.' in name token

These are now in the generated crush maps, so it seems appropriate to
recompile them :).

Reported-by: Martin Mailand <martin@tuxadero.com>
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: fix seed monmap removal
Sage Weil [Fri, 11 Nov 2011 22:54:41 +0000 (14:54 -0800)]
mon: fix seed monmap removal

Remove if we previous had no latest, not based on which map we now have.
It's possible we join when monmap epoch is something much larger than 1!

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: allow monitor to automagically join cluster
Sage Weil [Fri, 11 Nov 2011 22:52:14 +0000 (14:52 -0800)]
mon: allow monitor to automagically join cluster

If a monitor starts up with the correct fsid and auth keys, it will now
add itself to the monmap (and subsequently try to join the quorum) if it
is not already in the monmap.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: pass monclient::init errors up the stack
Sage Weil [Fri, 11 Nov 2011 20:52:24 +0000 (12:52 -0800)]
osd: pass monclient::init errors up the stack

Fixes crash like

 ceph version 0.38-149-gbf254de (commit:bf254de5cf8a17ce9467d166d87f3ab93170ae13)
 1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x91d97b]
 2: ./ceph-osd() [0xa05baa]
 3: (()+0xef60) [0x7fb54c87ef60]
 4: (std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::size() const+0xc) [0x8a4bc6]
 5: (std::set<unsigned int, std::less<unsigned int>, std::allocator<unsigned int> >::size() const+0x18) [0x8a1d32]
 6: (void encode<unsigned int>(std::set<unsigned int, std::less<unsigned int>, std::allocator<unsigned int> > const&, ceph::buffer::list&)+0x1c) [0x8a0311]
 7: (MonClient::_reopen_session()+0x2c5) [0x89a425]
 8: (MonClient::authenticate(double)+0x24f) [0x898da7]
 9: (OSD::init()+0x112b) [0x807ca1]
 10: (main()+0x2c09) [0x73e406]
 11: (__libc_start_main()+0xfd) [0x7fb54b04ec4d]
 12: ./ceph-osd() [0x73b499]

due to auth_supported being NULL.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: verify fsid during probe and election
Sage Weil [Fri, 11 Nov 2011 20:37:07 +0000 (12:37 -0800)]
mon: verify fsid during probe and election

This will keep mismatched fsids out of the same quorum.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: tolerate won election while active
Sage Weil [Fri, 11 Nov 2011 20:22:37 +0000 (12:22 -0800)]
mon: tolerate won election while active

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: clean up logic a bit
Sage Weil [Fri, 11 Nov 2011 20:22:22 +0000 (12:22 -0800)]
mon: clean up logic a bit

More explicit.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: only re bootstrap if monmap actually changes
Sage Weil [Fri, 11 Nov 2011 20:22:09 +0000 (12:22 -0800)]
mon: only re bootstrap if monmap actually changes

If we go thru here just to update latest, that's fine; no need to restart
the bootstrap process.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agopaxos: fix off-by-one in share_state
Sage Weil [Fri, 11 Nov 2011 20:15:16 +0000 (12:15 -0800)]
paxos: fix off-by-one in share_state

We hit this on adding a new monitor to an existing cluster.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: fix monmap update
Sage Weil [Fri, 11 Nov 2011 20:05:01 +0000 (12:05 -0800)]
mon: fix monmap update

It's on the stack; update in place.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: properly process monmaps even when i have the latest
Sage Weil [Fri, 11 Nov 2011 20:02:52 +0000 (12:02 -0800)]
mon: properly process monmaps even when i have the latest

We may get the latest monmap when we are doing our probing, but we still
need to process it in update_from_paxos().  Consider get_latest_version()
in addition to the active map.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: fix up update_from_paxos() methods
Sage Weil [Fri, 11 Nov 2011 19:55:34 +0000 (11:55 -0800)]
mon: fix up update_from_paxos() methods

Make sure they behave when the initial state is learned from paxos.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomonmaptool: new maps get epoch 0
Sage Weil [Fri, 11 Nov 2011 19:40:20 +0000 (11:40 -0800)]
monmaptool: new maps get epoch 0

Just for consistency's sake.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: clean up mkfs seed data
Sage Weil [Fri, 11 Nov 2011 19:40:02 +0000 (11:40 -0800)]
mon: clean up mkfs seed data

And make sure the monmap/latest gets written properly.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: remove empty monstore dirs
Sage Weil [Fri, 11 Nov 2011 19:10:17 +0000 (11:10 -0800)]
mon: remove empty monstore dirs

This is sloppy, but it works well enough since we mkdir dirs as needed
too.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: create initial states after quorum is formed
Sage Weil [Fri, 11 Nov 2011 19:01:50 +0000 (11:01 -0800)]
mon: create initial states after quorum is formed

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: stage mkfs seed info in mkfs/ dir
Sage Weil [Fri, 11 Nov 2011 18:45:27 +0000 (10:45 -0800)]
mon: stage mkfs seed info in mkfs/ dir

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: eliminate PaxosService::init()
Sage Weil [Fri, 11 Nov 2011 18:34:42 +0000 (10:34 -0800)]
mon: eliminate PaxosService::init()

update_from_paxos() is sufficient

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: include monmap dump in mon_status and quorum_status
Sage Weil [Fri, 11 Nov 2011 18:19:33 +0000 (10:19 -0800)]
mon: include monmap dump in mon_status and quorum_status

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: pull initial monmap from monmap/latest OR mkfs/monmap
Sage Weil [Fri, 11 Nov 2011 18:15:23 +0000 (10:15 -0800)]
mon: pull initial monmap from monmap/latest OR mkfs/monmap

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: take explicit initial monmap -or- generate one via MonClient
Sage Weil [Fri, 11 Nov 2011 18:05:36 +0000 (10:05 -0800)]
mon: take explicit initial monmap -or- generate one via MonClient

This will simplify bootstrapping a cluster via e.g. mon_host.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agotest_filestore_idempotent: detect commit cycles due to non-idempotent ops
Sage Weil [Thu, 10 Nov 2011 21:35:57 +0000 (13:35 -0800)]
test_filestore_idempotent: detect commit cycles due to non-idempotent ops

If we do a non-idempotent op and it does a commit itself, we don't see
fs->is_committed() true ever.  Also count full commit cycles, and kill
ourselves after several of those have gone by.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agofilejournal: fix replay of non-idempotent ops
Sage Weil [Thu, 10 Nov 2011 21:18:51 +0000 (13:18 -0800)]
filejournal: fix replay of non-idempotent ops

- start sync thread prior to replay, so that we can commit as we replay
  operations
- keep applied_seq accurate
- pass seq (not old op_seq) to do_transactions
- carry open_ops ref so that commit blocks until we have finished applying
  the full transaction

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agotest_filestore_idempotent: transactions are individually idempotent
Sage Weil [Thu, 10 Nov 2011 21:17:10 +0000 (13:17 -0800)]
test_filestore_idempotent: transactions are individually idempotent

Make individual transactions idempotent, but their interactions
non-idempotent.  I.e. A A A A is okay, but A B A is not.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agofilestore: make trigger_commit() wake up sync; adjust locking
Sage Weil [Thu, 10 Nov 2011 19:31:22 +0000 (11:31 -0800)]
filestore: make trigger_commit() wake up sync; adjust locking

We need to wake up the sync thread (duh).

Also, we need to obey the FileJournal::lock -> journal_lock locking
order.

Also, lockdep is broken. :(

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agofilestore: document the btrfs_* fields
Sage Weil [Thu, 10 Nov 2011 18:51:07 +0000 (10:51 -0800)]
filestore: document the btrfs_* fields

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agofilestore: sync after non-idempotent operations
Sage Weil [Thu, 10 Nov 2011 18:49:32 +0000 (10:49 -0800)]
filestore: sync after non-idempotent operations

This is a big hammer to fix journal replay on non-btrfs fs backends (extN,
xfs, whatever).  The problem is that it is not safe to replay some journal
operations more than once, notably things like CLONE whose source data
may be changed by subsequent operations.

The simple fix is to initiate a full commit after any non-idempotent
operations prior to any subsequent operation within the same Sequencer.
This is done by calling trigger_commit() in _do_transactions(), which means
any potentially dependent operation that follows will get blocked because
a commit is about to start.

I made trigger_commit() a bit more robust to callers who are not holding
an open_ops ref to also succeeding if the given op_seq is already
committing.  For the current caller, that can't happen.

There are probably better performing solutions, but this one is at least
correct.

Fixes: #213
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge remote branch 'gh/stable'
Sage Weil [Fri, 11 Nov 2011 04:50:31 +0000 (20:50 -0800)]
Merge remote branch 'gh/stable'

13 years agoworkunits: add workunit for running rgw and rados python tests
Josh Durgin [Fri, 11 Nov 2011 01:01:04 +0000 (17:01 -0800)]
workunits: add workunit for running rgw and rados python tests

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agorgw: remove warning
Yehuda Sadeh [Fri, 11 Nov 2011 01:10:28 +0000 (17:10 -0800)]
rgw: remove warning

13 years agotest/pybind: add test_rgw
Josh Durgin [Fri, 11 Nov 2011 00:52:01 +0000 (16:52 -0800)]
test/pybind: add test_rgw

Forgot to add this in the previous commit.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agotest/pybind: convert python rados and rgw tests to be runnable by nose
Josh Durgin [Fri, 11 Nov 2011 00:24:38 +0000 (16:24 -0800)]
test/pybind: convert python rados and rgw tests to be runnable by nose

These tests can now be run automatically more easily.

Fixes: #1653
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agorados.py: fix Snap.get_timestamp
Josh Durgin [Thu, 10 Nov 2011 23:14:20 +0000 (15:14 -0800)]
rados.py: fix Snap.get_timestamp

This now uses datetime, imports the right things, and calls the right function.

Fixes #1577
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agov0.38 v0.38
Sage Weil [Thu, 10 Nov 2011 23:07:05 +0000 (15:07 -0800)]
v0.38

13 years agocommon: return null if mc.init() unsuccessful
Samuel Just [Mon, 7 Nov 2011 23:04:02 +0000 (15:04 -0800)]
common: return null if mc.init() unsuccessful

Prevents ceph.cc from segfaulting on missing keyring.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agorbd.py: fix list when there are no images
Josh Durgin [Mon, 7 Nov 2011 17:08:00 +0000 (09:08 -0800)]
rbd.py: fix list when there are no images

It should return [], not [''].

Reported-by: Eric Chen <Eric_YH_Chen@wistron.com>
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agomon: overwrite in put_bl
Sage Weil [Thu, 10 Nov 2011 22:24:18 +0000 (14:24 -0800)]
mon: overwrite in put_bl

This fixes a situation where we accept a large value, there is some failure
and recovery, and then we commit a smaller value with the same version.

E.g.,

INFO:teuthology.task.ceph.mon.b.err:terminate called after throwing an instance of 'ceph::buffer::end_of_buffer'
INFO:teuthology.task.ceph.mon.b.err:  what():  buffer::end_of_buffer
INFO:teuthology.task.ceph.mon.b.err:*** Caught signal (Aborted) **
INFO:teuthology.task.ceph.mon.b.err: in thread 7f0a6037c700
INFO:teuthology.task.ceph.mon.b.err: ceph version 0.37-365-g5b20830 (commit:5b208302e1ad134f56933dfdbccb074e03c88be3)
INFO:teuthology.task.ceph.mon.b.err: 1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x6f4d1b]
INFO:teuthology.task.ceph.mon.b.err: 2: /tmp/cephtest/binary/usr/local/bin/ceph-mon() [0x7e9492]
INFO:teuthology.task.ceph.mon.b.err: 3: (()+0xfb40) [0x7f0a63bf4b40]
INFO:teuthology.task.ceph.mon.b.err: 4: (gsignal()+0x35) [0x7f0a625cdba5]
INFO:teuthology.task.ceph.mon.b.err: 5: (abort()+0x180) [0x7f0a625d16b0]
INFO:teuthology.task.ceph.mon.b.err: 6: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f0a62e716bd]
INFO:teuthology.task.ceph.mon.b.err: 7: (()+0xb9906) [0x7f0a62e6f906]
INFO:teuthology.task.ceph.mon.b.err: 8: (()+0xb9933) [0x7f0a62e6f933]
INFO:teuthology.task.ceph.mon.b.err: 9: (()+0xb9a3e) [0x7f0a62e6fa3e]
INFO:teuthology.task.ceph.mon.b.err: 10: (ceph::buffer::list::iterator::copy(unsigned int, std::string&)+0xcb) [0x7d73a7]
INFO:teuthology.task.ceph.mon.b.err: 11: (decode(std::string&, ceph::buffer::list::iterator&)+0x44) [0x5fa2e8]
INFO:teuthology.task.ceph.mon.b.err: 12: (LogEntry::decode(ceph::buffer::list::iterator&)+0xa8) [0x6ceee8]
INFO:teuthology.task.ceph.mon.b.err: 13: (LogMonitor::update_from_paxos()+0x346) [0x6cce9a]
INFO:teuthology.task.ceph.mon.b.err: 14: (PaxosService::_active()+0x13b) [0x647ab5]
INFO:teuthology.task.ceph.mon.b.err: 15: (PaxosService::C_Active::finish(int)+0x25) [0x647cb9]
INFO:teuthology.task.ceph.mon.b.err: 16: (Context::complete(int)+0x2b) [0x61a5a9]
INFO:teuthology.task.ceph.mon.b.err: 17: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x20b) [0x61a7ef]
INFO:teuthology.task.ceph.mon.b.err: 18: (Paxos::handle_last(MMonPaxos*)+0xea7) [0x63d081]
INFO:teuthology.task.ceph.mon.b.err: 19: (Paxos::dispatch(PaxosServiceMessage*)+0x29c) [0x642046]
INFO:teuthology.task.ceph.mon.b.err: 20: (Monitor::_ms_dispatch(Message*)+0xd78) [0x61636e]
INFO:teuthology.task.ceph.mon.b.err: 21: (Monitor::ms_dispatch(Message*)+0x3a) [0x61de84]
INFO:teuthology.task.ceph.mon.b.err: 22: (Messenger::ms_deliver_dispatch(Message*)+0x63) [0x7c690f]
INFO:teuthology.task.ceph.mon.b.err: 23: (SimpleMessenger::dispatch_entry()+0x7c2) [0x7b0156]
INFO:teuthology.task.ceph.mon.b.err: 24: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x5fd6ac]
INFO:teuthology.task.ceph.mon.b.err: 25: (Thread::_entry_func(void*)+0x23) [0x6e9261]
INFO:teuthology.task.ceph.mon.b.err: 26: (()+0x7971) [0x7f0a63bec971]
INFO:teuthology.task.ceph.mon.b.err: 27: (clone()+0x6d) [0x7f0a6268092d]

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoPG: mark scrubmap entry as not absent when we see an update
Samuel Just [Wed, 2 Nov 2011 18:50:29 +0000 (11:50 -0700)]
PG: mark scrubmap entry as not absent when we see an update

Previously, there would be an assert failure in _scan_list if we see an
object deleted and then recreated.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agorgw: implement swift copy, fix copy auth
Yehuda Sadeh [Thu, 10 Nov 2011 22:56:10 +0000 (14:56 -0800)]
rgw: implement swift copy, fix copy auth

13 years agoPG: gen_prefix: use osdmap_ref rather than osd->osdmap
Samuel Just [Thu, 10 Nov 2011 22:08:36 +0000 (14:08 -0800)]
PG: gen_prefix: use osdmap_ref rather than osd->osdmap

Otherwise, the debug output might not match the map used by
the pg logic.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoOSD: sync_and_flush afer mkfs to create first snap
Samuel Just [Thu, 10 Nov 2011 22:07:12 +0000 (14:07 -0800)]
OSD: sync_and_flush afer mkfs to create first snap

Previously, if we kill the OSD process before the filestore
does its first sync, we end up replaying the journal on top
of current and potentially hitting -EEXIST.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoPG: update info.history even if lastmap is absent
Samuel Just [Thu, 10 Nov 2011 01:16:57 +0000 (17:16 -0800)]
PG: update info.history even if lastmap is absent

Previously, we did not update same_interval_since etc if
we do not have the previous map.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoMakefile: add MMonProbe.h
Sage Weil [Thu, 10 Nov 2011 00:36:48 +0000 (16:36 -0800)]
Makefile: add MMonProbe.h

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: remove useless proc_replica_log() side-effect
Sage Weil [Wed, 9 Nov 2011 23:47:35 +0000 (15:47 -0800)]
osd: remove useless proc_replica_log() side-effect

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agohadoop: update patch and Readme.
Greg Farnum [Wed, 9 Nov 2011 23:23:38 +0000 (15:23 -0800)]
hadoop: update patch and Readme.

Patch generated by Noah Watkins <noahwatkins@gmail.com>

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agorgw: swift guesses mime type if not specified
Yehuda Sadeh [Wed, 9 Nov 2011 23:29:41 +0000 (15:29 -0800)]
rgw: swift guesses mime type if not specified

13 years agoosd: comment PG::lock*(), whitespace
Sage Weil [Wed, 9 Nov 2011 22:50:09 +0000 (14:50 -0800)]
osd: comment PG::lock*(), whitespace

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge branch 'master' of github.com:NewDreamNetwork/ceph
Sage Weil [Wed, 9 Nov 2011 22:46:58 +0000 (14:46 -0800)]
Merge branch 'master' of github.com:NewDreamNetwork/ceph

Conflicts:
src/osd/PG.cc

13 years agoosd: improve last_peering_reset debugging
Sage Weil [Mon, 31 Oct 2011 18:57:14 +0000 (11:57 -0700)]
osd: improve last_peering_reset debugging

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agocrypto: make crypto handlers non-static
Sage Weil [Wed, 9 Nov 2011 22:34:30 +0000 (14:34 -0800)]
crypto: make crypto handlers non-static

These were static in auth/Crypto.cc, which was mostly fine, except when
we got a signal shutting everything down for the gcov stuff, like so:

Thread 21 (Thread 2164):
#0  0x00007f31a800b3cd in open64 () from /lib/libpthread.so.0
#1  0x000000000081dee0 in __gcov_open ()
#2  0x000000000081e3fd in gcov_exit ()
#3  0x00007f31a67e64f2 in exit () from /lib/libc.so.6
#4  0x000000000054e1ca in handle_signal (signal=<value optimized out>) at osd/OSD.cc:600
#5  <signal handler called>
#6  0x00007f31a8007a9a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#7  0x0000000000636d7b in Wait (this=0x2241000) at ./common/Cond.h:48
#8  SimpleMessenger::wait (this=0x2241000) at msg/SimpleMessenger.cc:2637
#9  0x00000000004a4e35 in main (argc=<value optimized out>, argv=<value optimized out>) at ceph_osd.cc:343

and a racing thread would, say, accept a connection and then crash, like
so:

#0  0x00007f31a800ba0b in raise () from /lib/libpthread.so.0
#1  0x0000000000696eeb in reraise_fatal (signum=2164) at global/signal_handler.cc:59
#2  0x00000000006976cc in handle_fatal_signal (signum=<value optimized out>) at global/signal_handler.cc:106
#3  <signal handler called>
#4  0x00007f31a67e0ba5 in raise () from /lib/libc.so.6
#5  0x00007f31a67e46b0 in abort () from /lib/libc.so.6
#6  0x00007f31a70846bd in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#7  0x00007f31a7082906 in ?? () from /usr/lib/libstdc++.so.6
#8  0x00007f31a7082933 in std::terminate() () from /usr/lib/libstdc++.so.6
#9  0x00007f31a708328f in __cxa_pure_virtual () from /usr/lib/libstdc++.so.6
#10 0x0000000000690e5b in CryptoKey::decrypt (this=0x7f3195a67510, in=..., out=..., error=...) at auth/Crypto.cc:404
#11 0x000000000079ccee in void decode_decrypt_enc_bl<CephXServiceTicketInfo>(CephXServiceTicketInfo&, CryptoKey, ceph::buffer::list&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&) ()
#12 0x0000000000795ca3 in cephx_verify_authorizer (cct=0x2232000, keys=<value optimized out>, indata=...,
    ticket_info=<value optimized out>, reply_bl=<value optimized out>) at auth/cephx/CephxProtocol.cc:438
#13 0x00000000007a17cf in CephxAuthorizeHandler::verify_authorizer (this=<value optimized out>, cct=0x2232000, keys=0x2256000,
    authorizer_data=<value optimized out>, authorizer_reply=..., entity_name=..., global_id=@0x7f3195a67848, caps_info=...,
    auid=0x7f3195a67840) at auth/cephx/CephxAuthorizeHandler.cc:21
#14 0x00000000005577ff in OSD::ms_verify_authorizer (this=0x2267000, con=0x230da00, peer_type=<value optimized out>,
    protocol=<value optimized out>, authorizer_data=<value optimized out>, authorizer_reply=<value optimized out>,
    isvalid=@0x7f3195a67c0f) at osd/OSD.cc:2723
#15 0x0000000000611ce1 in ms_deliver_verify_authorizer (this=<value optimized out>, con=0x230da00, peer_type=4, protocol=2,
    authorizer=<value optimized out>, authorizer_reply=<value optimized out>, isvalid=@0x7f3195a67c0f) at msg/Messenger.h:145
#16 SimpleMessenger::verify_authorizer (this=<value optimized out>, con=0x230da00, peer_type=4, protocol=2,
    authorizer=<value optimized out>, authorizer_reply=<value optimized out>, isvalid=@0x7f3195a67c0f)
    at msg/SimpleMessenger.cc:2419
#17 0x00000000006309ab in SimpleMessenger::Pipe::accept (this=0x22ce280) at msg/SimpleMessenger.cc:756
#18 0x0000000000634711 in SimpleMessenger::Pipe::reader (this=0x22ce280) at msg/SimpleMessenger.cc:1546
#19 0x00000000004a7085 in SimpleMessenger::Pipe::Reader::entry (this=<value optimized out>) at msg/SimpleMessenger.h:208
#20 0x000000000060f252 in Thread::_entry_func (arg=0x874) at common/Thread.cc:42
#21 0x00007f31a8003971 in start_thread () from /lib/libpthread.so.0
#22 0x00007f31a689392d in clone () from /lib/libc.so.6
#23 0x0000000000000000 in ?? ()

Instead, put these on the heap.  Set them up in the ceph::crypto::init()
method, and tear them down in ceph::crypto::shutdown().

Fixes: #1633
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoPG: cache read-only reference to the current osdmap on pg lock
Samuel Just [Tue, 8 Nov 2011 18:54:57 +0000 (10:54 -0800)]
PG: cache read-only reference to the current osdmap on pg lock

Previously, we needed to grab an osd_map read lock to send messages,
among other things.  Now, we grab a reference to the osd_map on pg lock
and refer to that.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoOSDMap,CrushWrapper: const cleanup on OSDMap
Samuel Just [Tue, 8 Nov 2011 17:45:44 +0000 (09:45 -0800)]
OSDMap,CrushWrapper: const cleanup on OSDMap

The osd's cached maps are not actually modified once cached.  Marking
these methods const (which they should be) allows us to make OSDMapRef
shared_ptr<const OSDMap>.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoosd/: change type of osd::osdmap to a shared_ptr
Samuel Just [Tue, 8 Nov 2011 01:51:21 +0000 (17:51 -0800)]
osd/: change type of osd::osdmap to a shared_ptr

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoPG: always add backlog entry
Samuel Just [Wed, 2 Nov 2011 21:32:17 +0000 (14:32 -0700)]
PG: always add backlog entry

Previously, we did not add a backlog entry if the object already had an
entry in the log along with an entry for that entry's prior_version.
However, when scanning the log, an OSD will incorrectly conclude that it
has the prior_version's prior_version if the object is not already in
the missing set.  If there happens to be a clone entry with that version
as it's prior_version, the osd will attempt to recover the clone via a
clone operation on the non-existent object.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agorbd: Fix the showmapped cmd usage
Stratos Psomadakis [Wed, 9 Nov 2011 22:05:35 +0000 (00:05 +0200)]
rbd: Fix the showmapped cmd usage

If the rbd showmapped cmd is given any extra arguments, rbd will fail
with "assert(0)". Fix it by exiting with "usage_exit()", if any
arguments are present, instead of failing.

Signed-off-by: Stratos Psomadakis <psomas@grnet.gr>
13 years agohadoop: return all replica hostnames
Noah Watkins [Wed, 9 Nov 2011 02:39:20 +0000 (18:39 -0800)]
hadoop: return all replica hostnames

Updates CephFileSystem to return all replica locations,
and in addition attempts to use reverse DNS to convert
the OSD IPs into hostnames. Hadoop does not do well at
comparing the IP with hostnames, and locality is lost.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
13 years agohadoop: make listStatus quiet
Noah Watkins [Wed, 9 Nov 2011 02:39:21 +0000 (18:39 -0800)]
hadoop: make listStatus quiet

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
13 years agohadoop: handle new ceph_get_file_stripe_address
Noah Watkins [Wed, 9 Nov 2011 02:39:19 +0000 (18:39 -0800)]
hadoop: handle new ceph_get_file_stripe_address

Updates the Hadoop JNI/CephFileSystem to handle
the new version of ceph_get_file_stripe_address
which returns the locations of replicas in addition
to the primary.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
13 years agoclient: return stripe address replicas
Noah Watkins [Wed, 9 Nov 2011 02:39:18 +0000 (18:39 -0800)]
client: return stripe address replicas

Changes ceph_get_file_stripe_address to return a
vector of entity_addr_t's for the primary and the
replicas. libcephfs is updated to return the
associated sockaddr_storage for each address.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
13 years agoclient: fix bad perfcounter fset callers
Sage Weil [Wed, 9 Nov 2011 21:15:55 +0000 (13:15 -0800)]
client: fix bad perfcounter fset callers

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoImprove use of syncfs.
Alexandre Oliva [Wed, 9 Nov 2011 17:51:26 +0000 (15:51 -0200)]
Improve use of syncfs.

Test syncfs return value and fallback to btrfs sync and then sync.

Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
13 years agoosd: fix perfcounter typo
Sage Weil [Wed, 9 Nov 2011 18:46:18 +0000 (10:46 -0800)]
osd: fix perfcounter typo

Signed-off-by: Sage Weil <sage@newdream.net>