]>
git.apps.os.sepia.ceph.com Git - ceph.git/log
Josh Durgin [Fri, 3 Jun 2011 01:51:02 +0000 (18:51 -0700)]
Makefile.am: clean gcno and gcda files in "make clean"
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Josh Durgin [Mon, 6 Jun 2011 20:52:35 +0000 (13:52 -0700)]
coverage: add helper script to get coverage for a local test
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Josh Durgin [Tue, 31 May 2011 22:51:41 +0000 (15:51 -0700)]
mon: add all_exit and exit commands
all_exit makes each daemon exit(0), for gcov data collection.
exit causes cmon to do this.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Josh Durgin [Tue, 31 May 2011 22:48:21 +0000 (15:48 -0700)]
mds: allow mds to 'exit immediately'
This is temporary until shutting down cleans up well.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Josh Durgin [Tue, 31 May 2011 21:15:01 +0000 (14:15 -0700)]
mon: ceph tell mds * is a valid command
Previously this fell through and returned -EINVAL to the user.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Josh Durgin [Sat, 28 May 2011 00:40:28 +0000 (17:40 -0700)]
osd: add command to exit cleanly
This is required for gcov to work on daemons since the coverage data
is written atexit, and the function that writes the data is not
exported.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Josh Durgin [Sat, 28 May 2011 00:37:43 +0000 (17:37 -0700)]
configure: add option for building with gcov coverage support
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Colin Patrick McCabe [Tue, 31 May 2011 22:01:26 +0000 (15:01 -0700)]
rados_sync: prefix user extended attributes
Start user extended attributes with USER_XATTR_PREFIX.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Thu, 28 Apr 2011 04:28:26 +0000 (21:28 -0700)]
librados: implement aio_flush
Implement a per-ioctx flush that blocks until all previously submitted
aio operations on the ioctx are safe. Each aio gets a sequence number and
is put on a linked list attached to the ioctx. The flush operation waits
for it to drain to the watermark set when flush is first called.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Tue, 31 May 2011 20:45:51 +0000 (13:45 -0700)]
crushtool: error out if uniform weights vary
Fixes: #1075
Signed-off-by: Sage Weil <sage@newdream.net>
Josh Durgin [Tue, 31 May 2011 20:26:58 +0000 (13:26 -0700)]
osd: fix ScrubFinalizeWQ::_clear condition
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Sage Weil [Tue, 31 May 2011 19:58:35 +0000 (12:58 -0700)]
debian: depend on libboost-dev >= 1.34
for statechart. Partially fixes #1124.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Wed, 25 May 2011 23:22:19 +0000 (16:22 -0700)]
osd: don't leak Connection reference
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Wed, 25 May 2011 23:22:06 +0000 (16:22 -0700)]
osd: ignore old/stale heartbeat messages
If we get heartbeat messages from old epochs from peers that are not
current, drop them and mark the connection down. Even if they are peers
we _should_ have (because we haven't gotten a notify yet to learn about
a pg we should have but don't yet) we have a newer map epoch and will learn
about them shortly, reopening the connection.
Fixes: #1107
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Wed, 25 May 2011 23:20:26 +0000 (16:20 -0700)]
osd: fix map sharing due to heartbeats
- share the map with the cluster addr
- use the new {note,get}_peer_epoch helpers to do it sanely
- don't share if we're booting; see
818fa33a661
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Mon, 30 May 2011 19:37:31 +0000 (12:37 -0700)]
crushtool: add -v verbose for --test mode
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Colin Patrick McCabe [Fri, 27 May 2011 21:46:19 +0000 (14:46 -0700)]
hadoop: track Hadoop API changes
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Colin Patrick McCabe [Fri, 27 May 2011 21:04:36 +0000 (14:04 -0700)]
SimpleMessenger: allow multiple calls to shutdown
Fixes a case where radostool crashed on an error shutdown.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Colin Patrick McCabe [Fri, 27 May 2011 21:01:45 +0000 (14:01 -0700)]
common/Thread.h: const cleanup
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Colin Patrick McCabe [Fri, 27 May 2011 17:59:21 +0000 (10:59 -0700)]
Merge branch 'wip-obsync'
Sage Weil [Fri, 27 May 2011 04:37:03 +0000 (21:37 -0700)]
mkcephfs: pass config to osdmaptool
This lets OSDMap::create_simple() see g_conf.osd_pool_default_size when
creating the initial data, metadata, and rbd pools.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Fri, 27 May 2011 04:31:18 +0000 (21:31 -0700)]
drop useless cm.txt
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Fri, 27 May 2011 04:20:55 +0000 (21:20 -0700)]
osdmap: take default pool size from config
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Thu, 26 May 2011 22:07:37 +0000 (15:07 -0700)]
crushtool: update help
Signed-off-by: Sage Weil <sage@newdream.net>
Colin Patrick McCabe [Thu, 26 May 2011 20:28:39 +0000 (13:28 -0700)]
obysnc: rgw target: validate all users
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Colin Patrick McCabe [Thu, 26 May 2011 20:17:03 +0000 (13:17 -0700)]
Merge branch 'wip-obsync'
Sage Weil [Thu, 26 May 2011 20:17:12 +0000 (13:17 -0700)]
mon: remove pg_temp mappings when we delete pools
Signed-off-by: Sage Weil <sage@newdream.net>
Colin Patrick McCabe [Thu, 26 May 2011 20:15:50 +0000 (13:15 -0700)]
test-obsync: test sync directly from s3->rgw
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Thu, 26 May 2011 20:12:04 +0000 (13:12 -0700)]
crushtool: fix --add-item weight being zero when parent bucket(s) created
Signed-off-by: Sage Weil <sage@newdream.net>
Colin Patrick McCabe [Thu, 26 May 2011 18:21:23 +0000 (11:21 -0700)]
obsync: fix bucket creation through rgw target
The rgw: target can now create buckets. Add a test.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Thu, 26 May 2011 18:04:03 +0000 (11:04 -0700)]
Merge branch 'stable'
Colin Patrick McCabe [Thu, 26 May 2011 17:25:40 +0000 (10:25 -0700)]
test-obsync: test big objects, user-defined xattr
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Thu, 26 May 2011 17:19:04 +0000 (10:19 -0700)]
mkcephfs: set rdir for local mon setup
Fixes: #1113
Reported-by: Bernard Grymonpon <bernard@openminds.be>
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Thu, 26 May 2011 16:55:37 +0000 (09:55 -0700)]
init-ceph: ssh
Another bell/whistle.
Signed-off-by: Sage Weil <sage@newdream.net>
Colin Patrick McCabe [Thu, 26 May 2011 00:48:02 +0000 (17:48 -0700)]
obysnc: fix content-type on RGWStore
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Colin Patrick McCabe [Thu, 26 May 2011 00:36:36 +0000 (17:36 -0700)]
test-obsync: compare_directory now compares xattrs
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Colin Patrick McCabe [Wed, 25 May 2011 22:55:22 +0000 (15:55 -0700)]
ceph-pybind-test: test embedded NULLs in data
Test embedded nulls in rados data. Fix a bug in rados.Object.__str__
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Colin Patrick McCabe [Wed, 25 May 2011 22:49:10 +0000 (15:49 -0700)]
obsync: more fixes for RgwStore
* Fix content-type handling
* add vvprint and use it in Object::equals.
* support RgwStore::prefix
* more tests
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Colin Patrick McCabe [Wed, 25 May 2011 22:48:31 +0000 (15:48 -0700)]
pybind/rados: correctly return data with NULLs
Correctly handle returning data with embedded NULLs in it.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Colin Patrick McCabe [Wed, 25 May 2011 22:21:37 +0000 (15:21 -0700)]
pybind/rados.py: throw NoData on ENODATA
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Wed, 25 May 2011 21:54:15 +0000 (14:54 -0700)]
mds: fix canceled lock attempt
If client tries to lock a file, has to wait, and then cancels the attempt,
the client will send an unlock request to unwind its state.
- the unlock now removes the waiting lock attempt from the wait list
- when the lock request retries and finds it is no longer on the wait
list it will fail.
Signed-off-by: Sage Weil <sage@newdream.net>
Colin Patrick McCabe [Wed, 25 May 2011 21:22:51 +0000 (14:22 -0700)]
pybind/rados.py: rados.Object.key should be string
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Colin Patrick McCabe [Wed, 25 May 2011 19:58:39 +0000 (12:58 -0700)]
obysnc: RgwStore: make sure destination users exist
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Colin Patrick McCabe [Wed, 25 May 2011 19:36:37 +0000 (12:36 -0700)]
obsync: fix DST_OWNER
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Yehuda Sadeh [Wed, 25 May 2011 19:32:50 +0000 (12:32 -0700)]
rgw: return EACCES if acl xattr doesn't exist
Colin Patrick McCabe [Wed, 25 May 2011 19:05:15 +0000 (12:05 -0700)]
obsync: Add boto_retries, remove rgw_store.prefix
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Colin Patrick McCabe [Wed, 25 May 2011 18:01:16 +0000 (11:01 -0700)]
rados python bindings: handle xattrs with NULL
Handle extended attributes that contain NULL bytes correctly, rather
than treating everything as zero-terminated C strings.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Samuel Just [Wed, 25 May 2011 17:54:27 +0000 (10:54 -0700)]
PG: fix race in _activate_committed
Previously, _activate_committed would access the osdmap epoch racing
with handle_osd_map's osdmap update. This would allow a message to be
sent from a replica to the primary tagged with the same epoch as
last_warm_restart, though the event actually occured before
last_warm_restart. Thus the primary would fail to ignore the event and
transition to crashed.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Colin Patrick McCabe [Wed, 25 May 2011 17:50:15 +0000 (10:50 -0700)]
RgwStore: fix some ACL issues
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Colin Patrick McCabe [Tue, 24 May 2011 22:39:58 +0000 (15:39 -0700)]
test-obysnc.py: support librgw testing
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Colin Patrick McCabe [Tue, 24 May 2011 21:15:53 +0000 (14:15 -0700)]
Rename RadosStore to RgwStore
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Colin Patrick McCabe [Thu, 19 May 2011 23:33:13 +0000 (16:33 -0700)]
test-obsync: refactor a little bit
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Colin Patrick McCabe [Wed, 18 May 2011 23:39:18 +0000 (16:39 -0700)]
Proper ACL support for rados targets
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Wed, 18 May 2011 04:29:33 +0000 (21:29 -0700)]
mds: do not shift to EXCL or MIX while rdlocked
There was an old change in file_eval() that was allowing us to switch from
SYNC to MIX or EXCL while there were rdlocks, which either caused lots of
lock thrashing or could (I think) hang things up completely. This was
from
ea10a672 , an ancient fix for something related that appears to have
taken out the rdlocked check by accident.
In my tests (one writer, one stat-er), this took things from long stalls
(up to 20 seconds) to very responsive stats. Yay!
Fixes: #791
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Wed, 25 May 2011 04:14:59 +0000 (21:14 -0700)]
crushtool: clean up add-item a bit; don't add item to same bucket twice
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Wed, 25 May 2011 04:05:47 +0000 (21:05 -0700)]
crushtool: fix remove-item
Scan all buckets instead of doing a tree traverse.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Wed, 25 May 2011 03:30:38 +0000 (20:30 -0700)]
radosgw_admin: update clitest
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Colin Patrick McCabe [Wed, 25 May 2011 01:16:08 +0000 (18:16 -0700)]
mkcephfs.in: print out usage if no actions given
If the user didn't specify any actions, print out a usage message rather
than silently exiting.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Colin Patrick McCabe [Wed, 25 May 2011 00:50:24 +0000 (17:50 -0700)]
rgw: Fix RGWAccess::init_storage_provider
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Wed, 25 May 2011 00:05:30 +0000 (17:05 -0700)]
mkcephfs: error out on bad usage
Signed-off-by: Sage Weil <sage@newdream.net>
Yehuda Sadeh [Tue, 24 May 2011 23:40:12 +0000 (16:40 -0700)]
make: fix build for rgw
Yehuda Sadeh [Tue, 24 May 2011 23:33:11 +0000 (16:33 -0700)]
rgw_admin: clean warning
Yehuda Sadeh [Tue, 24 May 2011 22:30:17 +0000 (15:30 -0700)]
Merge commit 'origin/master' into rgw-multiuser
Yehuda Sadeh [Tue, 24 May 2011 21:29:50 +0000 (14:29 -0700)]
rgw_admin: add key create
Yehuda Sadeh [Tue, 24 May 2011 21:17:59 +0000 (14:17 -0700)]
rgw_admin: subuser and key removal
Sage Weil [Wed, 11 May 2011 04:35:50 +0000 (21:35 -0700)]
journaler: tolerate ENOENT when prezeroing
ENOENT is okay and expected.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Colin Patrick McCabe [Tue, 24 May 2011 19:36:07 +0000 (12:36 -0700)]
test_common.sh: skip rm before put
The rm before the put is unecessary and actually incorrect now.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Colin Patrick McCabe [Tue, 24 May 2011 19:34:56 +0000 (12:34 -0700)]
radostool: rados put should use write_full
If "rados put" uses write instead of write_full, the resulting object on
the server may be a mismash of old and new objects, if the old object
was longer than the new one. This is fairly counterintuitive behavior
for radostool, so remove it.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Colin Patrick McCabe [Tue, 24 May 2011 19:22:30 +0000 (12:22 -0700)]
Merge branch 'wip_ceph_context'
Colin Patrick McCabe [Mon, 23 May 2011 23:25:57 +0000 (16:25 -0700)]
Create a libcommon service thread
Create a libcommon service thread. Use it to handle SIGHUP.
Handle it by means of a flag that gets set. Using a queue would raise
the complicated question of what to do when the queue was full.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Tue, 24 May 2011 17:00:23 +0000 (10:00 -0700)]
librados: len should be size_t
Unsigned, and size_t because it's a buffer size.
Fixes signedness warning in testrados.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Tue, 24 May 2011 16:47:06 +0000 (09:47 -0700)]
osd: add ability to explicitly mark unfound as lost
Instead of automatically marking unfound objects lost (once we've tried
every location we can think of), do it when the administator explicitly
says to. This avoids marking things wrong incorrectly when there are
peering issues, and also allows the administrator to decide whether there
may be offline osds that are worth bringing online.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Tue, 24 May 2011 16:42:39 +0000 (09:42 -0700)]
osd: make automatically marking of unfound as lost optional
We may not want to do this automatically until we have more confidense in
the recovery code. Even then, possible not. In particular, the OSDs may
believe they have contact all possible homes for the data even though there
is some long-lost OSD that has the data on disk that if offline.
For now, we make the marking process explicit so that the administrator can
make the call.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Tue, 24 May 2011 16:26:40 +0000 (09:26 -0700)]
mds: clean up get_or_create_stray
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Tue, 24 May 2011 16:24:42 +0000 (09:24 -0700)]
mds: initialize stray_index on startup
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Tue, 24 May 2011 16:17:24 +0000 (09:17 -0700)]
Merge branch 'stable'
Sage Weil [Tue, 24 May 2011 04:11:44 +0000 (21:11 -0700)]
v0.28.1
Colin Patrick McCabe [Mon, 23 May 2011 21:02:15 +0000 (14:02 -0700)]
librads, libceph: store CephContext
Don't use the global g_ceph_context. Instead, store the CephContext in
the structures provided by the library user.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Colin Patrick McCabe [Mon, 23 May 2011 17:11:15 +0000 (10:11 -0700)]
Add CephContext
A CephContext represents the context held by a single library user.
There can be multiple CephContexts in the same process.
For daemons and utility programs, there will be only one CephContext.
The CephContext contains the configuration, the dout object, and
anything else that you might want to pass to libcommon with every
function call.
Move some non-config things out of md_config_t and into CephContext.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Colin Patrick McCabe [Mon, 23 May 2011 23:29:49 +0000 (16:29 -0700)]
Split common_init_daemonize from common_init_finish
Split off common_init_daemonize from common_init_finish. cfuse is a
daemon that calls common_init_finish, but handles daemonization itself.
This fixes cfuse.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Yehuda Sadeh [Mon, 23 May 2011 23:52:59 +0000 (16:52 -0700)]
rgw_admin: make interface a bit more explicit
Yehuda Sadeh [Mon, 23 May 2011 22:12:48 +0000 (15:12 -0700)]
rgw: subuser permissions
Sage Weil [Mon, 23 May 2011 21:58:26 +0000 (14:58 -0700)]
mon: verify that crush max does not exceed osd max
- when injecting a new crushmap
- when adjusting osdmap max_osd
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Sun, 22 May 2011 23:25:35 +0000 (16:25 -0700)]
crushtool: add --reweight-item <name> <weight>
Reweight and individual item via crushtool.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Sat, 21 May 2011 19:55:16 +0000 (12:55 -0700)]
osdmaptool: fail --import-crush if crush max_devices > osdmap max_osd
Crush will spew non-deterministic badness if it walks off the end of
the osd_weight vector.
Signed-off-by: Sage Weil <sage@newdream.net>
Colin Patrick McCabe [Fri, 20 May 2011 23:35:52 +0000 (16:35 -0700)]
common_init: don't init crypto until after fork
Get rid of the initialize-then-shutdown-crypto hack. We just initialize
crypto once, after it is safe to do so. There is now a single callback,
common_init_finish, which does the final stage of initialization,
including starting crypto and daemonization (if required.)
common_init_finish needs to be done before messenger::start().
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Colin Patrick McCabe [Fri, 20 May 2011 22:12:49 +0000 (15:12 -0700)]
ceph_crypto: add assert_init
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Sage Weil [Sat, 21 May 2011 01:16:49 +0000 (18:16 -0700)]
config: delete after new
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Sat, 21 May 2011 00:10:15 +0000 (17:10 -0700)]
crush: fix signedness warnings
Signed-off-by: Sage Weil <sage@newdream.net>
Yehuda Sadeh [Fri, 20 May 2011 23:46:14 +0000 (16:46 -0700)]
rgw_admin: able to create multiple keys/subusers
Sage Weil [Fri, 20 May 2011 23:45:57 +0000 (16:45 -0700)]
crushtool: --remove-item name
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Fri, 20 May 2011 23:41:16 +0000 (16:41 -0700)]
crush: fix tree bucket encoding
I wonder how long this has been broken!
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Fri, 20 May 2011 23:40:36 +0000 (16:40 -0700)]
crush: fix tree weight accessor, decompile
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Fri, 20 May 2011 22:44:15 +0000 (15:44 -0700)]
crushtool: default to hash 0 (rjenkins1)
Otherwise we get 255 which is undefined and get bad results!
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Yehuda Sadeh [Fri, 20 May 2011 22:15:48 +0000 (15:15 -0700)]
rgw: user info structure supports multiple subusers and keys
Sage Weil [Fri, 20 May 2011 22:08:06 +0000 (15:08 -0700)]
osd: update last_epoch_clean in PG::Info::History::merge()
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Fri, 20 May 2011 22:04:57 +0000 (15:04 -0700)]
osd: small cleanup
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Fri, 20 May 2011 22:04:09 +0000 (15:04 -0700)]
osd: merge history when primary sends replica new pg info
This, among other things, lets us update last_epoch_started and
last_epoch_clean.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Fri, 20 May 2011 21:45:36 +0000 (14:45 -0700)]
osd: more heartbeat rework
A few things:
- track Connection* instead of entity_inst_t for hb peers
- we can only send maps over the cluster_messenger
- if peer is still alive, do that
- if peer is not, send dying MOSDPing ping with YOU_DIED flag
Sage Weil [Fri, 20 May 2011 21:43:57 +0000 (14:43 -0700)]
msgr: don't close close_on_empty until outgoing messages are acked
Otherwise, if we close the socket, we may lose in-flight data.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>