]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
13 years agoosd: send log with backfill restart
Sage Weil [Wed, 11 Jan 2012 00:14:13 +0000 (16:14 -0800)]
osd: send log with backfill restart

This makes backfill restart less of a special case: we send an info AND
log, just like we do normally.  Code paths are more similar than before.

The main change here is that the backfill target gets a pg log with recent
history, which allows it to more reliably detect dup operations.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: fail to peer if interval lacks any !incomplete replicas
Sage Weil [Tue, 10 Jan 2012 21:23:00 +0000 (13:23 -0800)]
osd: fail to peer if interval lacks any !incomplete replicas

We need at least one non-incomplete replica during a rw interval in order
to peer.  The backfilling/incomplete replicas get log entries, but not
all object writes, so they are (mostly) excluded from the peering process
(find_best_info(), in particular).

We can't do this during the PriorSet calculation because we don't have
their PG::Info yet.  But, once we get it, we need to make sure at least one
of the replicas during the last rw interval is not incomplete, or else we
should mark the pg DOWN (just like the PriorSet calculation does).

This logic mostly mirrors that of PriorSet, but additionally requires
the replicas be !incomplete.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge remote branch 'gh/master' into wip-backfill
Sage Weil [Mon, 9 Jan 2012 17:13:03 +0000 (09:13 -0800)]
Merge remote branch 'gh/master' into wip-backfill

13 years agoosd: populate_obc_watchers when object pulled to primary
Sage Weil [Mon, 9 Jan 2012 00:23:55 +0000 (16:23 -0800)]
osd: populate_obc_watchers when object pulled to primary

We don't care about degraded state, only whether the object is on the
primary so that we can load the object_info_t.

In particular, this avoids problems with backfill, where an object is
not degraded and populated, is then degraded while we backfill to the
target, and then not degraded again, and populate_obc_watchers() is called
a second time.

Fixes: #1903
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: handle case where no acceptable info exists
Sage Weil [Sun, 8 Jan 2012 23:15:18 +0000 (15:15 -0800)]
osd: handle case where no acceptable info exists

This happens when the only available replicas has last_backfill != MAX.

In that case, revert to up, and then set the DOWN state bit.

Instead of waiting for a new map, we should actually wait for a new info
to show up...

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge remote branch 'gh/wip-osd-retry-attempt'
Sage Weil [Sun, 8 Jan 2012 18:15:41 +0000 (10:15 -0800)]
Merge remote branch 'gh/wip-osd-retry-attempt'

13 years agoMerge remote branch 'gh/wip-admin-socket'
Sage Weil [Sun, 8 Jan 2012 16:16:56 +0000 (08:16 -0800)]
Merge remote branch 'gh/wip-admin-socket'

13 years agoperfcounters: fix unittest for new admin_socket interface
Sage Weil [Sat, 7 Jan 2012 03:09:10 +0000 (19:09 -0800)]
perfcounters: fix unittest for new admin_socket interface

Broken by b389685afa1be00b5147855bf71c50042bfbfa6c.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoMakefile: disable untitest_interval_tree
Sage Weil [Sat, 7 Jan 2012 04:39:05 +0000 (20:39 -0800)]
Makefile: disable untitest_interval_tree

Segfaults. Valgrind errors. Accessing uninitialized memory.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agounittest_interval_tree: make it compile
Sage Weil [Sat, 7 Jan 2012 04:38:33 +0000 (20:38 -0800)]
unittest_interval_tree: make it compile

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: clean up src_oid, src_obc map key calculation
Sage Weil [Sat, 7 Jan 2012 01:18:01 +0000 (17:18 -0800)]
osd: clean up src_oid, src_obc map key calculation

Be consistent about how we generate the src_oid and src_oloc, so that we
feed good value into find_object_context and use a consistent key for
the src_obc map<>.  This fixes a crash in do_osd_ops() due to a missing
src_obc key when the get_src_oloc() normalizes the key in do_op() but not
in do_osd_ops().

Also use a nicer name.

Fixes: #1897
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: read op should claim_append data instead of claim
Yehuda Sadeh [Sat, 7 Jan 2012 00:55:52 +0000 (16:55 -0800)]
osd: read op should claim_append data instead of claim

13 years agorgw: remove object before writing both xattrs and data
Yehuda Sadeh [Sat, 7 Jan 2012 00:51:23 +0000 (16:51 -0800)]
rgw: remove object before writing both xattrs and data

otherwise we'll leak xattrs from previous incarnation

13 years agorgw: create plain processor for small objects
Yehuda Sadeh [Sat, 7 Jan 2012 00:16:06 +0000 (16:16 -0800)]
rgw: create plain processor for small objects

13 years agorgw: fix multipart PUT
Yehuda Sadeh [Fri, 6 Jan 2012 23:07:08 +0000 (15:07 -0800)]
rgw: fix multipart PUT

latest revamp broke it, missed calling RGWPutObjProcessor::prepare(s)
where needed.

13 years agorgw: rearrange PutObj::execute()
Yehuda Sadeh [Fri, 6 Jan 2012 20:41:33 +0000 (12:41 -0800)]
rgw: rearrange PutObj::execute()

groundwork for different handling of small object PUTs

13 years agorgw: different atomic handling for small objects
Yehuda Sadeh [Thu, 5 Jan 2012 20:51:27 +0000 (12:51 -0800)]
rgw: different atomic handling for small objects

13 years agoMerge remote branch 'gh/master' into wip-backfill
Sage Weil [Sat, 7 Jan 2012 00:44:11 +0000 (16:44 -0800)]
Merge remote branch 'gh/master' into wip-backfill

13 years agomon: fix uninitialized cluster_logger_registered
Sage Weil [Fri, 6 Jan 2012 22:32:16 +0000 (14:32 -0800)]
mon: fix uninitialized cluster_logger_registered

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoobjecter: ignore replies from old request attempts
Sage Weil [Fri, 6 Jan 2012 19:38:15 +0000 (11:38 -0800)]
objecter: ignore replies from old request attempts

If we know the request attempt, ignore old attempts.

If we do not know the attempt (because the server is old), accept the
reply.  This could lead to doing some ACK callbacks we shouldn't in
extreme failure/recovery scenarios, but that is better than doing
the callbacks out of order.

Partially fixes: #1490
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: encode retry attempt in MOSDOp[Reply]
Sage Weil [Fri, 6 Jan 2012 20:49:13 +0000 (12:49 -0800)]
osd: encode retry attempt in MOSDOp[Reply]

In addition to the boolean flag, also encode the exact retry attempt.

Return -1 if we don't know.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: document quorum_status, mon_status
Sage Weil [Fri, 6 Jan 2012 20:20:18 +0000 (12:20 -0800)]
mon: document quorum_status, mon_status

Fixes: #1824
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: fix misplaced else
Sage Weil [Fri, 6 Jan 2012 20:19:59 +0000 (12:19 -0800)]
mon: fix misplaced else

Broken by 435c29448a10ec343f5a2b7195d94c72de5b1a25.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge remote branch 'gh/wip-mon-timeouts'
Sage Weil [Fri, 6 Jan 2012 18:20:55 +0000 (10:20 -0800)]
Merge remote branch 'gh/wip-mon-timeouts'

13 years agoceph: speak new admin socket protocol
Sage Weil [Thu, 5 Jan 2012 21:58:05 +0000 (13:58 -0800)]
ceph: speak new admin socket protocol

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoadmin_socket: fix, extend admin_socket unit tests
Sage Weil [Fri, 6 Jan 2012 17:30:54 +0000 (09:30 -0800)]
admin_socket: fix, extend admin_socket unit tests

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoadmin_socket: string commands
Sage Weil [Thu, 5 Jan 2012 21:57:58 +0000 (13:57 -0800)]
admin_socket: string commands

Commands are strings.  Old __be32 works too.  'help' to list available
commands.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: elector needs to reset leader_acked on every election start
Greg Farnum [Thu, 5 Jan 2012 23:29:32 +0000 (15:29 -0800)]
mon: elector needs to reset leader_acked on every election start

Otherwise you never reset the leader_acked after a failed
election attempt, so if mon 0 is available on the first round
but then fails, you never make progress!

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agomon: instrument elector so you can stop participating in the quorum
Greg Farnum [Thu, 5 Jan 2012 23:36:37 +0000 (15:36 -0800)]
mon: instrument elector so you can stop participating in the quorum

Add new monitor commands "quorum exit" and "quorum enter" to use it.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agomon: kill client sessions when we're not in quorum
Greg Farnum [Thu, 5 Jan 2012 22:03:43 +0000 (14:03 -0800)]
mon: kill client sessions when we're not in quorum

After a timeout of 2*mon_lease length (ie, two election rounds),
kill existing client sessions so they can reconnect to a
monitor that's (hopefully) remained in the quorum. Let any
new client sessions stick around for a mon_lease interval, then
do the same to them.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoOCF RA: fix variable name
Florian Haas [Thu, 5 Jan 2012 21:33:32 +0000 (22:33 +0100)]
OCF RA: fix variable name

13 years agodebian: build ceph-resource-agents
Florian Haas [Thu, 5 Jan 2012 21:33:31 +0000 (22:33 +0100)]
debian: build ceph-resource-agents

13 years agoosd: parameterize min/max values for backfill scanning
Sage Weil [Tue, 3 Jan 2012 17:30:42 +0000 (09:30 -0800)]
osd: parameterize min/max values for backfill scanning

For local scans, use the optimal value for the local filestore.

For remote scans, make it configurable, so we can control how frequently
we need to wait for scan requests over the wire.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoadmin_socket: refactor
Sage Weil [Thu, 5 Jan 2012 18:46:31 +0000 (10:46 -0800)]
admin_socket: refactor

Combine AdminSocketConfigObs with AdminSocket so that we can interact
with it via the cct.  Simpler class structure.  Less pointer indirection.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agorbd: add a command to delete all snapshots of an image
Josh Durgin [Thu, 5 Jan 2012 01:07:07 +0000 (17:07 -0800)]
rbd: add a command to delete all snapshots of an image

This makes deleting images with many snapshots easier.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoadmin_socket: whitespace
Sage Weil [Thu, 5 Jan 2012 17:32:49 +0000 (09:32 -0800)]
admin_socket: whitespace

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agocommon: default 'mon osd auto mark in = false'
Sage Weil [Thu, 5 Jan 2012 17:30:33 +0000 (09:30 -0800)]
common: default 'mon osd auto mark in = false'

This way an osd that was explicitly marked out will stay out, even when
it is restarted.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: log backfill restart
Sage Weil [Thu, 5 Jan 2012 17:26:12 +0000 (09:26 -0800)]
osd: log backfill restart

This is interesting, particularly in determining when a peer that was
partially backfilled needs to be restarted.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agolibrbd: don't remove an image that still has snapshots
Josh Durgin [Thu, 5 Jan 2012 01:05:49 +0000 (17:05 -0800)]
librbd: don't remove an image that still has snapshots

Return -EBUSY instead. After the header is removed, the snapshots
can't be removed or read, so make sure they're gone before proceeding.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoSimpleMessenger: clarify when ms_bind_ipv6 is used
Josh Durgin [Thu, 5 Jan 2012 01:34:17 +0000 (17:34 -0800)]
SimpleMessenger: clarify when ms_bind_ipv6 is used

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoqa: add a slightly more stressful anchortable test
Greg Farnum [Thu, 5 Jan 2012 01:08:45 +0000 (17:08 -0800)]
qa: add a slightly more stressful anchortable test

This creates more than 8 links.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoqa: fix mdstable script for proper injectargs use.
Greg Farnum [Wed, 4 Jan 2012 23:36:08 +0000 (15:36 -0800)]
qa: fix mdstable script for proper injectargs use.

This script is fairly primitive, but somebody might find it useful...

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoMerge remote branch 'gh/master' into wip-backfill
Sage Weil [Thu, 5 Jan 2012 00:38:23 +0000 (16:38 -0800)]
Merge remote branch 'gh/master' into wip-backfill

13 years agorados: fix run-length option parsing for rados load-gen
Sage Weil [Thu, 5 Jan 2012 00:25:29 +0000 (16:25 -0800)]
rados: fix run-length option parsing for rados load-gen

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomove cluster protocol definitions out of ceph_fs.h
Sage Weil [Wed, 4 Jan 2012 21:54:45 +0000 (13:54 -0800)]
move cluster protocol definitions out of ceph_fs.h

Among other things, we don't recompile the whole system when we touch
these.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomsgr: explicitly specify internal cluster protocol
Sage Weil [Wed, 4 Jan 2012 21:54:11 +0000 (13:54 -0800)]
msgr: explicitly specify internal cluster protocol

Replace case statement based on my_type.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: rev cluster protocol
Sage Weil [Wed, 4 Jan 2012 21:21:36 +0000 (13:21 -0800)]
mon: rev cluster protocol

The OSDMap NEW and AUTOOUT bit additions subtely change the decoding of
the incremental maps in a reasonably harmless way in that the bits get
implicitly cleared whenever the OSD weight changes from non-zero.  The
monitors need to agree on this behavior to avoid odd behavior.  We don't
care what clients see, since those bits are informational only.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosdmap: include state names in dump()
Sage Weil [Wed, 4 Jan 2012 20:30:42 +0000 (12:30 -0800)]
osdmap: include state names in dump()

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: separately control auto-mark-in of new OSDs
Sage Weil [Wed, 4 Jan 2012 20:30:23 +0000 (12:30 -0800)]
mon: separately control auto-mark-in of new OSDs

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: maintain CEPH_OSD_NEW bit for new, unused OSDs
Sage Weil [Wed, 4 Jan 2012 20:29:14 +0000 (12:29 -0800)]
mon: maintain CEPH_OSD_NEW bit for new, unused OSDs

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: independently control whether AUTOOUT OSDs are marked in on boot
Sage Weil [Wed, 4 Jan 2012 19:31:35 +0000 (11:31 -0800)]
mon: independently control whether AUTOOUT OSDs are marked in on boot

Add separate config option to control whether the monitor will mark
AUTOOUT OSDs in on boot.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: track auto-marked-out osds
Sage Weil [Wed, 4 Jan 2012 20:56:15 +0000 (12:56 -0800)]
mon: track auto-marked-out osds

Mark OSDs that were automatically marked OUT by the monitor because they
were down for too long.  Clear the bit as soon as they are no longer out,
as soon as the weight is changed from 0.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: don't add all strays in calc_acting()
Sage Weil [Wed, 4 Jan 2012 21:43:17 +0000 (13:43 -0800)]
osd: don't add all strays in calc_acting()

We weren't counting up usable strays, which meant we added all of them.
This could result in acting sets with more active replicas than we wanted.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: mark degraded only when < desired replica count
Sage Weil [Wed, 4 Jan 2012 21:38:03 +0000 (13:38 -0800)]
osd: mark degraded only when < desired replica count

Having extra replicas is not 'degraded' per se.  Although it's weird that
we ever do that!

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: avoid querying missing set from (full) backfill target
Sage Weil [Wed, 4 Jan 2012 20:32:10 +0000 (12:32 -0800)]
osd: avoid querying missing set from (full) backfill target

If we are doing a complete backfill, we don't care about missing; it will
clearly all be below last_backfill anwyay and get ignored.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: fix backfill reset on activate
Sage Weil [Wed, 4 Jan 2012 20:31:30 +0000 (12:31 -0800)]
osd: fix backfill reset on activate

Look at peer's info, now our own!

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge pull request #8 from kylemarsh/master
Sage Weil [Wed, 4 Jan 2012 23:53:51 +0000 (15:53 -0800)]
Merge pull request #8 from kylemarsh/master

Remove cloudfiles requirement from obsync.

13 years agoqa: load-gen-mix-small-long
Sage Weil [Wed, 4 Jan 2012 22:21:01 +0000 (14:21 -0800)]
qa: load-gen-mix-small-long

30 minutes

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoobsync: make obsync run without cloudfiles installed 8/head
Kyle Marsh [Wed, 4 Jan 2012 22:14:17 +0000 (14:14 -0800)]
obsync: make obsync run without cloudfiles installed

Cloudfiles probably shouldn't be a requirement for running obsync, so this
commit makes it optional.

13 years agoosd: initialize backfill_target; include in PG operator<<
Sage Weil [Wed, 4 Jan 2012 17:52:35 +0000 (09:52 -0800)]
osd: initialize backfill_target; include in PG operator<<

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: initialize backfill_pos on activate
Sage Weil [Wed, 4 Jan 2012 17:42:02 +0000 (09:42 -0800)]
osd: initialize backfill_pos on activate

Handling of writes depends on backfill_pos being initialized (to know what
is between the leading and trailing edge of the backfill), so it needs to
be initialized at activate time to avoid badness on writes prior to
recovery starting.

- initialize during activate to last_backfill
- update on receiving the digest to maintain the invariant that
  backfill_pos = min(peer_backfill_info.start, backfill_info.start)
  in recover_backfill().

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: do not use incomplete peer for best info/log
Sage Weil [Sun, 1 Jan 2012 04:44:05 +0000 (20:44 -0800)]
osd: do not use incomplete peer for best info/log

For one, their stats are incomplete; if we use them we'll screw up everyone
else.  For another, it doesn't do us any good if they are a bit ahead of
the peers: we/they may not even have the objects their newer log says were
updated.  The only real use is if their log extends farther back in time,
but that is a problem in general that we'll eventually solve in other ways.

On the other hand, having the pg_stats sum only through last_backfill may
not have been the best choice; we could avoid that part of things by adding
a objects_backfilled field.  But this is probably a good idea anyway.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: fix misdirect check for requests with old epochs
Sage Weil [Wed, 4 Jan 2012 18:46:35 +0000 (10:46 -0800)]
osd: fix misdirect check for requests with old epochs

get_map() assumes the epoch passed is valid.  Check here in the caller.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: check that we're supposed to be getting a PG before waitlisting requests.
Sage Weil [Wed, 4 Jan 2012 18:44:12 +0000 (10:44 -0800)]
osd: check that we're supposed to be getting a PG before waitlisting requests.

This was broken in fa722de6708d3e92037df6289cc29ece12c8ea66.

Fix it by checking if the mapping was correct in the sender's epoch, and
either drop it (if valid) or handle_misdirected_request() if not.

Also fix the documentation for op_is_queueable() to not be a gigantic lie.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agorados: gracefully report errors from 'ls'
Sage Weil [Wed, 4 Jan 2012 18:40:06 +0000 (10:40 -0800)]
rados: gracefully report errors from 'ls'

Catch the exception thrown by the iterator when the OSD returns errors.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: return EINVAL on bad PGLS[_FILTER] handle
Sage Weil [Wed, 4 Jan 2012 17:49:28 +0000 (09:49 -0800)]
osd: return EINVAL on bad PGLS[_FILTER] handle

Fixes: #1875
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agolibrados: return int64_t pool ids
Josh Durgin [Wed, 4 Jan 2012 01:11:28 +0000 (17:11 -0800)]
librados: return int64_t pool ids

468e28ee60ee2fe625d2680c792a4bcb9ef19951 missed the get_id() functions.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agorados.py: use uint64_t for auids
Josh Durgin [Wed, 4 Jan 2012 00:24:59 +0000 (16:24 -0800)]
rados.py: use uint64_t for auids

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoradosgw-admin: add eol following info
Yehuda Sadeh [Tue, 3 Jan 2012 22:06:35 +0000 (14:06 -0800)]
radosgw-admin: add eol following info

13 years agotestrados: replace testreadwrite and testsnaps with testrados
Josh Durgin [Tue, 3 Jan 2012 19:09:00 +0000 (11:09 -0800)]
testrados: replace testreadwrite and testsnaps with testrados

testrados can act as testreadwrite or testsnaps by changing the
command line options for the weight of each operation type.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoRadosModel: check for out of order replies within WriteOps
Josh Durgin [Fri, 30 Dec 2011 02:36:54 +0000 (18:36 -0800)]
RadosModel: check for out of order replies within WriteOps

A single WriteOp already does multiple aio_writes. Each aio_write
gets a unique tid that is checked upon completion. There's no reason
to loop over the ranges twice since we can use the done flag instead
of the set of completions in WriteOp::finished().

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoRadosModel: allow TestOps to pass data to their finish methods
Josh Durgin [Fri, 30 Dec 2011 02:30:57 +0000 (18:30 -0800)]
RadosModel: allow TestOps to pass data to their finish methods

This will allow nested writes to keep track of which write actually
completed.  Also remove finish() and _finish() from TestOp subclasses
that had the same implementation as the superclass.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoRadosModel: make object write ranges configurable
Josh Durgin [Fri, 30 Dec 2011 01:55:45 +0000 (17:55 -0800)]
RadosModel: make object write ranges configurable

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoosd: add a monitor timeout via MPGStatsAck messages
Greg Farnum [Sat, 24 Dec 2011 00:41:38 +0000 (16:41 -0800)]
osd: add a monitor timeout via MPGStatsAck messages

Keep track of when we have outstanding updates, and while we do, make
sure the monitor responds within a timeout (default 30 seconds). If
it doesn't, reconnect!

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoFix invalid docdir_SCRIPTS usage with >=automake-1.11.2
Alphat-PC [Tue, 3 Jan 2012 15:36:15 +0000 (16:36 +0100)]
Fix invalid docdir_SCRIPTS usage with >=automake-1.11.2

13 years agoosd: trigger RecoveryFinished event on recovery completion
Sage Weil [Sat, 31 Dec 2011 23:09:58 +0000 (15:09 -0800)]
osd: trigger RecoveryFinished event on recovery completion

Unconditionally trigger the RecoveryFinished event when start_recvoery_ops
thinks it may be done.  This lets us trigger the acting change (if needed),
or call finish_recovery() if needed.

This fixes the case where we are backfilling with up == acting, complete,
but don't call finish_recovery() or clear the backfill|degraded bits.

At some point we may want to move the is_all_uptodate() checks to the
caller.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agolibrados: take lock in rollback
Sage Weil [Sat, 31 Dec 2011 01:04:47 +0000 (17:04 -0800)]
librados: take lock in rollback

We're poking through the osdmap; need to hold the lock here.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoobjecter: assert lock held in op_submit
Sage Weil [Sat, 31 Dec 2011 01:04:30 +0000 (17:04 -0800)]
objecter: assert lock held in op_submit

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agolibrados: call aio_operate() with lock held
Sage Weil [Sat, 31 Dec 2011 01:04:22 +0000 (17:04 -0800)]
librados: call aio_operate() with lock held

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: be a bit more verbose during backfill
Sage Weil [Sat, 31 Dec 2011 00:45:21 +0000 (16:45 -0800)]
osd: be a bit more verbose during backfill

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agocmp: fix 5-uple operator==
Sage Weil [Sat, 31 Dec 2011 00:44:53 +0000 (16:44 -0800)]
cmp: fix 5-uple operator==

Doh!

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: do not backfill if any objects are missing on the primary
Sage Weil [Fri, 30 Dec 2011 23:51:45 +0000 (15:51 -0800)]
osd: do not backfill if any objects are missing on the primary

Someday we need to do something smarter so that a single unfound object
doesn't hold up replication of other objects.  For now, this is the
simplest thing to do.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agorgw: create default constructors for some structs
Yehuda Sadeh [Fri, 30 Dec 2011 22:18:40 +0000 (14:18 -0800)]
rgw: create default constructors for some structs

this will silence valgrind a bit

13 years agoosd: handle backfill_target for pick_newest_available
Sage Weil [Fri, 30 Dec 2011 20:23:02 +0000 (12:23 -0800)]
osd: handle backfill_target for pick_newest_available

The it may not be missing on the backfill_target if it is after the
last_backfill marker.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: return EINVAL if multi op specified with no src object name
Sage Weil [Fri, 30 Dec 2011 20:19:32 +0000 (12:19 -0800)]
osd: return EINVAL if multi op specified with no src object name

This avoids crashing later in do_osd_ops() with something like

osd/ReplicatedPG.cc: In function 'int ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&, ceph::bufferlist&)', in thread '7f27e2d7e700'
osd/ReplicatedPG.cc: 1386: FAILED assert(src_obc)

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agohobject_t: fix operator==, !=
Sage Weil [Fri, 30 Dec 2011 19:39:30 +0000 (11:39 -0800)]
hobject_t: fix operator==, !=

These weren't comparing key.

While we're at it, clean this up by using generic macros for writing
these operators, so we don't get it wrong half the time.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agocmp.h: define macros for creating comparison operators
Sage Weil [Fri, 30 Dec 2011 19:37:43 +0000 (11:37 -0800)]
cmp.h: define macros for creating comparison operators

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoMerge remote branch 'gh/master' into wip-backfill
Sage Weil [Fri, 30 Dec 2011 18:43:33 +0000 (10:43 -0800)]
Merge remote branch 'gh/master' into wip-backfill

13 years agoworkunits: update rbd test for new error format
Josh Durgin [Fri, 30 Dec 2011 18:32:05 +0000 (10:32 -0800)]
workunits: update rbd test for new error format

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoconfig: use autoconf $libdir for default rados class dir
Sage Weil [Fri, 30 Dec 2011 17:50:36 +0000 (09:50 -0800)]
config: use autoconf $libdir for default rados class dir

Fixes: #1722
Signed-off-by: Sage Weil <sage@newdream.net>
13 years ago.gitignore: src/ocf/ceph
Sage Weil [Fri, 30 Dec 2011 17:17:06 +0000 (09:17 -0800)]
.gitignore: src/ocf/ceph

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoSpec: conditionally build ceph-resource-agents package
Florian Haas [Thu, 29 Dec 2011 19:58:02 +0000 (20:58 +0100)]
Spec: conditionally build ceph-resource-agents package

Put OCF resource agents in a separate subpackage,
to be enabled with a separate build conditional
(--with ocf).

Make the subpackage depend on the resource-agents
package, which provides the ocf-shellfuncs library
that the Ceph RAs use.

Signed-off-by: Florian Haas <florian@hastexo.com>
13 years agoAdd OCF-compliant resource agent for Ceph daemons
Florian Haas [Thu, 29 Dec 2011 19:58:01 +0000 (20:58 +0100)]
Add OCF-compliant resource agent for Ceph daemons

Add a wrapper around the ceph init script that makes
MDS, OSD and MON configurable as Open Cluster Framework
(OCF) compliant cluster resources. Allows Ceph
daemons to tie in with cluster resource managers that
support OCF, such as Pacemaker (http://www.clusterlabs.org).

Disabled by default, configure --with-ocf to enable.

Signed-off-by: Florian Haas <florian@hastexo.com>
13 years agomon: fix full ratio updates
Sage Weil [Fri, 30 Dec 2011 16:06:55 +0000 (08:06 -0800)]
mon: fix full ratio updates

- update them independently
- only if we are leader
- fix type for nearfull_ratio

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: don't ignore first full ratio update callback
Sage Weil [Fri, 30 Dec 2011 16:06:06 +0000 (08:06 -0800)]
mon: don't ignore first full ratio update callback

We get a callack on startup.  Don't ignore it.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: only update full_ratio if we're the leader
Sage Weil [Fri, 30 Dec 2011 15:45:21 +0000 (07:45 -0800)]
mon: only update full_ratio if we're the leader

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge remote branch 'gh/wip-cleanup'
Sage Weil [Fri, 30 Dec 2011 15:42:20 +0000 (07:42 -0800)]
Merge remote branch 'gh/wip-cleanup'

13 years agomon: make full ratio config change callback safe
Sage Weil [Fri, 30 Dec 2011 01:15:07 +0000 (17:15 -0800)]
mon: make full ratio config change callback safe

We can't propose_pending() from any context; do this in the tick() thread,
with the proper locking.  Among other things, this fixes the crash on
startup that is now triggered due to eba235f2.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoclitests: update for new error format
Josh Durgin [Thu, 29 Dec 2011 23:43:55 +0000 (15:43 -0800)]
clitests: update for new error format

This was changed in 1f434da8a3ca4db830d1f3b0d87e5df941d85f2d

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoclitests: update monmaptool test
Josh Durgin [Thu, 29 Dec 2011 23:28:16 +0000 (15:28 -0800)]
clitests: update monmaptool test

e93961c11119942eae3a4cd14a79f779a5a4d277 changed output format.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>