]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
13 years agorgw: swift related adjustments
Yehuda Sadeh [Thu, 27 Oct 2011 21:31:16 +0000 (14:31 -0700)]
rgw: swift related adjustments

13 years agoMerge branch 'master' of github.com:NewDreamNetwork/ceph
Sage Weil [Thu, 27 Oct 2011 21:26:53 +0000 (14:26 -0700)]
Merge branch 'master' of github.com:NewDreamNetwork/ceph

13 years agofixed graphic reference and headings
Sondra.Menthers [Thu, 27 Oct 2011 21:04:56 +0000 (14:04 -0700)]
fixed graphic reference and headings

13 years agofixed image reference
Sondra.Menthers [Thu, 27 Oct 2011 21:00:57 +0000 (14:00 -0700)]
fixed image reference

13 years agofixed architecture document
Sondra.Menthers [Thu, 27 Oct 2011 20:54:31 +0000 (13:54 -0700)]
fixed architecture document

13 years agoadd images for documentation
Sondra.Menthers [Thu, 27 Oct 2011 20:43:05 +0000 (13:43 -0700)]
add images for documentation

13 years agorgw: handle swift PUT with incorrect etag
Sondra.Menthers [Thu, 27 Oct 2011 19:51:57 +0000 (12:51 -0700)]
rgw: handle swift PUT with incorrect etag

13 years agorgw: handle swift PUT with incorrect etag
Sondra.Menthers [Thu, 27 Oct 2011 19:44:37 +0000 (12:44 -0700)]
rgw: handle swift PUT with incorrect etag

13 years agorgw: handle swift PUT with incorrect etag
Sondra.Menthers [Thu, 27 Oct 2011 18:20:41 +0000 (11:20 -0700)]
rgw: handle swift PUT with incorrect etag

13 years agorgw: handle swift PUT with incorrect etag
Sondra.Menthers [Thu, 27 Oct 2011 18:20:41 +0000 (11:20 -0700)]
rgw: handle swift PUT with incorrect etag

13 years agorgw: handle swift PUT with incorrect etag
Sondra.Menthers [Thu, 27 Oct 2011 18:16:51 +0000 (11:16 -0700)]
rgw: handle swift PUT with incorrect etag

13 years agorgw: handle swift PUT with incorrect etag
Sondra.Menthers [Thu, 27 Oct 2011 18:02:23 +0000 (11:02 -0700)]
rgw: handle swift PUT with incorrect etag

13 years agoceph: refactor for generic --admin-daemon <sock> <cmd> too
Sage Weil [Thu, 27 Oct 2011 17:02:42 +0000 (10:02 -0700)]
ceph: refactor for generic --admin-daemon <sock> <cmd> too

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoceph: --dump-perf-counters[-schema] sockpath
Sage Weil [Thu, 27 Oct 2011 16:48:08 +0000 (09:48 -0700)]
ceph: --dump-perf-counters[-schema] sockpath

Quick and dirty way to dump perfcounters stats.  Not documenting this until
we decide this is where it should live.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agofilejournal: journal_replay_from
Sage Weil [Thu, 27 Oct 2011 16:47:20 +0000 (09:47 -0700)]
filejournal: journal_replay_from

Force journal replay from a point other than the op_seq recorded by the
fs.  This is useful if you want to skip bad entries in the journal (e.g.,
because they were non-idempotent and you know they were applied and the fs
operations were fully ordered).

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge branch 'stable'
Sage Weil [Thu, 27 Oct 2011 16:26:08 +0000 (09:26 -0700)]
Merge branch 'stable'

13 years agorados: improve error message
Sage Weil [Wed, 26 Oct 2011 21:56:25 +0000 (14:56 -0700)]
rados: improve error message

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoradosgw-admin: fix key create check
Sage Weil [Thu, 27 Oct 2011 04:20:18 +0000 (21:20 -0700)]
radosgw-admin: fix key create check

Also fixes warning

warning: rgw/rgw_admin.cc:812: suggest parentheses around ‘&&’ within ‘||’

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: guard checks for writes
Josh Durgin [Thu, 27 Oct 2011 00:05:34 +0000 (17:05 -0700)]
osd: guard checks for writes

fa722de6708d3e92037df6289cc29ece12c8ea66 moved these checks, and
accidentally removed the may_write() guard. This caused reading from
snapshots to fail.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agorgw: handle swift PUT with incorrect etag
Yehuda Sadeh [Thu, 27 Oct 2011 00:20:51 +0000 (17:20 -0700)]
rgw: handle swift PUT with incorrect etag

13 years agorgw: rgw-admin --skip-zero-entries
Yehuda Sadeh [Wed, 26 Oct 2011 23:07:04 +0000 (16:07 -0700)]
rgw: rgw-admin --skip-zero-entries

13 years agoperfcounters: fix accessor name
Sage Weil [Wed, 26 Oct 2011 23:00:45 +0000 (16:00 -0700)]
perfcounters: fix accessor name

FreakingCamelCaps

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoobjecter: instrument with perfcounter
Sage Weil [Wed, 26 Oct 2011 22:54:15 +0000 (15:54 -0700)]
objecter: instrument with perfcounter

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agorgw: rgw-admin generate-key/access-key=false fix
Yehuda Sadeh [Wed, 26 Oct 2011 22:34:52 +0000 (15:34 -0700)]
rgw: rgw-admin generate-key/access-key=false fix

13 years agorgw: rgw-admin can show log summation
Yehuda Sadeh [Wed, 26 Oct 2011 22:34:18 +0000 (15:34 -0700)]
rgw: rgw-admin can show log summation

13 years agoosd: read_log: only list the collection once
Sage Weil [Wed, 26 Oct 2011 21:56:08 +0000 (14:56 -0700)]
osd: read_log: only list the collection once

After upgrading we may need to list the collection to recover the hash
value when upgrading an old collection.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agorgw: fix bucket suspension
Yehuda Sadeh [Wed, 26 Oct 2011 21:30:26 +0000 (14:30 -0700)]
rgw: fix bucket suspension

13 years agorgw: fix uninitialized variable warnings
Sage Weil [Wed, 26 Oct 2011 04:34:07 +0000 (21:34 -0700)]
rgw: fix uninitialized variable warnings

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoMerge branch 'master' of ssh://github.com/NewDreamNetwork/ceph
Yehuda Sadeh [Tue, 25 Oct 2011 23:29:40 +0000 (16:29 -0700)]
Merge branch 'master' of ssh://github.com/NewDreamNetwork/ceph

Conflicts:
src/rgw/rgw_rados.cc

13 years agohadoop: bring back Java changes.
Greg Farnum [Mon, 10 Oct 2011 15:19:47 +0000 (08:19 -0700)]
hadoop: bring back Java changes.

These convert the Hadoop stuff to work on the branch-0.20 API.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agorgw: fix attr cache
Yehuda Sadeh [Tue, 25 Oct 2011 23:23:08 +0000 (16:23 -0700)]
rgw: fix attr cache

13 years agofix osdmaptool clitests
Sage Weil [Tue, 25 Oct 2011 21:15:13 +0000 (14:15 -0700)]
fix osdmaptool clitests

13 years agoMerge branch 'wip-pools'
Sage Weil [Tue, 25 Oct 2011 21:02:42 +0000 (14:02 -0700)]
Merge branch 'wip-pools'

13 years agomon: reencode routed messages
Sage Weil [Tue, 25 Oct 2011 17:52:06 +0000 (10:52 -0700)]
mon: reencode routed messages

The message encoding may depend on the target features.  Clear the
payload so that the Message gets reencoded appropriately.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMOSDMap: reencode full map embedded in Incremental, as needed
Sage Weil [Tue, 25 Oct 2011 17:51:21 +0000 (10:51 -0700)]
MOSDMap: reencode full map embedded in Incremental, as needed

The Incremental may have a bufferlist containing a full map; reencode
that too if we are reencoding for old clients.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge remote-tracking branch 'gh/wip-rbd-tool'
Sage Weil [Tue, 25 Oct 2011 17:13:44 +0000 (10:13 -0700)]
Merge remote-tracking branch 'gh/wip-rbd-tool'

13 years agomon: fix rare races with pool updates
Sage Weil [Mon, 24 Oct 2011 18:41:29 +0000 (11:41 -0700)]
mon: fix rare races with pool updates

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: parse 0 values properly
Sage Weil [Mon, 24 Oct 2011 18:41:13 +0000 (11:41 -0700)]
mon: parse 0 values properly

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge remote branch 'gh/wip-osd-queue'
Sage Weil [Tue, 25 Oct 2011 05:51:15 +0000 (22:51 -0700)]
Merge remote branch 'gh/wip-osd-queue'

13 years agoosd: fix last_complete adjustment after recovering an object
Sage Weil [Mon, 24 Oct 2011 20:55:29 +0000 (13:55 -0700)]
osd: fix last_complete adjustment after recovering an object

After we recover each object, we try to raise the last_complete value
(and matching complete_to iterator).  If our log was purely a backlog, this
won't necessarily end up bringing last_complete all the way up to the
last_update value, and we'll fail an assert later.

If complete_to does reach the end of the log, then we fast-forward
last_complete to last_update.

The crash we were hitting was in finish_recovery(), and looked something
like

osd/PG.cc: In function 'void PG::finish_recovery(ObjectStore::Transaction&, std::list<Context*, std::allocator<Context*> >&)', in thread '0x7f4573df7700'
osd/PG.cc: 1800: FAILED assert(info.last_complete == info.last_update)
 ceph version 0.36-251-g6e29c28 (commit:6e29c2826066a7723ed05b60b8ac0433a04c3c13)
 1: (PG::finish_recovery(ObjectStore::Transaction&, std::list<Context*, std::allocator<Context*> >&)+0x8d) [0x6ff0ed]
 2: (PG::RecoveryState::Active::react(PG::RecoveryState::ActMap const&)+0x316) [0x729196]
 3: (boost::statechart::simple_state<PG::RecoveryState::Active, PG::RecoveryState::Primary, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*)+0x21b) [0x759c0b]
 4: (boost::statechart::state_machine<PG::RecoveryState::RecoveryMachine, PG::RecoveryState::Initial, std::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&)+0x8d) [0x7423dd]
 5: (PG::RecoveryState::handle_activate_map(PG::RecoveryCtx*)+0x183) [0x711f43]
 6: (OSD::activate_map(ObjectStore::Transaction&, std::list<Context*, std::allocator<Context*> >&)+0x674) [0x579884]
 7: (OSD::handle_osd_map(MOSDMap*)+0x2270) [0x57bd50]
 8: (OSD::_dispatch(Message*)+0x4d0) [0x596bb0]
 9: (OSD::ms_dispatch(Message*)+0x17b) [0x59803b]
 10: (SimpleMessenger::dispatch_entry()+0x9c2) [0x617562]
 11: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x4a3dec]
 12: (Thread::_entry_func(void*)+0x12) [0x611a92]
 13: (()+0x7971) [0x7f457f87b971]
 14: (clone()+0x6d) [0x7f457e10b92d]

Fixes: #1609
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: fix generate_past_intervals maybe_went_rw on oldest interval
Sage Weil [Sun, 23 Oct 2011 06:07:10 +0000 (23:07 -0700)]
osd: fix generate_past_intervals maybe_went_rw on oldest interval

We stop working backwards when we hit last_epoch_clean, which means for the
oldest interval first_epoch may not be the _real_ first_epoch.  (We can't
continue working backward because we may have thrown out those maps
entirely.)

However, if the last_epoch_clean epoch is contained within that interval,
we know that the OSD did in fact go rw because it had to have completed
recovery (and thus peering) to set last_clean_epoch in the first place.

This fixes cases where two different nodes have slightly different
past intervals, generate different prior probe sets as a result, and
flip/flop on the acting set choice.  (It may have eventually resolved when
the wrongly excluded node's notify races and arrives in time to be
considered, but that's still clearly no good.)

This does leave the start epoch for that oldest interval incorrect.  That
doesn't currently matter except that it's confusing, but I'm not sure how
to mark it properly, or if it's worth the effort.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: MOSDPGNotify: print prettier
Sage Weil [Sun, 23 Oct 2011 05:43:33 +0000 (22:43 -0700)]
osd: MOSDPGNotify: print prettier

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: print useful debug info from choose_acting
Sage Weil [Sun, 23 Oct 2011 05:43:21 +0000 (22:43 -0700)]
osd: print useful debug info from choose_acting

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: make proc_replica_log missing dump include useful information
Sage Weil [Fri, 21 Oct 2011 16:57:52 +0000 (09:57 -0700)]
osd: make proc_replica_log missing dump include useful information

I needed to see have/need to debug a weird unfound issue turned up by
thrashing.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: fix/simplify op discard checks
Sage Weil [Tue, 25 Oct 2011 05:21:43 +0000 (22:21 -0700)]
osd: fix/simplify op discard checks

Use a helper to determine when we should discard an op due to the client
being disconnected.  Use this when the op is first received, (re)queued,
and dequeued.

Fix the check to keep ops that are replayed ACKs, as we should make every
effort to reapply those even when the client goes away.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: move queue checks into enqueue_op, kill _handle_ helpers
Sage Weil [Tue, 25 Oct 2011 05:13:59 +0000 (22:13 -0700)]
osd: move queue checks into enqueue_op, kill _handle_ helpers

This simplifies things, and renames the checks to make it clear that we are
doing validation checks only, with no side-effects allowed.

Also move some checks into the parent handle_op() to further simplify the
(re)queue checks.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: move op cap check into helper
Sage Weil [Tue, 25 Oct 2011 04:59:49 +0000 (21:59 -0700)]
osd: move op cap check into helper

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: drop useless PG hooks
Sage Weil [Tue, 25 Oct 2011 04:48:50 +0000 (21:48 -0700)]
osd: drop useless PG hooks

These no longer need to be exposed to the generic OSD code.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: drop ability to disable op queue entirely
Sage Weil [Tue, 25 Oct 2011 04:46:56 +0000 (21:46 -0700)]
osd: drop ability to disable op queue entirely

This is pretty useless, and broken wrt requeueing anyway.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: handle missing/degraded in op thread
Sage Weil [Tue, 25 Oct 2011 04:44:36 +0000 (21:44 -0700)]
osd: handle missing/degraded in op thread

The _handle_op() method (and friends) are called when an op is initially
queued and when it is requeued.  In the requeue case we have to be more
careful because the caller may be in the middle of doing all sorts of
random stuff.  That means we need to limit ourselves to queueing or
discarding the op, and refrain from doing anything else with dangerous
side effects.

This fixes a crash like

osd/ReplicatedPG.cc: In function 'void ReplicatedPG::recover_primary_got(hobject_t, eversion_t)', in thread '7f21d0189700'
osd/ReplicatedPG.cc: 4109: FAILED assert(missing.num_missing() == 0)
 ceph version 0.37-105-gc2069eb (commit:c2069eb1e562ba7d753c9b5ce5c904f4f5ef6abe)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x76) [0x8ab95a]
 2: (ReplicatedPG::recover_primary_got(hobject_t, eversion_t)+0x62e) [0x767eea]
 3: (ReplicatedPG::sub_op_push(MOSDSubOp*)+0x2b79) [0x76abeb]
 4: (ReplicatedPG::do_sub_op(MOSDSubOp*)+0x1ab) [0x74761b]
 5: (OSD::dequeue_op(PG*)+0x47d) [0x820ac3]
 6: (OSD::OpWQ::_process(PG*)+0x27) [0x82cc8b]

due to an object being pushed to a replica before it is activated.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: set reqid on push/pull ops
Sage Weil [Tue, 25 Oct 2011 03:54:26 +0000 (20:54 -0700)]
osd: set reqid on push/pull ops

Not strictly necessary, but makes logs easier to follow.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomon: remove compatset cruft
Sage Weil [Mon, 24 Oct 2011 04:21:33 +0000 (21:21 -0700)]
mon: remove compatset cruft

The CompatSet is built on demand; it's no longer static.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoFileStore: ignore EEXIST on clones and collection creation !btrfs_snap
Samuel Just [Mon, 24 Oct 2011 23:49:45 +0000 (16:49 -0700)]
FileStore: ignore EEXIST on clones and collection creation !btrfs_snap

We need to ignore EEXIST on btrfs also when m_filestore_btrfs_snap is
disabled.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoReplicatedPG: fix snapshot directory handling in snap_trimmer
Samuel Just [Mon, 24 Oct 2011 17:40:38 +0000 (10:40 -0700)]
ReplicatedPG: fix snapshot directory handling in snap_trimmer

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agorgw: fix rgw_obj compare function
Yehuda Sadeh [Mon, 24 Oct 2011 23:43:14 +0000 (16:43 -0700)]
rgw: fix rgw_obj compare function

13 years agorgw: use a uint64_t instead of a size_t for storing the size
Greg Farnum [Mon, 24 Oct 2011 22:22:35 +0000 (15:22 -0700)]
rgw: use a uint64_t instead of a size_t for storing the size

librados uses uint64_t so that 32-bit architectures aren't hobbled.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoworkunits: test rbd python bindings
Josh Durgin [Mon, 24 Oct 2011 20:55:49 +0000 (13:55 -0700)]
workunits: test rbd python bindings

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agorbd.py: update python bindings for new copy interface
Josh Durgin [Mon, 24 Oct 2011 19:57:53 +0000 (12:57 -0700)]
rbd.py: update python bindings for new copy interface

It was changed to return 0 on success in d7f7a213546b599d2eec4c6617593d232b43a7d6

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agolibrados: use stored snap context for all operations
Josh Durgin [Mon, 24 Oct 2011 19:38:01 +0000 (12:38 -0700)]
librados: use stored snap context for all operations

Using an empty snap context led to the failure of
test_rbd.TestImage.test_rollback_with_resize, since clones weren't
created when deleting objects. This test now passes.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agolibrbd: resize if necessary before rolling back
Josh Durgin [Mon, 24 Oct 2011 19:36:03 +0000 (12:36 -0700)]
librbd: resize if necessary before rolling back

This is a partial fix for test_rbd.TestImage.test_rollback_with_resize

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agotest_rbd: add a test for rolling back after resizing
Josh Durgin [Mon, 24 Oct 2011 19:33:59 +0000 (12:33 -0700)]
test_rbd: add a test for rolling back after resizing

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agolibrbd: propagate error from snap_set
Josh Durgin [Fri, 21 Oct 2011 21:33:30 +0000 (14:33 -0700)]
librbd: propagate error from snap_set

Previously rbd_snap_set always returned 0, even when the snapshot did
not exist.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoworkunits: add rbd rollback and snapshot removal tests
Josh Durgin [Fri, 21 Oct 2011 21:16:30 +0000 (14:16 -0700)]
workunits: add rbd rollback and snapshot removal tests

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agorbd: remove unnecessary condition
Josh Durgin [Fri, 21 Oct 2011 21:13:25 +0000 (14:13 -0700)]
rbd: remove unnecessary condition

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoclitests: add rbd usage and invalid snap usage tests
Josh Durgin [Fri, 21 Oct 2011 20:28:30 +0000 (13:28 -0700)]
clitests: add rbd usage and invalid snap usage tests

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoworkunit: check that rbd info returns the right size for snapshots
Josh Durgin [Fri, 21 Oct 2011 20:25:48 +0000 (13:25 -0700)]
workunit: check that rbd info returns the right size for snapshots

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agolibrbd: show correct size for snapshots
Josh Durgin [Fri, 21 Oct 2011 20:11:46 +0000 (13:11 -0700)]
librbd: show correct size for snapshots

header.size is the current size of the image.
ImageCtx::get_image_size() already does the right thing for
snapshots.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agorbd: let all commands use the pool/image@snapshot format
Josh Durgin [Fri, 21 Oct 2011 20:07:33 +0000 (13:07 -0700)]
rbd: let all commands use the pool/image@snapshot format

This way you aren't forced to use '-p' or '--snap' to specify a pool
or snapshot for some commands.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agorbd: specify which commands take --snap in usage
Josh Durgin [Tue, 18 Oct 2011 23:51:36 +0000 (16:51 -0700)]
rbd: specify which commands take --snap in usage

Maybe this will be less confusing.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agorbd: check command before opening the image
Josh Durgin [Tue, 18 Oct 2011 23:50:08 +0000 (16:50 -0700)]
rbd: check command before opening the image

Now map/unmap won't use librbd, and commands that don't take --snap
will give an error when it's used.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agotest/osd: Add TestReadWrite
Samuel Just [Mon, 24 Oct 2011 18:42:30 +0000 (11:42 -0700)]
test/osd: Add TestReadWrite

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agomon: allow adjustment of per-pool crash_replay_interval
Sage Weil [Mon, 24 Oct 2011 18:27:20 +0000 (11:27 -0700)]
mon: allow adjustment of per-pool crash_replay_interval

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge branch 'rgw-dir-cleanup'
Greg Farnum [Mon, 24 Oct 2011 17:12:50 +0000 (10:12 -0700)]
Merge branch 'rgw-dir-cleanup'

13 years agorgw: fix check_disk_state; add a strip_namespace function.
Greg Farnum [Thu, 20 Oct 2011 23:26:15 +0000 (16:26 -0700)]
rgw: fix check_disk_state; add a strip_namespace function.

Use copies of the IoCtx rather than references so that
we can set locators without breaking stuff, and make use of the
on-disk locators which we just added.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agorgw: add locators to the directory objects, and functions handling them
Greg Farnum [Fri, 21 Oct 2011 00:00:12 +0000 (17:00 -0700)]
rgw: add locators to the directory objects, and functions handling them

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agorgw: rename translate_raw_obj to translate_raw_obj_to_obj_in_ns
Greg Farnum [Thu, 20 Oct 2011 20:49:45 +0000 (13:49 -0700)]
rgw: rename translate_raw_obj to translate_raw_obj_to_obj_in_ns

And document it. Because the naming is so bad that neither I nor
the author noticed it wasn't doing what we wanted it to until I ran
a test and it failed.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agolibrados: behave if shutdown is called twice
Sage Weil [Mon, 24 Oct 2011 04:07:39 +0000 (21:07 -0700)]
librados: behave if shutdown is called twice

On failure, we shut ourselves down.  If the caller calls shutdown again,
don't crash.

Fixes: #1650
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomon: need to print pool id for output to be useful
Sage Weil [Mon, 24 Oct 2011 04:05:56 +0000 (21:05 -0700)]
mon: need to print pool id for output to be useful

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: PGMap::dump: fix order in totals
Sage Weil [Mon, 24 Oct 2011 03:40:21 +0000 (20:40 -0700)]
mon: PGMap::dump: fix order in totals

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: make osd dump slightly more concise
Sage Weil [Mon, 24 Oct 2011 02:01:54 +0000 (19:01 -0700)]
osd: make osd dump slightly more concise

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: pg_pool_t: set crash_replay_interval on data pool when decoding old
Sage Weil [Sun, 23 Oct 2011 23:16:03 +0000 (16:16 -0700)]
osd: pg_pool_t: set crash_replay_interval on data pool when decoding old

We want to preserve the crash_replay_interval on old clusters being
upgraded.  Kludge this by setting it to 60 (the old default) if the
crush_ruleset == 0 and owner == 0, which is normally true for just the
data pool.

This may catch other pools they created by hand, but it's still better
than having the replay interval for all pools when it is not needed.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: make osd replay interval a per-pool property
Sage Weil [Sun, 23 Oct 2011 22:32:58 +0000 (15:32 -0700)]
osd: make osd replay interval a per-pool property

Change the config value to only control the interval set when the data
pool is first created (presumably during mkfs).  Start replay interval
based on the pool property.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge remote-tracking branch 'gh/master' into n
Sage Weil [Sun, 23 Oct 2011 23:26:35 +0000 (16:26 -0700)]
Merge remote-tracking branch 'gh/master' into n

Conflicts:
src/osd/OSDMap.h

13 years agoosd: pg_pool_t: introduce flags, crash_replay_interval
Sage Weil [Thu, 20 Oct 2011 04:54:40 +0000 (21:54 -0700)]
osd: pg_pool_t: introduce flags, crash_replay_interval

Introduce a per-pool crash_replay_interval so we can control whether
the OSD waits for replayed ACKed but not COMMITted requests for this
PG.  For the metadata and rbd pools, for instance, the replay window
is useless.

Introduce a generic flags field, while we're modifying the encoding.

No new feature bit; piggyback on POOL3.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: pg_pool_t: normalize encoding
Sage Weil [Thu, 20 Oct 2011 04:47:50 +0000 (21:47 -0700)]
osd: pg_pool_t: normalize encoding

Normalize encoding to be less awkward.  Use a FEATURE bit to indicate
whether the new encoding is supported, and encode appropriately.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoscratchtool[pp]: fix rados_conf_set/get test of log_to_stderr
Sage Weil [Sun, 23 Oct 2011 03:44:05 +0000 (20:44 -0700)]
scratchtool[pp]: fix rados_conf_set/get test of log_to_stderr

Fix this warning

warning: scratchtool.c:142: comparison with string literal results in unspecified behavior

and flips the logic.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: fix PG::Log::copy_after wrt backlogs (again)
Sage Weil [Sun, 23 Oct 2011 03:41:03 +0000 (20:41 -0700)]
osd: fix PG::Log::copy_after wrt backlogs (again)

Commit 68fe748fc2d703623050e8f2a448a0fd31ca8a0f fixed half of this problem,
but set this->tail incorrectly.  If we read olog.tail, the entry we are
on is a backlog entry, and probably not other.tail.  Do not reset tail in
this case because we already set it to other.tail above.

OTOH if we hit v, we do want to set this->tail to the current record as it
is the one that precedes the first log entry.

This fixes an incorrect log.tail send to other nodes, which eventually
propagates as a log bound mismatch.  For example,

2011-10-22 17:33:18.654693 7f8a2fefe700 osd.4 2788 pg[1.1f( v 1627'28 (1627'28,1627'28] n=2 ec=1 les/c 2763/2782 2788/2788/2788) [4,0] r=0 mlcod 0'0 !hml peering] merge_log log(578'5,1627'28] from osd.0 into log(1627'28,1627'28]
2011-10-22 17:33:18.654706 7f8a2fefe700 osd.4 2788 pg[1.1f( v 1627'28 (1627'28,1627'28] n=2 ec=1 les/c 2763/2782 2788/2788/2788) [4,0] r=0 mlcod 0'0 !hml peering] merge_log extending tail to 578'5
2011-10-22 17:33:18.654720 7f8a2fefe700 osd.4 2788 pg[1.1f( v 1627'28 (578'5,1627'28] n=2 ec=1 les/c 2763/2782 2788/2788/2788) [4,0] r=0 (log bound mismatch, empty) mlcod 0'0 !hml peering] merge_log result log(578'5,1627'28] missing(0) changed=1

This might fix #1526.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoradosgw: drop useless/broken set_val daemonize
Sage Weil [Fri, 21 Oct 2011 23:36:08 +0000 (16:36 -0700)]
radosgw: drop useless/broken set_val daemonize

Not sure what the intent was here anyway... but it is broken (the func
takes a string, not a bool).

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoconfig: separate --log-to-stderr and --err-to-stderr
Sage Weil [Fri, 21 Oct 2011 23:35:36 +0000 (16:35 -0700)]
config: separate --log-to-stderr and --err-to-stderr

Instead of having magic values (1 == errors only to stderr, 2 =
everything), have two booleans.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agorgw: fix xattrs cache
Yehuda Sadeh [Fri, 21 Oct 2011 23:14:11 +0000 (16:14 -0700)]
rgw: fix xattrs cache

13 years agoosd: trim past intervals when we complete recovery.
Sage Weil [Fri, 21 Oct 2011 22:24:18 +0000 (15:24 -0700)]
osd: trim past intervals when we complete recovery.

We weren't trimming at all, which meant these would just accumulate
indefinitely.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: move may_need_replay calculation out of PriorSet
Sage Weil [Fri, 21 Oct 2011 22:23:51 +0000 (15:23 -0700)]
osd: move may_need_replay calculation out of PriorSet

Although they both depend on past intervals, they are unrelated.  Factor
out the may_need_replay calculation from PriorSet.  Instead, do it right
before we activate when we need to decide whether to do a replay window
or not.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: fix last_clean interval bounds
Sage Weil [Fri, 21 Oct 2011 22:02:34 +0000 (15:02 -0700)]
osd: fix last_clean interval bounds

It was _first and _last, inclusive, but the epochs are really points in
time, so _last should have been non-inclusive.  Rename the variables
_begin and _end, print them as proper intervals [begin,end), and fix the
PriorSet calculation to interpret the end bound properly.

Also break that check out into separate cases so that it is clear what is
really happening.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: fix last_clean_interval calculation
Sage Weil [Fri, 21 Oct 2011 21:45:59 +0000 (14:45 -0700)]
mon: fix last_clean_interval calculation

This up_rom == first check is old and wrong.  It may have been correct at
the time, when the OSD had a defined shutdown procedure, but that is not
currently the case.  And if/when it is, the OSD can simply provide an
accurate clean_thru value.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: eliminate CRASHED state
Sage Weil [Fri, 21 Oct 2011 21:44:56 +0000 (14:44 -0700)]
osd: eliminate CRASHED state

This was an intermediate state that indicated that replay would be needed.
It was poorly named, and not very useful.  Instead, just set the REPLAY
bit if we need replay, and then do it.  No need for a separate CRASHED.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoReplicatedPG: Include pg version in MOSDOpReply on error
Samuel Just [Fri, 21 Oct 2011 22:14:43 +0000 (15:14 -0700)]
ReplicatedPG: Include pg version in MOSDOpReply on error

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agorgw: reduce rados bucket stats (and getxattrs)
Yehuda Sadeh [Fri, 21 Oct 2011 20:23:40 +0000 (13:23 -0700)]
rgw: reduce rados bucket stats (and getxattrs)

we didn't pass the context, and some other issue with the context map

13 years agorgw: object removal should remove object from index anyway
Yehuda Sadeh [Fri, 21 Oct 2011 17:32:54 +0000 (10:32 -0700)]
rgw: object removal should remove object from index anyway

even if object doesn't exist. Index might have the wrong info.

13 years agoosd: simplify finalizing scrub on replica
Sage Weil [Fri, 21 Oct 2011 16:56:19 +0000 (09:56 -0700)]
osd: simplify finalizing scrub on replica

We can simply call osr.flush() (with pg lock held) to ensure that prior
writes are visible and scrubbable.  This avoids the funky handoff to
op_applied() (which didn't seem to work for me just now, although I didn't
fully debug.

In any case, this is much simpler.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: PriorSet: acting/up membership implies still alive
Sage Weil [Fri, 21 Oct 2011 16:14:15 +0000 (09:14 -0700)]
osd: PriorSet: acting/up membership implies still alive

If the osd is in the acting or up sets, we can assume they are still alive,
even though we don't know that for sure, because if they are not, we will
rebuild PriorSet.

Note that we have a dependency here on up_thru that we could/should rebuild
PriorSet based on, IF we think it might change the value of the CRASHED
flag and IF we care enough.  Right now we don't.  Marking CRASHED when we
don't need to is conservative, and not dangerous.

Signed-off-by: Sage Weil <sage@newdream.net>