]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
13 years agocrush: remove parent maps
Sage Weil [Fri, 27 Apr 2012 05:12:40 +0000 (22:12 -0700)]
crush: remove parent maps

These were used (poorly) for forcefeeding, but they are useless now.  Which
is good, because we allow items to appear in multiple trees, which means
they have no single parent.  Good riddance!

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agocrush: remove forcefeed from crush_do_rule
Sage Weil [Fri, 27 Apr 2012 05:10:24 +0000 (22:10 -0700)]
crush: remove forcefeed from crush_do_rule

Remove forcefeed functionality from CRUSH.  This is an ugly misfeature that
is mostly useless.  Remove it.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoMerge remote branch 'gh/wip-filestore-misc'
Sage Weil [Sat, 28 Apr 2012 23:25:31 +0000 (16:25 -0700)]
Merge remote branch 'gh/wip-filestore-misc'

Conflicts:
src/test/filestore/run_seed_to.sh

13 years agoMerge remote branch 'gh/wip-2353'
Sage Weil [Sat, 28 Apr 2012 22:53:35 +0000 (15:53 -0700)]
Merge remote branch 'gh/wip-2353'

Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoMerge branch 'wip-osdmap'
Sage Weil [Sat, 28 Apr 2012 22:25:20 +0000 (15:25 -0700)]
Merge branch 'wip-osdmap'

Conflicts:
src/mon/PGMonitor.cc
src/osd/OSDMap.h

13 years agofix file_layout.sh layouts test
Sage Weil [Sat, 28 Apr 2012 21:52:56 +0000 (14:52 -0700)]
fix file_layout.sh layouts test

preferred_osd is not gone.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoMerge branch 'wip-mon'
Sage Weil [Sat, 28 Apr 2012 21:48:51 +0000 (14:48 -0700)]
Merge branch 'wip-mon'

Reviewed-by: Gregory Farnum <gregory.farnum@dreamhost.com>
13 years agomon: 'osd [un]set noin'
Sage Weil [Sat, 28 Apr 2012 21:48:26 +0000 (14:48 -0700)]
mon: 'osd [un]set noin'

Missed this one.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoMerge branch 'next'
Sage Weil [Sat, 28 Apr 2012 21:47:53 +0000 (14:47 -0700)]
Merge branch 'next'

13 years agoStop rebuild of libcommon.la on "make dist"
Dan Mick [Sat, 28 Apr 2012 01:04:34 +0000 (18:04 -0700)]
Stop rebuild of libcommon.la on "make dist"

Fixes: 2356
Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agomon: limit size of MOSDMap message sent as reply
Sage Weil [Fri, 27 Apr 2012 04:29:53 +0000 (21:29 -0700)]
mon: limit size of MOSDMap message sent as reply

We may send an MOSDMap as a reply to various requests, including

 - a failure report
 - a boot message
 - a pg_temp message
 - an up_thru message

In these cases, send a single MOSDMap message, but limit how big it gets.
All recipients here are osds, which are smart enough to request more maps
based on the MOSDMap::newest_map field.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoceph-object-corpus: revert rewind
Sage Weil [Sat, 28 Apr 2012 14:45:24 +0000 (07:45 -0700)]
ceph-object-corpus: revert rewind

From 92becb696bde7f0aa9687b2fe7505ed1ac9f493b

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosdmap: fix addr dedup check
Sage Weil [Sat, 28 Apr 2012 03:54:50 +0000 (20:54 -0700)]
osdmap: fix addr dedup check

Compare *every* address for a match, or else note that it is (or might be)
different.  Previously, we falsely took diff==0 to mean that all addrs
were definitely equal, which was not necessarily the case.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: fix bad map debug messages
Sage Weil [Sat, 28 Apr 2012 04:48:31 +0000 (21:48 -0700)]
osd: fix bad map debug messages

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoStop rebuild of libcommon.la on "make dist"
Dan Mick [Sat, 28 Apr 2012 01:04:34 +0000 (18:04 -0700)]
Stop rebuild of libcommon.la on "make dist"

Fixes: 2356
Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agofilestore: fix error message
Yehuda Sadeh [Fri, 27 Apr 2012 23:05:36 +0000 (16:05 -0700)]
filestore: fix error message

error message was misleading, fixing it.

Signed-off-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com>
13 years agofilestore: first lock osd mount point, next detect fs type
Yehuda Sadeh [Fri, 27 Apr 2012 22:46:49 +0000 (15:46 -0700)]
filestore: first lock osd mount point, next detect fs type

Fixes #2353. Problem was that there were (at least) two osd processes
that were racing for the fs detection, which triggered some errors
in the btrfs create/remove snapshot.

Signed-off-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com>
13 years agoOSD: use map bl cache pinning during handle_osd_map
Samuel Just [Fri, 27 Apr 2012 17:00:36 +0000 (10:00 -0700)]
OSD: use map bl cache pinning during handle_osd_map

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agosimple_cache.hpp: add pinning
Samuel Just [Fri, 27 Apr 2012 17:00:08 +0000 (10:00 -0700)]
simple_cache.hpp: add pinning

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoMerge branch 'next'
Samuel Just [Fri, 27 Apr 2012 21:00:09 +0000 (14:00 -0700)]
Merge branch 'next'

13 years agoFileJournal: simply flush by waiting for completions to empty
Samuel Just [Fri, 27 Apr 2012 04:29:45 +0000 (21:29 -0700)]
FileJournal: simply flush by waiting for completions to empty

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoPG: in GetInfo Notify handler, fix peer_info_requested filter
Samuel Just [Fri, 27 Apr 2012 18:25:19 +0000 (11:25 -0700)]
PG: in GetInfo Notify handler, fix peer_info_requested filter

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoMerge branch 'wip-lpg'
Sage Weil [Fri, 27 Apr 2012 04:57:23 +0000 (21:57 -0700)]
Merge branch 'wip-lpg'

Conflicts:
src/osd/OSDMap.h

13 years agoMerge branch 'next'
Sage Weil [Fri, 27 Apr 2012 04:53:36 +0000 (21:53 -0700)]
Merge branch 'next'

13 years agolibrados: test get/set of debug levels
Sage Weil [Fri, 27 Apr 2012 04:51:55 +0000 (21:51 -0700)]
librados: test get/set of debug levels

Also do some sanity checks on the subsystem log level settings.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoconfig: allow {get,set}_val on subsystem debug levels
Sage Weil [Fri, 27 Apr 2012 04:51:23 +0000 (21:51 -0700)]
config: allow {get,set}_val on subsystem debug levels

This mimics the allows you to get and set subsystem debug levels via the
normal config access methods.  Among other things, this allows librados
users to set debug levels.

Fixes: #2350
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoOSD.cc: track osdmap refs using an LRU
Samuel Just [Fri, 27 Apr 2012 00:58:59 +0000 (17:58 -0700)]
OSD.cc: track osdmap refs using an LRU

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agocommon/: added templated simple lru implementations
Samuel Just [Wed, 25 Apr 2012 23:58:33 +0000 (16:58 -0700)]
common/: added templated simple lru implementations

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoosdmap: dedup pg_temp
Sage Weil [Thu, 26 Apr 2012 18:12:11 +0000 (11:12 -0700)]
osdmap: dedup pg_temp

We only deal with the case where the entire map is identical, since the
individual items are too small to make the pointer overhead worthwhile.
Too bad.  A in-memory btree-like structure would work better for this.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosdmap: use shared_ptr<> for pg_temp
Sage Weil [Thu, 26 Apr 2012 18:01:06 +0000 (11:01 -0700)]
osdmap: use shared_ptr<> for pg_temp

This will let us dedup later.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: make map dedup optional
Sage Weil [Thu, 26 Apr 2012 22:50:27 +0000 (15:50 -0700)]
osd: make map dedup optional

On by default.  This trades CPU for memory.  Some might have unlimited RAM
and not care.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: dedup osdmaps when added to the in-memory cache
Sage Weil [Wed, 25 Apr 2012 23:40:11 +0000 (16:40 -0700)]
osd: dedup osdmaps when added to the in-memory cache

When we add an OSDMap to our in-memory cache, dedup against an existing map
at a nearby epoch.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosdmap: drop obsolete PG_ROLE_* constants
Sage Weil [Wed, 25 Apr 2012 23:22:14 +0000 (16:22 -0700)]
osdmap: drop obsolete PG_ROLE_* constants

There are cruft from the old primary/chain/splay replication code.  All
current code says <0 is stray, 0 is primary, and >0 is replica.  That is,
the role is the acting vector position, or -1 if not in the vector.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agobuffer: make contents_equal() more efficient
Sage Weil [Wed, 25 Apr 2012 23:10:30 +0000 (16:10 -0700)]
buffer: make contents_equal() more efficient

Iterate both lists in parallel in terms of buffers, and use memcmp() to
do the comparison.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosdmap: dedup crush map
Sage Weil [Wed, 25 Apr 2012 23:01:32 +0000 (16:01 -0700)]
osdmap: dedup crush map

If the encoded crush map is identical between two versions, share the
reference.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosdmap: use shared_ptr for CrushWrapper
Sage Weil [Wed, 25 Apr 2012 22:59:24 +0000 (15:59 -0700)]
osdmap: use shared_ptr for CrushWrapper

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosdmaptool: kludge to load a range of maps into memory
Sage Weil [Wed, 25 Apr 2012 22:44:24 +0000 (15:44 -0700)]
osdmaptool: kludge to load a range of maps into memory

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosdmap: dedup addrs and addr vectors between maps
Sage Weil [Wed, 25 Apr 2012 22:44:06 +0000 (15:44 -0700)]
osdmap: dedup addrs and addr vectors between maps

Compare two maps.  If an addrs matches, share the reference.  If all
addrs match, share the entire vector.

This leads to roughly 70% drop in memory utilization for the set of
thrashed maps I'm working with.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge branch 'next'
Josh Durgin [Fri, 27 Apr 2012 00:54:56 +0000 (17:54 -0700)]
Merge branch 'next'

13 years agoosdmap: filter out nonexistent osds from map
Sage Weil [Fri, 27 Apr 2012 00:42:12 +0000 (17:42 -0700)]
osdmap: filter out nonexistent osds from map

It is possible that the crush map contains device ids that do not exist as
osds.  Filter them out of the CRUSH result.

Drop the max devices assert, as that is trivially violated by adding a new
item to the crush map beyond max_osd (via 'ceph osd crush add ...').

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agolibrbd: the length argument of aio_discard should be uint64_t
Josh Durgin [Fri, 27 Apr 2012 00:41:27 +0000 (17:41 -0700)]
librbd: the length argument of aio_discard should be uint64_t

size_t was accidentally copy-pasted.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agofilestore: interprect any fiemap error as EOPNOTSUPP
Sage Weil [Fri, 27 Apr 2012 00:17:32 +0000 (17:17 -0700)]
filestore: interprect any fiemap error as EOPNOTSUPP

On 2.6.32-5-amd64 (debian) and XFS I'm getting EINVAL.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosdmap: use shared_ptr for addrs, addr vectors
Sage Weil [Wed, 25 Apr 2012 22:10:34 +0000 (15:10 -0700)]
osdmap: use shared_ptr for addrs, addr vectors

We share a lot of identical addresses between map versions because they
don't tend to change very often.  Instead of having a separate copy for
every map, use shared_ptr and share references.  Also use a reference for
the entire addr vector(s) in case no addrs differ at all.

Create new encode/decode macros for vector< shared_ptr<T> >.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosdmap: uninline a bunch of stuff
Sage Weil [Wed, 25 Apr 2012 21:55:18 +0000 (14:55 -0700)]
osdmap: uninline a bunch of stuff

This will conflict with wip-lpg, yay.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: consider pending_inc in {up,in}_ratio for can_mark_{out,down}()
Sage Weil [Thu, 26 Apr 2012 23:45:56 +0000 (16:45 -0700)]
mon: consider pending_inc in {up,in}_ratio for can_mark_{out,down}()

Consider pending changes when calculating the current up/in ratios.  Among
other things, this will make the marking of osds down->out stop once it
hits the min in ratio.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: thrash pg_temp mapping, too
Sage Weil [Wed, 25 Apr 2012 20:25:30 +0000 (13:25 -0700)]
mon: thrash pg_temp mapping, too

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agofilestore: fix a journal replay issue with collection_add()
Joao Eduardo Luis [Thu, 26 Apr 2012 23:31:55 +0000 (00:31 +0100)]
filestore: fix a journal replay issue with collection_add()

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
13 years agofilestore: fix a journal replay issue with collection_add()
Joao Eduardo Luis [Thu, 26 Apr 2012 23:31:55 +0000 (00:31 +0100)]
filestore: fix a journal replay issue with collection_add()

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
13 years agoosd: filter osds removed from probe set from peer_info_requested
Sage Weil [Wed, 25 Apr 2012 20:07:34 +0000 (13:07 -0700)]
osd: filter osds removed from probe set from peer_info_requested

Peef_info_requested should be a strict subset of the probe set.  Filter
osds that are dropped from probe from peer_info_requested.  We could also
restart peering from scratch here, but this is less expensive, because we
don't have to re-probe everyone.

Once we adjust the probe and peer_info_requested sets, (re)check if we're
done: we may have been blocedk on a previous peer_info_requested entry.

The situation I saw was:

  "recovery_state": [
        { "name": "Started\/Primary\/Peering\/GetInfo",
          "enter_time": "2012-04-25 14:39:56.905748",
          "requested_info_from": [
                { "osd": 193}]},
        { "name": "Started\/Primary\/Peering",
          "enter_time": "2012-04-25 14:39:56.905748",
          "probing_osds": [
                79,
                191,
                195],
          "down_osds_we_would_probe": [],
          "peering_blocked_by": []},
        { "name": "Started",
          "enter_time": "2012-04-25 14:39:56.905742"}]}

Once in this state, cycling osd.193 doesn't help, because the prior_set
is not affected.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoMerge branch 'next'
Samuel Just [Thu, 26 Apr 2012 22:53:27 +0000 (15:53 -0700)]
Merge branch 'next'

13 years agoPG: get_infos() should not post GotInfo
Samuel Just [Thu, 26 Apr 2012 22:44:21 +0000 (15:44 -0700)]
PG: get_infos() should not post GotInfo

The MNotifyRec handler also posts GotInfo under the same conditions
after calling get_infos().

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoRevert "PG: whitelist MNotifyRec in started"
Samuel Just [Thu, 26 Apr 2012 22:38:42 +0000 (15:38 -0700)]
Revert "PG: whitelist MNotifyRec in started"

This reverts commit 9579365720818125a4b15741ae65e58948b9c69f.

13 years agotest_librbd: rollback when mapped to a snapshot should fail
Josh Durgin [Thu, 26 Apr 2012 18:33:56 +0000 (11:33 -0700)]
test_librbd: rollback when mapped to a snapshot should fail

Rollback is effectively a write, and returns -EROFS when mapped to a
snapshot since 3ef3ab8a15b4a80a340ac6039f395738223df759.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoworkload_generator: get rid of our lock.
Joao Eduardo Luis [Thu, 26 Apr 2012 19:19:59 +0000 (20:19 +0100)]
workload_generator: get rid of our lock.

We don't need the lock in the WorkloadGenerator class. Everything that does
need a lock is handled by TestFileStoreState, and all that remains can be
handled by an atomic_t.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
13 years agoTestFileStoreState: make 'm_in_flight' var an atomic_t.
Joao Eduardo Luis [Thu, 26 Apr 2012 19:18:28 +0000 (20:18 +0100)]
TestFileStoreState: make 'm_in_flight' var an atomic_t.

This allows us to increase, decrease and retrieve its value without the
need to lock the class.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
13 years agoMerge branch 'next'
Samuel Just [Thu, 26 Apr 2012 17:51:34 +0000 (10:51 -0700)]
Merge branch 'next'

13 years agoPG: whitelist MNotifyRec in started
Samuel Just [Thu, 26 Apr 2012 17:39:04 +0000 (10:39 -0700)]
PG: whitelist MNotifyRec in started

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoRefCountedObject: fix constructor warning
Samuel Just [Thu, 26 Apr 2012 17:38:45 +0000 (10:38 -0700)]
RefCountedObject: fix constructor warning

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoworkload_generator: specify number of ops to run, or 0 to run forever.
Joao Eduardo Luis [Thu, 26 Apr 2012 16:00:45 +0000 (17:00 +0100)]
workload_generator: specify number of ops to run, or 0 to run forever.

New option '--test-num-ops VAL' -- if (VAL == 0) then run forever; fi

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
13 years agoworkload_generator: Delegate store tracking to TestFileStoreState.
Joao Eduardo Luis [Thu, 26 Apr 2012 15:58:38 +0000 (16:58 +0100)]
workload_generator: Delegate store tracking to TestFileStoreState.

We had a lot of duplicate code between the WorkloadGenerator and the
TestFileStoreState classes, and the last one is far more versatile than
what we initially had in the WorkloadGenerator. Therefore, delegate
everything we can to the TestFileStoreState class.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
13 years agoTestFileStoreState: Fix issues affecting proper behavior when inherited.
Joao Eduardo Luis [Thu, 26 Apr 2012 15:29:50 +0000 (16:29 +0100)]
TestFileStoreState: Fix issues affecting proper behavior when inherited.

Fix wait_for_ready() and make the C_OnFinished class' member variables
protected instead of private (to allow proper inheritance).

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
13 years agoMakefile.am: test_filestore_workloadgen doesn't need gtests lib.
Joao Eduardo Luis [Thu, 26 Apr 2012 15:26:43 +0000 (16:26 +0100)]
Makefile.am: test_filestore_workloadgen doesn't need gtests lib.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
13 years agoMerge branch 'wip-2342'
Yehuda Sadeh [Wed, 25 Apr 2012 22:23:34 +0000 (15:23 -0700)]
Merge branch 'wip-2342'

13 years agoRefCountedObject: relocate from msg/Message.h to common/RefCountedObj.h
Yehuda Sadeh [Wed, 25 Apr 2012 22:12:56 +0000 (15:12 -0700)]
RefCountedObject: relocate from msg/Message.h to common/RefCountedObj.h

Following a popular request.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agolibrados: call notification under different thread context
Yehuda Sadeh [Tue, 24 Apr 2012 22:51:14 +0000 (15:51 -0700)]
librados: call notification under different thread context

This fixes #2342. We shouldn't call notify on the dispatcher
context. We should also make sure that we don't hold
the client lock while waiting for the responses.
Also, pushed the client_lock locking into the
ctx->notify().

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agomon: 'osd thrash <num epochs>'
Sage Weil [Wed, 25 Apr 2012 16:23:49 +0000 (09:23 -0700)]
mon: 'osd thrash <num epochs>'

Thrash the osdmap for N iterations.  Randomly mark OSDs up, down, in, out,
and up_thru in order to generate a difficult osdmap history for peering
to chew through.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: filter osds removed from probe set from peer_info_requested
Sage Weil [Wed, 25 Apr 2012 20:07:34 +0000 (13:07 -0700)]
osd: filter osds removed from probe set from peer_info_requested

Peef_info_requested should be a strict subset of the probe set.  Filter
osds that are dropped from probe from peer_info_requested.  We could also
restart peering from scratch here, but this is less expensive, because we
don't have to re-probe everyone.

Once we adjust the probe and peer_info_requested sets, (re)check if we're
done: we may have been blocedk on a previous peer_info_requested entry.

The situation I saw was:

  "recovery_state": [
        { "name": "Started\/Primary\/Peering\/GetInfo",
          "enter_time": "2012-04-25 14:39:56.905748",
          "requested_info_from": [
                { "osd": 193}]},
        { "name": "Started\/Primary\/Peering",
          "enter_time": "2012-04-25 14:39:56.905748",
          "probing_osds": [
                79,
                191,
                195],
          "down_osds_we_would_probe": [],
          "peering_blocked_by": []},
        { "name": "Started",
          "enter_time": "2012-04-25 14:39:56.905742"}]}

Once in this state, cycling osd.193 doesn't help, because the prior_set
is not affected.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
13 years agomon: add 'mon osd min up ratio' and 'mon osd min in ratio'
Sage Weil [Wed, 25 Apr 2012 00:21:27 +0000 (17:21 -0700)]
mon: add 'mon osd min up ratio' and 'mon osd min in ratio'

Prevent the monitor from marking osds down or out when too many are already
in that state.  At this point the cluster is already broken and there is
little point in continuing to mark things down/out.

Setting these to 0 obviously disables the feature (by setting a minimum
of 0).

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: use can_mark_*() helpers
Sage Weil [Wed, 25 Apr 2012 18:15:34 +0000 (11:15 -0700)]
mon: use can_mark_*() helpers

So we can generalize beyond NO* flags.  We'll soon be adding other reasons
to not mark things up/down/in/out.  This lets us keep all though checks in
one place.

The helper methods will tell us why we can't do the thing (e.g., "NODOWN
flag is set").  The callers will generally tell us exactly what didn't
happen (e.g., "failure report of X ignored").

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoDeterministicOpSequence: add 'ceph_asserts()' where we expect != NULL.
Joao Eduardo Luis [Wed, 25 Apr 2012 15:27:41 +0000 (16:27 +0100)]
DeterministicOpSequence: add 'ceph_asserts()' where we expect != NULL.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
13 years agoTestFileStoreState: distinguish between 'get_coll()' and 'get_coll_at()'
Joao Eduardo Luis [Wed, 25 Apr 2012 15:26:32 +0000 (16:26 +0100)]
TestFileStoreState: distinguish between 'get_coll()' and 'get_coll_at()'

get_coll_at(int pos) should return the collection at the map's position
'pos', but 'pos' was being used as a map key. Therefore, we add a new
function 'get_coll(int key)' to mimic this behavior, and we make
'get_coll_at()' follow its intended behavior.

This patch may affect the test_filestore_idempotent_sequence tester, since
it uses the 'get_coll_at()' function a lot, and we changed this function's
behavior.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
13 years agorun_seed_to.sh: Add valgrind support.
Joao Eduardo Luis [Wed, 25 Apr 2012 14:35:03 +0000 (15:35 +0100)]
run_seed_to.sh: Add valgrind support.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
13 years agoTestFileStoreState: free memory on terminus.
Joao Eduardo Luis [Wed, 25 Apr 2012 14:34:39 +0000 (15:34 +0100)]
TestFileStoreState: free memory on terminus.

So far, it hasn't triggered any segfault, but I'm not yet convinced there
is no problem whatsoever.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
13 years agomon: decode old PGMap Incrementals differently from new ones
Greg Farnum [Tue, 24 Apr 2012 22:13:02 +0000 (15:13 -0700)]
mon: decode old PGMap Incrementals differently from new ones

We need to distinguish between the old 0 (meaning undefined) and
the new 0 (meaning switch to 0 and disable the flags). So rev the
encoding version on PGMap::Incremental, and if you decode an old
version with [near]full_ratio == 0, set the ratio to -1 instead. Then
when applying the Incremental interpret -1 as no change.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
13 years agomon: do not mark osds out if NOOUT flag is set
Sage Weil [Tue, 24 Apr 2012 22:46:49 +0000 (15:46 -0700)]
mon: do not mark osds out if NOOUT flag is set

Do not mark down osds out when NOOUT flag is set.  This is more or less
equivalent to setting a very long 'mon osd down out interval', but
reversible and less annoying.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: do not mark booting osds in if NOIN flag is set
Sage Weil [Tue, 24 Apr 2012 22:45:58 +0000 (15:45 -0700)]
mon: do not mark booting osds in if NOIN flag is set

If the NOIN osdmap flag is set, do not mark booting osds in.  Normally
we would for a range of reasons (always, new, auto-marked-out), but block
them all.

Do not limit manual 'ceph osd in N' commands.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: always remove booting osds from down_pending_out
Sage Weil [Tue, 24 Apr 2012 22:28:36 +0000 (15:28 -0700)]
mon: always remove booting osds from down_pending_out

The down_pending_out tracks OSDs that are down that we may want to
auto-mark out.  If an osd boots, it should be removed from this list
because it is no longer down; it doesn't matter whether it is marked in
or not.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: prevent osd mark-down with NODOWN flag
Sage Weil [Tue, 24 Apr 2012 21:28:18 +0000 (14:28 -0700)]
mon: prevent osd mark-down with NODOWN flag

If the NODOWN osdmap flag is set,

 - ignore osd failure reports
 - do not mark osds down due to lack of osd/pg stats

We *do* still allow explicit admin 'ceph osd down N' commands, and a
booting OSD to mark the previous instance of itself down.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: do not attempt to boot if NOUP
Sage Weil [Tue, 24 Apr 2012 21:22:10 +0000 (14:22 -0700)]
osd: do not attempt to boot if NOUP

If NOUP is set, do not send the boot message.

We already send onetime subscriptions to the osdmap, so we will find out
about osdmap flag changes.  If it is cleared later, we'll pass into
start_boot() and _got_boot_version() again and send it then.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: prevent osd from booting if NOUP
Sage Weil [Tue, 24 Apr 2012 21:16:42 +0000 (14:16 -0700)]
mon: prevent osd from booting if NOUP

Do not add an osd attempting to boot to the map if NOUP is sent.  Instead,
send it the latest osdmap so it knows that it's not allowed to boot.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: 'osd {set,unset} {noin,noout,noup,nodown}'
Sage Weil [Tue, 24 Apr 2012 03:33:48 +0000 (20:33 -0700)]
mon: 'osd {set,unset} {noin,noout,noup,nodown}'

Move the set/unset flag code into a helper, and also use that for the
pause/unpause commands.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosdmap: add NOUP, NODOWN, NOIN, NOOUT flags
Sage Weil [Tue, 24 Apr 2012 03:30:14 +0000 (20:30 -0700)]
osdmap: add NOUP, NODOWN, NOIN, NOOUT flags

These prevent OSDs from being marked up, down, in, or out, respectively.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoMerge remote branch 'origin/wip-rbd-snapid' into next
Josh Durgin [Tue, 24 Apr 2012 20:58:56 +0000 (13:58 -0700)]
Merge remote branch 'origin/wip-rbd-snapid' into next

Reviewed-by: Sage Weil <sage.weil@dreamhost.com>
13 years agolibrbd: pass errors removing head back to user
Sage Weil [Tue, 24 Apr 2012 16:43:44 +0000 (09:43 -0700)]
librbd: pass errors removing head back to user

In particular, the OSD may return EBUSY if there are still watchers.
Ignore ENOENT, as that may indicate we are cleaning up a previously
aborted removal.

Fixes: #2311
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomon: clean up handle_osd_timeouts a bit
Sage Weil [Tue, 24 Apr 2012 16:59:34 +0000 (09:59 -0700)]
mon: clean up handle_osd_timeouts a bit

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: fix pg stats timeout
Sage Weil [Tue, 24 Apr 2012 17:55:18 +0000 (10:55 -0700)]
mon: fix pg stats timeout

We clear out the osd entry when an osd goes up or down.  Thus, if we find
it missing from an up osd, we should start the timer.  Otherwise we get
behavior like this

2012-04-24 13:22:47.888291 7fa5bc587700 mon.peon5752@0(leader).osd e21633 OSDMonitor::handle_osd_timeouts: never got MOSDPGStat info from osd 521. Marking down!
2012-04-24 13:22:50.076394 7fa5bcd88700 log [INF] : osd.521 [2607:f298:4:2243::7088]:6806/53217 boot
2012-04-24 13:22:52.903558 7fa5bc587700 mon.peon5752@0(leader).osd e21638 OSDMonitor::handle_osd_timeouts: never got MOSDPGStat info from osd 521. Marking down!
2012-04-24 13:23:15.144532 7fa5bcd88700 log [INF] : osd.521 [2607:f298:4:2243::7088]:6806/53217 boot
2012-04-24 13:23:17.967118 7fa5bc587700 mon.peon5752@0(leader).osd e21663 OSDMonitor::handle_osd_timeouts: never got MOSDPGStat info from osd 521. Marking down!
2012-04-24 13:23:22.173778 7fa5bcd88700 log [INF] : osd.521 [2607:f298:4:2243::7088]:6806/53217 boot
2012-04-24 13:23:22.981556 7fa5bc587700 mon.peon5752@0(leader).osd e21668 OSDMonitor::handle_osd_timeouts: never got MOSDPGStat info from osd 521. Marking down!
2012-04-24 13:23:45.245380 7fa5bcd88700 log [INF] : osd.521 [2607:f298:4:2243::7088]:6806/53217 boot

when the pg stats message doesn't arrive quickly enough.

Fixes: #2341
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agomon: fix whitespace
Sage Weil [Tue, 24 Apr 2012 17:49:30 +0000 (10:49 -0700)]
mon: fix whitespace

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomon: fix pgmonitor ratio commands
Greg Farnum [Tue, 24 Apr 2012 17:30:43 +0000 (10:30 -0700)]
mon: fix pgmonitor ratio commands

The indices were set incorrectly when I whipped thi sup. That's what
you get for not testing nor being careful enough in review.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agotest_rbd: add tests for snap_set and more complicated resizing
Josh Durgin [Tue, 24 Apr 2012 15:35:43 +0000 (08:35 -0700)]
test_rbd: add tests for snap_set and more complicated resizing

* snap_set to a deleted (and recreated) snapshot
* resizing down (truncating) and back up
* resizing to non-object-aligned sizes

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agolibrbd: reset needs_refresh flag before re-reading header
Josh Durgin [Tue, 24 Apr 2012 06:46:51 +0000 (23:46 -0700)]
librbd: reset needs_refresh flag before re-reading header

This way we can't miss an update if we get a notify during ictx_refresh.
Specifically, a race like this:

Thread 1               Thread 2              Process 2

ictx_refresh()
read_header()
                                             snap_create()
                       notify()
                       need_refresh = true
process header...
need_refresh = false

If this happened, we would not re-read the header with the new
snapshot, so the snapshot would not happen at the intended point
in time, but only after we re-read the header again.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agolibrbd: clean up snapshot handling a bit
Josh Durgin [Tue, 24 Apr 2012 01:04:27 +0000 (18:04 -0700)]
librbd: clean up snapshot handling a bit

* snapid should determine whether our mapped snapshot is gone, not snapname
* snap_set(<nonexistent_snap>) shouldn't reset us to CEPH_NOSNAP
* snapname should be set before using the it in the perfcounter name
* snapname and image name don't need to be passed as arguments since an
  ImageCtx already contains that info
* ictx_check() doesn't need to check for non-existent snaps - only I/Os care,
  so check in check_io() instead

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agolibrbd: clarify handle_sparse_read condition
Josh Durgin [Mon, 23 Apr 2012 18:58:57 +0000 (11:58 -0700)]
librbd: clarify handle_sparse_read condition

The earlier condition is >. != means < at this point, and the nesting
is unnecessary.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agorun_seed_to.sh: rework the script, make it more flexible and broaden the tests.
Sage Weil [Mon, 23 Apr 2012 20:57:25 +0000 (13:57 -0700)]
run_seed_to.sh: rework the script, make it more flexible and broaden the tests.

Allow for '-h' and other options such as disabling the journal sync tests,
defining it is to be run on a btrfs FS, enabling exit on error (default is
now 'off'), and allow certain env variables to specify additional options
to each store.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
13 years agolibrbd: rev version for discard addition
Sage Weil [Tue, 24 Apr 2012 02:43:25 +0000 (19:43 -0700)]
librbd: rev version for discard addition

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosdmaptool: fix clitests for lack of localized pgs
Sage Weil [Fri, 20 Apr 2012 19:57:32 +0000 (12:57 -0700)]
osdmaptool: fix clitests for lack of localized pgs

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: load CompatSet features on startup
Sage Weil [Fri, 20 Apr 2012 19:55:35 +0000 (12:55 -0700)]
mon: load CompatSet features on startup

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: set auid for mon-created pools to 0
Sage Weil [Fri, 20 Apr 2012 19:53:41 +0000 (12:53 -0700)]
mon: set auid for mon-created pools to 0

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: ignore/remove localized pgs
Sage Weil [Fri, 20 Apr 2012 19:51:30 +0000 (12:51 -0700)]
mon: ignore/remove localized pgs

This will trigger on the next OSDMap update.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agotest_ioctls: remove preferred osd
Sage Weil [Fri, 20 Apr 2012 04:54:31 +0000 (21:54 -0700)]
test_ioctls: remove preferred osd

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agocephfs: remove preferred osd setting
Sage Weil [Fri, 20 Apr 2012 04:53:38 +0000 (21:53 -0700)]
cephfs: remove preferred osd setting

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>