]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
13 years agolibrbd: check for cache flush errors
Josh Durgin [Wed, 16 May 2012 19:41:27 +0000 (12:41 -0700)]
librbd: check for cache flush errors

Return errors from flushing to the caller. Warn
if an error occurs during invalidation, but don't retry,
since the higher level handles these cases, namely:

* rollback (doing this with an image open is asking for trouble)
* shrink (doing this with writes in flight may create extra objects anyway)
* shutdown (qemu flushes before closing the device)

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
13 years agoObjectCacher: handle write errors
Josh Durgin [Tue, 15 May 2012 22:21:50 +0000 (15:21 -0700)]
ObjectCacher: handle write errors

If a write error occurs, mark the BufferHead dirty again, and
pass the return value to the completion. This makes flushing
return the write error, if one occurs, since the flush callback
is passed as the write callback.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
13 years agoObjectCacher: propagate read errors to the caller
Josh Durgin [Tue, 15 May 2012 17:58:59 +0000 (10:58 -0700)]
ObjectCacher: propagate read errors to the caller

Previously the return value of a read operation was ignored.  Now a
read error sets the error field, and changes the BufferHead to a new
error state. Error state BufferHeads are treated as misses so they can
be retried when requested by a user of the ObjectCacher.  When _readx
is called again internally, they're treated as hits so the error can
be returned to the user.

The error value is ignored if the BufferHead is not in the error
state.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
13 years agolibrados: avoid overflow in the return value of reads
Josh Durgin [Wed, 16 May 2012 20:40:51 +0000 (13:40 -0700)]
librados: avoid overflow in the return value of reads

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
13 years agoObjectCacher: only perfcount reads requested by the client
Josh Durgin [Wed, 16 May 2012 20:40:43 +0000 (13:40 -0700)]
ObjectCacher: only perfcount reads requested by the client

_readx is called again after each bh is read by C_RetryRead. This
resulted in the read being counted many times for the internal
caller that was just checking whether it was done yet.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
13 years agocephfs: pass -1 for old preferred_osd field
Sage Weil [Sun, 13 May 2012 03:03:53 +0000 (20:03 -0700)]
cephfs: pass -1 for old preferred_osd field

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoutime_t: no double ctor
Sage Weil [Thu, 10 May 2012 17:09:30 +0000 (10:09 -0700)]
utime_t: no double ctor

error: os/FileJournal.h:48:51: call of overloaded ‘utime_t(int)’ is ambiguous

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoobjectcacher: make *_max_dirty_age tunables; pass to ctor
Sage Weil [Tue, 8 May 2012 23:19:51 +0000 (16:19 -0700)]
objectcacher: make *_max_dirty_age tunables; pass to ctor

This replaces the hard-coded 1 second writeback timer.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agolibrbd: set cache defaults to 32/24/16 mb
Sage Weil [Tue, 8 May 2012 23:08:22 +0000 (16:08 -0700)]
librbd: set cache defaults to 32/24/16 mb

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge branch 'wip-rbd-wt'
Sage Weil [Tue, 8 May 2012 23:04:12 +0000 (16:04 -0700)]
Merge branch 'wip-rbd-wt'

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
13 years agotest_filestore_workloadgen: name the Mutex variable
Sage Weil [Tue, 8 May 2012 04:42:51 +0000 (21:42 -0700)]
test_filestore_workloadgen: name the Mutex variable

This is for interpreting lockdep reports.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoMerge remote branch 'gh/wip-wrkldgen-throughput'
Sage Weil [Tue, 8 May 2012 04:41:46 +0000 (21:41 -0700)]
Merge remote branch 'gh/wip-wrkldgen-throughput'

13 years agoworkloadgen: time tracking using ceph's utime_t's instead of timevals.
Joao Eduardo Luis [Tue, 8 May 2012 02:37:07 +0000 (19:37 -0700)]
workloadgen: time tracking using ceph's utime_t's instead of timevals.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
13 years agoworkloadgen: forcing the user to specify a data and journal.
Joao Eduardo Luis [Mon, 7 May 2012 23:20:43 +0000 (16:20 -0700)]
workloadgen: forcing the user to specify a data and journal.

These default arguments, although handy when we just want to run the test,
just mess things up when we don't actually need them. If we don't specify
them on the CLI, we'll end up using the default ones, and that is just
annoying.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
13 years agoworkloadgen: add option to specify the max number of in-flight txs.
Joao Eduardo Luis [Mon, 7 May 2012 22:28:18 +0000 (15:28 -0700)]
workloadgen: add option to specify the max number of in-flight txs.

Use '--test-max-in-flight VAL' (default: 50) or check '--help' for more.
Also, allow the test to work even if we don't specify a conf file.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
13 years agoworkloadgen: Allow finer control over what the generator does.
Joao Eduardo Luis [Fri, 4 May 2012 23:42:26 +0000 (00:42 +0100)]
workloadgen: Allow finer control over what the generator does.

  Allow the user to have more control on:
    - the sizes of the data being written by the operations;
    - which operations are suppressed from execution;
    - view the throughput;
    - specify the periodicity of throughput output.

For the CLI options, '--help' should suffice.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
13 years agoMerge branch 'wip-rgw-bench'
Sage Weil [Mon, 7 May 2012 22:57:31 +0000 (15:57 -0700)]
Merge branch 'wip-rgw-bench'

Conflicts:
debian/rules

13 years agolibs3: trailing / does strange things to EXTRA_DIST
Sage Weil [Mon, 7 May 2012 18:16:43 +0000 (11:16 -0700)]
libs3: trailing / does strange things to EXTRA_DIST

drwxr-xr-x 1031/1031         0 2012-05-07 11:15 ceph-0.46/src/libs3/inc/
drwxr-xr-x 1031/1031         0 2012-05-04 15:28 ceph-0.46/src/libs3/inc/inc/
-rw-r--r-- 1031/1031      2343 2012-05-04 15:28 ceph-0.46/src/libs3/inc/inc/simplexml.h

etc.  Freaking autotools!

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge remote-tracking branch 'gh/wip-osd-peering'
Sage Weil [Mon, 7 May 2012 16:25:12 +0000 (09:25 -0700)]
Merge remote-tracking branch 'gh/wip-osd-peering'

Reviewed-by: Sam Just <sam.just@inktank.com>
13 years agoMakefile: drop librgw.so unittests
Sage Weil [Sun, 6 May 2012 21:52:25 +0000 (14:52 -0700)]
Makefile: drop librgw.so unittests

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoceph.spec: kill librgw
Sage Weil [Sun, 6 May 2012 21:50:39 +0000 (14:50 -0700)]
ceph.spec: kill librgw

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agodebian: kill librgw.so
Sage Weil [Sun, 6 May 2012 21:50:30 +0000 (14:50 -0700)]
debian: kill librgw.so

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: reset last_peering_interval on replica activate
Sage Weil [Sun, 6 May 2012 21:18:22 +0000 (14:18 -0700)]
osd: reset last_peering_interval on replica activate

There was a silent bug in the activate 'acks' that go from the replica back
to the primary.  Prior to 86aa07d7a91ac23074e76551c3a6db3a5736cffa, we
were passing same_interval_since to the callback, which mean that
sometimes _activate_committed() would ignore it and we wouldn't update
last_epoch_started.  This was mosty invisible; the next peering event would
just, in some cases, look at more past intervals than it needed to.

In 86aa07d7a91ac23074e76551c3a6db3a5736cffa we fixed this so that the check
is correct.  (We noticed because now we aren't setting the pg CLEAN flag
until after last_epoch_started is updated.)  That, in turn, revealed a
similar bug that we're fixing here: the replica's last_peering_reset could
be lower than the primary's, such that the activate 'ack' info is ignored.

To fix this, simply set last_peering_reset to the current epoch when the
replica activates; this will always be greater than the primary's.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agolibs3: dist and distdir make targets
Sage Weil [Sun, 6 May 2012 20:23:14 +0000 (13:23 -0700)]
libs3: dist and distdir make targets

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoMakefile: include libs3/ contents in dist tarball
Sage Weil [Sun, 6 May 2012 20:22:40 +0000 (13:22 -0700)]
Makefile: include libs3/ contents in dist tarball

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoMakefile: osdc/Journaler is only used by the mds
Sage Weil [Sun, 6 May 2012 19:53:15 +0000 (12:53 -0700)]
Makefile: osdc/Journaler is only used by the mds

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoMakefile: librgw.la -> librgw.a; and use it
Sage Weil [Sun, 6 May 2012 19:48:30 +0000 (12:48 -0700)]
Makefile: librgw.la -> librgw.a; and use it

The various rgw tools were all recompiling my_libradosgw_src files over
again.  Instead build a single .a (not .la!) and link that in.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoMakefile: libos.la -> libos.a
Sage Weil [Sun, 6 May 2012 16:32:46 +0000 (09:32 -0700)]
Makefile: libos.la -> libos.a

There is a -laio associated with this, so use a var instead of referring to
it by name.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMakefile: libosd.la -> libosd.a
Sage Weil [Sun, 6 May 2012 16:23:25 +0000 (09:23 -0700)]
Makefile: libosd.la -> libosd.a

Faster build.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMakefile: libmon.la -> libmon.a
Sage Weil [Sun, 6 May 2012 16:22:24 +0000 (09:22 -0700)]
Makefile: libmon.la -> libmon.a

Builds >2x as fast.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agolibs3: added 'make check' target
Sage Weil [Sun, 6 May 2012 15:34:01 +0000 (08:34 -0700)]
libs3: added 'make check' target

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agodebian: build-depend on libxml2-dev
Sage Weil [Sun, 6 May 2012 15:31:11 +0000 (08:31 -0700)]
debian: build-depend on libxml2-dev

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoobjectcacher: make cache sizes explicit
Sage Weil [Sat, 5 May 2012 21:34:54 +0000 (14:34 -0700)]
objectcacher: make cache sizes explicit

Make ObjectCacher users specify the cache size for each ObjectCacher
instances.  This avoids the confusing config namespace for the object
cache (client_oc_*), and also will make it possible to eventually have
cache sizes that vary between (say) RBD images.

- drop unused client_oc_max_sync_write
- add rbd_cache_max_size, max_dirty, target_dirty config values (these are
  the defaults for each image)

We probably want to add librbd calls to specify the cache size on a
per-image basis?  Alternatively, we should make it possible to share a
cache pool between multiple images in some explicit way.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoobjectcacher: delete unused onfinish from flush_set
Sage Weil [Sat, 5 May 2012 03:17:26 +0000 (20:17 -0700)]
objectcacher: delete unused onfinish from flush_set

Once upon a time the caller would do this, but none of those have survived,
and this makes more sense.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoobjectcacher: explicit write-thru mode
Sage Weil [Fri, 4 May 2012 21:11:59 +0000 (14:11 -0700)]
objectcacher: explicit write-thru mode

If the max_dirty config is 0, switch to write-thru mode, which will
explicitly flush and wait on the range we just dirtied.

Closes: #2335
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agocommon: add C_Cond
Sage Weil [Sat, 5 May 2012 03:23:07 +0000 (20:23 -0700)]
common: add C_Cond

Similar to C_SafeCond, but assume finisher already holds the relevant lock.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoobjectcacher: user helper to get starting point in buffer map
Sage Weil [Sat, 5 May 2012 23:31:57 +0000 (16:31 -0700)]
objectcacher: user helper to get starting point in buffer map

A common pattern is to search for the first buffer intersecting or
following an object offset.  Use a helper for that.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoobjectcacher: flush range, set
Sage Weil [Sat, 5 May 2012 23:25:26 +0000 (16:25 -0700)]
objectcacher: flush range, set

Add ability to flush a range of an object, or a vector of ObjectExtents.  Flush
any buffers that intersect the specified range, or the entire object if len==0.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: add safety checks for 'mds rm <gid>' command
Sage Weil [Sat, 5 May 2012 20:21:37 +0000 (13:21 -0700)]
mon: add safety checks for 'mds rm <gid>' command

- make sure the gid exists
- only remove it if it's inactive (state < 0)

Fixes: #2188
Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoosd: do not mark pg clean until active is durable
Sage Weil [Sat, 5 May 2012 18:24:57 +0000 (11:24 -0700)]
osd: do not mark pg clean until active is durable

Do not mark a PG CLEAN or set last_epoch_clean until after the PG activate
is stable on all replicas.

This effectively means that last_epoch_clean will never fall in an interval
that follows last_epoch_started's interval.  It *can* be >
last_epoch_started when it falls within the same interval.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: check against last_peering_reset in _activate_committed
Sage Weil [Sat, 5 May 2012 20:07:06 +0000 (13:07 -0700)]
osd: check against last_peering_reset in _activate_committed

We are checking against last_peering_reset in _activate_committed(), so we
need to pass in that value to compare against; last_peering_reset may be
greater than same_interval_since, e.g. on a replica that learns about the
PG after the initial creation epoch.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: tweak slow request warnings
Sage Weil [Sat, 5 May 2012 18:49:10 +0000 (11:49 -0700)]
osd: tweak slow request warnings

- always include 'slow request' in the warning string
- only summarize if we warn about anything (they all may have backed off)
- be more concise

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agokeyring: clean up error output
Sage Weil [Sat, 5 May 2012 17:13:41 +0000 (10:13 -0700)]
keyring: clean up error output

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agokeyring: catch key decode errors
Sage Weil [Sat, 5 May 2012 17:13:28 +0000 (10:13 -0700)]
keyring: catch key decode errors

Return EINVAL on decoding errors.

Other decode_base64() callers are already guarded.

Fixes: #2124
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agodebian: depend on uuid-runtime
Sage Weil [Sat, 5 May 2012 17:03:56 +0000 (10:03 -0700)]
debian: depend on uuid-runtime

We use uuidgen for osd creation.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agosafe_io: int -> ssize_t
Sage Weil [Sat, 5 May 2012 17:01:44 +0000 (10:01 -0700)]
safe_io: int -> ssize_t

int is 32-bit on 64-bit archs, but ssize_t is 64-bits.  This fixes overflow
when reading large (>2GB) extends.

Fixes: #2275
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoobjectcacher: wait directly from writex()
Sage Weil [Sat, 5 May 2012 03:31:01 +0000 (20:31 -0700)]
objectcacher: wait directly from writex()

This gives us access to the original ObjectExtent (useful later), and
simplifies the callers.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: fix call to get_uuid() on non-existant osd
Sage Weil [Fri, 4 May 2012 23:02:00 +0000 (16:02 -0700)]
mon: fix call to get_uuid() on non-existant osd

Didn't catch this with vstart.sh testing.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agodebian: add rules for rest-bench
Yehuda Sadeh [Tue, 1 May 2012 17:06:19 +0000 (10:06 -0700)]
debian: add rules for rest-bench

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agorest-bench: build conditionally
Yehuda Sadeh [Mon, 30 Apr 2012 20:55:16 +0000 (13:55 -0700)]
rest-bench: build conditionally

added configure --with-rest-bench, and configure --with-system-libs3

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agoobj_bencher: changed interface
Yehuda Sadeh [Fri, 27 Apr 2012 00:03:29 +0000 (17:03 -0700)]
obj_bencher: changed interface

Passing bufferlist and not const bufferlist in aio_write(). We assign
it to another object which is not const, and it doesn't work too
well.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agorest-bench: change thread context for libs3 calls
Yehuda Sadeh [Fri, 27 Apr 2012 00:01:36 +0000 (17:01 -0700)]
rest-bench: change thread context for libs3 calls

Apparently S3_put_object() and S3_get_object() need to
run on the same thread as S3_runall_request_context() (at least
per context). So We now call them in the workqueue thread.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agorest-bench: change command line arg for seconds
Yehuda Sadeh [Thu, 26 Apr 2012 23:53:36 +0000 (16:53 -0700)]
rest-bench: change command line arg for seconds

seconds should be a param, not a command.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agoobj_bencher: fix data encoding
Yehuda Sadeh [Thu, 26 Apr 2012 23:47:27 +0000 (16:47 -0700)]
obj_bencher: fix data encoding

There was a bug when doing a read with multiple threads, when
one of the threads was left behind; when it returned the compared
data string might have been cluttered by newer strings that
were longer.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agoobj_bencher: use better round robin for completion slot scan
Yehuda Sadeh [Thu, 26 Apr 2012 21:35:39 +0000 (14:35 -0700)]
obj_bencher: use better round robin for completion slot scan

Start where left last time, don't start from zero.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agorest-bench: reuse libs3 handle
Yehuda Sadeh [Thu, 26 Apr 2012 06:45:42 +0000 (23:45 -0700)]
rest-bench: reuse libs3 handle

This is necessary for keep-alive to be useful. Otherwise a new
connection will be created for each request.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agoobj_bencher: fix param order
Yehuda Sadeh [Thu, 26 Apr 2012 06:40:46 +0000 (23:40 -0700)]
obj_bencher: fix param order

seq benchmark was broken, passed params in wrong order.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agorest-bench: use refcount for req_state life cycle
Yehuda Sadeh [Thu, 26 Apr 2012 03:06:42 +0000 (20:06 -0700)]
rest-bench: use refcount for req_state life cycle

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agorest-bench: multiple fixes
Yehuda Sadeh [Wed, 25 Apr 2012 20:53:11 +0000 (13:53 -0700)]
rest-bench: multiple fixes

write seems to work

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agorest-bench: cleanups, initialization
Yehuda Sadeh [Wed, 25 Apr 2012 18:02:41 +0000 (11:02 -0700)]
rest-bench: cleanups, initialization

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agorest-bench: create workqueue for requests dispatching
Yehuda Sadeh [Wed, 25 Apr 2012 07:30:48 +0000 (00:30 -0700)]
rest-bench: create workqueue for requests dispatching

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agorest_bench: cleanups, implement get and put
Yehuda Sadeh [Wed, 25 Apr 2012 06:56:51 +0000 (23:56 -0700)]
rest_bench: cleanups, implement get and put

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agorest_bench: some more implementation
Yehuda Sadeh [Tue, 24 Apr 2012 20:45:27 +0000 (13:45 -0700)]
rest_bench: some more implementation

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agorest_bench: initial work
Yehuda Sadeh [Sat, 21 Apr 2012 00:36:00 +0000 (17:36 -0700)]
rest_bench: initial work

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agorados_bencher: abstract away rados specific operations
Yehuda Sadeh [Fri, 20 Apr 2012 21:55:42 +0000 (14:55 -0700)]
rados_bencher: abstract away rados specific operations

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agorados_bencher -> obj_bencher
Yehuda Sadeh [Fri, 20 Apr 2012 19:57:20 +0000 (12:57 -0700)]
rados_bencher -> obj_bencher

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agorados_bencher: fix build
Yehuda Sadeh [Thu, 19 Apr 2012 23:44:59 +0000 (16:44 -0700)]
rados_bencher: fix build

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agorados_bencher: restructure code, create RadosBencher class
Yehuda Sadeh [Mon, 26 Mar 2012 21:53:37 +0000 (14:53 -0700)]
rados_bencher: restructure code, create RadosBencher class

Preparing for different benchmark backend.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agorados_bencher: restructure code (initial work)
Yehuda Sadeh [Fri, 20 Apr 2012 00:01:48 +0000 (17:01 -0700)]
rados_bencher: restructure code (initial work)

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agolibrados: call safe callback on read operation
Sage Weil [Fri, 4 May 2012 22:26:33 +0000 (15:26 -0700)]
librados: call safe callback on read operation

This avoids confusion for the user who isn't sure if they should wait for
complete or safe on a read aio.  It also means that you can always wait
for safe for both reads or writes, which can simplify some code.

Dup the roundtrip functional tests to verify this works.

Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Yehuda Sadeh <yehuda.sadeh@inktank.com>
13 years agocrush: note that tree bucket size is tree size, not item count
Sage Weil [Fri, 4 May 2012 21:51:15 +0000 (14:51 -0700)]
crush: note that tree bucket size is tree size, not item count

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge remote-tracking branch 'gh/wip-crush-forcefeed'
Sage Weil [Fri, 4 May 2012 21:16:49 +0000 (14:16 -0700)]
Merge remote-tracking branch 'gh/wip-crush-forcefeed'

Reviewed-by: Sam Just <sam.just@inktank.com>
13 years agoOpRequest: ignore all ops while the oldest one is still young.
Sage Weil [Fri, 4 May 2012 21:15:59 +0000 (14:15 -0700)]
OpRequest: ignore all ops while the oldest one is still young.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
Reviewed-by: Sage Weil <sage@newdream.net>
13 years agoobjectcacher: don't wait for write waiters; wait after dirtying
Sage Weil [Fri, 4 May 2012 20:12:58 +0000 (13:12 -0700)]
objectcacher: don't wait for write waiters; wait after dirtying

We do three things here:

- Wait for the dirty limit to drop _after_ writing into the cache.  This
  means that an active thread can always provide its dirty data to the
  cache for potential writing without waiting (a small win).  It's also
  helpful later... (see below, and next commit)

- Don't wait for other waiters.  If another thread dirtying 1MB and is
  waiting for it, don't wait for them too.  This prevents two threads
  writing 1MB at a time with a limit of 1MB from serializing: both can
  dirty their 1MB and initiate a flush, and they once 1/2 of that has
  flushed one of them will be allowed to proceed.

- Update the flusher to add the dirty_waiting bytes to the amount to
  write so that the OPs will indeed be parallel.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agocrush: update_item() should pass an error back to the caller
Sage Weil [Fri, 4 May 2012 19:09:28 +0000 (12:09 -0700)]
crush: update_item() should pass an error back to the caller

If you give it a nonsensical loc, it will fail check_item_loc() (false) and
then error out on insert_item().

Reported-by: Sam Just <sam.just@inktank.com>
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agocrush: improve docs/comments for check_item_loc and insert_item semantics
Sage Weil [Fri, 4 May 2012 18:06:27 +0000 (11:06 -0700)]
crush: improve docs/comments for check_item_loc and insert_item semantics

We don't adjust the internal hierarchy structure (currently).  This is a
bit confusing, so describe the semantics in some detail.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agocrush: comment and clean up checks for check_item_loc and insert_item
Sage Weil [Fri, 4 May 2012 18:05:34 +0000 (11:05 -0700)]
crush: comment and clean up checks for check_item_loc and insert_item

- drop useless cur for check_item_loc
- comment the checks we're doing so the code is understandable
- use name_exists instead of broken get_item_id != 0 check

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge branch 'wip-crush-update'
Sage Weil [Fri, 4 May 2012 03:44:20 +0000 (20:44 -0700)]
Merge branch 'wip-crush-update'

Reviewed-by: Greg Farnum <greg@inktank.com>
13 years agoMerge branch 'wip-osd-uuid'
Sage Weil [Fri, 4 May 2012 03:43:54 +0000 (20:43 -0700)]
Merge branch 'wip-osd-uuid'

Reviewed-by: Greg Farnum <greg@inktank.com>
13 years agoMakefile: fix $shell_scripts substution
Sage Weil [Fri, 4 May 2012 03:40:20 +0000 (20:40 -0700)]
Makefile: fix $shell_scripts substution

No spaces here, apparently!

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomon: simplify 'osd create <uuid>' command
Sage Weil [Fri, 4 May 2012 02:47:55 +0000 (19:47 -0700)]
mon: simplify 'osd create <uuid>' command

Make the flow clearer for the three cases (exists, about to exist, new).

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agocrushtool: another simple test for update
Sage Weil [Fri, 4 May 2012 03:33:35 +0000 (20:33 -0700)]
crushtool: another simple test for update

If the weight doesn't change it should be a no-op.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agocrush: document return values
Sage Weil [Fri, 4 May 2012 03:28:27 +0000 (20:28 -0700)]
crush: document return values

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agocrush: compare fixed-point weights in update_item
Sage Weil [Fri, 4 May 2012 03:28:21 +0000 (20:28 -0700)]
crush: compare fixed-point weights in update_item

This is less ugly than converting the quantized value back to a float and
comparing that.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agothread: remove get_num_threads() static
Sage Weil [Fri, 4 May 2012 01:51:03 +0000 (18:51 -0700)]
thread: remove get_num_threads() static

This looks in /proc to count threads.  Kludgey and no longer needed.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
13 years agoglobal_init: do not count threads before daemonize()
Sage Weil [Fri, 4 May 2012 01:50:42 +0000 (18:50 -0700)]
global_init: do not count threads before daemonize()

We were verifying that there was only 1 thread (the presumably main()) when
we call daemonize.  However, with the new logging code, we stop a thread
right before the check, and /proc apparently updates asynchronously such
that our attempt to count running threads gives us a bad answer.

Just remove this kludgey check; we'll have to catch this class of bugs
the hard way.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
13 years agocrush: clean up check_item_loc() comments
Sage Weil [Fri, 4 May 2012 01:59:02 +0000 (18:59 -0700)]
crush: clean up check_item_loc() comments

Thanks Greg!

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoOpRequest: only show a small set of the oldest messages, instead of all.
Joao Eduardo Luis [Tue, 1 May 2012 18:55:30 +0000 (19:55 +0100)]
OpRequest: only show a small set of the oldest messages, instead of all.

Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com>
Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agorgw: update cache interface for put_obj_meta
Yehuda Sadeh [Thu, 3 May 2012 19:50:23 +0000 (12:50 -0700)]
rgw: update cache interface for put_obj_meta

This fixes issue #2381.
The method interface was different than the one needed in order
to override the one in RGWRados.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
13 years agodoc: fix some underscores
Sage Weil [Thu, 3 May 2012 19:26:17 +0000 (12:26 -0700)]
doc: fix some underscores

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: drop unused CEPH_OSDMAP*VERSION* #defines
Sage Weil [Thu, 3 May 2012 16:48:57 +0000 (09:48 -0700)]
osd: drop unused CEPH_OSDMAP*VERSION* #defines

It's easier to manage/rev/grok these inline.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge branch 'wip-doc-rebase-2'
Sage Weil [Thu, 3 May 2012 19:16:55 +0000 (12:16 -0700)]
Merge branch 'wip-doc-rebase-2'

13 years agoFixed link to blog.
John Wilkins [Thu, 3 May 2012 19:14:26 +0000 (12:14 -0700)]
Fixed link to blog.

Signed-off-by: John Wilkins <john.wilkins@dreamhost.com>
13 years agoFixed another link to the blog.
John Wilkins [Thu, 3 May 2012 18:49:22 +0000 (11:49 -0700)]
Fixed another link to the blog.

Signed-off-by: John Wilkins <john.wilkins@dreamhost.com>
13 years agoFixed link.
John Wilkins [Thu, 3 May 2012 18:42:50 +0000 (11:42 -0700)]
Fixed link.

Signed-off-by: John Wilkins <john.wilkins@dreamhost.com>
13 years agoClean up. Changed ceph.newdream.net to ceph.com.
John Wilkins [Thu, 3 May 2012 18:31:37 +0000 (11:31 -0700)]
Clean up. Changed ceph.newdream.net to ceph.com.
Removed {ARCH} references.
Added link to Source.

Signed-off-by: John Wilkins <john.wilkins@dreamhost.com>
13 years agodoc: more fonts
Sage Weil [Thu, 3 May 2012 18:25:12 +0000 (11:25 -0700)]
doc: more fonts

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agodoc: new theme
Ross Turk [Thu, 3 May 2012 18:02:51 +0000 (11:02 -0700)]
doc: new theme

Signed-off-by: Ross Turk <ross.turk@inktank.com>
13 years agodoc/install/debian: simplify more
Sage Weil [Thu, 3 May 2012 17:51:02 +0000 (10:51 -0700)]
doc/install/debian: simplify more

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agodoc/install: reorg, simplify
Sage Weil [Thu, 3 May 2012 17:45:08 +0000 (10:45 -0700)]
doc/install: reorg, simplify

Signed-off-by: Sage Weil <sage@newdream.net>