]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
11 years agoosd/ReplicatedPG: track dirty, whiteout stat counts
Sage Weil [Tue, 17 Dec 2013 01:18:48 +0000 (17:18 -0800)]
osd/ReplicatedPG: track dirty, whiteout stat counts

These counts will be useful (even necessary!) for the cache agent, and are
generally interesting to the admin as well.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/osd_types: include num_objects_dirty, num_whiteouts in object_stat_sum_t
Sage Weil [Tue, 17 Dec 2013 00:03:56 +0000 (16:03 -0800)]
osd/osd_types: include num_objects_dirty, num_whiteouts in object_stat_sum_t

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: EBUSY on cache-evict when watchers are present
Sage Weil [Sat, 14 Dec 2013 00:39:02 +0000 (16:39 -0800)]
osd/ReplicatedPG: EBUSY on cache-evict when watchers are present

Linger operations will follow the object to the cache pool when the pool
overlay process is set.  If we evict the object, the object_info_t will
go away along with the watch state and confusing things will happen.
Prevent that from happening by returning EBUSY when you try to evict a
watched object.

Note that you *can* flush a watched object, and the dirty flag will be
cleared.  But you still can't evict it.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados: test cache_flush, cache_try_flush, cache_evict
Sage Weil [Thu, 12 Dec 2013 21:21:31 +0000 (13:21 -0800)]
ceph_test_rados: test cache_flush, cache_try_flush, cache_evict

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados_api_tier: fix HitSet* test names
Sage Weil [Tue, 17 Dec 2013 18:32:07 +0000 (10:32 -0800)]
ceph_test_rados_api_tier: fix HitSet* test names

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/osd_types: debug: include size in object_info_t operator<<
Sage Weil [Fri, 13 Dec 2013 21:40:01 +0000 (13:40 -0800)]
osd/osd_types: debug: include size in object_info_t operator<<

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: debug: clean up oi printout
Sage Weil [Fri, 13 Dec 2013 21:38:13 +0000 (13:38 -0800)]
osd/ReplicatedPG: debug: clean up oi printout

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: debug: add an assert for copy-get
Sage Weil [Thu, 12 Dec 2013 23:41:04 +0000 (15:41 -0800)]
osd/ReplicatedPG: debug: add an assert for copy-get

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: fix locking for promote
Sage Weil [Fri, 13 Dec 2013 21:41:58 +0000 (13:41 -0800)]
osd/ReplicatedPG: fix locking for promote

After we get the copy-from data and unblock the obc, we still need to take
the RWWRITE lock on the object for the duration of the repop while we
actually apply the change locally.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: fix user_version preservation for copy_from
Sage Weil [Thu, 12 Dec 2013 23:40:41 +0000 (15:40 -0800)]
osd/ReplicatedPG: fix user_version preservation for copy_from

In the process of fixing this for flush, we break promote, so we need to
adjust them both here.  Basic strategy: do not set user_modify, but handle
the user_version explicitly in the callbacks.

For copy_from, we don't have a clean way to pass the result through to
finish_copyfrom in do_osd_ops; do so by putting it in user_at_version. (If
we were to call finish_copyfrom directly from the callback this might
be simpler, but let's not go there right now.)

For promote, it is a trivial fix.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: handle ECANCELED in C_CopyFrom, C_Flush
Sage Weil [Thu, 12 Dec 2013 23:05:06 +0000 (15:05 -0800)]
osd/ReplicatedPG: handle ECANCELED in C_CopyFrom, C_Flush

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: uninline CopyFromCallback, PromoteCallback
Sage Weil [Thu, 12 Dec 2013 23:02:53 +0000 (15:02 -0800)]
osd/ReplicatedPG: uninline CopyFromCallback, PromoteCallback

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/osd_types: make object_info_t::dump() dump user_version
Sage Weil [Thu, 12 Dec 2013 22:27:57 +0000 (14:27 -0800)]
osd/osd_types: make object_info_t::dump() dump user_version

Backport: emperor
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/osd_types: include user_version in operator<< object_info_t
Sage Weil [Thu, 12 Dec 2013 21:50:43 +0000 (13:50 -0800)]
osd/osd_types: include user_version in operator<< object_info_t

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agovstart.sh: --cache <pool> to set up pool cache(s) on startup
Sage Weil [Thu, 12 Dec 2013 21:33:40 +0000 (13:33 -0800)]
vstart.sh: --cache <pool> to set up pool cache(s) on startup

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoqa/workunits/rados/test_cache_pool.sh: fixes
Sage Weil [Fri, 13 Dec 2013 22:17:14 +0000 (14:17 -0800)]
qa/workunits/rados/test_cache_pool.sh: fixes

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoqa/workunits/rados: rename cache pool tests
Sage Weil [Tue, 10 Dec 2013 17:58:21 +0000 (09:58 -0800)]
qa/workunits/rados: rename cache pool tests

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoqa/workunits/rados: test cache-{flush,evict,flush-evict-all}
Sage Weil [Tue, 10 Dec 2013 17:57:57 +0000 (09:57 -0800)]
qa/workunits/rados: test cache-{flush,evict,flush-evict-all}

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agorados: add cache-flush, cache-evict, cache-flush-evict-all commands
Sage Weil [Tue, 10 Dec 2013 00:56:09 +0000 (16:56 -0800)]
rados: add cache-flush, cache-evict, cache-flush-evict-all commands

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: implement cache-flush, cache-try-flush
Sage Weil [Fri, 25 Oct 2013 05:30:50 +0000 (22:30 -0700)]
osd/ReplicatedPG: implement cache-flush, cache-try-flush

Implement a rados operation that will flush a dirty object in the cache
tier by writing it back to the base tier.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd: make obc copyfrom blocking generic
Sage Weil [Wed, 18 Dec 2013 19:23:50 +0000 (11:23 -0800)]
osd: make obc copyfrom blocking generic

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agolibrados, osd: add flags to COPY_FROM
Sage Weil [Fri, 13 Dec 2013 21:35:25 +0000 (13:35 -0800)]
librados, osd: add flags to COPY_FROM

If we initiate a COPY_FROM as part of a FLUSH operation, we will need to
set a flag so that the read-side of the copy and join the existing
in-progress operation without taknig additional locks.

Similarly, we need to pass flags from the client indicating whether we
should ignore overlay or cache logic while performing the copy.  These are
used by the promote and flush logic.

Note that none of these flags are exposed through librados (at least not
at this time).

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: fix promote: set oi.size
Sage Weil [Fri, 13 Dec 2013 21:08:12 +0000 (13:08 -0800)]
osd/ReplicatedPG: fix promote: set oi.size

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/osd_types: fix operator<< on copy-get operation
Sage Weil [Fri, 13 Dec 2013 21:37:01 +0000 (13:37 -0800)]
osd/osd_types: fix operator<< on copy-get operation

This was missed in 15c8267e34aaba7a6d1d316b22519982a997f5a0.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados_api_tier: test undirty on non-existent object
Sage Weil [Mon, 9 Dec 2013 17:48:35 +0000 (09:48 -0800)]
ceph_test_rados_api_tier: test undirty on non-existent object

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: debug: improve maybe_handle_cache() handling
Sage Weil [Tue, 10 Dec 2013 17:52:03 +0000 (09:52 -0800)]
osd/ReplicatedPG: debug: improve maybe_handle_cache() handling

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: rename invalidate_forward
Sage Weil [Tue, 10 Dec 2013 00:34:15 +0000 (16:34 -0800)]
osd/ReplicatedPG: rename invalidate_forward

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados: debug: include exists|dne in update_object_version
Sage Weil [Sat, 7 Dec 2013 23:22:12 +0000 (15:22 -0800)]
ceph_test_rados: debug: include exists|dne in update_object_version

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados: test is_dirty, undirty
Sage Weil [Sat, 7 Dec 2013 22:56:10 +0000 (14:56 -0800)]
ceph_test_rados: test is_dirty, undirty

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados: fix CopyFromOp locking
Sage Weil [Sat, 7 Dec 2013 22:22:09 +0000 (14:22 -0800)]
ceph_test_rados: fix CopyFromOp locking

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agolibrados: seek during object iteration
Sage Weil [Thu, 10 Oct 2013 18:51:16 +0000 (11:51 -0700)]
librados: seek during object iteration

Add ability to reset iterator to a specific hash position.  For now, we
just truncate this to the current PG.  In the future, this may be more
precise.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoosdc/Objecter: remove honor_cache_redirects global flag
Sage Weil [Tue, 10 Dec 2013 17:50:01 +0000 (09:50 -0800)]
osdc/Objecter: remove honor_cache_redirects global flag

We can do this on a per-op basic with CEPH_OSD_FLAG_IGNORE_OVERLAY.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: use IGNORE_OVERLAY flag for copy-from
Sage Weil [Tue, 10 Dec 2013 17:48:19 +0000 (09:48 -0800)]
osd/ReplicatedPG: use IGNORE_OVERLAY flag for copy-from

No need to use the Objecter-wide setting now.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosdc/Objecter: add CEPH_OSD_FLAG_IGNORE_OVERLAY flag
Sage Weil [Fri, 13 Dec 2013 21:15:11 +0000 (13:15 -0800)]
osdc/Objecter: add CEPH_OSD_FLAG_IGNORE_OVERLAY flag

If the flag is set, send the op to the pool specified and ignore the
overlay.  Note that this obsoletes the global Objecter flag.

It also makes these EINVAL correctly:

  rados -p base cache-flush
  rados -p base cache-evict

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd: rename IGNORE_OVERLAY -> IGNORE_CACHE
Sage Weil [Fri, 13 Dec 2013 21:11:27 +0000 (13:11 -0800)]
osd: rename IGNORE_OVERLAY -> IGNORE_CACHE

This is about skipping cache logic, not the tier pool overlay property.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/osd_types: operator<< for ObjectContext::RWState
Sage Weil [Thu, 12 Dec 2013 20:33:44 +0000 (12:33 -0800)]
osd/osd_types: operator<< for ObjectContext::RWState

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: more verbose heading for process_copy_chunk
Sage Weil [Fri, 25 Oct 2013 05:23:51 +0000 (22:23 -0700)]
osd/ReplicatedPG: more verbose heading for process_copy_chunk

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: set ctx->obc in simple_repop_create
Sage Weil [Fri, 25 Oct 2013 05:23:22 +0000 (22:23 -0700)]
osd/ReplicatedPG: set ctx->obc in simple_repop_create

Strangely nobody hss needed this yet, but we will shortly.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: use finish_ctx for finish_promote
Sage Weil [Fri, 25 Oct 2013 04:45:50 +0000 (21:45 -0700)]
osd/ReplicatedPG: use finish_ctx for finish_promote

Use the common code here to avoid duplicating this logic.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: use get_next_version() in finish_promote
Sage Weil [Fri, 25 Oct 2013 04:40:09 +0000 (21:40 -0700)]
osd/ReplicatedPG: use get_next_version() in finish_promote

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: split off finish_ctx from execute_ctx
Sage Weil [Fri, 25 Oct 2013 04:38:30 +0000 (21:38 -0700)]
osd/ReplicatedPG: split off finish_ctx from execute_ctx

The second part of execute_ctx() is doing some somewhat generic work to
make the prepared updates in the ctx apply, updating the obc's cached
values.  Factor it out.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: add SKIPRWLOCKS flag
Sage Weil [Fri, 25 Oct 2013 04:35:45 +0000 (21:35 -0700)]
osd/ReplicatedPG: add SKIPRWLOCKS flag

Flush puts us in an conundrum:

 - the flush eventually writes, behaving like a write
 - writes take the write lock at the start
 - to flush, we send copy-from to the base pool, which does a copy-get on
   our object
 - the copy-get is a read, that blocks on the write.

This flag will allow an op to skip the initial locking step.  It will need
to take it later, of course.

Signed-off-by: Sage Weil <sage@inktank.com>
Conflicts:

src/osd/ReplicatedPG.cc

11 years agoosd/ReplicatedPG: be consistent about ctx->obs vs ctx->obc->obs
Sage Weil [Fri, 25 Oct 2013 03:47:17 +0000 (20:47 -0700)]
osd/ReplicatedPG: be consistent about ctx->obs vs ctx->obc->obs

Just for consistency (ctx->obs =- &ctx->obc->obs).

Signed-off-by: Sage Weil <sage@inktank.com>
Conflicts:

src/osd/ReplicatedPG.cc

11 years agoosd/ReplicatedPG: drop unnecessary temp vars in execute_ctx()
Sage Weil [Fri, 25 Oct 2013 03:44:30 +0000 (20:44 -0700)]
osd/ReplicatedPG: drop unnecessary temp vars in execute_ctx()

Both of these are pulled out of ctx->obs, which is not updated until the
very end; use that instead!

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: allow osds to issue writes to osds
Sage Weil [Fri, 25 Oct 2013 02:18:40 +0000 (19:18 -0700)]
osd/ReplicatedPG: allow osds to issue writes to osds

We asserted that the client was not an OSD years ago when we separated out
the client and cluster networks.  Now, we are about to allow an OSD to
trigger a copy_from on another pool (for cache flush) and the assert can
go away.  We've long since verified that the messages are going out on
the correct interfaces.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplcatedPG: maybe_handle_cache style
Sage Weil [Tue, 29 Oct 2013 04:29:09 +0000 (21:29 -0700)]
osd/ReplcatedPG: maybe_handle_cache style

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: skip promote for DELETE
Sage Weil [Wed, 23 Oct 2013 02:56:55 +0000 (19:56 -0700)]
osd/ReplicatedPG: skip promote for DELETE

If an op starts with DELETE there is no need to promote the old content
from the base tier.  Note that this only works if the FAILOK flag is
set.  Otherwise, we need to know whether the object existed or not to
return either 0 or -ENOENT.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: implement cache_evict
Sage Weil [Wed, 23 Oct 2013 02:41:27 +0000 (19:41 -0700)]
osd/ReplicatedPG: implement cache_evict

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agolibrados: add an aio_operate that takes a write and flags
Sage Weil [Wed, 23 Oct 2013 02:38:54 +0000 (19:38 -0700)]
librados: add an aio_operate that takes a write and flags

Until now you could only pass flags to read operations.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/osd_types: introduce helper for osd op flags -> string conversion
Greg Farnum [Tue, 12 Nov 2013 22:54:53 +0000 (14:54 -0800)]
osd/osd_types: introduce helper for osd op flags -> string conversion

Signed-off-by: Sage Weil <sage@inktank.com>
Conflicts:

src/osd/osd_types.h

11 years agolibrados, osd: add IGNORE_OVERLAY flag
Sage Weil [Wed, 23 Oct 2013 01:44:03 +0000 (18:44 -0700)]
librados, osd: add IGNORE_OVERLAY flag

Add a flag that will make the OSD bypass the cache overlay logic.  This is
needed in order to handle operations like CACHE_EVICT and CACHE_FLUSH.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agolibrados: add cache_flush(), cache_try_flus(), cache_evict() methods
Sage Weil [Wed, 23 Oct 2013 01:26:01 +0000 (18:26 -0700)]
librados: add cache_flush(), cache_try_flus(), cache_evict() methods

Not yet implemented by the OSD.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: set object_info and snapset xattrs on promote
Sage Weil [Wed, 23 Oct 2013 02:36:45 +0000 (19:36 -0700)]
osd/ReplicatedPG: set object_info and snapset xattrs on promote

For the normal write path, prepare_transaction() handles this for us.  In
this case, we need to do it explicitly.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: handle is_whiteout in do_osd_ops()
Sage Weil [Wed, 23 Oct 2013 01:06:41 +0000 (18:06 -0700)]
osd/ReplicatedPG: handle is_whiteout in do_osd_ops()

Most of the time we handle whiteouts by returning ENOENT before we even
get this far. However, for a mixed read/write transaction (e.g., a guard)
or certain ops (like create exclusive) we need to deal with the
exists == true and whiteout flag set case explicitly.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: clear whiteout when writing into cache tier
Sage Weil [Wed, 23 Oct 2013 01:02:25 +0000 (18:02 -0700)]
osd/ReplicatedPG: clear whiteout when writing into cache tier

If we have a whiteout object and then write over it, clear the whiteout
flag.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: set whiteout in cache pool on delete
Sage Weil [Wed, 23 Oct 2013 00:21:27 +0000 (17:21 -0700)]
osd/ReplicatedPG: set whiteout in cache pool on delete

If we delete an object in the cache pool, set the whiteout flag instead of
removing the on-disk object.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados_api_tier: verify delete creates whiteouts
Sage Weil [Wed, 23 Oct 2013 00:24:21 +0000 (17:24 -0700)]
ceph_test_rados_api_tier: verify delete creates whiteouts

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: ENOENT when deleting a whiteout
Sage Weil [Tue, 22 Oct 2013 23:30:26 +0000 (16:30 -0700)]
osd/ReplicatedPG: ENOENT when deleting a whiteout

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: create whiteout on promote ENOENT
Sage Weil [Tue, 22 Oct 2013 23:14:00 +0000 (16:14 -0700)]
osd/ReplicatedPG: create whiteout on promote ENOENT

If we try to fetch an object from the base tier and it is not present, we
can create a whiteout object.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados_api_tier: add simple promote-on-read test
Sage Weil [Wed, 23 Oct 2013 00:23:39 +0000 (17:23 -0700)]
ceph_test_rados_api_tier: add simple promote-on-read test

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados_api_tier: rename tests
Sage Weil [Tue, 22 Oct 2013 22:44:32 +0000 (15:44 -0700)]
ceph_test_rados_api_tier: rename tests

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: use simple_repop_{create,submit} for finish_promote
Sage Weil [Tue, 22 Oct 2013 22:12:38 +0000 (15:12 -0700)]
osd/ReplicatedPG: use simple_repop_{create,submit} for finish_promote

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: UNDIRTY is not a user_modify
Sage Weil [Sat, 7 Dec 2013 23:20:08 +0000 (15:20 -0800)]
osd/ReplicatedPG: UNDIRTY is not a user_modify

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: move r<0 handling into finish_promote()
Sage Weil [Tue, 22 Oct 2013 22:04:44 +0000 (15:04 -0700)]
osd/ReplicatedPG: move r<0 handling into finish_promote()

Let logic in header, and will let us handle ENOENT with a whiteout.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoworkunits: break down cache pool tests to be more precise; expand some
Greg Farnum [Tue, 15 Oct 2013 22:43:49 +0000 (15:43 -0700)]
workunits: break down cache pool tests to be more precise; expand some

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoworkunits: check errors propagate on cache pools in caching_redirects.sh
Greg Farnum [Mon, 14 Oct 2013 20:43:07 +0000 (13:43 -0700)]
workunits: check errors propagate on cache pools in caching_redirects.sh

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoReplicatedPG: promote: handle failed promotes
Greg Farnum [Thu, 10 Oct 2013 16:58:57 +0000 (09:58 -0700)]
ReplicatedPG: promote: handle failed promotes

If we get an error back, reply to the client directly and remove
the op which triggered promotion from our blocked op queue.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoReplicatedPG: promote: add the OpRequest to the Callback
Greg Farnum [Thu, 10 Oct 2013 16:37:35 +0000 (09:37 -0700)]
ReplicatedPG: promote: add the OpRequest to the Callback

This way we can do stuff to it, and we're about to.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoReplicatedPG: promote: first draft pass at doing object promotion
Greg Farnum [Thu, 10 Oct 2013 00:48:57 +0000 (17:48 -0700)]
ReplicatedPG: promote: first draft pass at doing object promotion

This is not yet at all complete -- among other things, it will
retry forever on any object which doesn't exist in the underlying
pool. But it demonstrates the approach reasonably clearly.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>y
11 years agoReplicatedPG: copy: don't return from finish_copyfrom
Greg Farnum [Thu, 10 Oct 2013 00:53:35 +0000 (17:53 -0700)]
ReplicatedPG: copy: don't return from finish_copyfrom

The return value is meaningless; nothing in this function can fail.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoReplicatedPG: copy: switch out the CopyCallback interface
Greg Farnum [Wed, 9 Oct 2013 23:16:36 +0000 (16:16 -0700)]
ReplicatedPG: copy: switch out the CopyCallback interface

The tuple was already unwieldy with 4 members; I didn't want to add
more. Instead, create a new CopyResults struct which contains all the
object info and completion data, and pass the retval and a CopyResults*
in the CopyCallbackResults tuple.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agotest_ipaddr: add another unit test
Sage Weil [Sat, 14 Dec 2013 00:02:22 +0000 (16:02 -0800)]
test_ipaddr: add another unit test

Was checking something for kbader.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: drop unused hit_set_start_stats
Sage Weil [Sat, 14 Dec 2013 00:02:02 +0000 (16:02 -0800)]
osd/ReplicatedPG: drop unused hit_set_start_stats

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: maintain stats for the hit_set_* objects
Sage Weil [Sat, 14 Dec 2013 00:01:48 +0000 (16:01 -0800)]
osd/ReplicatedPG: maintain stats for the hit_set_* objects

We also make hit_set.current_info reflect only the on-disk 'current', not
anything that is not persisted.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: set object_info_t, SnapSet on hit_set objects
Sage Weil [Fri, 13 Dec 2013 22:54:16 +0000 (14:54 -0800)]
osd/ReplicatedPG: set object_info_t, SnapSet on hit_set objects

These are first-class user-visible rados objects and need these attrs.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agovstart.sh: --hitset <pool> <type>
Sage Weil [Fri, 13 Dec 2013 22:50:34 +0000 (14:50 -0800)]
vstart.sh: --hitset <pool> <type>

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: debug: improve hit_set func banners
Sage Weil [Fri, 13 Dec 2013 02:14:12 +0000 (18:14 -0800)]
osd/ReplicatedPG: debug: improve hit_set func banners

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: do not update current_last_update on activate
Sage Weil [Fri, 13 Dec 2013 02:13:58 +0000 (18:13 -0800)]
osd/ReplicatedPG: do not update current_last_update on activate

Don't update this when we apply the log to our in-memory hitset!  We should
only update this when we persist something to disk.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados_api_tier: make HitSetWrite handle pg splits
Sage Weil [Tue, 10 Dec 2013 04:53:07 +0000 (20:53 -0800)]
ceph_test_rados_api_tier: make HitSetWrite handle pg splits

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agocommon/bloom_filter: fix copy ctor
Sage Weil [Fri, 6 Dec 2013 21:51:02 +0000 (13:51 -0800)]
common/bloom_filter: fix copy ctor

We should not delete[] an uninitialized pointer.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados_api_tier: add HitSetRead
Sage Weil [Fri, 6 Dec 2013 19:28:04 +0000 (11:28 -0800)]
ceph_test_rados_api_tier: add HitSetRead

Verify that the HitSet reflects a read (and never written) object.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados_api_tier: HitSetRead -> HitSetWrite
Sage Weil [Fri, 6 Dec 2013 19:25:20 +0000 (11:25 -0800)]
ceph_test_rados_api_tier: HitSetRead -> HitSetWrite

This way it will pass despite thrashing.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados_api_tier: add HitSet trim test
Sage Weil [Fri, 6 Dec 2013 19:01:39 +0000 (11:01 -0800)]
ceph_test_rados_api_tier: add HitSet trim test

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/HitSet: fix sealed initialization in Params ctor
Sage Weil [Fri, 6 Dec 2013 17:41:21 +0000 (09:41 -0800)]
osd/HitSet: fix sealed initialization in Params ctor

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados_api_tier: make HitSetRead test less noisy
Sage Weil [Fri, 6 Dec 2013 17:39:21 +0000 (09:39 -0800)]
ceph_test_rados_api_tier: make HitSetRead test less noisy

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/HitSet: fix copy ctor
Sage Weil [Fri, 6 Dec 2013 06:10:09 +0000 (22:10 -0800)]
osd/HitSet: fix copy ctor

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/HitSet: fix dump() of fpp
Sage Weil [Fri, 6 Dec 2013 02:00:09 +0000 (18:00 -0800)]
osd/HitSet: fix dump() of fpp

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agotest/encoding/check-generated: test copy ctor, operator=
Sage Weil [Fri, 6 Dec 2013 02:11:10 +0000 (18:11 -0800)]
test/encoding/check-generated: test copy ctor, operator=

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph-dencoder: add 'copy' command to test operator=
Sage Weil [Fri, 6 Dec 2013 01:16:39 +0000 (17:16 -0800)]
ceph-dencoder: add 'copy' command to test operator=

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agomds/Capability: no copying
Sage Weil [Fri, 6 Dec 2013 01:16:08 +0000 (17:16 -0800)]
mds/Capability: no copying

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agotest: add a HitSet unit test
Greg Farnum [Thu, 5 Dec 2013 20:58:37 +0000 (12:58 -0800)]
test: add a HitSet unit test

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoosd/HitSet: track BloomHitSet::Params fpp in micros, not as a double
Sage Weil [Wed, 4 Dec 2013 23:42:21 +0000 (15:42 -0800)]
osd/HitSet: track BloomHitSet::Params fpp in micros, not as a double

...and store it as a 32-bit value, so that it actually works!

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: archive hit_set if it is old and not full
Sage Weil [Wed, 4 Dec 2013 23:17:57 +0000 (15:17 -0800)]
osd/ReplicatedPG: archive hit_set if it is old and not full

This matches the condition under which we call _persist().

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd: prevent zero BloomHitSet fpp
Sage Weil [Wed, 4 Dec 2013 22:42:09 +0000 (14:42 -0800)]
osd: prevent zero BloomHitSet fpp

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/HitSet: take Params as const ref to avoid confusion about ownership
Sage Weil [Wed, 4 Dec 2013 22:41:40 +0000 (14:41 -0800)]
osd/HitSet: take Params as const ref to avoid confusion about ownership

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agomon/OSDMonitor: non-zero default bloom fpp
Sage Weil [Wed, 4 Dec 2013 22:41:04 +0000 (14:41 -0800)]
mon/OSDMonitor: non-zero default bloom fpp

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/HitSet: make pg_pool_t and Params operator<< less parenthetical
Sage Weil [Wed, 4 Dec 2013 22:17:03 +0000 (14:17 -0800)]
osd/HitSet: make pg_pool_t and Params operator<< less parenthetical

pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 8 owner 0 crash_replay_interval 45 hit_set bloom{false_positive_probability: 0, target size: 0, seed: 0} 10s x8

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: apply log to new HitSet to capture writes after peering
Sage Weil [Wed, 4 Dec 2013 22:11:53 +0000 (14:11 -0800)]
osd/ReplicatedPG: apply log to new HitSet to capture writes after peering

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoReplicatedPG: do not seal() HitSets until we're done with them
Greg Farnum [Wed, 4 Dec 2013 20:57:44 +0000 (12:57 -0800)]
ReplicatedPG: do not seal() HitSets until we're done with them

We don't want to seal HitSets just because we're writing a
snapshot to disk; it potentially shrinks the in-memory one
we want to keep adding stuff to!

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agopg_hit_set_info_t: remove unused size, target_size members
Greg Farnum [Wed, 4 Dec 2013 20:45:33 +0000 (12:45 -0800)]
pg_hit_set_info_t: remove unused size, target_size members

Signed-off-by: Greg Farnum <greg@inktank.com>