]>
git.apps.os.sepia.ceph.com Git - ceph.git/log
Sage Weil [Fri, 25 Oct 2013 05:30:50 +0000 (22:30 -0700)]
osd/ReplicatedPG: implement cache-flush, cache-try-flush
Implement a rados operation that will flush a dirty object in the cache
tier by writing it back to the base tier.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Wed, 18 Dec 2013 19:23:50 +0000 (11:23 -0800)]
osd: make obc copyfrom blocking generic
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 13 Dec 2013 21:35:25 +0000 (13:35 -0800)]
librados, osd: add flags to COPY_FROM
If we initiate a COPY_FROM as part of a FLUSH operation, we will need to
set a flag so that the read-side of the copy and join the existing
in-progress operation without taknig additional locks.
Similarly, we need to pass flags from the client indicating whether we
should ignore overlay or cache logic while performing the copy. These are
used by the promote and flush logic.
Note that none of these flags are exposed through librados (at least not
at this time).
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 13 Dec 2013 21:08:12 +0000 (13:08 -0800)]
osd/ReplicatedPG: fix promote: set oi.size
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 13 Dec 2013 21:37:01 +0000 (13:37 -0800)]
osd/osd_types: fix operator<< on copy-get operation
This was missed in
15c8267e34aaba7a6d1d316b22519982a997f5a0 .
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Mon, 9 Dec 2013 17:48:35 +0000 (09:48 -0800)]
ceph_test_rados_api_tier: test undirty on non-existent object
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Tue, 10 Dec 2013 17:52:03 +0000 (09:52 -0800)]
osd/ReplicatedPG: debug: improve maybe_handle_cache() handling
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Tue, 10 Dec 2013 00:34:15 +0000 (16:34 -0800)]
osd/ReplicatedPG: rename invalidate_forward
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Sat, 7 Dec 2013 23:22:12 +0000 (15:22 -0800)]
ceph_test_rados: debug: include exists|dne in update_object_version
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Sat, 7 Dec 2013 22:56:10 +0000 (14:56 -0800)]
ceph_test_rados: test is_dirty, undirty
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Sat, 7 Dec 2013 22:22:09 +0000 (14:22 -0800)]
ceph_test_rados: fix CopyFromOp locking
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Thu, 10 Oct 2013 18:51:16 +0000 (11:51 -0700)]
librados: seek during object iteration
Add ability to reset iterator to a specific hash position. For now, we
just truncate this to the current PG. In the future, this may be more
precise.
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
Sage Weil [Tue, 10 Dec 2013 17:50:01 +0000 (09:50 -0800)]
osdc/Objecter: remove honor_cache_redirects global flag
We can do this on a per-op basic with CEPH_OSD_FLAG_IGNORE_OVERLAY.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Tue, 10 Dec 2013 17:48:19 +0000 (09:48 -0800)]
osd/ReplicatedPG: use IGNORE_OVERLAY flag for copy-from
No need to use the Objecter-wide setting now.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 13 Dec 2013 21:15:11 +0000 (13:15 -0800)]
osdc/Objecter: add CEPH_OSD_FLAG_IGNORE_OVERLAY flag
If the flag is set, send the op to the pool specified and ignore the
overlay. Note that this obsoletes the global Objecter flag.
It also makes these EINVAL correctly:
rados -p base cache-flush
rados -p base cache-evict
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 13 Dec 2013 21:11:27 +0000 (13:11 -0800)]
osd: rename IGNORE_OVERLAY -> IGNORE_CACHE
This is about skipping cache logic, not the tier pool overlay property.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Thu, 12 Dec 2013 20:33:44 +0000 (12:33 -0800)]
osd/osd_types: operator<< for ObjectContext::RWState
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 25 Oct 2013 05:23:51 +0000 (22:23 -0700)]
osd/ReplicatedPG: more verbose heading for process_copy_chunk
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 25 Oct 2013 05:23:22 +0000 (22:23 -0700)]
osd/ReplicatedPG: set ctx->obc in simple_repop_create
Strangely nobody hss needed this yet, but we will shortly.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 25 Oct 2013 04:45:50 +0000 (21:45 -0700)]
osd/ReplicatedPG: use finish_ctx for finish_promote
Use the common code here to avoid duplicating this logic.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 25 Oct 2013 04:40:09 +0000 (21:40 -0700)]
osd/ReplicatedPG: use get_next_version() in finish_promote
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 25 Oct 2013 04:38:30 +0000 (21:38 -0700)]
osd/ReplicatedPG: split off finish_ctx from execute_ctx
The second part of execute_ctx() is doing some somewhat generic work to
make the prepared updates in the ctx apply, updating the obc's cached
values. Factor it out.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 25 Oct 2013 04:35:45 +0000 (21:35 -0700)]
osd/ReplicatedPG: add SKIPRWLOCKS flag
Flush puts us in an conundrum:
- the flush eventually writes, behaving like a write
- writes take the write lock at the start
- to flush, we send copy-from to the base pool, which does a copy-get on
our object
- the copy-get is a read, that blocks on the write.
This flag will allow an op to skip the initial locking step. It will need
to take it later, of course.
Signed-off-by: Sage Weil <sage@inktank.com>
Conflicts:
src/osd/ReplicatedPG.cc
Sage Weil [Fri, 25 Oct 2013 03:47:17 +0000 (20:47 -0700)]
osd/ReplicatedPG: be consistent about ctx->obs vs ctx->obc->obs
Just for consistency (ctx->obs =- &ctx->obc->obs).
Signed-off-by: Sage Weil <sage@inktank.com>
Conflicts:
src/osd/ReplicatedPG.cc
Sage Weil [Fri, 25 Oct 2013 03:44:30 +0000 (20:44 -0700)]
osd/ReplicatedPG: drop unnecessary temp vars in execute_ctx()
Both of these are pulled out of ctx->obs, which is not updated until the
very end; use that instead!
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 25 Oct 2013 02:18:40 +0000 (19:18 -0700)]
osd/ReplicatedPG: allow osds to issue writes to osds
We asserted that the client was not an OSD years ago when we separated out
the client and cluster networks. Now, we are about to allow an OSD to
trigger a copy_from on another pool (for cache flush) and the assert can
go away. We've long since verified that the messages are going out on
the correct interfaces.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Tue, 29 Oct 2013 04:29:09 +0000 (21:29 -0700)]
osd/ReplcatedPG: maybe_handle_cache style
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Wed, 23 Oct 2013 02:56:55 +0000 (19:56 -0700)]
osd/ReplicatedPG: skip promote for DELETE
If an op starts with DELETE there is no need to promote the old content
from the base tier. Note that this only works if the FAILOK flag is
set. Otherwise, we need to know whether the object existed or not to
return either 0 or -ENOENT.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Wed, 23 Oct 2013 02:41:27 +0000 (19:41 -0700)]
osd/ReplicatedPG: implement cache_evict
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Wed, 23 Oct 2013 02:38:54 +0000 (19:38 -0700)]
librados: add an aio_operate that takes a write and flags
Until now you could only pass flags to read operations.
Signed-off-by: Sage Weil <sage@inktank.com>
Greg Farnum [Tue, 12 Nov 2013 22:54:53 +0000 (14:54 -0800)]
osd/osd_types: introduce helper for osd op flags -> string conversion
Signed-off-by: Sage Weil <sage@inktank.com>
Conflicts:
src/osd/osd_types.h
Sage Weil [Wed, 23 Oct 2013 01:44:03 +0000 (18:44 -0700)]
librados, osd: add IGNORE_OVERLAY flag
Add a flag that will make the OSD bypass the cache overlay logic. This is
needed in order to handle operations like CACHE_EVICT and CACHE_FLUSH.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Wed, 23 Oct 2013 01:26:01 +0000 (18:26 -0700)]
librados: add cache_flush(), cache_try_flus(), cache_evict() methods
Not yet implemented by the OSD.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Wed, 23 Oct 2013 02:36:45 +0000 (19:36 -0700)]
osd/ReplicatedPG: set object_info and snapset xattrs on promote
For the normal write path, prepare_transaction() handles this for us. In
this case, we need to do it explicitly.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Wed, 23 Oct 2013 01:06:41 +0000 (18:06 -0700)]
osd/ReplicatedPG: handle is_whiteout in do_osd_ops()
Most of the time we handle whiteouts by returning ENOENT before we even
get this far. However, for a mixed read/write transaction (e.g., a guard)
or certain ops (like create exclusive) we need to deal with the
exists == true and whiteout flag set case explicitly.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Wed, 23 Oct 2013 01:02:25 +0000 (18:02 -0700)]
osd/ReplicatedPG: clear whiteout when writing into cache tier
If we have a whiteout object and then write over it, clear the whiteout
flag.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Wed, 23 Oct 2013 00:21:27 +0000 (17:21 -0700)]
osd/ReplicatedPG: set whiteout in cache pool on delete
If we delete an object in the cache pool, set the whiteout flag instead of
removing the on-disk object.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Wed, 23 Oct 2013 00:24:21 +0000 (17:24 -0700)]
ceph_test_rados_api_tier: verify delete creates whiteouts
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Tue, 22 Oct 2013 23:30:26 +0000 (16:30 -0700)]
osd/ReplicatedPG: ENOENT when deleting a whiteout
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Tue, 22 Oct 2013 23:14:00 +0000 (16:14 -0700)]
osd/ReplicatedPG: create whiteout on promote ENOENT
If we try to fetch an object from the base tier and it is not present, we
can create a whiteout object.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Wed, 23 Oct 2013 00:23:39 +0000 (17:23 -0700)]
ceph_test_rados_api_tier: add simple promote-on-read test
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Tue, 22 Oct 2013 22:44:32 +0000 (15:44 -0700)]
ceph_test_rados_api_tier: rename tests
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Tue, 22 Oct 2013 22:12:38 +0000 (15:12 -0700)]
osd/ReplicatedPG: use simple_repop_{create,submit} for finish_promote
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Sat, 7 Dec 2013 23:20:08 +0000 (15:20 -0800)]
osd/ReplicatedPG: UNDIRTY is not a user_modify
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Tue, 22 Oct 2013 22:04:44 +0000 (15:04 -0700)]
osd/ReplicatedPG: move r<0 handling into finish_promote()
Let logic in header, and will let us handle ENOENT with a whiteout.
Signed-off-by: Sage Weil <sage@inktank.com>
Greg Farnum [Tue, 15 Oct 2013 22:43:49 +0000 (15:43 -0700)]
workunits: break down cache pool tests to be more precise; expand some
Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Greg Farnum [Mon, 14 Oct 2013 20:43:07 +0000 (13:43 -0700)]
workunits: check errors propagate on cache pools in caching_redirects.sh
Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Greg Farnum [Thu, 10 Oct 2013 16:58:57 +0000 (09:58 -0700)]
ReplicatedPG: promote: handle failed promotes
If we get an error back, reply to the client directly and remove
the op which triggered promotion from our blocked op queue.
Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Greg Farnum [Thu, 10 Oct 2013 16:37:35 +0000 (09:37 -0700)]
ReplicatedPG: promote: add the OpRequest to the Callback
This way we can do stuff to it, and we're about to.
Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Greg Farnum [Thu, 10 Oct 2013 00:48:57 +0000 (17:48 -0700)]
ReplicatedPG: promote: first draft pass at doing object promotion
This is not yet at all complete -- among other things, it will
retry forever on any object which doesn't exist in the underlying
pool. But it demonstrates the approach reasonably clearly.
Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>y
Greg Farnum [Thu, 10 Oct 2013 00:53:35 +0000 (17:53 -0700)]
ReplicatedPG: copy: don't return from finish_copyfrom
The return value is meaningless; nothing in this function can fail.
Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Greg Farnum [Wed, 9 Oct 2013 23:16:36 +0000 (16:16 -0700)]
ReplicatedPG: copy: switch out the CopyCallback interface
The tuple was already unwieldy with 4 members; I didn't want to add
more. Instead, create a new CopyResults struct which contains all the
object info and completion data, and pass the retval and a CopyResults*
in the CopyCallbackResults tuple.
Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Sage Weil [Sat, 14 Dec 2013 00:02:22 +0000 (16:02 -0800)]
test_ipaddr: add another unit test
Was checking something for kbader.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Sat, 14 Dec 2013 00:02:02 +0000 (16:02 -0800)]
osd/ReplicatedPG: drop unused hit_set_start_stats
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Sat, 14 Dec 2013 00:01:48 +0000 (16:01 -0800)]
osd/ReplicatedPG: maintain stats for the hit_set_* objects
We also make hit_set.current_info reflect only the on-disk 'current', not
anything that is not persisted.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 13 Dec 2013 22:54:16 +0000 (14:54 -0800)]
osd/ReplicatedPG: set object_info_t, SnapSet on hit_set objects
These are first-class user-visible rados objects and need these attrs.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 13 Dec 2013 22:50:34 +0000 (14:50 -0800)]
vstart.sh: --hitset <pool> <type>
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 13 Dec 2013 02:14:12 +0000 (18:14 -0800)]
osd/ReplicatedPG: debug: improve hit_set func banners
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 13 Dec 2013 02:13:58 +0000 (18:13 -0800)]
osd/ReplicatedPG: do not update current_last_update on activate
Don't update this when we apply the log to our in-memory hitset! We should
only update this when we persist something to disk.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Tue, 10 Dec 2013 04:53:07 +0000 (20:53 -0800)]
ceph_test_rados_api_tier: make HitSetWrite handle pg splits
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 6 Dec 2013 21:51:02 +0000 (13:51 -0800)]
common/bloom_filter: fix copy ctor
We should not delete[] an uninitialized pointer.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 6 Dec 2013 19:28:04 +0000 (11:28 -0800)]
ceph_test_rados_api_tier: add HitSetRead
Verify that the HitSet reflects a read (and never written) object.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 6 Dec 2013 19:25:20 +0000 (11:25 -0800)]
ceph_test_rados_api_tier: HitSetRead -> HitSetWrite
This way it will pass despite thrashing.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 6 Dec 2013 19:01:39 +0000 (11:01 -0800)]
ceph_test_rados_api_tier: add HitSet trim test
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 6 Dec 2013 17:41:21 +0000 (09:41 -0800)]
osd/HitSet: fix sealed initialization in Params ctor
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 6 Dec 2013 17:39:21 +0000 (09:39 -0800)]
ceph_test_rados_api_tier: make HitSetRead test less noisy
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 6 Dec 2013 06:10:09 +0000 (22:10 -0800)]
osd/HitSet: fix copy ctor
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 6 Dec 2013 02:00:09 +0000 (18:00 -0800)]
osd/HitSet: fix dump() of fpp
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 6 Dec 2013 02:11:10 +0000 (18:11 -0800)]
test/encoding/check-generated: test copy ctor, operator=
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 6 Dec 2013 01:16:39 +0000 (17:16 -0800)]
ceph-dencoder: add 'copy' command to test operator=
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 6 Dec 2013 01:16:08 +0000 (17:16 -0800)]
mds/Capability: no copying
Signed-off-by: Sage Weil <sage@inktank.com>
Greg Farnum [Thu, 5 Dec 2013 20:58:37 +0000 (12:58 -0800)]
test: add a HitSet unit test
Signed-off-by: Greg Farnum <greg@inktank.com>
Sage Weil [Wed, 4 Dec 2013 23:42:21 +0000 (15:42 -0800)]
osd/HitSet: track BloomHitSet::Params fpp in micros, not as a double
...and store it as a 32-bit value, so that it actually works!
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Wed, 4 Dec 2013 23:17:57 +0000 (15:17 -0800)]
osd/ReplicatedPG: archive hit_set if it is old and not full
This matches the condition under which we call _persist().
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Wed, 4 Dec 2013 22:42:09 +0000 (14:42 -0800)]
osd: prevent zero BloomHitSet fpp
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Wed, 4 Dec 2013 22:41:40 +0000 (14:41 -0800)]
osd/HitSet: take Params as const ref to avoid confusion about ownership
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Wed, 4 Dec 2013 22:41:04 +0000 (14:41 -0800)]
mon/OSDMonitor: non-zero default bloom fpp
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Wed, 4 Dec 2013 22:17:03 +0000 (14:17 -0800)]
osd/HitSet: make pg_pool_t and Params operator<< less parenthetical
pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 8 owner 0 crash_replay_interval 45 hit_set bloom{false_positive_probability: 0, target size: 0, seed: 0} 10s x8
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Wed, 4 Dec 2013 22:11:53 +0000 (14:11 -0800)]
osd/ReplicatedPG: apply log to new HitSet to capture writes after peering
Signed-off-by: Sage Weil <sage@inktank.com>
Greg Farnum [Wed, 4 Dec 2013 20:57:44 +0000 (12:57 -0800)]
ReplicatedPG: do not seal() HitSets until we're done with them
We don't want to seal HitSets just because we're writing a
snapshot to disk; it potentially shrinks the in-memory one
we want to keep adding stuff to!
Signed-off-by: Greg Farnum <greg@inktank.com>
Greg Farnum [Wed, 4 Dec 2013 20:45:33 +0000 (12:45 -0800)]
pg_hit_set_info_t: remove unused size, target_size members
Signed-off-by: Greg Farnum <greg@inktank.com>
Sage Weil [Wed, 4 Dec 2013 17:39:26 +0000 (09:39 -0800)]
ceph_test_rados: hit hit_set_{list,get} rados operations
This will do a list, and then get a random HitSet.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Wed, 4 Dec 2013 17:11:02 +0000 (09:11 -0800)]
osd/ReplicatedPG: trim old hit_set objects on persist
Any time we persist a hit_set object, take the opportunity to remove any
old ones that we don't want any more.
Note that this means if the admin decreases the number of objects to track,
we won't remove them until the next time we persist something. We also
don't clean up if the HitSet tracking is disabled entirely.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Mon, 2 Dec 2013 19:27:05 +0000 (11:27 -0800)]
osd/ReplicatedPG: put hit_set objects in a configurable namespace
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Sun, 6 Oct 2013 18:39:04 +0000 (11:39 -0700)]
librados: create new ceph_test_rados_api_tier target
Move the dirty/undirty test to it, and add one for HitSets.
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
Greg Farnum [Tue, 19 Nov 2013 00:52:50 +0000 (16:52 -0800)]
librados, osd: list and get HitSets via librados
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
Sage Weil [Sun, 6 Oct 2013 18:39:58 +0000 (11:39 -0700)]
osd/ReplicatedPG: use vectorized osd_op outdata for pg ops
This lets us put PGLS in a compound operation. Nothing does that yet, but
this would allow it.
Despite appearances, this is not a protocol change and does not require
a feature bit for clients: using the osd_ops vector mechanisms store all
the data in the same places as before, it just fills in some of the
already-decoded-but-empty data structures in the MOSDOpReply header.
<Greg note:> We may need a feature bit to let clients know they can send
compound PG ops to OSDs, though? Or maybe we can let it be covered
by supporting hitset ops.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 11 Oct 2013 23:06:07 +0000 (16:06 -0700)]
osd/ReplicatedPG: add basic HitSet tracking
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
Sage Weil [Thu, 10 Oct 2013 22:40:29 +0000 (15:40 -0700)]
mon/OSDMonitor: set hit_set fields
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
Sage Weil [Thu, 19 Sep 2013 15:48:07 +0000 (08:48 -0700)]
osd: add hit_set_* parameters to pg_pool_t
Add pool properties to control what type of HitSet we want to use, along with
some (mostly generic) parameters.
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
Sage Weil [Thu, 3 Oct 2013 23:40:15 +0000 (16:40 -0700)]
osd/osd_types: include pg_hit_set_history_t in pg_info_t
Track metadata about the currently accumulating HitSet as well as
previously archived ones in the pg_info_t. This will not scale well for
extremely long histories, but does let us avoid explicitly sharing this
metadata during recovery or other normal update activity.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Thu, 3 Oct 2013 22:29:15 +0000 (15:29 -0700)]
osd/osd_types: add pg_hit_set_{info,history}_t
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 6 Dec 2013 06:19:57 +0000 (22:19 -0800)]
common/bloom_filter: fix operator=
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Thu, 3 Oct 2013 05:41:54 +0000 (22:41 -0700)]
osd_types: add generic HitSet type with bloom and explicit implementations
Track a set of hash values, either explicitly or using a bloom_filter. Hide
the implementation and allow us to transparently encode and decode.
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
Sage Weil [Fri, 4 Oct 2013 23:07:20 +0000 (16:07 -0700)]
osd/ReplicatedPG: factor out simple_repop_{create,submit} helpers
This makes it easier to create repops correctly, and should help
prevent bugs like the one we remove here in process_copy_op (we were
serializing on the wrong object!)
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
Greg Farnum [Fri, 15 Nov 2013 23:16:20 +0000 (15:16 -0800)]
osd/PG: factor out get_next_version()
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
Greg Farnum [Fri, 15 Nov 2013 23:48:55 +0000 (15:48 -0800)]
librados: add wait_for_latest_osdmap()
There are times when users may need to make sure the client has the
latest osdmap, for example after sending a mon command modifying
pool properties.
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
squash "librados: add wait_for_latest_osdmap()"
Sage Weil [Fri, 11 Oct 2013 22:34:33 +0000 (15:34 -0700)]
librados: expose methods for calculating object hash position
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Sage Weil [Fri, 11 Oct 2013 22:34:19 +0000 (15:34 -0700)]
osdc/Objecter: expose methods for getting object hash position and pg
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Sage Weil [Fri, 11 Oct 2013 22:33:45 +0000 (15:33 -0700)]
osd: capture hashing of objects to hash positions/pgs in pg_pool_t
The hashing is dependent on pool properties; capture (more of) it in a
method instead of having it in OSDMap.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>