]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
11 years agocommon/bloom_filter: fix operator=
Sage Weil [Fri, 6 Dec 2013 06:19:57 +0000 (22:19 -0800)]
common/bloom_filter: fix operator=

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd_types: add generic HitSet type with bloom and explicit implementations
Sage Weil [Thu, 3 Oct 2013 05:41:54 +0000 (22:41 -0700)]
osd_types: add generic HitSet type with bloom and explicit implementations

Track a set of hash values, either explicitly or using a bloom_filter. Hide
the implementation and allow us to transparently encode and decode.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoosd/ReplicatedPG: factor out simple_repop_{create,submit} helpers
Sage Weil [Fri, 4 Oct 2013 23:07:20 +0000 (16:07 -0700)]
osd/ReplicatedPG: factor out simple_repop_{create,submit} helpers

This makes it easier to create repops correctly, and should help
prevent bugs like the one we remove here in process_copy_op (we were
serializing on the wrong object!)

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoosd/PG: factor out get_next_version()
Greg Farnum [Fri, 15 Nov 2013 23:16:20 +0000 (15:16 -0800)]
osd/PG: factor out get_next_version()

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agolibrados: add wait_for_latest_osdmap()
Greg Farnum [Fri, 15 Nov 2013 23:48:55 +0000 (15:48 -0800)]
librados: add wait_for_latest_osdmap()

There are times when users may need to make sure the client has the
latest osdmap, for example after sending a mon command modifying
pool properties.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
squash "librados: add wait_for_latest_osdmap()"

11 years agolibrados: expose methods for calculating object hash position
Sage Weil [Fri, 11 Oct 2013 22:34:33 +0000 (15:34 -0700)]
librados: expose methods for calculating object hash position

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoosdc/Objecter: expose methods for getting object hash position and pg
Sage Weil [Fri, 11 Oct 2013 22:34:19 +0000 (15:34 -0700)]
osdc/Objecter: expose methods for getting object hash position and pg

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoosd: capture hashing of objects to hash positions/pgs in pg_pool_t
Sage Weil [Fri, 11 Oct 2013 22:33:45 +0000 (15:33 -0700)]
osd: capture hashing of objects to hash positions/pgs in pg_pool_t

The hashing is dependent on pool properties; capture (more of) it in a
method instead of having it in OSDMap.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoosd/OSDMap: use new object_locator_t::hash to place object in a pg
Sage Weil [Thu, 3 Oct 2013 05:15:41 +0000 (22:15 -0700)]
osd/OSDMap: use new object_locator_t::hash to place object in a pg

The hash value, if provided, becomes the ps (placement seed) portion of the
pg_t, skipping any hashing of the object name (or locator key).

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoosd/osd_types: add explicit hash to object_locator_t
Greg Farnum [Fri, 15 Nov 2013 19:12:03 +0000 (11:12 -0800)]
osd/osd_types: add explicit hash to object_locator_t

Instead of hashing the object name or key, we allow the hash position to be
provided explicitly.

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoencoding: allow users to specify a different compatv after encoding
Greg Farnum [Thu, 21 Nov 2013 01:04:26 +0000 (17:04 -0800)]
encoding: allow users to specify a different compatv after encoding

This way we can set the compatv preferentially depending on whether
we've actually encoded new information or not.

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agolibrados: add mon_command to C++ API
Sage Weil [Thu, 10 Oct 2013 23:23:57 +0000 (16:23 -0700)]
librados: add mon_command to C++ API

This way librados users can execute monitor commands.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agolibrados: document aio_flush()
Sage Weil [Sun, 6 Oct 2013 19:55:16 +0000 (12:55 -0700)]
librados: document aio_flush()

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agolibrados: constify inbl command args
Sage Weil [Thu, 10 Oct 2013 23:13:58 +0000 (16:13 -0700)]
librados: constify inbl command args

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoosdc/Objecter: constify inbl command args
Sage Weil [Thu, 10 Oct 2013 23:13:43 +0000 (16:13 -0700)]
osdc/Objecter: constify inbl command args

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agomon/MonClient: constify inbl command args
Sage Weil [Thu, 10 Oct 2013 23:13:31 +0000 (16:13 -0700)]
mon/MonClient: constify inbl command args

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoosdc/Objecter: reimplement list_objects
Sage Weil [Thu, 10 Oct 2013 16:56:39 +0000 (09:56 -0700)]
osdc/Objecter: reimplement list_objects

Return to caller at the end of each PG.  This allows the caller to look at
the [pg_]hash_position and get something meaningful.

If there are no objects in the PG, we skip it so that every callback has
*some* data (unless the pool is totally empty!).  So the real difference
here is that we don't move on to the next PG just to reach max_entries.

This gives the client some data sooner, but may mean more callbacks into
client code.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agolibrados: add get_pg_hash_position to determine pg while listing objects
Sage Weil [Thu, 3 Oct 2013 19:38:40 +0000 (12:38 -0700)]
librados: add get_pg_hash_position to determine pg while listing objects

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosdc/Objecter: stick bl inside ListContext
Sage Weil [Thu, 10 Oct 2013 15:51:23 +0000 (08:51 -0700)]
osdc/Objecter: stick bl inside ListContext

This is simpler and less error-prone.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosdc/Objecter: factor pg_read out of list_objects code
Sage Weil [Sun, 6 Oct 2013 20:30:23 +0000 (13:30 -0700)]
osdc/Objecter: factor pg_read out of list_objects code

This will get used later for other ops against PGs (instead of objects).

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosdc/Objecter: separate explicit pg target from current target
Sage Weil [Sun, 6 Oct 2013 20:22:31 +0000 (13:22 -0700)]
osdc/Objecter: separate explicit pg target from current target

The pgid field is used to store the pg the op mapped to.  We were just
setting it directly for PGLS.  Instead, fill in a new base_pgid, and copy that
to pgid in recalc_op_target(), the same way we do when we map an object
name to a PG.

In particular, we take this opportunity to map a raw pgid to an actual
pgid.  This means the base_pg could come from a raw hash value (although
it doesn't, yet).

Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoosdc/Objecter: drop redundant condition
Sage Weil [Thu, 10 Oct 2013 15:51:53 +0000 (08:51 -0700)]
osdc/Objecter: drop redundant condition

We are inside an if (response_size) block.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/osd_types: make pref optional in pg_t constructor
Sage Weil [Sun, 6 Oct 2013 18:37:15 +0000 (11:37 -0700)]
osd/osd_types: make pref optional in pg_t constructor

We don't use preferred placements any more, so this will
make it easier to start dropping references to it in new code.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agov0.72 v0.72
Gary Lowell [Thu, 7 Nov 2013 20:27:35 +0000 (20:27 +0000)]
v0.72

11 years agorgw: deny writes to a secondary zone by non-system users
Yehuda Sadeh [Tue, 5 Nov 2013 22:54:20 +0000 (14:54 -0800)]
rgw: deny writes to a secondary zone by non-system users

Fixes: #6678
We don't want to allow regular users to write to secondary zones,
otherwise we'd end up with data inconsistencies.

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
11 years agodoc/release-notes: note crush update timeout on startup change
Sage Weil [Thu, 7 Nov 2013 04:02:09 +0000 (20:02 -0800)]
doc/release-notes: note crush update timeout on startup change

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosdmaptool: fix cli tests
Sage Weil [Thu, 7 Nov 2013 03:59:56 +0000 (19:59 -0800)]
osdmaptool: fix cli tests

From c22c84a88c22688b6044ab37f65a3fe40dfe1983.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoCeph: Fix memory leak in chain_flistxattr()
Li Wang [Thu, 7 Nov 2013 02:44:30 +0000 (10:44 +0800)]
Ceph: Fix memory leak in chain_flistxattr()

Free allocated memory before return.

Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoReplicatedPG: don't skip missing if sentries is empty on pgls
Samuel Just [Wed, 6 Nov 2013 22:33:03 +0000 (14:33 -0800)]
ReplicatedPG: don't skip missing if sentries is empty on pgls

Formerly, if sentries is empty, we skip missing.  In general,
we need to continue adding items from missing until we get
to next (returned from collection_list_partial) to avoid
missing any objects.

Fixes: #6633
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
11 years agoPG: fix operator<<,log_wierdness log bound warning
Samuel Just [Wed, 6 Nov 2013 05:48:53 +0000 (21:48 -0800)]
PG: fix operator<<,log_wierdness log bound warning

Split may cause holes such that head != tail and yet
log.empty().

Fixes: #6722
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
11 years agoPGLog::rewind_divergent_log: log may not contain newhead
Samuel Just [Wed, 6 Nov 2013 01:47:48 +0000 (17:47 -0800)]
PGLog::rewind_divergent_log: log may not contain newhead

Due to split, there may be a hole at newhead.

Fixes: #6722
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
11 years agoMerge pull request #824 from dmick/next
Sage Weil [Wed, 6 Nov 2013 15:46:02 +0000 (07:46 -0800)]
Merge pull request #824 from dmick/next

osdmaptool: don't put progress on stdout

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoRadosModel: use sharedptr_registry for snaps_in_use
Samuel Just [Tue, 5 Nov 2013 23:40:29 +0000 (15:40 -0800)]
RadosModel: use sharedptr_registry for snaps_in_use

There might be two concurrent rollback ops each of which
adds snap x to snaps_in_use.  Between when the first
completes and the second completes, snap x may be removed
since the first would have removed snap x from snaps_in_use.
Using sharedptr_registry here avoids this by ensuring that
the snap won't be removed from snaps_in_use until all refs
are gone.

This patch also adds size() to sharedptr_registry.

Fixes: #6719
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
11 years agoosdmaptool: don't put progress on stdout 824/head
Dan Mick [Wed, 6 Nov 2013 00:11:10 +0000 (16:11 -0800)]
osdmaptool: don't put progress on stdout

If one requests JSON output, the progress message pollutes the output;
don't do that, send it to stderr instead

Signed-off-by: Dan Mick <dan.mick@inktank.com>
11 years agoFileStore::_collection_move_rename: handle missing dst dir on replay
Samuel Just [Mon, 4 Nov 2013 19:25:31 +0000 (11:25 -0800)]
FileStore::_collection_move_rename: handle missing dst dir on replay

In case of a replay, a missing destination directory indicates that
the destination object and directory have been removed by a later
transaction.  Thus, we need to remove the src object and return
0.

Fixes: #6714
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoMerge pull request #814 from ceph/wip-da-fix-galois-warning
Loic Dachary [Tue, 5 Nov 2013 00:36:32 +0000 (16:36 -0800)]
Merge pull request #814 from ceph/wip-da-fix-galois-warning

galois.c: fix compiler warning

Reviewed-by: Loic Dachary <loic@dachary.org>
11 years agogalois.c: fix compiler warning 814/head
Danny Al-Gaaf [Mon, 4 Nov 2013 22:30:47 +0000 (23:30 +0100)]
galois.c: fix compiler warning

galois_create_split_w8_tables() takes no parameter, remove '8' passed
to the function in one case.

osd/ErasureCodePluginJerasure/galois.c: In function 'galois_w32_region_multiply':
osd/ErasureCodePluginJerasure/galois.c:696:5: warning: call to function 'galois_create_split_w8_tables' without a real prototype [-Wunprototyped-calls]
In file included from osd/ErasureCodePluginJerasure/galois.c:53:0:
osd/ErasureCodePluginJerasure/galois.h:71:12: note: 'galois_create_split_w8_tables' was declared here

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
11 years agoOSD: allow project_pg_history to handle a missing map
Samuel Just [Mon, 4 Nov 2013 05:02:36 +0000 (21:02 -0800)]
OSD: allow project_pg_history to handle a missing map

If we get a peering message for an old map we don't have, we
can throwit out: the sending OSD will learn about the newer
maps and update itself accordingly, and we don't have the
information to know if the message is valid. This situation
can only happen if the sender was down for a long enough time
to create a map gap and its PGs have not yet advanced from
their boot-up maps to the current ones, so we can rely on it

Fixes: #6712
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoOSD: don't clear peering_wait_for_split in advance_map()
Samuel Just [Sun, 3 Nov 2013 19:06:10 +0000 (11:06 -0800)]
OSD: don't clear peering_wait_for_split in advance_map()

I really don't know why I added this...  Ops can be discarded from the
waiting_for_pg queue if we aren't primary simply because there must have
been an exchange of peering events before subops will be sent within a
particular epoch.  Thus, any events in the waiting_for_pg queue must be
client ops which should only be seen by the primary.  Peering events, on
the other hand, should only be discarded if we are in a new interval,
and that check might as well be performed in the peering wq.

Fixes: #6681
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoReplicatedPG::recover_backfill: adjust last_backfill to HEAD if snapdir
Samuel Just [Sat, 2 Nov 2013 20:54:51 +0000 (13:54 -0700)]
ReplicatedPG::recover_backfill: adjust last_backfill to HEAD if snapdir

Otherwise, if last_backfill_started is a snapdir, we will fail to send a
transaction for a client IO creating the head object and removing the
snapdir object.  The result will be that head will eventually be
backfilled, but the snapdir object will erroneously not be removed.

Fixes: #6685
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoMerge pull request #809 from ceph/wip-pgmap
Gregory Farnum [Sun, 3 Nov 2013 17:25:28 +0000 (09:25 -0800)]
Merge pull request #809 from ceph/wip-pgmap

Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoosd/erasurecode: correct one variable name in jerasure_matrix_to_bitmatrix()
Xing Lin [Sun, 3 Nov 2013 01:24:22 +0000 (19:24 -0600)]
osd/erasurecode: correct one variable name in jerasure_matrix_to_bitmatrix()

When bitmatrix is NULL, this function returns NULL.

Signed-off-by: Xing Lin <xinglin@cs.utah.edu>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agomon/PGMap: use const ref, not pass-by-value 809/head
Sage Weil [Sat, 2 Nov 2013 06:56:45 +0000 (23:56 -0700)]
mon/PGMap: use const ref, not pass-by-value

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #806 from jdurgin/wip-xfstests
Sage Weil [Sat, 2 Nov 2013 06:32:26 +0000 (23:32 -0700)]
Merge pull request #806 from jdurgin/wip-xfstests

Don't run racy xfstest 008

11 years agoMerge pull request #807 from jdurgin/wip-rbd-map-rw
Sage Weil [Sat, 2 Nov 2013 06:31:44 +0000 (23:31 -0700)]
Merge pull request #807 from jdurgin/wip-rbd-map-rw

rbd: omit 'rw' option during map

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #804 from jdurgin/wip-rgw-replica-log-next
Yehuda Sadeh [Sat, 2 Nov 2013 04:00:24 +0000 (21:00 -0700)]
Merge pull request #804 from jdurgin/wip-rgw-replica-log-next

rgw: don't turn 404 into 400 for the replicalog api

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
11 years agorbd: omit 'rw' option during map 807/head
Josh Durgin [Sat, 2 Nov 2013 02:02:29 +0000 (19:02 -0700)]
rbd: omit 'rw' option during map

The ro and rw options were added in linux 3.7. To be compatible with
older kernels, don't specify rw. The default will probably always be
rw, so this should not present any problems in the future.

Reported-by: nicolasc <nicolas.canceill@surfsara.nl>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoqa: don't run racy xfstest 008 806/head
Josh Durgin [Sat, 2 Nov 2013 01:41:02 +0000 (18:41 -0700)]
qa: don't run racy xfstest 008

This test attempts to generate a random number of holes within a
particular range, but may fail because hole placement is random.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoMerge pull request #802 from ceph/wip-6673b
David Zafman [Fri, 1 Nov 2013 23:36:19 +0000 (16:36 -0700)]
Merge pull request #802 from ceph/wip-6673b
(manually merged after next branch rebuilt)

OSDMonitor: be a little nicer about letting users do pg splitting

Reviewed-by: David Zafman <david.zafman@inktank.com>
11 years agoOSDMonitor: be a little nicer about letting users do pg splitting
Greg Farnum [Fri, 1 Nov 2013 22:45:02 +0000 (15:45 -0700)]
OSDMonitor: be a little nicer about letting users do pg splitting

We were previously blocking pg splits whenever pg creations were in-
progress, but we only really need to avoid splitting any pgs which are
currently being created. Let the user set a different pg_num if there
are no creating PGs on the pool in question.

Fixes: #6673, take two
Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agorgw: don't turn 404 into 400 for the replicalog api 804/head
Josh Durgin [Fri, 1 Nov 2013 23:12:52 +0000 (16:12 -0700)]
rgw: don't turn 404 into 400 for the replicalog api

404 is not actually a problem to clients like radosgw-agent, but 400
implies something about the request was incorrect.

Backport: dumpling
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoWrap hex_to_num table into class HexTable
Ray Lv [Wed, 30 Oct 2013 03:40:54 +0000 (11:40 +0800)]
Wrap hex_to_num table into class HexTable

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
11 years ago[rgw] Set initialized to true after populating table in hex_to_num()
Ray Lv [Tue, 29 Oct 2013 11:34:51 +0000 (19:34 +0800)]
[rgw] Set initialized to true after populating table in hex_to_num()

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
11 years agosharedptr_registry.hpp: removed ptrs need to not blast contents
Samuel Just [Thu, 31 Oct 2013 20:19:32 +0000 (13:19 -0700)]
sharedptr_registry.hpp: removed ptrs need to not blast contents

See the included unit test update.  Consider:
1) x = lookup_or_create(1, 1)
2) remove(1)
3) y = lookup_or_create(1, 2)
4) x.reset()
5) z = lookup(1)

The bug is that z will be null since x.reset() caused the
cleanup callback to remove y's key value from contents.

To fix this, contents also records the pointer value for
the weak_ptr.  The removal callback only removes the
key from contents if it matches the ptr in contents.

This should work since the pointer passed to the removal
callback must be unique up to that point since it has
not yet been deleted.

This allowed a pg removal -> pg recreation -> pg removal
sequence to cause the second pg removal entry to be
erroneously cleared by the first pg removal's destructor
as it finally made its way through the removal queue.

Fixes: #5951
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoprio-q: initialize cur iterator
Noah Watkins [Wed, 30 Oct 2013 23:34:29 +0000 (16:34 -0700)]
prio-q: initialize cur iterator

For new SubQueues `cur` is not intialized, so front/pop_front will freak
out. I honestly I have no idea how this hasn't been seen, but it was
being triggered frequently on OSX.

Fixes: #6686
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agoPGLog: remove obsolete assert in merge_log
Samuel Just [Wed, 30 Oct 2013 23:54:39 +0000 (16:54 -0700)]
PGLog: remove obsolete assert in merge_log

This assert assumes that if olog.head != log.head, olog contains
a log entry at log.head, which may not be true since pg splitting
might have left the log with arbitrary holes.

Related: 0c2769d3321bff6e85ec57c85a08ee0b8e751bcb
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agotest/osd/RadosModel.h: select and reserve roll_back_to atomically
Samuel Just [Wed, 30 Oct 2013 23:12:19 +0000 (16:12 -0700)]
test/osd/RadosModel.h: select and reserve roll_back_to atomically

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agotest/rados/list.cc: we might get some objects more than once
Samuel Just [Tue, 29 Oct 2013 21:53:53 +0000 (14:53 -0700)]
test/rados/list.cc: we might get some objects more than once

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoos/chain_listxattr: fix leak fix
Sage Weil [Wed, 30 Oct 2013 20:20:46 +0000 (13:20 -0700)]
os/chain_listxattr: fix leak fix

e22347df3854a5c5ebc6631c62d70447d67d722d added a bad goto; just free
explicitly instead.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Noah Watkins <noahwatkins@gmail.com>
11 years agoMerge branch 'next' of jenkins:ceph/ceph into next
Gary Lowell [Wed, 30 Oct 2013 18:34:42 +0000 (18:34 +0000)]
Merge branch 'next' of jenkins:ceph/ceph into next

11 years agoceph: Release resource before return in BackedObject::download()
Li Wang [Wed, 30 Oct 2013 13:32:34 +0000 (21:32 +0800)]
ceph: Release resource before return in BackedObject::download()

Close file before return

Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoceph: Fix memory leak in chain_listxattr
Li Wang [Wed, 30 Oct 2013 08:39:09 +0000 (16:39 +0800)]
ceph: Fix memory leak in chain_listxattr

Free allocated memory before return

Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoFix memory leak in Backtrace::print()
Li Wang [Wed, 30 Oct 2013 08:18:10 +0000 (16:18 +0800)]
Fix memory leak in Backtrace::print()

Free already allocated memory if short of memory

Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agov0.72-rc1 v0.72-rc1
Gary Lowell [Wed, 30 Oct 2013 00:45:10 +0000 (00:45 +0000)]
v0.72-rc1

11 years agoRevert "ceph-crush-location: new crush location hook"
Sage Weil [Tue, 29 Oct 2013 20:58:14 +0000 (13:58 -0700)]
Revert "ceph-crush-location: new crush location hook"

This reverts commit fc49065d855cfd74cb861d294f3464dd616e82ee.

Merged to wrong branch; my bad!

11 years agoRevert "upstart, sysvinit: use ceph-crush-location hook"
Sage Weil [Tue, 29 Oct 2013 20:58:10 +0000 (13:58 -0700)]
Revert "upstart, sysvinit: use ceph-crush-location hook"

This reverts commit 111a37efb19cb46a48d669bc9866c29b4015a889.

11 years agoMerge pull request #779 from ceph/wip-crush-hook
Loic Dachary [Tue, 29 Oct 2013 19:24:05 +0000 (12:24 -0700)]
Merge pull request #779 from ceph/wip-crush-hook

upstart,sysvinit: allow 'osd crush location hook' script to determine osd crush position

Reviewed-by: Loic Dachary <loic@dachary.org>
11 years agoupstart, sysvinit: use ceph-crush-location hook 779/head
Sage Weil [Tue, 29 Oct 2013 18:08:58 +0000 (11:08 -0700)]
upstart, sysvinit: use ceph-crush-location hook

Instead of hard-coding a check in ceph.conf and some reasonable
defaults, defer this work to ceph-crush-location, and allow users to
specify their own hook with alternative logic.

This can be helpful in a nubmer of cases, like:

 - rack (or other) information included in hostname and easily parsed
   out by a hook
 - multiple types of devices in each host, resulting in 'parallel'
   crush trees (e.g., one for hdd, one for ssd)

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph-crush-location: new crush location hook
Sage Weil [Tue, 29 Oct 2013 18:03:04 +0000 (11:03 -0700)]
ceph-crush-location: new crush location hook

This generalizes the bit of code that builds a key=value pair list to
update an entity's CRUSH location.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #786 from ceph/wip-6673
Sage Weil [Tue, 29 Oct 2013 17:16:52 +0000 (10:16 -0700)]
Merge pull request #786 from ceph/wip-6673

mon/PGMonitor: always send pg creations after mapping

Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
11 years agomon/PGMonitor: always send pg creations after mapping 786/head
Sage Weil [Tue, 29 Oct 2013 17:10:21 +0000 (10:10 -0700)]
mon/PGMonitor: always send pg creations after mapping

At some point in the dumpling cycle I separated the map stage from the
send stage.  We can send the creates any time we have a non-zero osdmap
epoch, and are in good shape as long as we do the map step after the
osdmap is loaded (hence the post_paxos_update).

Some background:

We originally introduced the map-but-don't send in a2fe0137, at which
point all was well because we only called it on ceph-mon startup.

Later, this turned into post_paxos_update in e635c478, at which point
it was now called by a running monitor.. but we didn't add in the
send_pg_creates().  This is where this bug stems from.

This particular path is responsible for the stalled test referenced in
bug #6673.

Backport: dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agomon/OSDMonitor: fix signedness warning on poolid
Sage Weil [Tue, 29 Oct 2013 15:59:06 +0000 (08:59 -0700)]
mon/OSDMonitor: fix signedness warning on poolid

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoReplicatedPG::recover_backfill: update last_backfill to max() when backfill is complete
Samuel Just [Tue, 29 Oct 2013 06:05:30 +0000 (23:05 -0700)]
ReplicatedPG::recover_backfill: update last_backfill to max() when backfill is complete

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoMerge pull request #780 from ceph/wip-6585
athanatos [Tue, 29 Oct 2013 04:11:27 +0000 (21:11 -0700)]
Merge pull request #780 from ceph/wip-6585

Wip 6585

Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agoosd/ReplicatedPG: use MIN for backfill_pos 780/head
Sage Weil [Mon, 28 Oct 2013 23:39:09 +0000 (16:39 -0700)]
osd/ReplicatedPG: use MIN for backfill_pos

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #772 from ceph/wip-5612
Loic Dachary [Mon, 28 Oct 2013 23:13:34 +0000 (16:13 -0700)]
Merge pull request #772 from ceph/wip-5612

init-ceph, upstart: make crush update on osd start time out

Reviewed-by: Loic Dachary <loic@dachary.org>
11 years agoReplicatedPG: recover_backfill: don't prematurely adjust last_backfill
Samuel Just [Mon, 28 Oct 2013 23:09:59 +0000 (16:09 -0700)]
ReplicatedPG: recover_backfill: don't prematurely adjust last_backfill

We can't adjust last_backfill to object x until x has been fully
backfilled.  pending_backfill_updates contains all those backfills
started, but which have not yet been reflected in pinfo.last_update.
backfills_in_flight contains those backfills which have not yet
completed.  Thus, we can adjust last_update to the largest entry
in pending_backfill_updates not in backfills_in_flight.

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoReplicatedPG: add empty stat when we remove an object in recover_backfill
Samuel Just [Mon, 28 Oct 2013 23:03:25 +0000 (16:03 -0700)]
ReplicatedPG: add empty stat when we remove an object in recover_backfill

Subsequent updates to that object need to have their stats added
to the backfill info stats atomically with the last_backfill
update.

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoReplicatedPG: replace backfill_pos with last_backfill_started
Samuel Just [Mon, 28 Oct 2013 22:53:24 +0000 (15:53 -0700)]
ReplicatedPG: replace backfill_pos with last_backfill_started

last_backfill_started reflects what pinfo.last_backfill will be
once all currently outstanding backfills complete.  backfill_pos
was tricky since we couldn't correctly inialize it without
doing the first backfill scan pair.

In recover_backfill, we rescan from last_backfill_started rather
than from backfill_pos.  This ensures that we capture all clones
created between last_backfill_started and what previously had been
backfill_pos without special handling in make_writeable.  The main
downside is that we will tend to "rescan" last_backfill_started.

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoPG::BackfillInfo: introduce trim_to
Samuel Just [Mon, 28 Oct 2013 22:49:58 +0000 (15:49 -0700)]
PG::BackfillInfo: introduce trim_to

We'll use this to trim off last_backfill_started since it'll
often be included in rescans.

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoPG::BackfillInterval: use trim() in pop_front()
Samuel Just [Mon, 28 Oct 2013 22:49:23 +0000 (15:49 -0700)]
PG::BackfillInterval: use trim() in pop_front()

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoReplicatedPG::prepare_transaction: info.last_backfill is inclusive
Samuel Just [Mon, 28 Oct 2013 22:22:37 +0000 (15:22 -0700)]
ReplicatedPG::prepare_transaction: info.last_backfill is inclusive

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoupstart: fail osd start if crush update fails 772/head
Sage Weil [Mon, 28 Oct 2013 22:56:36 +0000 (15:56 -0700)]
upstart: fail osd start if crush update fails

If the update for the CRUSH position fails for some reason, do not
start the OSD.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoinit-ceph: make crush update on osd start time out
Sage Weil [Mon, 28 Oct 2013 22:56:15 +0000 (15:56 -0700)]
init-ceph: make crush update on osd start time out

If the monitor is not currently available, this crush update would block
forever, preventing the OSD and (potentially) the rest of the system
from starting up.  Instead, make it time out after 10 seconds and then
abort startup.  This prevents startup of an OSD if we failed to update
the CRUSH position for some reason.

In fact, do not start up the OSD if the CRUSH update fails for any
reason--not just a timeout!

Works-around: #5612
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #778 from ceph/wip-6621
Sage Weil [Mon, 28 Oct 2013 21:28:25 +0000 (14:28 -0700)]
Merge pull request #778 from ceph/wip-6621

radosgw-admin: accept negative values for quota params

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoradosgw-admin: accept negative values for quota params 778/head
Yehuda Sadeh [Mon, 28 Oct 2013 20:36:45 +0000 (13:36 -0700)]
radosgw-admin: accept negative values for quota params

and document that in the usage output.

Fixes: #6621
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
11 years agoMerge pull request #760 from ceph/wip-6585
athanatos [Mon, 28 Oct 2013 20:50:34 +0000 (13:50 -0700)]
Merge pull request #760 from ceph/wip-6585

Wip 6585

Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoReplicatedBackend: don't hold ObjectContexts in pull completion callback 760/head
Samuel Just [Mon, 28 Oct 2013 18:02:34 +0000 (11:02 -0700)]
ReplicatedBackend: don't hold ObjectContexts in pull completion callback

We need flushing the sequencer to ensure that all Contexts which hold
ObjectContextRefs have been run or deleted.
C_ReplicatedBackend_OnPullComplete, however, gets queued in a second
work queue in order to avoid performing expensive push related reads
in the FileStore finisher.

Rather than keep the objects contexts around, we instead put off
removing the object from the pulling map until the call back
fires and read the object context out of the pulling map.  This
way the ObjectContextRef will be cleaned up along with the rest
of the pulling map in on_change.

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoReplicatedPG: put repops even in TrimObjects
Samuel Just [Sun, 27 Oct 2013 03:21:25 +0000 (20:21 -0700)]
ReplicatedPG: put repops even in TrimObjects

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoReplicatedPG: improved on_flushed error output
Samuel Just [Sun, 27 Oct 2013 01:24:41 +0000 (18:24 -0700)]
ReplicatedPG: improved on_flushed error output

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoPG: call on_flushed on FlushEvt
Samuel Just [Sat, 26 Oct 2013 23:46:22 +0000 (16:46 -0700)]
PG: call on_flushed on FlushEvt

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoPG,ReplicatedPG: remove the waiting_for_backfill_peer mechanism
Samuel Just [Sat, 26 Oct 2013 00:58:31 +0000 (17:58 -0700)]
PG,ReplicatedPG: remove the waiting_for_backfill_peer mechanism

See previous patch.

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoReplicatedPG: have make_writeable adjust backfill_pos
Samuel Just [Sat, 26 Oct 2013 00:58:10 +0000 (17:58 -0700)]
ReplicatedPG: have make_writeable adjust backfill_pos

If we are writing to backfill_pos and create a clone, we end
up failing to send the transaction creating the clone to the
backfill peer.  This is fine as long as we end up backfilling
the clone.  To that end, we simply add the clone to
backfill_info and adjust backfill_pos accordingly.  This is less
brittle than the waiting_for_backfill_pos mechanism since it
works even if we wait between that check and issuing the repop,
which can happen for copy_from.

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoReplicatedBackend: fix failed push error output
Samuel Just [Sat, 26 Oct 2013 23:52:32 +0000 (16:52 -0700)]
ReplicatedBackend: fix failed push error output

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoReplicatedPG,osd_types: move rw tracking from its own map to ObjectContext
Samuel Just [Sat, 26 Oct 2013 23:52:16 +0000 (16:52 -0700)]
ReplicatedPG,osd_types: move rw tracking from its own map to ObjectContext

We also modify recovering to hold a reference to the recovering obc
in order to ensure that our backfill_read_lock doesn't outlive the
obc.

ReplicatedPG::op_applied no longer clears repop->obc since we need
it to live until the op is finally cleaned up.  This is fine since
repop->obc is now an ObjectContextRef and can clean itself up.

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoosd_types,OpRequest: move osd_req_id into OpRequest
Samuel Just [Sat, 26 Oct 2013 00:36:40 +0000 (17:36 -0700)]
osd_types,OpRequest: move osd_req_id into OpRequest

This way I can have OpRequest included from osd_types.h.

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoOpRequest: move method implementations into cc
Samuel Just [Sat, 26 Oct 2013 00:35:49 +0000 (17:35 -0700)]
OpRequest: move method implementations into cc

I need to remove the osd_types.h include.

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoReplicatedPG: reset new_obs and new_snapset in execute_ctx
Samuel Just [Fri, 25 Oct 2013 01:52:59 +0000 (18:52 -0700)]
ReplicatedPG: reset new_obs and new_snapset in execute_ctx

This way, if execute_ctx is rerun on the same OpContext, we
won't erroneously reuse a stale snapset/object_info.

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years ago fix the bug if we set pgp_num=-1 using "ceph osd pool set data|metadata|rbd -1"
huangjun [Mon, 28 Oct 2013 04:12:26 +0000 (12:12 +0800)]
   fix the bug if we set pgp_num=-1 using "ceph osd pool set data|metadata|rbd -1"
   will set the pgp_num to a hunge number.

Signed-off-by: huangjun <hjwsm1989@gmail.com>
(cherry picked from commit bf198e673fd876e34006d3c83f0479454e6295aa)

11 years agoReplicatedPG: take and drop read locks when doing backfill
Greg Farnum [Wed, 23 Oct 2013 18:28:45 +0000 (11:28 -0700)]
ReplicatedPG: take and drop read locks when doing backfill

All our interfaces are in place, so now we can actually take and
drop the locks.
1) Take locks in ReplicatedPG::recover_backfill. This is the entry
into the backfill code path, and covers all objects which are
added to backfills_in_flight (via prep_backfill_object_push()). If we
can't get the lock right away, we stop the backfill movement there
until we can do so.
2) Drop the locks in ReplicatedPG::on_peer_recover(), called when the
push is completed.
2b) Further drop the locks on all backfills_in_flight objects in
_clear_recovery_state(), for when we cancel peering.

Signed-off-by: Greg Farnum <greg@inktank.com>