]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
11 years agorgw: remove memory allocation
Yehuda Sadeh [Fri, 21 Feb 2014 00:54:06 +0000 (16:54 -0800)]
rgw: remove memory allocation

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
11 years agoMerge pull request #1487 from ceph/wip-7738
Samuel Just [Mon, 17 Mar 2014 23:03:49 +0000 (16:03 -0700)]
Merge pull request #1487 from ceph/wip-7738

os/FileJournal: return errors on make_writeable() if reopen fails

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agoos/FileJournal: return errors on make_writeable() if reopen fails 1487/head
Sage Weil [Mon, 17 Mar 2014 22:37:44 +0000 (15:37 -0700)]
os/FileJournal: return errors on make_writeable() if reopen fails

This is why #7738 is resulting in a crash instead of an error.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1482 from ceph/wip-7611
Sage Weil [Mon, 17 Mar 2014 15:17:53 +0000 (08:17 -0700)]
Merge pull request #1482 from ceph/wip-7611

ceph.in: do not allow using 'tell' with interactive mode

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoceph.in: do not allow using 'tell' with interactive mode 1482/head
Joao Eduardo Luis [Mon, 17 Mar 2014 14:37:09 +0000 (14:37 +0000)]
ceph.in: do not allow using 'tell' with interactive mode

This avoids a lot of hassle when dealing with to whom tell each command
on interactive mode, and even more so if multiple targets are specified.

As so, 'tell' commands should be used while on interactive mode instead.

Backport: dumpling,emperor

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
11 years agoMerge pull request #1474 from ceph/wip-7740
Sage Weil [Sun, 16 Mar 2014 19:12:29 +0000 (12:12 -0700)]
Merge pull request #1474 from ceph/wip-7740

OSD::handle_pg_query: on dne pg, send lb=hobject_t() if deleting

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1473 from ceph/wip-7719
Sage Weil [Sun, 16 Mar 2014 16:37:05 +0000 (09:37 -0700)]
Merge pull request #1473 from ceph/wip-7719

PG: clear want_pg_temp in clear_primary_state only if primary

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1471 from ceph/wip-7684
Sage Weil [Sun, 16 Mar 2014 04:47:10 +0000 (21:47 -0700)]
Merge pull request #1471 from ceph/wip-7684

client: force getattr when inline data is missing

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoclient: force getattr when inline data is missing 1471/head
Yan, Zheng [Sun, 16 Mar 2014 04:38:55 +0000 (12:38 +0800)]
client: force getattr when inline data is missing

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agoMerge pull request #1467 from ceph/wip-7684
Sage Weil [Sun, 16 Mar 2014 02:39:36 +0000 (19:39 -0700)]
Merge pull request #1467 from ceph/wip-7684

Wip 7684

http://pulpito.ceph.com/sage-2014-03-15_09:12:44-fs-wip-7684-testing-basic-plana

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoOSD::handle_pg_query: on dne pg, send lb=hobject_t() if deleting 1474/head
Samuel Just [Sun, 16 Mar 2014 00:58:35 +0000 (17:58 -0700)]
OSD::handle_pg_query: on dne pg, send lb=hobject_t() if deleting

We will set lb=hobject_t() if we resurrect the pg.  In that case,
we need to have sent that to the primary before hand.  If we
finish the removal before the pg is recreated, we'll just end
up backfilling it, which is ok since the pg doesn't exist anyway.

Fixes: #7740
Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoMerge pull request #1469 from ceph/wip-7718
Sage Weil [Sat, 15 Mar 2014 22:30:40 +0000 (15:30 -0700)]
Merge pull request #1469 from ceph/wip-7718

PG::issue_repop: only adjust peer_info last_updates if not temp

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1468 from ceph/wip-7732
Sage Weil [Sat, 15 Mar 2014 22:29:22 +0000 (15:29 -0700)]
Merge pull request #1468 from ceph/wip-7732

PG::build_might_have_unfound: check pg_whomai, not osd whoami

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1470 from ceph/wip-7712
Samuel Just [Sat, 15 Mar 2014 20:16:16 +0000 (13:16 -0700)]
Merge pull request #1470 from ceph/wip-7712

osd/ReplicatedPG: fix enqueue_front race

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agomds: include inline data in lock messages 1467/head
Yan, Zheng [Sat, 15 Mar 2014 12:38:13 +0000 (20:38 +0800)]
mds: include inline data in lock messages

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: fix corner case of pushing inline data
Yan, Zheng [Sat, 15 Mar 2014 12:37:37 +0000 (20:37 +0800)]
mds: fix corner case of pushing inline data

Following sequence of events can happen.
 - Client releases an inode, queues cap release message.
 - A 'lookup' reply brings the same inode back, but the reply doesn't
   contain inline data because MDS didn't receive the cap release
   message and thought client already has up-to-data inline data.

The fix is trigger a getattr if client finds inline_version is zero.
The getattr mask is set to CEPH_STAT_CAP_INLINE_DATA, so that MDS knows
client does not have inline data.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agoPG::build_might_have_unfound: check pg_whomai, not osd whoami 1468/head
Samuel Just [Sat, 15 Mar 2014 01:00:05 +0000 (18:00 -0700)]
PG::build_might_have_unfound: check pg_whomai, not osd whoami

Otherwise, we might skip (2,0) when we are (2,1).

Fixes: #7732
Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoosd/ReplicatedPG: fix enqueue_front race 1470/head
Sage Weil [Fri, 14 Mar 2014 23:32:48 +0000 (16:32 -0700)]
osd/ReplicatedPG: fix enqueue_front race

When requeuing and item at the front, we need to shuffle the items in
pg_for_processing if there is an entry for this PG there.  If so, we need
to hold the qlock for the duration of the requeue of the shuffled item
back into the primary queue in order to avoid reshuffling items.  For
example, consider the queue has

 A B C D

 - dequeue1 gets (pg, A), puts A in the processing list
 - dequeue1 tries to lock pg, blocks
 - enqueue_front on X takes qlock, swaps it for A, drops qlock
 - dequeue2 gets (pg, B), puts B in the processing list
 - enqueue_front pushes X back into the original list

 so we have processing: X B  queue: A C D

 - dequeue* get X, then B, then A C D

If we whole qlock for the duration of the enqueue_front, we avoid dequeu2
from sneaking in an shuffling B into the processing list before we have
crammed A back onto the front of the list.

This may have caused #7712.

Backport: emperor, dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoPG::issue_repop: only adjust peer_info last_updates if not temp 1469/head
Samuel Just [Fri, 14 Mar 2014 21:48:31 +0000 (14:48 -0700)]
PG::issue_repop: only adjust peer_info last_updates if not temp

Temp object repops have version eversion_t() since they don't
actually send log entries.  Updating the last_updates here
caused the peer info last_updates to be incorrect until the
next non-temp repop.

Fixes: #7718
Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoRGWListBucketMultiparts: init max_uploads/default_max with 0
Danny Al-Gaaf [Wed, 12 Mar 2014 21:56:44 +0000 (22:56 +0100)]
RGWListBucketMultiparts: init max_uploads/default_max with 0

CID 717377 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
 2. uninit_member: Non-static class member "max_uploads" is not initialized
    in this constructor nor in any functions that it calls.
 4. uninit_member: Non-static class member "default_max" is not initialized
    in this constructor nor in any functions that it calls.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
(cherry picked from commit b23a141d54ffb39958aba9da7f87544674fa0e50)

11 years agounittest_mon_pgmap: fix warnings
Sage Weil [Fri, 14 Mar 2014 19:47:59 +0000 (12:47 -0700)]
unittest_mon_pgmap: fix warnings

In file included from test/mon/PGMap.cc:15:0:
../src/gtest/include/gtest/gtest.h: In function ‘testing::AssertionResult testing::internal::CmpHelperEQ(const char*, const char*, const T1&, const T2&) [with T1 = int, T2 = unsigned int]’:
../src/gtest/include/gtest/gtest.h:1300:30: instantiated from ‘static testing::AssertionResult testing::internal::EqHelper::Compare(const char*, const char*, const T1&, const T2&) [with T1 = int, T2 = unsigned int, bool lhs_is_null_literal = false]’
test/mon/PGMap.cc:33:257: instantiated from here
warning: ../src/gtest/include/gtest/gtest.h:1263:3: comparison between signed and unsigned integer expressions [-Wsign-compare]

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agounittest_ceph_argparse: fix warnings
Sage Weil [Fri, 14 Mar 2014 19:46:57 +0000 (12:46 -0700)]
unittest_ceph_argparse: fix warnings

In file included from test/ceph_argparse.cc:17:0:
../src/gtest/include/gtest/gtest.h: In function ‘testing::AssertionResult testing::internal::CmpHelperEQ(const char*, const char*, const T1&, const T2&) [with T1 = int, T2 = long unsigned int]’:
../src/gtest/include/gtest/gtest.h:1333:30: instantiated from ‘static testing::AssertionResult testing::internal::EqHelper::Compare(const char*, const char*, const T1&, const T2&) [with T1 = int, T2 = long unsigned int]’
test/ceph_argparse.cc:344:207: instantiated from here
warning: ../src/gtest/include/gtest/gtest.h:1263:3: comparison between signed and unsigned integer expressions [-Wsign-compare]

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoPG: clear want_pg_temp in clear_primary_state only if primary 1472/head 1473/head
Samuel Just [Fri, 14 Mar 2014 20:09:30 +0000 (13:09 -0700)]
PG: clear want_pg_temp in clear_primary_state only if primary

Clearing it in that way in on_shutdown() can cause a stray
shard to clobber the want_pg_temp value created by the primary
shard on the same osd.  Thus, instead only clear it if we are
the primary.

Fixes: #7719
Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoMerge pull request #1454 from ceph/wip-7709
Sage Weil [Fri, 14 Mar 2014 18:36:02 +0000 (11:36 -0700)]
Merge pull request #1454 from ceph/wip-7709

osd/ReplicatedPG: release op locks on on commit+applied

Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoMerge pull request #1459 from ceph/wip-7696
Sage Weil [Fri, 14 Mar 2014 18:28:50 +0000 (11:28 -0700)]
Merge pull request #1459 from ceph/wip-7696

Wip 7696

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1461 from ceph/wip-7692
Sage Weil [Fri, 14 Mar 2014 18:05:49 +0000 (11:05 -0700)]
Merge pull request #1461 from ceph/wip-7692

mon: on timecheck on monmap 0

Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
11 years agoMerge pull request #1460 from ceph/wip-warning
Sage Weil [Fri, 14 Mar 2014 18:03:49 +0000 (11:03 -0700)]
Merge pull request #1460 from ceph/wip-warning

PGLog: remove unused variable

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agomon: only do timecheck with known monmap 1461/head
Sage Weil [Fri, 14 Mar 2014 18:02:30 +0000 (11:02 -0700)]
mon: only do timecheck with known monmap

If we are still on monmap epoch 0, our mon ranks cannot yet be trusted
since there is not yet a shared source of truth from paxos.  If we do
timechecks, the code gets confused about the ranks in e.g. the
timecheck_waiting map.

Fixes: #7692
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph-mon: be a bit more verbose on error
Sage Weil [Fri, 14 Mar 2014 18:00:36 +0000 (11:00 -0700)]
ceph-mon: be a bit more verbose on error

Motivated by #7489

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoPG::activate: handle peer contigious with primary, but not auth_log 1459/head
Samuel Just [Fri, 14 Mar 2014 01:16:19 +0000 (18:16 -0700)]
PG::activate: handle peer contigious with primary, but not auth_log

The added case covers a situation where a replica is not contiguous with
the auth_log, but is contiguous with the primary.  Reshuffling the
active set to handle this would be tricky, so instead we just go ahead
and backfill it anyway.  This is probably preferrable in any case since
the replica in question would have to be significantly behind.

Fixes: #7696
Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoMerge pull request #1458 from ceph/wip-7489
Sage Weil [Fri, 14 Mar 2014 17:51:32 +0000 (10:51 -0700)]
Merge pull request #1458 from ceph/wip-7489

ceph_mon: split postfork() in two and finish postfork just before daemonize

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoceph_mon: output error message if unable to bind. 1458/head
Joao Eduardo Luis [Fri, 14 Mar 2014 17:45:57 +0000 (17:45 +0000)]
ceph_mon: output error message if unable to bind.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
11 years agoceph_mon: all output after initial fork go to dout/derr
Joao Eduardo Luis [Fri, 14 Mar 2014 17:44:06 +0000 (17:44 +0000)]
ceph_mon: all output after initial fork go to dout/derr

We were doing it in some cases, and not doing in some other cases.  Just
do it throughout.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
11 years agoceph_mon: split postfork() in two and finish postfork just before daemonize
Joao Eduardo Luis [Fri, 14 Mar 2014 17:39:00 +0000 (17:39 +0000)]
ceph_mon: split postfork() in two and finish postfork just before daemonize

We split global_init_postfork() in two: start and finish, with the first
keeping much of postfork()'s tasks except closing stderr, which we leave
open until just before we daemonize.  This allows the user to see any
error messages that the monitor may spit out before it daemonizes, making
sense of the error code (which we were already returning).

Fixes: 7489
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
11 years agoosd/ReplicatedPG: release op locks on on commit+applied 1454/head
Sage Weil [Fri, 14 Mar 2014 05:02:01 +0000 (22:02 -0700)]
osd/ReplicatedPG: release op locks on on commit+applied

We were releasing the op locks when we applied the update but (potentially)
before we committed it.  This means that another client can read object
state that is not yet durable.

Fixes: #7709
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1450 from ceph/wip-7641
Loic Dachary [Fri, 14 Mar 2014 01:14:27 +0000 (02:14 +0100)]
Merge pull request #1450 from ceph/wip-7641

debian: make ceph depend on ceph-common >= 0.67

Reviewed-by: Loic Dachary <loic@dachary.org>
11 years agoqa/workunits: misc -> fs/misc
Sage Weil [Fri, 14 Mar 2014 00:38:08 +0000 (17:38 -0700)]
qa/workunits: misc -> fs/misc

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoPGLog: remove unused variable 1460/head
Samuel Just [Thu, 13 Mar 2014 23:44:32 +0000 (16:44 -0700)]
PGLog: remove unused variable

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoMerge pull request #1442 from ceph/wip-magic-bad
Sage Weil [Thu, 13 Mar 2014 23:43:11 +0000 (16:43 -0700)]
Merge pull request #1442 from ceph/wip-magic-bad

osd: tunables instead of hard-coded target dirty/full ratios

Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoosd: add tunables for cache_min_{flush,evict}_age 1442/head
Sage Weil [Mon, 10 Mar 2014 20:54:34 +0000 (13:54 -0700)]
osd: add tunables for cache_min_{flush,evict}_age

Why not.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd: set default cache_target_{dirty,full}_ratios based on configurable
Sage Weil [Mon, 10 Mar 2014 20:52:54 +0000 (13:52 -0700)]
osd: set default cache_target_{dirty,full}_ratios based on configurable

These were hard-coded in the pg_pool_t constructor, but that was a dumb
idea.

Note that decoding legacy pg_pool_t's no longer does what it used to.  I'm
pretty sure that's okay since we care less about interim releases and
because we are pulling these normally out of OSDMap, which is freshly
encoded on a regular basis (and certainly recently with real values). Also,
let's not forget that this field is meaningless on old pools anyway.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1452 from ceph/wip-7706
Samuel Just [Thu, 13 Mar 2014 23:24:19 +0000 (16:24 -0700)]
Merge pull request #1452 from ceph/wip-7706

Wip 7706

Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoMerge pull request #1447 from ceph/wip-7703
Josh Durgin [Thu, 13 Mar 2014 22:53:06 +0000 (15:53 -0700)]
Merge pull request #1447 from ceph/wip-7703

rgw: manifest hold the actual bucket used for tail objects
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoMerge pull request #1449 from ceph/wip-7705
Sage Weil [Thu, 13 Mar 2014 21:52:19 +0000 (14:52 -0700)]
Merge pull request #1449 from ceph/wip-7705

ceph_test_rados: wait for commit not ack

Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoceph_test_rados: wait for commit, not ack 1448/head 1449/head
Sage Weil [Thu, 13 Mar 2014 21:49:30 +0000 (14:49 -0700)]
ceph_test_rados: wait for commit, not ack

First, this is what we wanted in the first place

Second, if we wait for ACK, we may look at a user_version value that is
not stable.

Fixes: #7705
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMOSDOp: include reassert_version in print
Sage Weil [Thu, 13 Mar 2014 21:45:49 +0000 (14:45 -0700)]
MOSDOp: include reassert_version in print

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoconfig_opts: raise ms_pq_max_tokens_per_priority to 16MB 1451/head 1452/head
Samuel Just [Thu, 13 Mar 2014 21:07:19 +0000 (14:07 -0700)]
config_opts: raise ms_pq_max_tokens_per_priority to 16MB

Recovery messages can get pretty big.

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoPrioritizedQueue: cap costs at max_tokens_per_subqueue
Samuel Just [Thu, 13 Mar 2014 21:04:19 +0000 (14:04 -0700)]
PrioritizedQueue: cap costs at max_tokens_per_subqueue

Otherwise, you can get a recovery op in the queue which has a cost
higher than the max token value.  It won't get serviced until all other
queues also do not have enough tokens and higher priority queues are
empty.

Fixes: #7706
Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agorgw: manifest hold the actual bucket used for tail objects 1446/head 1447/head
Yehuda Sadeh [Thu, 13 Mar 2014 18:25:24 +0000 (11:25 -0700)]
rgw: manifest hold the actual bucket used for tail objects

Fixes: 7703
Object can be copied between different buckets, so we need to keep track
of which bucket is used for naming the tail parts. The new manifest
requires that because older manifest just held all the tail objects
(each containing the appropriate bucket internally).

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
11 years agorbd-fuse: fix signed/unsigned warning
Sage Weil [Thu, 13 Mar 2014 18:22:34 +0000 (11:22 -0700)]
rbd-fuse: fix signed/unsigned warning

rbd_fuse/rbd-fuse.c: In function 'enumerate_images':
rbd_fuse/rbd-fuse.c:113:2: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1440 from ceph/wip-7649
Samuel Just [Thu, 13 Mar 2014 01:33:03 +0000 (18:33 -0700)]
Merge pull request #1440 from ceph/wip-7649

Wip 7649

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1441 from ceph/wip-7671
Sage Weil [Thu, 13 Mar 2014 00:09:31 +0000 (17:09 -0700)]
Merge pull request #1441 from ceph/wip-7671

Wip 7671

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agotest/librados/watch_notify: create foo before watching 1441/head
Samuel Just [Tue, 11 Mar 2014 21:17:47 +0000 (14:17 -0700)]
test/librados/watch_notify: create foo before watching

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agotest/system/st_rados_watch: expect ENOENT for watch on non-existent object
Samuel Just [Tue, 11 Mar 2014 18:25:47 +0000 (11:25 -0700)]
test/system/st_rados_watch: expect ENOENT for watch on non-existent object

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoMerge pull request #1439 from ceph/wip-7682
Sage Weil [Wed, 12 Mar 2014 22:45:35 +0000 (15:45 -0700)]
Merge pull request #1439 from ceph/wip-7682

ReplicatedPG::already_(complete|ack) should skip temp object ops

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoReplicatedPG::already_(complete|ack) should skip temp object ops 1439/head
Samuel Just [Wed, 12 Mar 2014 21:07:50 +0000 (14:07 -0700)]
ReplicatedPG::already_(complete|ack) should skip temp object ops

We clearly won't get dup ops on these repops, and they don't
have meaningful versions since they don't carry log
entries.

Fixes: #7682
Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoMerge pull request #1434 from ceph/wip-7695
Sage Weil [Wed, 12 Mar 2014 18:57:46 +0000 (11:57 -0700)]
Merge pull request #1434 from ceph/wip-7695

build-doc: fix checks for required commands for non-debian

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1436 from ceph/wip-7681
Sage Weil [Wed, 12 Mar 2014 17:46:54 +0000 (10:46 -0700)]
Merge pull request #1436 from ceph/wip-7681

ECBackend: when removing the temp obj, use the right shard

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1437 from ceph/wip-7650
Sage Weil [Wed, 12 Mar 2014 17:44:50 +0000 (10:44 -0700)]
Merge pull request #1437 from ceph/wip-7650

tools/rados/rados.cc: use write_full for sync_write for ec pools

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoPG: do not wait for flushed before activation 1440/head
Samuel Just [Tue, 11 Mar 2014 21:23:10 +0000 (14:23 -0700)]
PG: do not wait for flushed before activation

This should reduce the sting of the previous commit somewhat.  We wait
for the activation transactions to clear prior to accepting IO anyway,
so we can go ahead and get that process started without waiting for the
flush.

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoPG: do not serve requests until replicas have activated
Samuel Just [Tue, 11 Mar 2014 17:31:55 +0000 (10:31 -0700)]
PG: do not serve requests until replicas have activated

There are two problems:
1) We choose the min last_update amoung peers with the max local-les
value as an upper bound on requests which could have been reported to
the client as committed.  We then, for ec pools, roll back to that point
to ensure that we don't inadvertently commit to an update which fewer
than K replicas actually saw.  If the primary sets local-les, accepts an
update from a client, and there is a new interval before any of the
replicas have been activated, we will end up being forced to use that
update which no other replica has seen as the new last_update.  This
will cause the object to become unfound.  We don't have this problem as
long as all active replicas agree on last_update before we accept IO.

2) Even for replicated pools, we would then immediately respond to the
request which created the primary-only update with a commit since it is
in the log and we have no outstanding repops.  If we then lose that
primary before any of the replicas in the new interval record the new
log, we will not only lose the object, but also the log entry recording
it, which will result in a lost write.

For these reasons, it seems like we need to wait for the replicas to
activate before we can process new requests essentially because whatever
update we select as last_update is essentially regarded as committed as
soon as we accept IO.

Fixes: #7649
Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoECBackend: when removing the temp obj, use the right shard 1436/head
Samuel Just [Tue, 11 Mar 2014 21:41:05 +0000 (14:41 -0700)]
ECBackend: when removing the temp obj, use the right shard

Introduced in d0b1094ff7b98ef9262ecb45ee8324853003a77c
Fixes: #7681
Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoosd_types: print lb if incomplete even if empty
Samuel Just [Wed, 12 Mar 2014 17:28:43 +0000 (10:28 -0700)]
osd_types: print lb if incomplete even if empty

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agobuild-doc: fix checks for required commands for non-debian 1434/head
Danny Al-Gaaf [Wed, 12 Mar 2014 17:09:59 +0000 (18:09 +0100)]
build-doc: fix checks for required commands for non-debian

Fixes: 7695
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
11 years agoMerge pull request #1412 from ceph/wip-libxfs-flag
Yehuda Sadeh [Wed, 12 Mar 2014 16:50:58 +0000 (09:50 -0700)]
Merge pull request #1412 from ceph/wip-libxfs-flag

FileStore: support compiling without libxfs

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
11 years agoMerge pull request #1362 from dachary/wip-7548
Sage Weil [Wed, 12 Mar 2014 04:54:02 +0000 (21:54 -0700)]
Merge pull request #1362 from dachary/wip-7548

doc: erasure coded pool developer and operations documentation

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1425 from ceph/wip-rbd-fuse-enumerate
Sage Weil [Wed, 12 Mar 2014 04:41:53 +0000 (21:41 -0700)]
Merge pull request #1425 from ceph/wip-rbd-fuse-enumerate

rbd-fuse: fix enumerate_images() image names buffer size issue

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1409 from enovance/wip-brag
Sage Weil [Wed, 12 Mar 2014 04:25:25 +0000 (21:25 -0700)]
Merge pull request #1409 from enovance/wip-brag

ceph-brag enhancements

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agodebian: make ceph depend on ceph-common >= 0.67 1450/head
Sage Weil [Wed, 12 Mar 2014 04:22:57 +0000 (21:22 -0700)]
debian: make ceph depend on ceph-common >= 0.67

The older versions of ceph-common (ceph CLI, in particular) can't talk to
newer clusters.  The primary change happened with dumpling when the new
CLI and rest-api changes were made.  Although in reality ceph doesn't
care what version of ceph-common is installed, in practice this forces
ceph-common to get upgraded along with ceph and avoids some user pain.

Fixes: #7641
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1427 from ceph/wip-6889
Sage Weil [Wed, 12 Mar 2014 02:23:27 +0000 (19:23 -0700)]
Merge pull request #1427 from ceph/wip-6889

rgw: don't log system requests in usage log

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1432 from ceph/wip-7687
Sage Weil [Wed, 12 Mar 2014 01:23:14 +0000 (18:23 -0700)]
Merge pull request #1432 from ceph/wip-7687

rgw: don't overwrite bucket entry data when syncing user stats

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agorgw: don't overwrite bucket entry data when syncing user stats 1432/head
Yehuda Sadeh [Wed, 12 Mar 2014 01:19:44 +0000 (18:19 -0700)]
rgw: don't overwrite bucket entry data when syncing user stats

Fixes: #7687
When syncing user bucket stats we overwritten the entire entry with the
passed in entry. We should only look at the stats portion, and not
overwrite the rest (which contains bucket creation time).

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
11 years agoMerge pull request #1400 from ceph/wip-hint-tests
Sage Weil [Wed, 12 Mar 2014 01:09:54 +0000 (18:09 -0700)]
Merge pull request #1400 from ceph/wip-hint-tests

wip-hint-tests

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoqa/workunits/cephtool/test.sh: fix thrash (more)
Sage Weil [Wed, 12 Mar 2014 00:03:23 +0000 (17:03 -0700)]
qa/workunits/cephtool/test.sh: fix thrash (more)

If I have to touch this again I will remove it.  Ugh.  This time,

ubuntu@teuthology:/var/lib/teuthworker/archive/teuthology-2014-03-11_02:30:01-rados-firefly-distro-basic-plana/125922

hit NXIO a few lines down because one of the OSDs was still down.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1417 from ceph/wip-7663
Sage Weil [Tue, 11 Mar 2014 23:38:28 +0000 (16:38 -0700)]
Merge pull request #1417 from ceph/wip-7663

Wip 7663

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1416 from ceph/wip-div
Sage Weil [Tue, 11 Mar 2014 23:05:12 +0000 (16:05 -0700)]
Merge pull request #1416 from ceph/wip-div

More log handling fixes

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: fix typo
Sage Weil [Tue, 11 Mar 2014 19:14:49 +0000 (12:14 -0700)]
osd/ReplicatedPG: fix typo

This is the object count, not the dirty object count.  Broken by
00bf3b56743830a4a9c5d6765946a4e68f530c57.

Reported-by: Greg Farnum <greg@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1430 from ceph/wip-7674
Sage Weil [Tue, 11 Mar 2014 18:43:35 +0000 (11:43 -0700)]
Merge pull request #1430 from ceph/wip-7674

osd/ReplicatedPG: do not include hit_set objects in full calculation

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agoReplicatedPG: CEPH_OSD_OP_WATCH return -ENOENT if !obs.exists
Samuel Just [Mon, 10 Mar 2014 20:01:36 +0000 (13:01 -0700)]
ReplicatedPG: CEPH_OSD_OP_WATCH return -ENOENT if !obs.exists

Fixes: #7671
Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoosd/ReplicatedPG: do not include hit_set objects in full calculation 1430/head
Sage Weil [Tue, 11 Mar 2014 17:49:47 +0000 (10:49 -0700)]
osd/ReplicatedPG: do not include hit_set objects in full calculation

If we have a low target and there are hit_set objects (which cannot be
evicted), we can get stuck in a full state and never get out of it.

Fixes: #7674
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1418 from ceph/wip-7672
Sage Weil [Tue, 11 Mar 2014 17:23:04 +0000 (10:23 -0700)]
Merge pull request #1418 from ceph/wip-7672

PG::choose_acting: filter CRUSH_ITEM_NONE out of have

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1429 from ceph/wip-7592-final
Sage Weil [Tue, 11 Mar 2014 17:18:59 +0000 (10:18 -0700)]
Merge pull request #1429 from ceph/wip-7592-final

Wip 7592 final

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1410 from ceph/wip-flock
Sage Weil [Tue, 11 Mar 2014 17:00:48 +0000 (10:00 -0700)]
Merge pull request #1410 from ceph/wip-flock

mds: fix owner check of file lock

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1385 from ceph/wip-nfs-export
Sage Weil [Tue, 11 Mar 2014 16:59:33 +0000 (09:59 -0700)]
Merge pull request #1385 from ceph/wip-nfs-export

mds: introduce LOOKUPNAME MDS request

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoosd: hit_set_persist(): Verify all objects aren't degraded 1429/head
David Zafman [Tue, 11 Mar 2014 02:54:57 +0000 (19:54 -0700)]
osd: hit_set_persist(): Verify all objects aren't degraded

Fixes: #7592
Signed-off-by: David Zafman <david.zafman@inktank.com>
11 years agorgw: don't log system requests in usage log 1427/head
Yehuda Sadeh [Fri, 22 Nov 2013 23:41:49 +0000 (15:41 -0800)]
rgw: don't log system requests in usage log

Fixes: 6889
System requets should not be logged in the usage log.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
11 years agorbd-fuse: fix enumerate_images() image names buffer size issue 1425/head
Ilya Dryomov [Tue, 11 Mar 2014 14:00:37 +0000 (16:00 +0200)]
rbd-fuse: fix enumerate_images() image names buffer size issue

Image names buffer is fixed at 1024.  This turns out to be not enough:
there are at least two "rbd-fuse rbd_list: error %d Numerical result
out of range" reports on the ML.  Fix it by calling rbd_list() twice to
first get the expected buffer size.  Also, get rid of the memory leak
and tweak the error message while at it.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
11 years agoUse pgrep radosgw to determine if rados gateway is running.
Warren Usui [Sat, 1 Mar 2014 05:43:31 +0000 (21:43 -0800)]
Use pgrep radosgw to determine if rados gateway is running.

Fixes: 7528
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Warren Usui <warren.usui@inktank.com>
(cherry picked from commit 5b88856cd25a13842fa8ad0699b84fbdfbc13694)

11 years agoFixed get_status() to find client.radosgw fields inside of ps output.
Warren Usui [Fri, 21 Feb 2014 05:07:53 +0000 (21:07 -0800)]
Fixed get_status() to find client.radosgw fields inside of ps output.

Fixes: 7375
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Warren Usui <warren.usui@inktank.com>
(cherry picked from commit 8020dcf7791a0f459bae5e8a77d70ff1dc9c60bc)

11 years agoFix get_status() to find client.rados text inside of ps command results.
Warren Usui [Fri, 21 Feb 2014 05:11:45 +0000 (21:11 -0800)]
Fix get_status() to find client.rados text inside of ps command results.

Added port (fixed value for right now in teuthology) to hostname.
Fixes: 7374
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Warren Usui <warren.usui@inktank.com>
(cherry picked from commit 8200b8a02511e367370d33cb74c3d45ef85fca31)

11 years agoosd: Remove unused checkpoint code
David Zafman [Mon, 10 Mar 2014 20:35:19 +0000 (13:35 -0700)]
osd: Remove unused checkpoint code

Signed-off-by: David Zafman <david.zafman@inktank.com>
11 years agomds: fix owner check of file lock 1410/head
Yan, Zheng [Sun, 9 Mar 2014 23:36:14 +0000 (07:36 +0800)]
mds: fix owner check of file lock

flock and posix lock do not use process ID as owner identifier.
The process ID of who holds the lock is just for F_GETLK fcntl(2).
For linux kernel, File lock's owner identifier is the file pointer
through which the lock is requested.

The fix is do not take the 'pid_namespace' into consideration when
checking conflict locks. Also rename the 'pid' fields of struct
ceph_mds_request_args and struct ceph_filelock to 'owner', rename
'pid_namespace' fields to 'pid'.

The kclient counterpart of this patch modifies the flock code to
assign the file pointer to the 'owner' field of lock message. It
also set the most significant bit of the 'owner' field. We can use
that bit to distinguish between old and new clients.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agoReplicatedPG: adjust pending_attrs correctly in copy_from 1417/head
Samuel Just [Sun, 9 Mar 2014 18:43:57 +0000 (11:43 -0700)]
ReplicatedPG: adjust pending_attrs correctly in copy_from

Otherwise, subsequent reads might not get the correct cached attrs.

Fixes: #7663
Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoReplicatedPG: _delete_head should adjust pending_attrs
Samuel Just [Sun, 9 Mar 2014 18:41:48 +0000 (11:41 -0700)]
ReplicatedPG: _delete_head should adjust pending_attrs

We need the old attr_cache in make_writeable for the clone,
so make the changes to pending_attrs.

Fixes: #7663
Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoReplicatedPG: use pending_attrs in rollback
Samuel Just [Sun, 9 Mar 2014 18:43:00 +0000 (11:43 -0700)]
ReplicatedPG: use pending_attrs in rollback

Otherwise, we won't have the correct attr_cache in
make_writeable for the clone.

Fixes: #7663
Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoPG::choose_acting: filter CRUSH_ITEM_NONE out of have 1418/head
Samuel Just [Mon, 10 Mar 2014 20:36:37 +0000 (13:36 -0700)]
PG::choose_acting: filter CRUSH_ITEM_NONE out of have

Fixes: #7672
Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agorbdmap: bugfix upstart script
Stephan Renatus [Mon, 10 Mar 2014 14:17:41 +0000 (15:17 +0100)]
rbdmap: bugfix upstart script

It seems like the upstart script is lacking a little behind [the initscript](https://github.com/ceph/ceph/blob/master/src/init-rbdmap#L44-L49); however, this bugfix makes it actually do what it should do.

Before, the bug made the job just ignore all parameters, with the following error in /var/log/upstart/rbdmap.log:

```
rbd map volumes/volume-one
rbd: add failed: (22) Invalid argument
```

Signed-off-by: Stephan Renatus <s.renatus@x-ion.de>
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoFileStore: support compiling without libxfs 1412/head
Ilya Dryomov [Mon, 10 Mar 2014 08:36:48 +0000 (10:36 +0200)]
FileStore: support compiling without libxfs

When configured with --without-libxfs, use GenericFileStoreBackend
instead of XfsFileStoreBackend for XFS.  At this point this would only
impact the allocation hint op.  The default is to compile with
--with-libxfs.  (Previously it was unconditionally enabled on linux and
disabled for non-linux arches.)

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
11 years agoMerge branch 'master' of https://github.com/enovance/ceph-brag into firefly 1409/head
Babu Shanmugam [Mon, 10 Mar 2014 06:12:58 +0000 (06:12 +0000)]
Merge branch 'master' of https://github.com/enovance/ceph-brag into firefly

11 years agoRemoved all regular expression parsing and used '-f json' instead
Babu Shanmugam [Mon, 10 Mar 2014 06:11:03 +0000 (06:11 +0000)]
Removed all regular expression parsing and used '-f json' instead

Signed-off-by: Babu Shanmugam <anbu@enovance.com>