]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
12 years agomon/PaxosService: inline trim()
Sage Weil [Tue, 9 Jul 2013 05:04:10 +0000 (22:04 -0700)]
mon/PaxosService: inline trim()

This is now trivial; pull it into the caller.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon/PaxosService: move paxos_service_trim_max into caller, clean up
Sage Weil [Tue, 9 Jul 2013 05:02:00 +0000 (22:02 -0700)]
mon/PaxosService: move paxos_service_trim_max into caller, clean up

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon/PaxosService: simplify paxos_service_trim_min check
Sage Weil [Tue, 9 Jul 2013 04:58:13 +0000 (21:58 -0700)]
mon/PaxosService: simplify paxos_service_trim_min check

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: make service trim_to stateless
Sage Weil [Tue, 9 Jul 2013 04:54:53 +0000 (21:54 -0700)]
mon: make service trim_to stateless

Call get_trim_to() when we need to know how much to trim (if any), and
calculate it then.  No need to keep this in a hidden trim_version
variable and remember to update it.  This drops several helpers and
accessors and makes get_trim_to() a single method that services need to
override.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon/PaxosService: pass trim target into encode_trim()
Sage Weil [Tue, 9 Jul 2013 18:09:44 +0000 (11:09 -0700)]
mon/PaxosService: pass trim target into encode_trim()

This will help us in a few patches...

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon/PaxosService: unwind should_trim()
Sage Weil [Tue, 9 Jul 2013 04:44:05 +0000 (21:44 -0700)]
mon/PaxosService: unwind should_trim()

Inline the single-caller helper.  This will help us in a moment...

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon/PaxosService: unwind service_should_trim() helper
Sage Weil [Tue, 9 Jul 2013 04:41:55 +0000 (21:41 -0700)]
mon/PaxosService: unwind service_should_trim() helper

Nobody overloads it; put it inline in should_trim().

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon/MDSMonitor: remove unnecessary service_should_trim()
Sage Weil [Tue, 9 Jul 2013 04:41:34 +0000 (21:41 -0700)]
mon/MDSMonitor: remove unnecessary service_should_trim()

We never set_trim_to(), so this is unnecessary.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon/OSDMonitor: remove dup service_should_trim() implementation
Sage Weil [Tue, 9 Jul 2013 04:40:36 +0000 (21:40 -0700)]
mon/OSDMonitor: remove dup service_should_trim() implementation

This matches the parent.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon/PaxosService: trim periodically instead of via propose_pending
Sage Weil [Tue, 9 Jul 2013 04:38:11 +0000 (21:38 -0700)]
mon/PaxosService: trim periodically instead of via propose_pending

We want to trim old states even if there is no update activity.  For
example, if a long-running rebalance finishes all osdmap updates will
stop and we won't trim out old maps to free space.

Instead, trim at the same time as tick().  Remove the trim during
propose_pending() to force all trims through this path and avoid
introducing a new and rarely-exercised behavior.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agomon/PaxosService: reorder definitions
Sage Weil [Tue, 9 Jul 2013 04:33:37 +0000 (21:33 -0700)]
mon/PaxosService: reorder definitions

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon/PaxosService: uninline should_trim()
Sage Weil [Tue, 9 Jul 2013 04:33:22 +0000 (21:33 -0700)]
mon/PaxosService: uninline should_trim()

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge branch 'master' of https://github.com/ceph/ceph
John Wilkins [Tue, 9 Jul 2013 01:11:57 +0000 (18:11 -0700)]
Merge branch 'master' of https://github.com/ceph/ceph

12 years agodoc: Added Ceph Object Storage installation instructions for CentOS/RHEL 6.
John Wilkins [Tue, 9 Jul 2013 01:11:25 +0000 (18:11 -0700)]
doc: Added Ceph Object Storage installation instructions for CentOS/RHEL 6.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agomon/OSDMonitor: fix base case for loading full osdmap
Sage Weil [Tue, 9 Jul 2013 00:46:40 +0000 (17:46 -0700)]
mon/OSDMonitor: fix base case for loading full osdmap

Right after cluster creation, first_committed is 1 and latest stashed in 0,
but we don't have the initial full map yet.  Thereafter, we do (because we
write it with trim).  Fixes afd6c7d8247075003e5be439ad59976c3d123218.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agoMerge branch 'wip-small-object-recovery'
Samuel Just [Mon, 8 Jul 2013 23:46:31 +0000 (16:46 -0700)]
Merge branch 'wip-small-object-recovery'

Conflicts:
src/include/ceph_features.h

Reviewed-by: Sage Weil <sage@inktank.com>
Fixes: #5278
12 years agoReplicatedPG: send compound messages to enlightened peers
Samuel Just [Wed, 19 Jun 2013 20:26:50 +0000 (13:26 -0700)]
ReplicatedPG: send compound messages to enlightened peers

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoReplicatedPG: add handlers for MOSDPG(Push|Pull|PushReply)
Samuel Just [Mon, 17 Jun 2013 23:26:31 +0000 (16:26 -0700)]
ReplicatedPG: add handlers for MOSDPG(Push|Pull|PushReply)

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoOSD: add handlers for MOSDPG(Push|PushReply|Pull)
Samuel Just [Mon, 17 Jun 2013 22:59:19 +0000 (15:59 -0700)]
OSD: add handlers for MOSDPG(Push|PushReply|Pull)

MOSDPG(Push|PushReply|Pull|SubOp|SubOpReply) need the
same thing checked prior to queueing the op, so they
share a templated handler.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agomessages/,osd_types: add messages for Push, PushReply, Pull
Samuel Just [Mon, 17 Jun 2013 22:41:36 +0000 (15:41 -0700)]
messages/,osd_types: add messages for Push, PushReply, Pull

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoReplicatedPG: split handle_pull out of sub_op_pull
Samuel Just [Fri, 14 Jun 2013 23:06:16 +0000 (16:06 -0700)]
ReplicatedPG: split handle_pull out of sub_op_pull

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoReplicatedPG: split handle_push_reply out of sub_op_push_reply
Samuel Just [Fri, 14 Jun 2013 22:35:55 +0000 (15:35 -0700)]
ReplicatedPG: split handle_push_reply out of sub_op_push_reply

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoReplicatedPG: send pulls en masse in recover_primary
Samuel Just [Fri, 14 Jun 2013 21:58:39 +0000 (14:58 -0700)]
ReplicatedPG: send pulls en masse in recover_primary

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoReplicatedPG: send pushes en mass in recover_replicas, recover_backfill
Samuel Just [Fri, 14 Jun 2013 20:44:34 +0000 (13:44 -0700)]
ReplicatedPG: send pushes en mass in recover_replicas, recover_backfill

This way, the pushes might be later merged into a smaller number of
messages.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoOSD: convert handle_push to use PushOp
Samuel Just [Wed, 12 Jun 2013 22:10:59 +0000 (15:10 -0700)]
OSD: convert handle_push to use PushOp

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoReplicatedPG: pass a PushOp into handle_pull_response
Samuel Just [Wed, 12 Jun 2013 20:28:15 +0000 (13:28 -0700)]
ReplicatedPG: pass a PushOp into handle_pull_response

This is the first step toward packaging multiple
pushes/pulls into a single message.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoReplicatedPG: split send_push into build_push_op and send_push_op
Samuel Just [Wed, 12 Jun 2013 22:50:55 +0000 (15:50 -0700)]
ReplicatedPG: split send_push into build_push_op and send_push_op

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoReplicatedPG: _committed_pushed_object don't pass op
Samuel Just [Wed, 12 Jun 2013 20:27:12 +0000 (13:27 -0700)]
ReplicatedPG: _committed_pushed_object don't pass op

Add a separate callback to handle marking the event and
the stats.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoReplicatedPG: submit_push_data must take recovery_info as non-const
Samuel Just [Mon, 8 Jul 2013 21:34:50 +0000 (14:34 -0700)]
ReplicatedPG: submit_push_data must take recovery_info as non-const

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agomon: implement simple 'scrub' command
Sage Weil [Mon, 8 Jul 2013 22:07:57 +0000 (15:07 -0700)]
mon: implement simple 'scrub' command

Compare all keys within the sync'ed prefixes across members of the quorum
and compare the key counts and CRC for inconsistencies.

Currently this is a one-shot inefficient hammer.  We'll want to make this
work in chunks before it is usable in production environments.

Protect with a feature bit to avoid sending MMonScrub to mons who can't
decode it.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agomon: fix osdmap stash, trim to retain complete history of full maps
Sage Weil [Mon, 8 Jul 2013 22:04:59 +0000 (15:04 -0700)]
mon: fix osdmap stash, trim to retain complete history of full maps

The current interaction between sync and stashing full osdmaps only on
active mons means that a sync can result in an incomplete osdmap_full
history:

 - mon.c starts a full sync
 - during sync, active osdmap service should_stash_full() is true and
   includes a full in the txn
 - mon.c sync finishes
 - mon.c update_from_paxos gets "latest" stashed that it got from the
   paxos txn
 - mon.c does *not* walk to previous inc maps to complete it's collection
   of full maps.

To fix this, we disable the periodic/random stash of full maps by the
osdmap service.

This introduces a new problem: we must have at least one full map (the first
one) in order for a mon that just synced to build it's full collection.
Extend the encode_trim() process to allow the osdmap service to include
the oldest full map with the trim txn.  This is more complex than just
writing the full maps in the txn, but cheaper--we only write the full
map at trim time.

This *might* be related to previous bugs where the full osdmap was
missing, or case where leveldb keys seemed to 'disappear'.

Fixes: #5512
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoMerge pull request #397 from kri5/wip-5478
Yehuda Sadeh [Mon, 8 Jul 2013 19:23:36 +0000 (12:23 -0700)]
Merge pull request #397 from kri5/wip-5478

rgw: Add explicit messages in radosgw init script

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agocommon/crc32c: skip cpu detection incantation on not x86_64
Sage Weil [Sat, 6 Jul 2013 00:21:06 +0000 (17:21 -0700)]
common/crc32c: skip cpu detection incantation on not x86_64

On i386 this fails to build with

common/crc32c-intel.c: In function 'ceph_have_crc32c_intel':
error: common/crc32c-intel.c:79:9: PIC register clobbered by 'ebx' in 'asm'

ARM had more to complain about.

Not sure where this test came from, but it is clearly not meant for
anything other than x86_64.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoMerge pull request #407 from dachary/wip-5487
athanatos [Mon, 8 Jul 2013 17:44:43 +0000 (10:44 -0700)]
Merge pull request #407 from dachary/wip-5487

unit tests for ObjectContext read/write locks

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoqa/workunits/rbd/simple_big.sh: don't ENOSPC every time
Sage Weil [Mon, 8 Jul 2013 17:14:08 +0000 (10:14 -0700)]
qa/workunits/rbd/simple_big.sh: don't ENOSPC every time

Set the count on the initial dd so we don't always ENOSPC.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoqa/workunits/rbd/kernel.sh: move modprobe up
Sage Weil [Mon, 8 Jul 2013 16:58:16 +0000 (09:58 -0700)]
qa/workunits/rbd/kernel.sh: move modprobe up

Needs to happen before cleanup.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoqa/workunits/fs/test_o_trunc.sh: fix .sh to match new bin location
Sage Weil [Mon, 8 Jul 2013 16:56:29 +0000 (09:56 -0700)]
qa/workunits/fs/test_o_trunc.sh: fix .sh to match new bin location

To match 83f308962c53eec10db9e496987a9e4be7c87e9b.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agounit tests for ObjectContext read/write locks 407/head
Loic Dachary [Mon, 8 Jul 2013 14:18:08 +0000 (16:18 +0200)]
unit tests for ObjectContext read/write locks

unit tests for the ObjectContext methods ondisk_write_lock,
ondisk_write_unlock, ondisk_read_lock and ondisk_read_unlock.

A class derived from ::testing::Test is created with two sub-classes (
Thread_read_lock & Thread_write_lock ) to provide a separate thread
that can block with cond.Wait(). usleep(3) is used in the main thread
to wait for the expected side effect with increasing delays ( up to
MAX_DELAY ).

http://tracker.ceph.com/issues/5487 refs #5487

Signed-off-by: Loic Dachary <loic@dachary.org>
12 years agoMerge branch 'next'
Sage Weil [Mon, 8 Jul 2013 04:20:34 +0000 (21:20 -0700)]
Merge branch 'next'

12 years agomon: remove bad assert about monmap version
Sage Weil [Fri, 5 Jul 2013 23:03:49 +0000 (16:03 -0700)]
mon: remove bad assert about monmap version

It is possible to start a sync when our newest monmap is 0.  Usually we see
e0 from probe, but that isn't always published as part of the very first
paxos transaction due to the way PaxosService::_active generates it's
first initial commit.

In any case, having e0 here is harmless.

Fixes: #5509
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agoqa: write a somewhat <1tb image
Sage Weil [Fri, 5 Jul 2013 18:24:06 +0000 (11:24 -0700)]
qa: write a somewhat <1tb image

1TB is enough to fill up 6 plana osds.  And it takes forever.  Write less.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoqa/workunits/rbd/kernel.sh: modprobe rbd
Sage Weil [Fri, 5 Jul 2013 18:20:43 +0000 (11:20 -0700)]
qa/workunits/rbd/kernel.sh: modprobe rbd

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoqa: move test_o_trunc.sh into fs dir
Sage Weil [Fri, 5 Jul 2013 18:17:29 +0000 (11:17 -0700)]
qa: move test_o_trunc.sh into fs dir

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoqa: move fs test binary into workunits dir so teuthology can build it
Sage Weil [Fri, 5 Jul 2013 18:16:08 +0000 (11:16 -0700)]
qa: move fs test binary into workunits dir so teuthology can build it

Teuthology does a make in the workunits dir, so move this in there.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomds/MDSTable: gracefully suicide on EBLACKLIST
Sage Weil [Fri, 5 Jul 2013 18:04:17 +0000 (11:04 -0700)]
mds/MDSTable: gracefully suicide on EBLACKLIST

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agorgw: Add explicit messages in radosgw init script 397/head
Christophe Courtaut [Fri, 5 Jul 2013 12:41:04 +0000 (14:41 +0200)]
rgw: Add explicit messages in radosgw init script

http://tracker.ceph.com/issues/5478 fixes #5478

Signed-off-by: Christophe Courtaut <christophe.courtaut@gmail.com>
12 years agoqa: add O_TRUNC test
Sage Weil [Thu, 4 Jul 2013 04:53:48 +0000 (21:53 -0700)]
qa: add O_TRUNC test

From: Yan, Zheng <yan.zheng@intel.com>

Simple reproducer for #5453, modified to run for a finite number of
iterations.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon/Paxos: make 'paxos trim disabled max versions' much much larger
Sage Weil [Wed, 3 Jul 2013 23:56:06 +0000 (16:56 -0700)]
mon/Paxos: make 'paxos trim disabled max versions' much much larger

108000 is about 3 hours if paxos is going full-bore (1 proposal/second).
That ought to be pretty safe.  Otherwise, we start trimming to soon and a
slow sync will just have to restart when it finishes.

Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agomon: be less chatty about discarding messages
Sage Weil [Wed, 3 Jul 2013 23:23:56 +0000 (16:23 -0700)]
mon: be less chatty about discarding messages

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd/OSDMap: handle case where some new osds have hb_front and others don't
Sage Weil [Wed, 3 Jul 2013 22:36:39 +0000 (15:36 -0700)]
osd/OSDMap: handle case where some new osds have hb_front and others don't

Do not assume that because at least one OSD has an hb_front addr that they
all do, or else we will end up assigning garbage here and later thinking
it is a addr (or, more precisely, != entity_addr_t()).

Fixes: #5460
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
12 years agoosd: clear hb_front if it was previously non-NULL and is now NULL
Sage Weil [Wed, 3 Jul 2013 22:37:05 +0000 (15:37 -0700)]
osd: clear hb_front if it was previously non-NULL and is now NULL

If we have a real addr for hb_front for a given osd and then a new map
has the osd coming up without an hb_front, we need to clear the addr
field.

Also, improve the debug output in add_heartbeat_peer() so we can tell if
we have no connection or a connection to a blank addr.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
12 years agoMerge branch 'master' of https://github.com/ceph/ceph
John Wilkins [Wed, 3 Jul 2013 22:27:26 +0000 (15:27 -0700)]
Merge branch 'master' of https://github.com/ceph/ceph

12 years agodoc: Added write caps. Required for auto-creating pools.
John Wilkins [Wed, 3 Jul 2013 22:26:52 +0000 (15:26 -0700)]
doc: Added write caps. Required for auto-creating pools.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoosd: fix race when queuing recovery ops
Sage Weil [Tue, 25 Jun 2013 20:16:45 +0000 (13:16 -0700)]
osd: fix race when queuing recovery ops

Previously we would sample how many ops to start under the lock, drop it,
and start that many.  This is racy because multiple threads can jump in
and we start too many ops.  Instead, claim as many slots as we can and
release them back later if we do not end up using them.

Take care to re-wake the work-queue since we are releasing more resources
for wq use.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoosd_types: add PushOp, PushReplyOp PullOp
Samuel Just [Wed, 12 Jun 2013 20:18:22 +0000 (13:18 -0700)]
osd_types: add PushOp, PushReplyOp PullOp

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: do not use temp_coll for single-step pushes
Sage Weil [Wed, 5 Jun 2013 05:42:52 +0000 (22:42 -0700)]
osd: do not use temp_coll for single-step pushes

If we are recovering an object in a single step, there is no need to
write it to temp and then move it.  Avoiding that is a very good thing
when the FileStore has to do an fsync() for non-btrfs fs's.

Signed-off-by: Sage Weil <sage@inktank.com>
Conflicts:
src/osd/ReplicatedPG.cc

12 years agoObjectStore: only register non-null contexts
Samuel Just [Tue, 18 Jun 2013 22:14:17 +0000 (15:14 -0700)]
ObjectStore: only register non-null contexts

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoObjectStore,Context: add register_on_complete
Samuel Just [Wed, 12 Jun 2013 20:18:00 +0000 (13:18 -0700)]
ObjectStore,Context: add register_on_complete

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoElector.h: features are 64 bit
Samuel Just [Wed, 3 Jul 2013 18:18:33 +0000 (11:18 -0700)]
Elector.h: features are 64 bit

Fixes: #5497
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Luis <joao.luis@inktank.com>
12 years agoceph_features.h: declare all features as ULL
Samuel Just [Wed, 3 Jul 2013 18:18:19 +0000 (11:18 -0700)]
ceph_features.h: declare all features as ULL

Otherwise, the first 32 get |'d together as ints.  Then, the result
((int)-1) is sign extended to ((long long int)-1) before being |'d
with the 1LL entries.  This results in ~((uint64_t)0).

Fixes: #5497
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Luis <joao.luis@inktank.com>
12 years agoPipe: use uint64_t not unsigned when setting features
Samuel Just [Wed, 3 Jul 2013 04:09:36 +0000 (21:09 -0700)]
Pipe: use uint64_t not unsigned when setting features

Fixes: #5497
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Luis <joao.luis@inktank.com>
12 years agoMerge pull request #393 from ceph/wip-crc
Mark Nelson [Wed, 3 Jul 2013 20:20:10 +0000 (13:20 -0700)]
Merge pull request #393 from ceph/wip-crc

use sse4.2 crc32c instruction

12 years agoceph.spec.in: Fix file name typo
Gary Lowell [Wed, 3 Jul 2013 20:14:36 +0000 (13:14 -0700)]
ceph.spec.in:  Fix file name typo

Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
12 years agocommon: autoselect crc32c based on cpu features 393/head
Sage Weil [Wed, 3 Jul 2013 19:20:45 +0000 (12:20 -0700)]
common: autoselect crc32c based on cpu features

If the CPu supposts SSE4.2, use the crc32c instructions.  Use the magic
incantation from who knows where to do this.  __builtin_cpu_supports()
is a nicer way to do it, but that is new in gcc 4.8.

Avoid static globals; they are bad.  Sadly that means we redetect the CPU
feature on every call. I assume that is reasonably efficient...

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoInitial Intel SSE4 crc32c implementation.
Mark Nelson [Mon, 24 Jun 2013 03:18:03 +0000 (22:18 -0500)]
Initial Intel SSE4 crc32c implementation.

Signed-off-by: Mark Nelson <mark.nelson@inktank.com>
12 years agomds: fix O_TRUNC locking
Yan, Zheng [Sat, 29 Jun 2013 23:44:04 +0000 (07:44 +0800)]
mds: fix O_TRUNC locking

When truncating a file, we should xlock the corresponding filelock.
(revoke any Fw caps from clients).

[note from sw: setattr on size also takes the filelock xlock.]

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoMakefile: include rbdmap in dist tarball
Sage Weil [Wed, 3 Jul 2013 18:11:19 +0000 (11:11 -0700)]
Makefile: include rbdmap in dist tarball

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #390 from ksperis/rbdmap.init-2
Sage Weil [Wed, 3 Jul 2013 14:03:35 +0000 (07:03 -0700)]
Merge pull request #390 from ksperis/rbdmap.init-2

init-rbdmap install

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Wed, 3 Jul 2013 02:43:17 +0000 (19:43 -0700)]
Merge remote-tracking branch 'gh/next'

Conflicts:
src/mon/OSDMonitor.cc

12 years agomon: dead code removal
Dan Mick [Tue, 2 Jul 2013 23:57:47 +0000 (16:57 -0700)]
mon: dead code removal

Remove code for 'mds cluster_fail', 'osd tell', and auth_usage()

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoMerge branch 'wip-ceph-disk'
Sage Weil [Tue, 2 Jul 2013 23:55:26 +0000 (16:55 -0700)]
Merge branch 'wip-ceph-disk'

Passed ceph-deploy suite.

Reviewed-by: Dan Mick <dan.mick@inktank.com>
12 years agosysvinit, upstart: handle symlinks to dirs in /var/lib/ceph/*
Sage Weil [Tue, 2 Jul 2013 21:43:17 +0000 (14:43 -0700)]
sysvinit, upstart: handle symlinks to dirs in /var/lib/ceph/*

Match a symlink to a dir, not just dirs.  This fixes the osd case of e.g.,
creating an osd in /data/osd$id in which ceph-disk makes a symlink from
/var/lib/ceph/osd/ceph-$id.

Fix proposed by Matt Thompson <matt.thompson@mandiant.com>; extended to
include the upstart users too.

Fixes: #5490
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
12 years agoceph-disk: handle /dev/foo/bar devices throughout 369/head
Sage Weil [Tue, 18 Jun 2013 23:21:48 +0000 (16:21 -0700)]
ceph-disk: handle /dev/foo/bar devices throughout

Assume the last component is the unique device name, even if it appears
under a subdir of /dev.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoceph-disk: make is_held() smarter about full disks
Sage Weil [Tue, 18 Jun 2013 03:54:15 +0000 (20:54 -0700)]
ceph-disk: make is_held() smarter about full disks

Handle the case where the device is a full disk.  Make the partition
check a bit more robust (don't make assumptions about naming aside from
the device being a prefix of the partition).

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdc/Objecter: resend command map version checks on reconnect
Sage Weil [Tue, 2 Jul 2013 20:43:29 +0000 (13:43 -0700)]
osdc/Objecter: resend command map version checks on reconnect

We already do this for Ops and LingerOps, but missed this when we added
CommandOps to the mix.  The result is that an ill-timed mon disconnect will
leave a command map check (and thus the command) hanging.

This misbehavior was introduced when CommandOp was introduced, back in
commit 2e172225b01561bb7988b6d86d96ff4b9c1c5762, and we failed to fix it
in 8808ca57c652502d9cf803b0dc53673ca9dd62af.

Fixes: #5493
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoMerge pull request #386 from ceph/wip-osd-xattr
Sage Weil [Tue, 2 Jul 2013 18:32:23 +0000 (11:32 -0700)]
Merge pull request #386 from ceph/wip-osd-xattr

automatically enable xattrs in omap; make size limits well defined

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agomds: man page: Fixed cut & paste error
Simon Leinen [Tue, 2 Jul 2013 11:44:41 +0000 (13:44 +0200)]
mds: man page: Fixed cut & paste error

12 years agoMerge pull request #388 from dachary/master
athanatos [Tue, 2 Jul 2013 17:53:04 +0000 (10:53 -0700)]
Merge pull request #388 from dachary/master

set object_info_t pool of an ObjectContext if it is undefined or bad

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoset object_info_t pool of an ObjectContext if it is undefined or bad 388/head
Loic Dachary [Tue, 2 Jul 2013 08:19:58 +0000 (10:19 +0200)]
set object_info_t pool of an ObjectContext if it is undefined or bad

When reading object_info_t from an existing object attribute, the pool
may be < 0 and should be set to the pool containing the object. This is done
on the oi object on the stack but overriden later by:

    obc->obs.oi.decode(bv);

This decode is superfluous and is removed so that it does not override
the modified value of the pool.

Signed-off-by: Loic Dachary <loic@dachary.org>
12 years agoMove rbdmap file to /etc/ceph 390/head
Laurent Barbe [Tue, 2 Jul 2013 14:53:11 +0000 (16:53 +0200)]
Move rbdmap file to /etc/ceph

Signed-off-by: Laurent Barbe <laurent@ksperis.com>
12 years agoinstall rules for init-rbdmap
Laurent Barbe [Tue, 2 Jul 2013 14:50:40 +0000 (16:50 +0200)]
install rules for init-rbdmap

Signed-off-by: Laurent Barbe <laurent@ksperis.com>
12 years agoMerge pull request #387 from ceph/wip-5346
Yehuda Sadeh [Tue, 2 Jul 2013 04:00:02 +0000 (21:00 -0700)]
Merge pull request #387 from ceph/wip-5346

rgw: add RGWFormatter_Plain allocation to sidestep cranky strlen()

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agorgw: add RGWFormatter_Plain allocation to sidestep cranky strlen() 387/head
Sage Weil [Tue, 2 Jul 2013 00:33:11 +0000 (17:33 -0700)]
rgw: add RGWFormatter_Plain allocation to sidestep cranky strlen()

Valgrind complains about an invalid read when we don't pad the allocation,
and because it is inlined we can't whitelist it for valgrind.  Workaround
the warning by just padding our allocations a bit.

Fixes: #5346
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agodoc: Minor fix.
John Wilkins [Tue, 2 Jul 2013 00:06:22 +0000 (17:06 -0700)]
doc: Minor fix.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agomon: Paxos: update first_committed on first paxos proposal
Joao Eduardo Luis [Mon, 1 Jul 2013 22:18:48 +0000 (23:18 +0100)]
mon: Paxos: update first_committed on first paxos proposal

We were adding this update to a transaction that would only be applied
on the leader.  The peons would never see a first_committed > 0.

Fixes: #5484
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agolibrados: fix test warning on 32-bit platforms
Sage Weil [Mon, 1 Jul 2013 21:30:03 +0000 (14:30 -0700)]
librados: fix test warning on 32-bit platforms

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoos/FileStore: automatically enable 'filestore xattr use omap' as needed 386/head
Sage Weil [Sat, 29 Jun 2013 01:26:31 +0000 (18:26 -0700)]
os/FileStore: automatically enable 'filestore xattr use omap' as needed

Automatically enable the 'filestore xattr use omap' option if the fs
does not appear to handle large xattrs on its own.

This makes for a more pleasant use experience as they are not told to
enable something that we already know they must enable in order to
continue.

Fixes: #5137
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agolibrados: add test for large and many xattrs
Sage Weil [Sat, 29 Jun 2013 00:45:21 +0000 (17:45 -0700)]
librados: add test for large and many xattrs

Verify that we can set large and large numbers of attrs on an object.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd/PGLog: populate log_keys_debug from read_old_log()
Sage Weil [Sat, 29 Jun 2013 05:03:21 +0000 (22:03 -0700)]
osd/PGLog: populate log_keys_debug from read_old_log()

Fixes: #5470
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoMerge pull request #385 from kri5/wip-1779
Yehuda Sadeh [Mon, 1 Jul 2013 14:08:05 +0000 (07:08 -0700)]
Merge pull request #385 from kri5/wip-1779

rgw: Fix return value for swift user not found

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agorgw: Fix return value for swift user not found 385/head
Christophe Courtaut [Mon, 1 Jul 2013 12:57:17 +0000 (14:57 +0200)]
rgw: Fix return value for swift user not found

http://tracker.ceph.com/issues/1779 fixes #1779

Adjust the return value from rgw_get_user_info_by_swift call
in RGW_SWIFT_Auth_Get::execute() to have the correct
return code in response.

12 years agoosd: set maximum object attr size
Sage Weil [Sat, 29 Jun 2013 01:15:23 +0000 (18:15 -0700)]
osd: set maximum object attr size

Make a well-defined maximum size of an object attribute.  Since Linus has
a 64KB limit, and that is what we normally use to back this, use that as
the limit.  This means that even when leveldb is backing large xattrs
(as ext4 users must do) we will return EFBIG on >64KB setxattr attempts.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomds: log before respawning when standby-replay falls behind
Sage Weil [Fri, 28 Jun 2013 21:20:28 +0000 (14:20 -0700)]
mds: log before respawning when standby-replay falls behind

Call into an MDS method so that we can write to the log.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'gh/wip-5381' into wip-mds
Sage Weil [Fri, 28 Jun 2013 21:24:08 +0000 (14:24 -0700)]
Merge remote-tracking branch 'gh/wip-5381' into wip-mds

Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoclient: send all request put's through put_request() 383/head
Sage Weil [Fri, 28 Jun 2013 19:21:58 +0000 (12:21 -0700)]
client: send all request put's through put_request()

Make sure all MetaRequest reference put's go through the same path that
releases inode references, including all of the error paths.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoclient: fix remaining Inode::put() caller, and make method psuedo-private
Sage Weil [Fri, 28 Jun 2013 18:50:11 +0000 (11:50 -0700)]
client: fix remaining Inode::put() caller, and make method psuedo-private

Not sure I can make this actually private and make Client::put_inode() a
friend method (making all of Client a friend would defeat the purpose).
This works well enough, though!

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agolibrados: fix cmd OSDCommand test
Sage Weil [Fri, 28 Jun 2013 13:54:18 +0000 (06:54 -0700)]
librados: fix cmd OSDCommand test

If we get ENXIO, buflen will be 0.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoclient: use put_inode on MetaRequest inode refs
Sage Weil [Fri, 28 Jun 2013 04:39:35 +0000 (21:39 -0700)]
client: use put_inode on MetaRequest inode refs

When we drop the request inode refs, we need to use put_inode() to ensure
they get cleaned up properly (removed from inode_map, caps released, etc.).
Do this explicitly here (as we do with all other inode put() paths that
matter).

Fixes: #5381
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agodoc: Created an install page for Calxeda development packages.
John Wilkins [Thu, 27 Jun 2013 23:31:44 +0000 (16:31 -0700)]
doc: Created an install page for Calxeda development packages.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoMerge branch 'next'
Greg Farnum [Thu, 27 Jun 2013 22:23:00 +0000 (15:23 -0700)]
Merge branch 'next'