]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
12 years agoosd/PG: introduce flags to indicate explicitly requested scrubs
Sage Weil [Sat, 12 Jan 2013 17:15:16 +0000 (09:15 -0800)]
osd/PG: introduce flags to indicate explicitly requested scrubs

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd/PG: move scrub schedule registration into a helper
Sage Weil [Sat, 12 Jan 2013 17:14:01 +0000 (09:14 -0800)]
osd/PG: move scrub schedule registration into a helper

Simplifies callers, and will let us easily modify the decision of when
to schedule the PG for scrub.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoRevert "osdmap: spread replicas across hosts with default crush map"
Sage Weil [Mon, 14 Jan 2013 15:37:59 +0000 (07:37 -0800)]
Revert "osdmap: spread replicas across hosts with default crush map"

This reverts commit 7ea5d84fa3d0ed3db61eea7eb9fa8dbee53244b6.

This breaks teuthology and vstart both in its current state.

12 years agomon: OSDMonitor: don't output to stdout in plain text if json is specified
Joao Eduardo Luis [Thu, 10 Jan 2013 18:54:12 +0000 (18:54 +0000)]
mon: OSDMonitor: don't output to stdout in plain text if json is specified

Fixes: #3748
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoosdmap: spread replicas across hosts with default crush map
Sage Weil [Sat, 12 Jan 2013 01:23:22 +0000 (17:23 -0800)]
osdmap: spread replicas across hosts with default crush map

This is more often the case than not, and we don't have a good way to
magically know what size of cluster the user will be creating.  Better to
err on the side of doing the right thing for more people.

Fixes: #3785
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agorbd: Fix tabs
Dan Mick [Sat, 12 Jan 2013 00:18:10 +0000 (16:18 -0800)]
rbd: Fix tabs

Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agodoc: Updates to CRUSH paper.
John Wilkins [Fri, 11 Jan 2013 23:56:02 +0000 (15:56 -0800)]
doc: Updates to CRUSH paper.

fixes: 3329, 3707, 3711, 3389

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agorbd: make 'add' modprobe rbd so it has a chance of success
Dan Mick [Fri, 11 Jan 2013 02:46:13 +0000 (18:46 -0800)]
rbd: make 'add' modprobe rbd so it has a chance of success

Check for existence of /sys/bus/rbd first to avoid unnecessary calls

Fixes: #3784
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
12 years agorbd: call udevadm settle on map/unmap
Dan Mick [Fri, 11 Jan 2013 02:44:44 +0000 (18:44 -0800)]
rbd: call udevadm settle on map/unmap

When we map/unmap devices, udev gets called to manage device nodes;
this will allow the command to wait for those manipulations to complete,
particularly for test runs, so that the device tree is stable by the
time the command exits.

--no-settle is also provided to avoid this behavior if desired (say,
for a series of 'map' commands, perhaps the user wants to wait for
settling only on the last of the series).

Fixes: #3635
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
12 years agoOSD: only trim up to the oldest map still in use by a pg
Samuel Just [Fri, 11 Jan 2013 19:02:15 +0000 (11:02 -0800)]
OSD: only trim up to the oldest map still in use by a pg

map_cache.cached_lb() provides us with a lower bound across
all pgs for in-use osdmaps.  We cannot trim past this since
those maps are still in use.

backport: bobtail
Fixes: #3770
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoOSD: check for empty command in do_command
Samuel Just [Fri, 11 Jan 2013 18:44:04 +0000 (10:44 -0800)]
OSD: check for empty command in do_command

Fixes: #3878
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
12 years agoMerge pull request #32 from imjustmatthew/imjustmatthew_docs
John Wilkins [Fri, 11 Jan 2013 20:09:25 +0000 (12:09 -0800)]
Merge pull request #32 from imjustmatthew/imjustmatthew_docs

Correct typo in mon docs 'ceph.com' to 'ceph.conf'

12 years agoCorrect typo in mon docs 'ceph.com' to 'ceph.conf' 32/head
Matthew Roy [Fri, 11 Jan 2013 19:59:53 +0000 (14:59 -0500)]
Correct typo in mon docs 'ceph.com' to 'ceph.conf'

12 years agomon: Monitor: only schedule a timecheck after election if we are not alone
Joao Eduardo Luis [Fri, 11 Jan 2013 18:04:01 +0000 (18:04 +0000)]
mon: Monitor: only schedule a timecheck after election if we are not alone

Fixes: #3790
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'gh/wip-3633'
Sage Weil [Fri, 11 Jan 2013 02:05:27 +0000 (18:05 -0800)]
Merge remote-tracking branch 'gh/wip-3633'

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agomon: Monitor: unify 'ceph health' and 'ceph status'; add json output
Joao Eduardo Luis [Wed, 2 Jan 2013 16:07:33 +0000 (16:07 +0000)]
mon: Monitor: unify 'ceph health' and 'ceph status'; add json output

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agomon: Monitor: use 'else if' on handle_command instead of bunches of 'if'
Joao Eduardo Luis [Wed, 2 Jan 2013 14:45:01 +0000 (14:45 +0000)]
mon: Monitor: use 'else if' on handle_command instead of bunches of 'if'

... when the options are mutually exclusive.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agomon: Monitor: move a couple of if's together on handle_command()
Joao Eduardo Luis [Wed, 2 Jan 2013 14:42:08 +0000 (14:42 +0000)]
mon: Monitor: move a couple of if's together on handle_command()

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agomon: Monitor: reduce indentation level; make code more readable
Joao Eduardo Luis [Wed, 2 Jan 2013 14:39:51 +0000 (14:39 +0000)]
mon: Monitor: reduce indentation level; make code more readable

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agomon: Monitor: add timecheck infrastructure to detect clock skews
Joao Eduardo Luis [Thu, 27 Dec 2012 20:11:33 +0000 (20:11 +0000)]
mon: Monitor: add timecheck infrastructure to detect clock skews

Fixes: #3633
Fixes: #3695
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agomessages: add MTimeCheck
Joao Eduardo Luis [Wed, 26 Dec 2012 11:43:42 +0000 (11:43 +0000)]
messages: add MTimeCheck

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agodoc: Added -a option. Should work without from server, as described.
John Wilkins [Fri, 11 Jan 2013 00:03:02 +0000 (16:03 -0800)]
doc: Added -a option. Should work without from server, as described.

fixes: #3750

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Normalized to term "drive" rather than disk. Changed "(Manual)" entry on remove...
John Wilkins [Thu, 10 Jan 2013 23:59:59 +0000 (15:59 -0800)]
doc: Normalized to term "drive" rather than disk. Changed "(Manual)" entry on remove OSD.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoMerge branch 'next'
Samuel Just [Thu, 10 Jan 2013 23:06:19 +0000 (15:06 -0800)]
Merge branch 'next'

12 years agorados: add truncate support
Samuel Just [Fri, 4 Jan 2013 05:13:44 +0000 (21:13 -0800)]
rados: add truncate support

Signed-off-by: Samuel Just <sam.just@inktank.com>
Revewed-by: Greg Farnum <greg@inktank.com>
12 years agoconfig_opts.h: default osd_recovery_delay_start to 0
Samuel Just [Thu, 10 Jan 2013 19:06:02 +0000 (11:06 -0800)]
config_opts.h: default osd_recovery_delay_start to 0

This setting was intended to prevent recovery from overwhelming peering traffic
by delaying the recovery_wq until osd_recovery_delay_start seconds after pgs
stop being added to it.  This should be less necessary now that recovery
messages are sent with strictly lower priority then peering messages.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Gregory Farnum <greg@inktank.com>
12 years agoReplicatedPG: fix snapdir trimming
Samuel Just [Thu, 10 Jan 2013 03:17:23 +0000 (19:17 -0800)]
ReplicatedPG: fix snapdir trimming

The previous logic was both complicated and not correct.  Consequently,
we have been tending to drop snapcollection links in some cases.  This
has resulted in clones incorrectly not being trimmed.  This patch
replaces the logic with something less efficient but hopefully a bit
clearer.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoRevert "rgw: fix handler leak in handle_request"
Yehuda Sadeh [Thu, 10 Jan 2013 18:14:11 +0000 (10:14 -0800)]
Revert "rgw: fix handler leak in handle_request"

This reverts commit eba314a811cd98a79f483dc7a9128fe76c722c78.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agoMerge pull request #31 from chrisglass/expose_cluster_stats_to_python
Gregory Farnum [Thu, 10 Jan 2013 18:09:25 +0000 (10:09 -0800)]
Merge pull request #31 from chrisglass/expose_cluster_stats_to_python

Added python wrapper to rados_cluster_stat

12 years agoAdded python wrapper to rados_cluster_stat 31/head
Chris Glass [Thu, 10 Jan 2013 13:43:49 +0000 (14:43 +0100)]
Added python wrapper to rados_cluster_stat

The new get_cluster_stats() method on the rados.Rados object calls
the rados_cluster_stat() function in the librados library.

Signed-off-by: Christopher Glass <christopher.glass@canonical.com>
12 years agorbd: allow copy of zero-length images. Includes simple test.
Dan Mick [Wed, 9 Jan 2013 22:50:48 +0000 (14:50 -0800)]
rbd: allow copy of zero-length images.  Includes simple test.

Fixes: #3765
Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agodoc/install/debian.rst: fix typo in link ref; broke doc build
Dan Mick [Thu, 10 Jan 2013 00:10:36 +0000 (16:10 -0800)]
doc/install/debian.rst: fix typo in link ref; broke doc build

Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agoMerge branch 'next'
Dan Mick [Wed, 9 Jan 2013 23:11:36 +0000 (15:11 -0800)]
Merge branch 'next'

Want to get various rbd-related fixes together for upgrade testing

12 years agoReplicatedPG: increment scrubber.errors rather than errors
Samuel Just [Wed, 9 Jan 2013 22:39:00 +0000 (14:39 -0800)]
ReplicatedPG: increment scrubber.errors rather than errors

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agolibrados: add aio stat tests
Filippos Giannakos [Thu, 20 Dec 2012 20:05:11 +0000 (22:05 +0200)]
librados: add aio stat tests

Implement simple write-stat test, and a write-stat-remove-stat test cycle.

Signed-off-by: Filippos Giannakos <philipgian@grnet.gr>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agolibrados: implement aio_stat
Filippos Giannakos [Thu, 20 Dec 2012 20:05:10 +0000 (22:05 +0200)]
librados: implement aio_stat

Implement aio stat and also export this functionality to the C API.

Signed-off-by: Filippos Giannakos <philipgian@grnet.gr>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoosd: make missing head non-fatal during scrub
Sage Weil [Wed, 2 Jan 2013 17:39:26 +0000 (09:39 -0800)]
osd: make missing head non-fatal during scrub

If we encounter a scrub without a preceeding head, warn instead of
crashing.  Note that this is still something we can't repair.

See #3705.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agorgw: Fix crash when FastCGI frontend doesn't set SCRIPT_URI
Sylvain Munaut [Mon, 7 Jan 2013 12:13:49 +0000 (13:13 +0100)]
rgw: Fix crash when FastCGI frontend doesn't set SCRIPT_URI

Fixes: #3735
Signed-off-by: caleb miles <caleb.miles@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agorgw: fix handler leak in handle_request
caleb miles [Tue, 8 Jan 2013 20:56:00 +0000 (15:56 -0500)]
rgw: fix handler leak in handle_request

Fixes: #3682
Signed-off-by: caleb miles <caleb.miles@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agolibrbd: Allow get_lock_info to fail
Dan Mick [Tue, 8 Jan 2013 19:21:22 +0000 (11:21 -0800)]
librbd: Allow get_lock_info to fail

If the lock class isn't present, EOPNOTSUPP is returned for lock calls
on newer OSDs, but sadly EIO on older; we need to treat both as
acceptable failures for RBD images.  rados lock list will still fail.

Fixes #3744.

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agodoc/release-notes: v0.48.3argonaut
Sage Weil [Wed, 9 Jan 2013 02:21:12 +0000 (18:21 -0800)]
doc/release-notes: v0.48.3argonaut

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agodoc/install: new URLs for argonaut vs bobtail
Sage Weil [Tue, 8 Jan 2013 04:51:04 +0000 (20:51 -0800)]
doc/install: new URLs for argonaut vs bobtail

Also restructure the document a bit to make the choice of packages more
clear.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agodoc/release-notes: v0.56.1
Sage Weil [Tue, 8 Jan 2013 04:46:31 +0000 (20:46 -0800)]
doc/release-notes: v0.56.1

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge branch 'wip-stripe-gran'
Noah Watkins [Tue, 8 Jan 2013 00:14:16 +0000 (16:14 -0800)]
Merge branch 'wip-stripe-gran'

Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agotest: enforce -ENOTCONN contract in libcephfs
Noah Watkins [Mon, 7 Jan 2013 23:50:36 +0000 (15:50 -0800)]
test: enforce -ENOTCONN contract in libcephfs

Tests all relevant calls for -ENOTCONN when used with an unmounted
ceph_mount_info param.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agolibcephfs: return -ENOTCONN when call unmounted
Noah Watkins [Mon, 7 Jan 2013 23:47:38 +0000 (15:47 -0800)]
libcephfs: return -ENOTCONN when call unmounted

Adds -ENOTCONN return value for stat, fchmod, fchown, lchown.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agoPG: set DEGRADED in Active AdvMap handler based on pool size
Samuel Just [Mon, 7 Jan 2013 23:02:34 +0000 (15:02 -0800)]
PG: set DEGRADED in Active AdvMap handler based on pool size

Otherwise, if the acting set does not change, the pg might
not show up as degraded if the pool size now exceeds the
acting set size.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agolibcephfs: clarify interface return value
Noah Watkins [Mon, 7 Jan 2013 23:04:33 +0000 (15:04 -0800)]
libcephfs: clarify interface return value

Document that ceph_get_stripe_unit_granularity may return an error code
(e.g. -ENOTCONN). The interface requires a mount, but currently we
return a compile-time constant. Other error codes may be possible in the
future.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agoMerge branch 'next'
Sage Weil [Mon, 7 Jan 2013 21:12:33 +0000 (13:12 -0800)]
Merge branch 'next'

12 years agoMerge branch 'wip-3678-b' into next
Sage Weil [Mon, 7 Jan 2013 21:04:13 +0000 (13:04 -0800)]
Merge branch 'wip-3678-b' into next

Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agomsg/Pipe: prepare Message data for wire under pipe_lock
Sage Weil [Sun, 6 Jan 2013 16:38:27 +0000 (08:38 -0800)]
msg/Pipe: prepare Message data for wire under pipe_lock

We cannot trust the Message bufferlists or other structures to be
stable without pipe_lock, as another Pipe may claim and modify the sent
list items while we are writing to the socket.

Related to #3678.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomsgr: update Message envelope in encode, not write_message
Sage Weil [Sun, 6 Jan 2013 16:33:01 +0000 (08:33 -0800)]
msgr: update Message envelope in encode, not write_message

Fill out the Message header, footer, and calculate CRCs during
encoding, not write_message().  This removes most modifications from
Pipe::write_message().

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdc/Objecter: fix linger_ops iterator invalidation on pool deletion
Sage Weil [Mon, 7 Jan 2013 20:58:39 +0000 (12:58 -0800)]
osdc/Objecter: fix linger_ops iterator invalidation on pool deletion

The call to check_linger_pool_dne() may unregister the linger request,
invalidating the iterator.  To avoid this, increment the iterator at
the top of the loop.

This mirror the fix in 4bf9078286d58c2cd4e85cb8b31411220a377092 for
regular non-linger ops.

Fixes: #3734
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoceph-fuse: rename ceph_ll_* to fuse_ll_*
David Zafman [Fri, 4 Jan 2013 21:37:20 +0000 (13:37 -0800)]
ceph-fuse: rename ceph_ll_* to fuse_ll_*

To not conflict with future linuxbox pull for nfs-ganesha.

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agomsg/Pipe: encode message inside pipe_lock
Sage Weil [Sun, 6 Jan 2013 16:25:40 +0000 (08:25 -0800)]
msg/Pipe: encode message inside pipe_lock

This modifies bufferlists in the Message struct, and it is possible
for multiple instances of the Pipe to get references on the Message;
make sure they don't modify those bufferlists concurrently.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomsg/Pipe: associate sending msgs to con inside lock
Sage Weil [Sat, 5 Jan 2013 18:39:08 +0000 (10:39 -0800)]
msg/Pipe: associate sending msgs to con inside lock

Associate a sending message with the connection inside the pipe_lock.
This way if a racing thread tries to steal these messages it will
be sure to reset the con point *after* we do such that it the con
pointer is valid in encode_payload() (and later).

This may be part of #3678.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomsg/Pipe: fix msg leak in requeue_sent()
Sage Weil [Sat, 5 Jan 2013 17:29:50 +0000 (09:29 -0800)]
msg/Pipe: fix msg leak in requeue_sent()

The sent list owns a reference to each message.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoos/FileJournal: include limits.h
Sage Weil [Sun, 6 Jan 2013 04:53:49 +0000 (20:53 -0800)]
os/FileJournal: include limits.h

Needed for IOV_MAX.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agojava: add stripe unit granularity tests
Noah Watkins [Sat, 5 Jan 2013 19:17:58 +0000 (11:17 -0800)]
java: add stripe unit granularity tests

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agojava: update javadoc comments
Noah Watkins [Sat, 5 Jan 2013 19:12:25 +0000 (11:12 -0800)]
java: update javadoc comments

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agojava: fix whitespace
Noah Watkins [Sat, 5 Jan 2013 19:10:53 +0000 (11:10 -0800)]
java: fix whitespace

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agojava: add support for get_stripe_unit_granularity
Joe Buck [Sat, 5 Jan 2013 01:33:26 +0000 (17:33 -0800)]
java: add support for get_stripe_unit_granularity

Signed-off-by: Joe Buck <jbbuck@gmail.com>
Reviewed-by: Noah Watkins <noahwatkins@gmail.com>
12 years agolibcephfs: expose stripe unit granularity
Noah Watkins [Fri, 4 Jan 2013 22:57:20 +0000 (14:57 -0800)]
libcephfs: expose stripe unit granularity

Assists clients in choosing layout parameters.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agoMerge branch 'next'
Sage Weil [Sat, 5 Jan 2013 04:48:12 +0000 (20:48 -0800)]
Merge branch 'next'

12 years agoosd: special case CALL op to not have RD bit effects
Sage Weil [Sat, 5 Jan 2013 01:43:41 +0000 (17:43 -0800)]
osd: special case CALL op to not have RD bit effects

In commit 20496b8d2b2c3779a771695c6f778abbdb66d92a we treat a CALL as
different from a normal "read", but we did not adjust the behavior
determined by the RD bit in the op.  We tried to fix that in
91e941aef9f55425cc12204146f26d79c444cfae, but changing the op code breaks
compatibility, so that was reverted.

Instead, special-case CALL in the helper--the only point in the code that
actually checks for the RD bit.  (And fix one lingering user to use that
helper appropriately.)

Fixes: #3731
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
12 years agoRevert "OSD: remove RD flag from CALL ops"
Sage Weil [Sat, 5 Jan 2013 04:46:48 +0000 (20:46 -0800)]
Revert "OSD: remove RD flag from CALL ops"

This reverts commit 91e941aef9f55425cc12204146f26d79c444cfae.

We cannot change this op code without breaking compatibility
with old code (client and server).  We'll have to special case
this op code instead.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
12 years agolibcephfs: delete client after messenger shutdown
Noah Watkins [Fri, 4 Jan 2013 22:15:31 +0000 (14:15 -0800)]
libcephfs: delete client after messenger shutdown

Prevents race between messages being dispatched to the client after the
client has been free'd.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agorbd: Don't call ProgressContext's finish() if there's an error.
Dan Mick [Sat, 5 Jan 2013 02:00:24 +0000 (18:00 -0800)]
rbd: Don't call ProgressContext's finish() if there's an error.

do_copy was different from the others; call pc.fail() on error and
do not call pc.finish().

Fixes: #3729
Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agoReplicatedPG: remove old-head optization from push_to_replica
Samuel Just [Fri, 4 Jan 2013 20:43:52 +0000 (12:43 -0800)]
ReplicatedPG: remove old-head optization from push_to_replica

This optimization allowed the primary to push a clone as a single push in the
case that the head object on the replica is old and happens to be at the same
version as the clone.  In general, using head in clone_subsets is tricky since
we might be writing to head during the push.  calc_clone_subsets does not
consider head (probably for this reason).  Handling the clone from head case
properly would require blocking writes on head in the interim which is probably
a bad trade off anyway.

Because the old-head optimization only comes into play if the replica's state
happens to fall on the last write to head prior to the snap that caused the
clone in question, it's not worth the complexity.

Fixes: #3698
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote branch 'origin/wip-rbd-watch'
Josh Durgin [Fri, 4 Jan 2013 21:37:29 +0000 (13:37 -0800)]
Merge remote branch 'origin/wip-rbd-watch'

Reviewed-by: Dan Mick <dan.mick@inktank.com>
12 years agoos/FileStore: fix non-btrfs op_seq commit order
Sage Weil [Fri, 4 Jan 2013 01:15:07 +0000 (17:15 -0800)]
os/FileStore: fix non-btrfs op_seq commit order

The op_seq file is the starting point for journal replay.  For stable btrfs
commit mode, which is using a snapshot as a reference, we should write this
file before we take the snap.  We normally ignore current/ contents anyway.

On non-btrfs file systems, however, we should only write this file *after*
we do a full sync, and we should then fsync(2) it before we continue
(and potentially trim anything from the journal).

This fixes a serious bug that could cause data loss and corruption after
a power loss event.  For a 'kill -9' or crash, however, there was little
risk, since the writes were still captured by the host's cache.

Fixes: #3721
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agodoc: Removed the --without-tcmalloc flag until further advised.
John Wilkins [Fri, 4 Jan 2013 00:13:13 +0000 (16:13 -0800)]
doc: Removed the --without-tcmalloc flag until further advised.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoMerge pull request #30 from rca/master
Sage Weil [Fri, 4 Jan 2013 00:07:59 +0000 (16:07 -0800)]
Merge pull request #30 from rca/master

Minor clarification in docs.

12 years agodoc: Added defaults for PGs, links to recommended settings, and updated note on split...
John Wilkins [Thu, 3 Jan 2013 22:51:33 +0000 (14:51 -0800)]
doc: Added defaults for PGs, links to recommended settings, and updated note on splitting.

Fixes: #3555
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoOSD: for old osds, dispatch peering messages immediately
Samuel Just [Thu, 3 Jan 2013 17:59:45 +0000 (09:59 -0800)]
OSD: for old osds, dispatch peering messages immediately

Normally, we batch up peering messages until the end of
process_peering_events to allow us to combine many notifies, etc
to the same osd into the same message.  However, old osds assume
that the actiavtion message (log or info) will be _dispatched
before the first sub_op_modify of the interval.  Thus, for those
peers, we need to send the peering messages before we drop the
pg lock, lest we issue a client repop from another thread before
activation message is sent.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agodoc: Added comments on --without-tcmalloc option when building Ceph.
John Wilkins [Thu, 3 Jan 2013 21:30:14 +0000 (13:30 -0800)]
doc: Added comments on --without-tcmalloc option when building Ceph.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoUpdate doc/rados/configuration/filesystem-recommendations.rst 30/head
rca [Thu, 3 Jan 2013 21:30:01 +0000 (13:30 -0800)]
Update doc/rados/configuration/filesystem-recommendations.rst

Clarified when it's necessary to use the setting:

filestore xattr use omap = true

12 years agodoc: Added some packages to the copyable line.
John Wilkins [Thu, 3 Jan 2013 21:29:20 +0000 (13:29 -0800)]
doc: Added some packages to the copyable line.

Fixes: #3686
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Fixed syntax error.
John Wilkins [Thu, 3 Jan 2013 21:28:06 +0000 (13:28 -0800)]
doc: Fixed syntax error.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoMerge remote-tracking branch 'gh/wip-3714-b' into next
Sage Weil [Thu, 3 Jan 2013 20:53:07 +0000 (12:53 -0800)]
Merge remote-tracking branch 'gh/wip-3714-b' into next

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years ago qa/workunit: Add dbench-short.sh for nfs suite
David Zafman [Thu, 3 Jan 2013 20:44:19 +0000 (12:44 -0800)]
qa/workunit:  Add dbench-short.sh for nfs suite

    A multi-client dbench run doesn't work over NFS,
    see bug #3718.  Make single client dbench available.

Signed-off-by: David Zafman <david.zafman@inktank.com>
12 years agoosd: move common active vs booting code into consume_map
Sage Weil [Thu, 3 Jan 2013 06:38:53 +0000 (22:38 -0800)]
osd: move common active vs booting code into consume_map

Push osdmaps to PGs in separate method from activate_map() (whose name
is becoming less and less accurate).

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: let pgs process map advances before booting
Sage Weil [Thu, 3 Jan 2013 06:20:06 +0000 (22:20 -0800)]
osd: let pgs process map advances before booting

The OSD deliberate consumes and processes most OSDMaps from while it
was down before it marks itself up, as this is can be slow.  The new
threading code does this asynchronously in peering_wq, though, and
does not let it drain before booting the OSD.  The OSD can get into
a situation where it marks itself up but is not responsive or useful
because of the backlog, and only makes the situation works by
generating more osdmaps as result.

Fix this by calling activate_map() even when booting, and when booting
draining the peering_wq on each call.  This is harmless since we are
not yet processing actual ops; we only need to be async when active.

Fixes: #3714
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: drop oldest_last_clean from activate_map
Sage Weil [Thu, 3 Jan 2013 06:04:34 +0000 (22:04 -0800)]
osd: drop oldest_last_clean from activate_map

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: drop unused variables from activate_map
Sage Weil [Thu, 3 Jan 2013 06:04:08 +0000 (22:04 -0800)]
osd: drop unused variables from activate_map

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoOSDMap: fix modifed -> modified typo
Sage Weil [Thu, 3 Jan 2013 05:09:07 +0000 (21:09 -0800)]
OSDMap: fix modifed -> modified typo

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Thu, 3 Jan 2013 02:13:25 +0000 (18:13 -0800)]
Merge remote-tracking branch 'gh/next'

12 years agolog: fix locking typo/stupid for dump_recent()
Sage Weil [Wed, 2 Jan 2013 21:58:44 +0000 (13:58 -0800)]
log: fix locking typo/stupid for dump_recent()

We weren't locking m_flush_mutex properly, which in turn was leading to
racing threads calling dump_recent() and garbling the crash dump output.

Backport: bobtail, argonaut
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
12 years agoMerge branch 'master' of https://github.com/ceph/ceph
John Wilkins [Wed, 2 Jan 2013 23:59:59 +0000 (15:59 -0800)]
Merge branch 'master' of https://github.com/ceph/ceph

12 years agodoc: Added a memory profiling section. Ported from the wiki.
John Wilkins [Wed, 2 Jan 2013 23:58:03 +0000 (15:58 -0800)]
doc: Added a memory profiling section. Ported from the wiki.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added memory profiling to the index.
John Wilkins [Wed, 2 Jan 2013 23:57:22 +0000 (15:57 -0800)]
doc: Added memory profiling to the index.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoqa/workunit: Update pjd script to use new tarball
Sam Lang [Wed, 2 Jan 2013 20:39:12 +0000 (14:39 -0600)]
qa/workunit:  Update pjd script to use new tarball

The pjd script now uses the latest version of pjd
with an additional test for opening a non-existent
file.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agofuse: Fix cleanup code path on init failure
Sam Lang [Wed, 2 Jan 2013 22:07:13 +0000 (16:07 -0600)]
fuse: Fix cleanup code path on init failure

With the changes from 856f32ab, the cfuse.init call returns
a _positive_ errno, which was getting ignored.  Also, if an
error occurs during cfuse.init(), we need to teardown the client
mount.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agolibrbd: establish watch before reading header
Josh Durgin [Wed, 2 Jan 2013 22:15:24 +0000 (14:15 -0800)]
librbd: establish watch before reading header

This eliminates a window in which a race could occur when we have an
image open but no watch established. The previous fix (using
assert_version) did not work well with resend operations.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoMerge branch 'wip-journal-aio' into next
Sage Weil [Wed, 2 Jan 2013 21:42:22 +0000 (13:42 -0800)]
Merge branch 'wip-journal-aio' into next

Reviewed-by: Samuel Just <sam.just@inktank.com>
Backport: bobtail

12 years agotest_filejournal: optionally specify journal filename as an argument
Sage Weil [Sat, 29 Dec 2012 00:48:22 +0000 (16:48 -0800)]
test_filejournal: optionally specify journal filename as an argument

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agotest_filejournal: test journaling bl with >IOV_MAX segments
Sage Weil [Sat, 29 Dec 2012 00:48:05 +0000 (16:48 -0800)]
test_filejournal: test journaling bl with >IOV_MAX segments

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoos/FileJournal: limit size of aio submission
Sage Weil [Sat, 29 Dec 2012 00:47:28 +0000 (16:47 -0800)]
os/FileJournal: limit size of aio submission

Limit size of each aio submission to IOV_MAX-1 (to be safe).  Take care to
only mark the last aio with the seq to signal completion.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoRevert "librbd: ensure header is up to date after initial read"
Josh Durgin [Wed, 2 Jan 2013 20:32:33 +0000 (12:32 -0800)]
Revert "librbd: ensure header is up to date after initial read"

Using assert version for linger ops doesn't work with retries,
since the version will change after the first send.
This reverts commit e1776809031c6dad441cfb2b9fac9612720b9083.

Conflicts:

qa/workunits/rbd/watch_correct_version.sh

12 years agodoc: Minor edits.
John Wilkins [Wed, 2 Jan 2013 19:24:39 +0000 (11:24 -0800)]
doc: Minor edits.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>