]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
11 years agoosd/ReplicatedPG: check clones for degraded 1688/head
Sage Weil [Thu, 17 Apr 2014 20:11:54 +0000 (13:11 -0700)]
osd/ReplicatedPG: check clones for degraded

We check whether the head is degraded, and we check whether a clone is
unreadable, but in the case where we have a cache op on a degraded object,
we don't check.  That leads to an assert when the repop hits the replica
and the object is in the peer's missing set.

Fix this by adding a check on the clone when write_ordered is true.  Note
that checking write_ordered is better than whether it is a cache op because
we want to preserve write ordering even for reads that are flagged by the
client.

Fixes: #8048
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1687 from ceph/wip-8130
Yehuda Sadeh [Thu, 17 Apr 2014 17:50:40 +0000 (10:50 -0700)]
Merge pull request #1687 from ceph/wip-8130

osdc/Objecter: fix osd target for newly-homeless op

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
11 years agoosdc/Objecter: fix osd target for newly-homeless op 1687/head
Sage Weil [Thu, 17 Apr 2014 17:48:26 +0000 (10:48 -0700)]
osdc/Objecter: fix osd target for newly-homeless op

If we recalculate the mapping and find that there is no primary, we need
to set the 'osd' field to -1.  Otherwise, the caller will try to resend
to a dead session with bad results.

This was introduced in the refactor 860d72770c.

Fixes: #8130
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1684 from onlyjob/debian
Sage Weil [Thu, 17 Apr 2014 17:07:40 +0000 (10:07 -0700)]
Merge pull request #1684 from onlyjob/debian

spelling corrections

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1671 from ceph/wip-7699
Sage Weil [Thu, 17 Apr 2014 17:05:22 +0000 (10:05 -0700)]
Merge pull request #1671 from ceph/wip-7699

mds: Fix respawn (add path resolution)

Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1677 from ceph/wip-poolset-noblock
Sage Weil [Thu, 17 Apr 2014 17:03:26 +0000 (10:03 -0700)]
Merge pull request #1677 from ceph/wip-poolset-noblock

mon: Don't block on EAGAIN from `osd pool set`

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agomon: EBUSY instead of EAGAIN when pgs creating 1677/head
John Spray [Thu, 17 Apr 2014 14:28:22 +0000 (15:28 +0100)]
mon: EBUSY instead of EAGAIN when pgs creating

In 69321bf, EAGAIN changed behaviour to block indefinitely
rather than returning to user.  Change the return for
`osd pool set` operations that are blocked by creating PGs
to return EBUSY instead of EAGAIN, so that they are excepted
from this blocking behaviour.

Signed-off-by: John Spray <john.spray@inktank.com>
11 years agoMerge pull request #1675 from guangyy/wip-bench
Gregory Farnum [Thu, 17 Apr 2014 04:57:41 +0000 (21:57 -0700)]
Merge pull request #1675 from guangyy/wip-bench

Make rados/rest bench work for multiple write instances without metadata conflict.

Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agospelling corrections 1684/head
Dmitry Smirnov [Thu, 17 Apr 2014 02:43:30 +0000 (12:43 +1000)]
spelling corrections

11 years agoMerge pull request #1681 from ceph/wip-8043
Samuel Just [Thu, 17 Apr 2014 01:16:11 +0000 (18:16 -0700)]
Merge pull request #1681 from ceph/wip-8043

mon/OSDMonitor: require force argument to split a cache pool

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agoMerge pull request #1682 from ceph/wip-8020
Sage Weil [Thu, 17 Apr 2014 01:13:01 +0000 (18:13 -0700)]
Merge pull request #1682 from ceph/wip-8020

OSD: split pg stats during pg split

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoOSD: split pg stats during pg split 1682/head
Samuel Just [Mon, 7 Apr 2014 23:37:46 +0000 (16:37 -0700)]
OSD: split pg stats during pg split

Fixes: #8020
Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoosd_types::osd_stat_sum_t: fix floor for num_objects_omap
Samuel Just [Thu, 17 Apr 2014 01:04:35 +0000 (18:04 -0700)]
osd_types::osd_stat_sum_t: fix floor for num_objects_omap

Introduced in a130a4452e4fb159dc62fb417077d98dc9ebd621
Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoMerge branch 'wip-8100'
David Zafman [Wed, 16 Apr 2014 22:09:09 +0000 (15:09 -0700)]
Merge branch 'wip-8100'

Reviewed-by: Mark Nelson <mark.nelson@inktank.com>
11 years agocommon/obj_bencher: Fix error return check from read that is negative on error
David Zafman [Wed, 16 Apr 2014 21:02:13 +0000 (14:02 -0700)]
common/obj_bencher: Fix error return check from read that is negative on error

Fixed read return value in d99f1d9f68db41231e0ffff4082b05d6d095c231

Fixes: #8100
Signed-off-by: David Zafman <david.zafman@inktank.com>
11 years agoMerge pull request #1680 from ceph/wip-7786
Sage Weil [Wed, 16 Apr 2014 18:49:58 +0000 (11:49 -0700)]
Merge pull request #1680 from ceph/wip-7786

civetweb: update subproject

11 years agoosd/ReplicatedPG: add missing whitespace in debug output
David Zafman [Wed, 16 Apr 2014 18:08:23 +0000 (11:08 -0700)]
osd/ReplicatedPG: add missing whitespace in debug output

Signed-off-by: David Zafman <david.zafman@inktank.com>
11 years agoUse string instead of char* when saving arguments for rest-bench 1675/head
Guang Yang [Wed, 16 Apr 2014 01:28:16 +0000 (01:28 +0000)]
Use string instead of char* when saving arguments for rest-bench

11 years agomon/OSDMonitor: require force argument to split a cache pool 1681/head
Sage Weil [Tue, 15 Apr 2014 20:57:21 +0000 (13:57 -0700)]
mon/OSDMonitor: require force argument to split a cache pool

There are several perils when splitting a cache pool:

 - split invalidstes pg stats, which disables the agent
 - a scrub must be manually triggered post-split to rebuild stats
 - the pool may fill the OSDs during that period.
 - or, the pool may end up beyond the 'full' mark and once scrub does
   complete and the agent activate we may block IO for a long time while
   we catch up with flush/evict

Make it a bit harder for users to shoot themselves in the foot.

Fixes: #8043
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agomds: Fix respawn (add path resolution) 1671/head
John Spray [Mon, 14 Apr 2014 16:14:42 +0000 (17:14 +0100)]
mds: Fix respawn (add path resolution)

Previously assumed that ceph-mds executable was in
PWD - now use /proc/self/exe to find the
executable whereever it may be.  Leave in old version
as a fallback for non-linux environments.

Also add a 'respawn' command so that it's easy to test
respawn with `ceph mds tell <id> respawn`

Fixes: #7966
11 years agoMake rados/rest bench work for multiple write instances without metadata conflict.
Guang Yang [Tue, 15 Apr 2014 07:48:37 +0000 (07:48 +0000)]
Make rados/rest bench work for multiple write instances without metadata conflict.
Signed-off-by: Guang Yang <yguang@yahoo-inc.com>
11 years agoMerge pull request #1666 from ceph/wip-mds
Yan, Zheng [Tue, 15 Apr 2014 00:13:01 +0000 (08:13 +0800)]
Merge pull request #1666 from ceph/wip-mds

Wip mds

11 years agoMerge pull request #1673 from ceph/wip-stress-watch
Samuel Just [Mon, 14 Apr 2014 23:12:31 +0000 (16:12 -0700)]
Merge pull request #1673 from ceph/wip-stress-watch

ceph_test_stress_watch: test over cache pool

Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoMerge pull request #1667 from ceph/wip-8089
Samuel Just [Mon, 14 Apr 2014 23:11:47 +0000 (16:11 -0700)]
Merge pull request #1667 from ceph/wip-8089

osd: fix dup request ahndling for ENOENT and cache ops

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agoMerge pull request #1654 from ceph/wip-7940
Samuel Just [Mon, 14 Apr 2014 23:10:42 +0000 (16:10 -0700)]
Merge pull request #1654 from ceph/wip-7940

Wip 7940

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agoMerge pull request #1664 from ceph/wip-8085
Samuel Just [Mon, 14 Apr 2014 23:09:50 +0000 (16:09 -0700)]
Merge pull request #1664 from ceph/wip-8085

osd: handle misdirected pg command

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agoMerge pull request #1660 from ceph/wip-hitset-missing
Samuel Just [Mon, 14 Apr 2014 23:07:41 +0000 (16:07 -0700)]
Merge pull request #1660 from ceph/wip-hitset-missing

osd: handle hitset-get on a missing hit_set object

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agoceph_test_stress_watch: test over cache pool 1673/head
Sage Weil [Mon, 14 Apr 2014 22:57:28 +0000 (15:57 -0700)]
ceph_test_stress_watch: test over cache pool

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1661 from ceph/wip-objecter
Josh Durgin [Mon, 14 Apr 2014 22:36:10 +0000 (15:36 -0700)]
Merge pull request #1661 from ceph/wip-objecter

objecter: make linger watch the correct pool/object

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoMerge pull request #1672 from ceph/wip-strerror
Josh Durgin [Mon, 14 Apr 2014 20:57:36 +0000 (13:57 -0700)]
Merge pull request #1672 from ceph/wip-strerror

Use cpp_strerror() wherever possible, and use autoconf for portability

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoUse cpp_strerror() wherever possible, and use autoconf for portability 1672/head
Dan Mick [Wed, 9 Apr 2014 04:06:55 +0000 (21:06 -0700)]
Use cpp_strerror() wherever possible, and use autoconf for portability

strerror_r is not portable; on Gnu libc it returns char * and sometimes
does not fill in the supplied buffer.  Use autoconf to test which
version this platform uses and adapt.

Clean up the random calls to strerror and strerror_r (along with all
their private little one-use buffers) and regularize the code to use
cpp_strerror almost everywhere.  Where changed, any negation of the
error code is also removed, since cpp_strerror() will do that.

Note: some tools were using their own calls to strerror/strerror_r, so
will now get a (%d) in their output that wasn't there before; hence
the change to test/cli/monmaptool/print-nonexistent.t

Fixes: #8041
Signed-off-by: Dan Mick <dan.mick@inktank.com>
11 years agoMerge pull request #1668 from ceph/wip-librados-tests
Josh Durgin [Mon, 14 Apr 2014 18:44:34 +0000 (11:44 -0700)]
Merge pull request #1668 from ceph/wip-librados-tests

ceph_test_rados_api_*: fix build warnings and memset ranges

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoMerge pull request #1622 from dachary/wip-mailmap
Loic Dachary [Mon, 14 Apr 2014 09:52:17 +0000 (11:52 +0200)]
Merge pull request #1622 from dachary/wip-mailmap

mailmap updates

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
11 years agomds: don't modify inode when calculating client ranges 1666/head
Yan, Zheng [Mon, 14 Apr 2014 09:27:08 +0000 (17:27 +0800)]
mds: don't modify inode when calculating client ranges

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agoMerge pull request #1669 from ceph/wip-client-debug
Yan, Zheng [Mon, 14 Apr 2014 08:39:40 +0000 (16:39 +0800)]
Merge pull request #1669 from ceph/wip-client-debug

client: print inode max_size

11 years agoclient: print inode max_size 1669/head
Yan, Zheng [Mon, 14 Apr 2014 08:38:22 +0000 (16:38 +0800)]
client: print inode max_size

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agoosd/ReplicatedPG: add missing whitespace in debug output
Sage Weil [Mon, 14 Apr 2014 04:59:23 +0000 (21:59 -0700)]
osd/ReplicatedPG: add missing whitespace in debug output

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados_api_*: fix build warnings, memset ranges 1668/head
Sage Weil [Mon, 14 Apr 2014 04:37:31 +0000 (21:37 -0700)]
ceph_test_rados_api_*: fix build warnings, memset ranges

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: handle dup ops earlier in do_op 1667/head
Sage Weil [Mon, 14 Apr 2014 04:31:35 +0000 (21:31 -0700)]
osd/ReplicatedPG: handle dup ops earlier in do_op

Current the dup op checks happen in execute_ctx, long after we handle
cache ops or get the obc and (potentially) return ENOENT.  That means that
object deletions and cache ops both aren't properly idempotent.

This is easy to fix by moving the check earlier in do_op.

Fixes: #8089
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agomds: don't issue/revoke caps before client has caps
Yan, Zheng [Sun, 13 Apr 2014 12:07:33 +0000 (20:07 +0800)]
mds: don't issue/revoke caps before client has caps

If early reply is not allowed, MDS does not send reply to client immediately
after Locker::issue_new_caps adds new caps. So MDS can revoke the caps before
sending reply to client.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: do file recover after authpin inode
Yan, Zheng [Sun, 13 Apr 2014 07:45:33 +0000 (15:45 +0800)]
mds: do file recover after authpin inode

MDCache::do_file_recover may call Locker::evel_gather, which may change
filelock to stable state. So we should authpin the inode (for unstable
lock state) first.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agoosd/ReplicatedPG: handle misdirected do_command 1664/head
Sage Weil [Sun, 13 Apr 2014 05:23:26 +0000 (22:23 -0700)]
osd/ReplicatedPG: handle misdirected do_command

We can get a query on a pg we still have but are no longer primary for.  If
that happens, do not reply.  The client will resend to the correct OSD
assuming it has the map.  Send them the latest incremental so that we know
they know there is something new.  We don't know the exact epoch they have,
unfortunately, because MCommand doesn't include it, but a newer inc is
enough to make them request the right incrementals from a mon.  Eventually
they will figure it out and Objecter will resend the request to the
correct target.

It is possible we should include epoch in the MCommand message so that we
can do this mapping "correctly" (as in, the same way MOSDOp does).  That
makes MCommand less general, though... a PG-specific command message might
be the most precise thing.  Another day...

Fixes: #8085
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agomds: fix typo in Server::do_rename_rollback() 1662/head
Yan, Zheng [Sat, 12 Apr 2014 06:10:55 +0000 (14:10 +0800)]
mds: fix typo in Server::do_rename_rollback()

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agoMerge pull request #1659 from ceph/wip-8054
Sage Weil [Sat, 12 Apr 2014 05:33:27 +0000 (22:33 -0700)]
Merge pull request #1659 from ceph/wip-8054

mds: finish table servers recovery after creating newfs

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: handle missing hit_set on HITSET_GET rados op 1660/head
Sage Weil [Sat, 12 Apr 2014 00:46:44 +0000 (17:46 -0700)]
osd/ReplicatedPG: handle missing hit_set on HITSET_GET rados op

Fixes: #8081
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1655 from ceph/wip-8077
Samuel Just [Sat, 12 Apr 2014 00:33:10 +0000 (17:33 -0700)]
Merge pull request #1655 from ceph/wip-8077

osd: handle missing hti_set objects in agent_load_hit_sets

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agoceph_test_rados_api_watch_notify: test over cache pool 1661/head
Sage Weil [Sat, 12 Apr 2014 00:11:24 +0000 (17:11 -0700)]
ceph_test_rados_api_watch_notify: test over cache pool

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agotest/librados/TestCase: add Param option that can set up a cache pool
Sage Weil [Sat, 12 Apr 2014 00:05:21 +0000 (17:05 -0700)]
test/librados/TestCase: add Param option that can set up a cache pool

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agotest: Add --pool-snaps option to ceph_test_rados 1654/head
David Zafman [Fri, 11 Apr 2014 22:37:23 +0000 (15:37 -0700)]
test: Add --pool-snaps option to ceph_test_rados

Fixes: #7940
Signed-off-by: David Zafman <david.zafman@inktank.com>
11 years agotest: Fix ceph_test_rados to not core dump with invalid arguments
David Zafman [Tue, 8 Apr 2014 22:19:39 +0000 (15:19 -0700)]
test: Fix ceph_test_rados to not core dump with invalid arguments

Signed-off-by: David Zafman <david.zafman@inktank.com>
11 years agolibrados: Add ObjectWriteOperation::snap_rollback() for pool snapshots
David Zafman [Fri, 11 Apr 2014 00:16:33 +0000 (17:16 -0700)]
librados: Add ObjectWriteOperation::snap_rollback() for pool snapshots

snap_rollback() is the same as selfmanaged_snap_rollback() but we want an
independent interface for pool snapshots.  Should really take snapname
for consistency with other pool snapshot interfaces.

Signed-off-by: David Zafman <david.zafman@inktank.com>
11 years agolibrados: Rollback interface additions
David Zafman [Fri, 11 Apr 2014 23:20:14 +0000 (16:20 -0700)]
librados: Rollback interface additions

Add C interface rados_ioctx_snap_rollback() and indicate that rados_rollback()
is deprecated.

Add C++ interface IoCtx::snap_rollback() and indicate that IoCtx::rollback()
is deprecated.

Modify snapshot test case to use new function names.

Signed-off-by: David Zafman <david.zafman@inktank.com>
11 years agoMerge pull request #1658 from ceph/wip-8008
Samuel Just [Fri, 11 Apr 2014 22:50:58 +0000 (15:50 -0700)]
Merge pull request #1658 from ceph/wip-8008

osd: fix repair_object

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agoMerge pull request #1657 from ceph/wip-8063
Samuel Just [Fri, 11 Apr 2014 22:50:38 +0000 (15:50 -0700)]
Merge pull request #1657 from ceph/wip-8063

ceph_test_rados_api_tier: fix scrub test

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agoosd/PG: fix repair_object when missing on primary 1658/head
Sage Weil [Fri, 11 Apr 2014 22:39:23 +0000 (15:39 -0700)]
osd/PG: fix repair_object when missing on primary

If the object is missing on the primary, we need to fully populate the
missing_loc.needs_recovery_map.  This broke with the recent refactoring of
recovery for EC, somewhere around 84e2f39c557c79e9ca7c3c3f0eb0bfa4860bf899.

Fixes: #8008
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_librados_tier: tolerage EAGAIN from pg scrub command 1657/head
Sage Weil [Fri, 11 Apr 2014 21:48:26 +0000 (14:48 -0700)]
ceph_test_librados_tier: tolerage EAGAIN from pg scrub command

We may get EAGAIN if the osd happens to be down, for example due to
thrashing.  Try a few times and then give up.

Note that the other place we try to scrub we don't even check the return
value as we are poking ever pg in the pool.  And the scrub commands get
lost due to any peering event, etc.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1656 from ceph/wip-osd-boot
Gregory Farnum [Fri, 11 Apr 2014 21:44:01 +0000 (14:44 -0700)]
Merge pull request #1656 from ceph/wip-osd-boot

mon: fix osd boot check

Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agomon/OSDMonitor: fix osd epoch in boot check 1656/head
Sage Weil [Fri, 11 Apr 2014 21:32:21 +0000 (14:32 -0700)]
mon/OSDMonitor: fix osd epoch in boot check

This was introduced in 4c99e978a77a242e540cb8ccacb967d24322416c and was
incorrect; boot_epoch is the previous epoch the osd booted in, not the
latest map epoch that the OSD currently has.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: skip missing hit_sets when loading into memory 1655/head
Sage Weil [Fri, 11 Apr 2014 20:14:58 +0000 (13:14 -0700)]
osd/ReplicatedPG: skip missing hit_sets when loading into memory

We weren't handling hit_sets that were missing.

Two changes here:

1- Load the hit_sets oldest to newest.  That means that if we stop partway
   through loading, and then add another to the end of the list, and then
   try again to load some more, we will still catch them all.
2- If the object is missing, stop.  We'll try again the next time
   agent_work() is called.

Fixes: #8077
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agomds: finish table servers recovery after creating newfs 1659/head
Yan, Zheng [Fri, 11 Apr 2014 01:43:59 +0000 (09:43 +0800)]
mds: finish table servers recovery after creating newfs

Fixes: #8054
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agoRevert "mds: finish table servers recovery after creating newfs"
Sage Weil [Fri, 11 Apr 2014 17:28:49 +0000 (10:28 -0700)]
Revert "mds: finish table servers recovery after creating newfs"

This reverts commit f6c20730c16a7632061639dd83be523fc6a9a44f.

This breaks single MDS startup.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1650 from dachary/wip-erasure-code-doc
Loic Dachary [Fri, 11 Apr 2014 17:20:35 +0000 (19:20 +0200)]
Merge pull request #1650 from dachary/wip-erasure-code-doc

erasure-code: document the ruleset-root profile parameter

Reviewed-by: Mark Nelson <mark.nelson@inktank.com>
11 years agoMerge pull request #1630 from ceph/wip-7450
Josh Durgin [Fri, 11 Apr 2014 17:04:57 +0000 (10:04 -0700)]
Merge pull request #1630 from ceph/wip-7450

Wip 7450

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoMerge pull request #1635 from ceph/wip-7437
David Zafman [Fri, 11 Apr 2014 15:33:45 +0000 (08:33 -0700)]
Merge pull request #1635 from ceph/wip-7437

Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoMerge pull request #1641 from ceph/wip-multimds
Sage Weil [Fri, 11 Apr 2014 13:59:11 +0000 (06:59 -0700)]
Merge pull request #1641 from ceph/wip-multimds

mds: guarantee message ordering when importing non-auth caps

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1645 from ceph/wip-8054
Sage Weil [Fri, 11 Apr 2014 13:56:39 +0000 (06:56 -0700)]
Merge pull request #1645 from ceph/wip-8054

mds: finish table servers recovery after creating newfs

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agodoc: Add additional information over CloudStack and RBD
Wido den Hollander [Fri, 11 Apr 2014 11:59:31 +0000 (13:59 +0200)]
doc: Add additional information over CloudStack and RBD

11 years agoerasure-code: document the ruleset-root profile parameter 1650/head
Loic Dachary [Fri, 11 Apr 2014 11:51:46 +0000 (13:51 +0200)]
erasure-code: document the ruleset-root profile parameter

If unspecified it is ruleset-root=default and will translate into

   take default

when a ruleset is created for an erasure-code pool.

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoMerge pull request #1647 from ceph/wip-lockdep
Josh Durgin [Fri, 11 Apr 2014 07:11:18 +0000 (00:11 -0700)]
Merge pull request #1647 from ceph/wip-lockdep

a couple of lockdep fixes

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoRWLock: make lockdep id mutable 1647/head
Sage Weil [Fri, 11 Apr 2014 04:36:37 +0000 (21:36 -0700)]
RWLock: make lockdep id mutable

This allows us to keep the lock/unlock methods const, as per commit
970d53fc0fefc89ffe7550880a4aaa36bd534955.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoRevert "RWLock: don't assign the lockdep id more than once"
Sage Weil [Fri, 11 Apr 2014 04:34:51 +0000 (21:34 -0700)]
Revert "RWLock: don't assign the lockdep id more than once"

This reverts commit 957ac3cbe394473f225ffd2b632461fcdaca99e6.

It's important to assign these for all operations for cases where
g_lockdep isn't yet true when the constructor runs.  This is true
for the HeartbeatMap rwlock, among other things, as that thread
is created during early startup before lockdep is enabled.  All
of the lockdep hooks assume that they can assign ids on the fly
and not tracking them here breaks things.

Conflicts:

src/common/RWLock.h

11 years agocommon_init: remove dup lockdep message
Sage Weil [Fri, 11 Apr 2014 04:26:18 +0000 (21:26 -0700)]
common_init: remove dup lockdep message

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1646 from dmick/wip-erasure-doc
John Wilkins [Fri, 11 Apr 2014 03:02:56 +0000 (20:02 -0700)]
Merge pull request #1646 from dmick/wip-erasure-doc

doc: Wordsmith the erasure-code doc a bit.

11 years agoWordsmith the erasure-code doc a bit 1646/head
Dan Mick [Thu, 3 Apr 2014 18:24:33 +0000 (11:24 -0700)]
Wordsmith the erasure-code doc a bit

Signed-off-by: Dan Mick <dan.mick@inktank.com>
11 years agomds: finish table servers recovery after creating newfs 1645/head
Yan, Zheng [Fri, 11 Apr 2014 01:43:59 +0000 (09:43 +0800)]
mds: finish table servers recovery after creating newfs

Fixes: #8054
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agoMerge pull request #1643 from ceph/wip-8062
Sage Weil [Fri, 11 Apr 2014 01:25:23 +0000 (18:25 -0700)]
Merge pull request #1643 from ceph/wip-8062

mon/OSDMonitor: ignore boot message from before last up_from

Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agomds: issue new caps before starting log entry 1641/head
Yan, Zheng [Fri, 11 Apr 2014 00:21:40 +0000 (08:21 +0800)]
mds: issue new caps before starting log entry

Locker::issue_new_caps() calls Locker::eval(), which may dispatch
other requests.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agotest: Add EC testing to ceph_test_rados_api_aio 1635/head
David Zafman [Tue, 8 Apr 2014 00:00:45 +0000 (17:00 -0700)]
test: Add EC testing to ceph_test_rados_api_aio

Fixes: #7437
Signed-off-by: David Zafman <david.zafman@inktank.com>
11 years agotest: Add multiple write test cases to ceph_test_rados_api_aio
David Zafman [Mon, 7 Apr 2014 21:16:02 +0000 (14:16 -0700)]
test: Add multiple write test cases to ceph_test_rados_api_aio

Signed-off-by: David Zafman <david.zafman@inktank.com>
11 years agotest, librados: aio read *return_value consistency, fix ceph_test_rados_api_aio
David Zafman [Wed, 2 Apr 2014 05:28:18 +0000 (22:28 -0700)]
test, librados: aio read *return_value consistency, fix ceph_test_rados_api_aio

test:
  Add set_completion*PP() functions to cast arg to correct class
  Add return_value checks
  Add some reads with buffers larger than object size
  Check buffer length on reads
librados:
  Make sure *return_value() has bytes read in all cases

Signed-off-by: David Zafman <david.zafman@inktank.com>
11 years agotest: Add EC unaligned append write test to ceph_test_rados_api_io
David Zafman [Sat, 5 Apr 2014 02:08:54 +0000 (19:08 -0700)]
test: Add EC unaligned append write test to ceph_test_rados_api_io

Signed-off-by: David Zafman <david.zafman@inktank.com>
11 years agopybind, test: Add python binding for append and add to test
David Zafman [Tue, 8 Apr 2014 17:44:47 +0000 (10:44 -0700)]
pybind, test: Add python binding for append and add to test

Signed-off-by: David Zafman <david.zafman@inktank.com>
11 years agopybind: Check that "key" is a string
David Zafman [Wed, 9 Apr 2014 18:42:01 +0000 (11:42 -0700)]
pybind: Check that "key" is a string

Signed-off-by: David Zafman <david.zafman@inktank.com>
11 years agolibrados, test: Have write, append and write_full return 0 on success
David Zafman [Wed, 2 Apr 2014 18:54:51 +0000 (11:54 -0700)]
librados, test: Have write, append and write_full return 0 on success

Fix consistency of write, append, write_full, all return 0 on success
Include C (rados_*) variants, C++ ctx variants
and aio get_return_value() and rados_aio_get_return_value()

Signed-off-by: David Zafman <david.zafman@inktank.com>
11 years agocivetweb: update subproject 1680/head
Yehuda Sadeh [Thu, 10 Apr 2014 22:54:01 +0000 (15:54 -0700)]
civetweb: update subproject

Fixes: #7786
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
11 years agomon/OSDMonitor: ignore boot message from before last up_from 1643/head
Sage Weil [Thu, 10 Apr 2014 20:34:58 +0000 (13:34 -0700)]
mon/OSDMonitor: ignore boot message from before last up_from

It is possible we will have a dup OSDBoot message queued up in the mon
and will process it again after that osd was marked up and then down.  If
that happens, we should ignore this message, not mark the osd back in with
the same address.

Fixes: #8062
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1624 from ceph/wip-6789
Sage Weil [Thu, 10 Apr 2014 18:01:43 +0000 (11:01 -0700)]
Merge pull request #1624 from ceph/wip-6789

mon: Monitor: suicide on start if mon has been removed from monmap

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: adjust obc + snapset_obc locking strategy
Sage Weil [Wed, 9 Apr 2014 19:38:10 +0000 (12:38 -0700)]
osd/ReplicatedPG: adjust obc + snapset_obc locking strategy

Prevoiusly we assumed that if we had snapset_obc set, !exists on the head
and if we got the snapdir lock we were good to take the head lock too.
This is no the case when:

 - delete queued
   - takes wr lock on both head and snapdir
 - delete commits (but not yet applied)
 - stat
   - tries to take wr lock on head
     - blocks, toggles w=1 state on *head only*
 - copy-from
   - tries to take wr lock on snapdir, succeeds
   - tries to take wr lock on head, fails because w=1
     - fails the assert(got)

The problem is that the read and write paths are taking different locks
and we are expecting them to operate in synchrony.

Fix this by using the same ordering for reads as well as write: if the
snapset_obc is defined, take the read lock on that too, just as we do with
a write.

Fixes: #8046
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agomon: Monitor: suicide on start if mon has been removed from monmap 1624/head
Joao Eduardo Luis [Thu, 10 Apr 2014 14:14:19 +0000 (15:14 +0100)]
mon: Monitor: suicide on start if mon has been removed from monmap

If the monitor has been marked as having been part of an existing quorum
and is no longer in the monmap, then it is safe to assume the monitor
was removed from the monmap.  In that event, do not allow the monitor
to start, as it will try to find its way into the quorum again (and
someone clearly stated they don't really want them there), unless
'mon force quorum join' is specified.

Fixes: 6789
Backport: dumpling, emperor

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
11 years agomds: guarantee message ordering when importing non-auth caps
Yan, Zheng [Thu, 10 Apr 2014 08:03:51 +0000 (16:03 +0800)]
mds: guarantee message ordering when importing non-auth caps

Current code allow importing non-auth caps when inode is being exported.
This can breaks message ordering because the corresponding cap import
messages are sent after the flush session messages. So they can arrive
at clients after clients have already received cap import messages from
new auth MDS of the inode.

The quick fix is ignore MExportCaps when inode is frozen.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agoMerge pull request #1639 from ceph/wip-multimds
Sage Weil [Thu, 10 Apr 2014 04:19:42 +0000 (21:19 -0700)]
Merge pull request #1639 from ceph/wip-multimds

Wip multimds

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agomds: include truncate_seq/truncate_size in filelock's state 1639/head
Yan, Zheng [Thu, 10 Apr 2014 02:56:18 +0000 (10:56 +0800)]
mds: include truncate_seq/truncate_size in filelock's state

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: remove wrong assertion for remote frozen authpin
Yan, Zheng [Thu, 10 Apr 2014 03:09:28 +0000 (11:09 +0800)]
mds: remove wrong assertion for remote frozen authpin

For across authority rename, the MDS first freezes the source inode's
authpin. It happens while the source dentry isn't locked. So when the
inode's authpin become frozen, the source dentry may have changed and
be linked to a different inode.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agoosdc/Objecter: move mapping into struct, helper
Sage Weil [Thu, 10 Apr 2014 01:02:27 +0000 (18:02 -0700)]
osdc/Objecter: move mapping into struct, helper

Move the common bits of Op and LingerOp into op_target_t and separate the
actual mapping calculation into calc_target().  This hugely simplifies
recal_*op_target() by mostly just shuffling all of the same logic into
that helper.

There is one functional change in this patch: recalc_linger_op() now is
aware of the tiering logic that was previously only handled in
recalc_op_target().

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1637 from ceph/wip-8042
Gregory Farnum [Thu, 10 Apr 2014 00:21:57 +0000 (17:21 -0700)]
Merge pull request #1637 from ceph/wip-8042

mon: fix election required_features checks

Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoMerge pull request #1636 from ceph/wip-6480
Sage Weil [Wed, 9 Apr 2014 23:25:24 +0000 (16:25 -0700)]
Merge pull request #1636 from ceph/wip-6480

fix auth races that may have lead to qemu crashes

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agomon: tell peers missing features during probe 1637/head
Sage Weil [Wed, 9 Apr 2014 23:03:05 +0000 (16:03 -0700)]
mon: tell peers missing features during probe

Use a new probe op to inform mons that they are missing features during
the earliest probe phase.  This prevents them from getting as far as
the sync entirely if they are too old.

We still need to refuse to speak to them if they try to call an election,
which they could do based on their replies from other peers.

Note that old clients will assert on getting a message type string they
don't understand, so we need to be careful not to send the probe reply
to older clients.  The feature bit we use is not precise in that it does
not cover recent dev releases, but it does work for dumpling and emperor.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agomon: move required_features back into Monitor
Sage Weil [Wed, 9 Apr 2014 22:27:20 +0000 (15:27 -0700)]
mon: move required_features back into Monitor

This is simpler and cleaner.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agomon: ignore sync clients without required_features
Sage Weil [Wed, 9 Apr 2014 21:40:44 +0000 (14:40 -0700)]
mon: ignore sync clients without required_features

If we let them sync data they don't understand they will get confused
and crash.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoauth: remove unused get_global_id() method 1636/head
Josh Durgin [Wed, 9 Apr 2014 21:23:32 +0000 (14:23 -0700)]
auth: remove unused get_global_id() method

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>