Loic Dachary [Tue, 12 Aug 2014 16:46:29 +0000 (18:46 +0200)]
erasure-code: isa plugin must link with ErasureCode.cc
Otherwise it will not get the methods it needs. A test is added to check
the plugin loads as expected, from the command line. The test is not run
if the isa plugin is not found, which happens on platforms that are not
supported.
Sage Weil [Mon, 11 Aug 2014 03:22:23 +0000 (20:22 -0700)]
msg/Pipe: do not wait for self in Pipe::stop_and_wait()
The fast dispatch code necessitated adding a wait for the fast dispatch
to complete when taking over sockets back in commit 2d5d3097c3998add1061ce253104154d72879237. This included mark_down()
(although I am not certain mark_down was required to fix the previous set
of races).
In any case, if the fast dispatch thread itself tries to mark down its
own connection, it will deadlock in this method waiting for itself to
return and clear reader_dispatching. Skip this wait if we are in fact
the reader thread. This avoids the deadlock.
Alternatively, we could change mark_down() to not use stop_and_wait(), but
I am less clear about the potential races there, so I'm opting for the
minimal (though ugly) fix.
Fixes: #9057 Signed-off-by: Sage Weil <sage@redhat.com>
When OSDMonitor::crush_ruleset_create_erasure checks the ruleset for
existence, it must convert the ruleid into a ruleset before assigning it
back to the *ruleset parameter.
Ma Jianpeng [Thu, 7 Aug 2014 13:33:18 +0000 (21:33 +0800)]
os/chain_xattr: Remove all old xattr entry when overwrite the xattr.
Ceph use multiple xattrs to store the value of a single xattr which size
is larger than CHAIN_XATTR_MAX_BLOCK_LEN.
But when overwote the content of xattr in func
chain_setxattr/chain_fsetxattr, we don't know the size of previous
content of the xattr.
So we only try to remove until system return -ENODATA.
Signed-off-by: Ma Jianpeng <jianpeng.ma@intel.com>
OSD: introduce require_up_osd_peer() function for gating replica ops
This checks both that a Message originates from an OSD, and that the OSD
is up in the given map epoch.
We use it in handle_replica_op so that we don't inadvertently add operations
from down peers, who might or might not know it.
test_librbd_fsx: also flatten as part of randomize_parent_overlap
With randomize_parent_overlap fsx will randomly truncate base images
after they have been cloned from. This throws flatten into the mix:
base image will be flattened with 2/16 chance (equal to the chance of
leaving the image intact).
Samuel Just [Mon, 4 Aug 2014 22:30:41 +0000 (15:30 -0700)]
OSD: move waiting_for_pg into the session structures
Each message belongs to a session. Further, no ordering is implied
between messages which arrived on different sessions. Breaking the
global waiting_for_pg structure into a per-session structure lets
us avoid the problem of taking a write lock on a global structure
(pg_map_lock) in get_pg_or_queue_for_pg at the cost of some
complexity in updating each session's waiting_for_pg structure when
we receive a new map (due to pg splits) or when we locally create
a pg.
Samuel Just [Tue, 5 Aug 2014 20:00:01 +0000 (13:00 -0700)]
OSD: fix wake_pg_waiters revert error in _open_lock_pg
231fe1b685bfbd3db9c81709ca39a29d696b13ad reintroduced erroneously
this call to wake_pg_waiters. All _create_lock_pg callers handle
calling wake_pg_waiters after the pg lock has been dropped.
Fixes: #8691 Signed-off-by: Samuel Just <sam.just@inktank.com>
Samuel Just [Fri, 1 Aug 2014 21:04:35 +0000 (14:04 -0700)]
osd_types: s/stashed/rollback_info_completed and set on create
Originally, this flag indicated that the object had already been stashed and
that therefore recording subsequent changes is unecessary. We want to set it
on create() as well since operations like [create, writefull] should not need
to stash the object.
Fixes: #8625 Signed-off-by: Samuel Just <sam.just@inktank.com>
In lookup_pool and pool_delete, a lock is taken
before invoking wait_for_osdmap, but is not
released for the failure case of the call. Fixing the same.
Sage Weil [Thu, 7 Aug 2014 00:28:45 +0000 (17:28 -0700)]
os/FileStore: force any new xattr into omap on E2BIG
If we have a huge xattr (or many little ones), the _fgetattrs() for the
inline_set will fail with E2BIG. The conditions later where we decide
whether to clean up the old xattr will then also fail. We *will* put
the xattr in omap, but the non-omap version isn't cleaned up.
Fix this by setting a flag if we get E2BIG that the inline_set is known
to be incomplete. In that case, take the conservative step of assuming
the xattr might be present and chain_fremovexattr(). Ignore any error
because it might not be there.
This is clearly harmless in the general case because it won't be there.
If it is, we will hopefully remove enough xattrs that the E2BIG
condition will go away (usually by removing some really big chained
xattr).
See original bug #7779. With this in place, we can repair objects in
the broken state if we know the rados attr(s) that are responsible.
Usually that is user.rgw.manifset, and a rados get + set of the attr
will repair things.
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com> Signed-off-by: Sage Weil <sage@redhat.com>
Loic Dachary [Tue, 3 Jun 2014 20:40:31 +0000 (22:40 +0200)]
erasure-code: rework ErasureCode*::parse methods
The ErasureCode::parse virtual function is derived in
ErasureCode{Jerasure,Isa}. It is reworked to share parsing arguments the
various techniques have in common. The logic is otherwise unmodified.
Loic Dachary [Tue, 3 Jun 2014 15:25:20 +0000 (17:25 +0200)]
erasure-code: move to ErasureCode::to_{int,bool}
The parameter parser helpers to_int and to_bool are moved from
ErasureCode{Jerasure,Isa} to ErasureCode.
The prototype is modified to return a status instead of the value. An
error ostream is provided as the last argument because ErasureCode
cannot use dout() or derr().
Loic Dachary [Sat, 31 May 2014 22:16:59 +0000 (00:16 +0200)]
erasure-code: move to ErasureCode::minimum_to_decode*
The ErasureCode{Jerasure,Isa}::minimum_to_decode and
ErasureCode{Jerasure,Isa}::minimum_to_decode_with_cost methods are moved
verbatim to the ErasureCode base class.