]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
11 years agomds: issue new caps before starting log entry 1641/head
Yan, Zheng [Fri, 11 Apr 2014 00:21:40 +0000 (08:21 +0800)]
mds: issue new caps before starting log entry

Locker::issue_new_caps() calls Locker::eval(), which may dispatch
other requests.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: guarantee message ordering when importing non-auth caps
Yan, Zheng [Thu, 10 Apr 2014 08:03:51 +0000 (16:03 +0800)]
mds: guarantee message ordering when importing non-auth caps

Current code allow importing non-auth caps when inode is being exported.
This can breaks message ordering because the corresponding cap import
messages are sent after the flush session messages. So they can arrive
at clients after clients have already received cap import messages from
new auth MDS of the inode.

The quick fix is ignore MExportCaps when inode is frozen.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agoMerge pull request #1639 from ceph/wip-multimds
Sage Weil [Thu, 10 Apr 2014 04:19:42 +0000 (21:19 -0700)]
Merge pull request #1639 from ceph/wip-multimds

Wip multimds

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agomds: include truncate_seq/truncate_size in filelock's state 1639/head
Yan, Zheng [Thu, 10 Apr 2014 02:56:18 +0000 (10:56 +0800)]
mds: include truncate_seq/truncate_size in filelock's state

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agomds: remove wrong assertion for remote frozen authpin
Yan, Zheng [Thu, 10 Apr 2014 03:09:28 +0000 (11:09 +0800)]
mds: remove wrong assertion for remote frozen authpin

For across authority rename, the MDS first freezes the source inode's
authpin. It happens while the source dentry isn't locked. So when the
inode's authpin become frozen, the source dentry may have changed and
be linked to a different inode.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agoMerge pull request #1637 from ceph/wip-8042
Gregory Farnum [Thu, 10 Apr 2014 00:21:57 +0000 (17:21 -0700)]
Merge pull request #1637 from ceph/wip-8042

mon: fix election required_features checks

Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoMerge pull request #1636 from ceph/wip-6480
Sage Weil [Wed, 9 Apr 2014 23:25:24 +0000 (16:25 -0700)]
Merge pull request #1636 from ceph/wip-6480

fix auth races that may have lead to qemu crashes

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agomon: tell peers missing features during probe 1637/head
Sage Weil [Wed, 9 Apr 2014 23:03:05 +0000 (16:03 -0700)]
mon: tell peers missing features during probe

Use a new probe op to inform mons that they are missing features during
the earliest probe phase.  This prevents them from getting as far as
the sync entirely if they are too old.

We still need to refuse to speak to them if they try to call an election,
which they could do based on their replies from other peers.

Note that old clients will assert on getting a message type string they
don't understand, so we need to be careful not to send the probe reply
to older clients.  The feature bit we use is not precise in that it does
not cover recent dev releases, but it does work for dumpling and emperor.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agomon: move required_features back into Monitor
Sage Weil [Wed, 9 Apr 2014 22:27:20 +0000 (15:27 -0700)]
mon: move required_features back into Monitor

This is simpler and cleaner.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agomon: ignore sync clients without required_features
Sage Weil [Wed, 9 Apr 2014 21:40:44 +0000 (14:40 -0700)]
mon: ignore sync clients without required_features

If we let them sync data they don't understand they will get confused
and crash.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoauth: remove unused get_global_id() method 1636/head
Josh Durgin [Wed, 9 Apr 2014 21:23:32 +0000 (14:23 -0700)]
auth: remove unused get_global_id() method

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoauth: make AuthClientHandler::validate_ticket() protected
Josh Durgin [Wed, 9 Apr 2014 21:12:58 +0000 (14:12 -0700)]
auth: make AuthClientHandler::validate_ticket() protected

It's just used internally. Make it private in the subclasses since
there's just one level of inheritance.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoauth: AuthClientHandler const cleanup
Josh Durgin [Wed, 9 Apr 2014 21:11:49 +0000 (14:11 -0700)]
auth: AuthClientHandler const cleanup

get_protocol(), build_request(), build_rotating_request(), and
build_authorizer() can all be declared const now.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoauth: CephxProtocol const cleanup
Josh Durgin [Wed, 9 Apr 2014 21:09:33 +0000 (14:09 -0700)]
auth: CephxProtocol const cleanup

need_key() and build_authorizer() can be const.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoutime: declare is_zero(), ceph_timespec(), and sleep() as const
Josh Durgin [Wed, 9 Apr 2014 21:04:49 +0000 (14:04 -0700)]
utime: declare is_zero(), ceph_timespec(), and sleep() as const

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoauth: separate writes of build_request() into prepare_build_request()
Josh Durgin [Wed, 9 Apr 2014 21:04:15 +0000 (14:04 -0700)]
auth: separate writes of build_request() into prepare_build_request()

validate_tickets() updates internal state, as does
tickets.get_handler(). Move them into a new method called before
build_request() so build_request() can be declared const.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoRWLock: make read locking methods const
Josh Durgin [Wed, 9 Apr 2014 20:15:32 +0000 (13:15 -0700)]
RWLock: make read locking methods const

This allows methods using RWLock for reading to be declared const.
There might be cases where we'd want to take a write lock in a const
method, but right now that's unnecessary, and I'd rather get a compile
error.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoRWLock: don't assign the lockdep id more than once
Josh Durgin [Wed, 9 Apr 2014 20:13:04 +0000 (13:13 -0700)]
RWLock: don't assign the lockdep id more than once

This never does anything since lockdep_register() assigns an id >= 0
in the RWLock constructor. This also prevents methods from being
declared const.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoauth: remove unused tick() method
Josh Durgin [Wed, 9 Apr 2014 19:56:34 +0000 (12:56 -0700)]
auth: remove unused tick() method

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoauth: add rwlock to AuthClientHandler to prevent races
Josh Durgin [Wed, 2 Apr 2014 00:27:01 +0000 (17:27 -0700)]
auth: add rwlock to AuthClientHandler to prevent races

For cephx, build_authorizer reads a bunch of state (especially the
current session_key) which can be updated by the MonClient. With no
locks held, Pipe::connect() calls SimpleMessenger::get_authorizer()
which ends up calling RadosClient::get_authorizer() and then
AuthClientHandler::bulid_authorizer(). This unsafe usage can lead to
crashes like:

Program terminated with signal 11, Segmentation fault.
0x00007fa0d2ddb7cb in ceph::buffer::ptr::release (this=0x7f987a5e3070) at common/buffer.cc:370
370 common/buffer.cc: No such file or directory.
in common/buffer.cc
(gdb) bt
0x00007fa0d2ddb7cb in ceph::buffer::ptr::release (this=0x7f987a5e3070) at common/buffer.cc:370
0x00007fa0d2ddec00 in ~ptr (this=0x7f989c03b830) at ./include/buffer.h:171
ceph::buffer::list::rebuild (this=0x7f989c03b830) at common/buffer.cc:817
0x00007fa0d2ddecb9 in ceph::buffer::list::c_str (this=0x7f989c03b830) at common/buffer.cc:1045
0x00007fa0d2ea4dc2 in Pipe::connect (this=0x7fa0c4307340) at msg/Pipe.cc:907
0x00007fa0d2ea7d73 in Pipe::writer (this=0x7fa0c4307340) at msg/Pipe.cc:1518
0x00007fa0d2eb44dd in Pipe::Writer::entry (this=<value optimized out>) at msg/Pipe.h:59
0x00007fa0e0f5f9d1 in start_thread (arg=0x7f987a5e4700) at pthread_create.c:301
0x00007fa0de560b6d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

and

Error in `qemu-system-x86_64': invalid fastbin entry (free): 0x00007ff12887ff20
*** ======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x80a46)[0x7ff3dea1fa46]
/usr/lib/librados.so.2(+0x29eb03)[0x7ff3e3d43b03]
/usr/lib/librados.so.2(_ZNK9CryptoKey7encryptEP11CephContextRKN4ceph6buffer4listERS4_RSs+0x71)[0x7ff3e3d42661]
/usr/lib/librados.so.2(_Z21encode_encrypt_enc_blIN4ceph6buffer4listEEvP11CephContextRKT_RK9CryptoKeyRS2_RSs+0xfe)[0x7ff3e3d417de]
/usr/lib/librados.so.2(_Z14encode_encryptIN4ceph6buffer4listEEiP11CephContextRKT_RK9CryptoKeyRS2_RSs+0xa2)[0x7ff3e3d41912]
/usr/lib/librados.so.2(_ZN19CephxSessionHandler12sign_messageEP7Message+0x242)[0x7ff3e3d40de2]
/usr/lib/librados.so.2(_ZN4Pipe6writerEv+0x92b)[0x7ff3e3e61b2b]
/usr/lib/librados.so.2(_ZN4Pipe6Writer5entryEv+0xd)[0x7ff3e3e6c7fd]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7f8e)[0x7ff3ded6ff8e]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7ff3dea99a0d]

Fix this by adding an rwlock to AuthClientHandler. A simpler fix would
be to move RadosClient::get_authorizer() into the MonClient() under
the MonClient lock, but this would not catch all uses of other
Authorizer, e.g. for verify_authorizer() and it would serialize
independent connection attempts.

This mainly matters for cephx, but none and unknown can have the
global_id reset as well.

Partially-fixes: #6480
Backport: dumpling, emperor
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
11 years agomon: refresh elector required_features when they change
Sage Weil [Wed, 9 Apr 2014 18:13:31 +0000 (11:13 -0700)]
mon: refresh elector required_features when they change

Currently we only refresh required_features on Elector::start().  This
does not prevent an old peer from calling an election (even though they
won't succeed in joining the resulting quorum).

Fix this by updating the elector's features when they change.  This way we
don't allow a useless election cycle just to trigger that update in
start().

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agomon/Elector: ignore ACK from peers without required features
Sage Weil [Wed, 9 Apr 2014 18:09:14 +0000 (11:09 -0700)]
mon/Elector: ignore ACK from peers without required features

If an old peer gets a PROPOSE from us, we need to be sure to ignore their
ACK.  Ignoring their PROPOSEs isn't sufficient to keep them out of a
quorum.

Fixes: #8042
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1626 from ceph/wip-8031
Samuel Just [Wed, 9 Apr 2014 17:37:26 +0000 (10:37 -0700)]
Merge pull request #1626 from ceph/wip-8031

osd: improve misdirected op checks

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agoMerge pull request #1627 from ceph/wip-8001
Samuel Just [Wed, 9 Apr 2014 17:34:54 +0000 (10:34 -0700)]
Merge pull request #1627 from ceph/wip-8001

osd/PG: set CREATING pg state bit until we peer for the first time

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agoMerge pull request #1631 from ceph/wip-8045
Samuel Just [Wed, 9 Apr 2014 17:34:07 +0000 (10:34 -0700)]
Merge pull request #1631 from ceph/wip-8045

osd: fix check_osdmap_features deadlock

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agoMerge pull request #1632 from ceph/wip-5469
Sage Weil [Wed, 9 Apr 2014 15:14:28 +0000 (08:14 -0700)]
Merge pull request #1632 from ceph/wip-5469

librbd: fix zero length request handling

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1634 from ceph/wip-8028
Alfredo Deza [Wed, 9 Apr 2014 14:12:11 +0000 (10:12 -0400)]
Merge pull request #1634 from ceph/wip-8028

rpm: add redhat-lsb dependency

Reviewed-by: Alfredo Deza <alfredo.deza@inktank.com>
11 years agoceph.spec.in: require redhat-lsb-core 1634/head
Sage Weil [Wed, 9 Apr 2014 14:05:36 +0000 (07:05 -0700)]
ceph.spec.in: require redhat-lsb-core

We need this for /lib/lsb/init-functions.

Fixes: #8028
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1606 from ceph/wip-shrink-icache
Sage Weil [Wed, 9 Apr 2014 13:55:33 +0000 (06:55 -0700)]
Merge pull request #1606 from ceph/wip-shrink-icache

client: try shrinking kernel inode cache when trimming session caps

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1633 from ceph/wip-8004
Sage Weil [Wed, 9 Apr 2014 03:48:12 +0000 (20:48 -0700)]
Merge pull request #1633 from ceph/wip-8004

client: wake up umount waiter if receiving session open message

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoclient: wake up umount waiter if receiving session open message 1633/head
Yan, Zheng [Wed, 9 Apr 2014 03:22:04 +0000 (11:22 +0800)]
client: wake up umount waiter if receiving session open message

Wake up umount waiter if receiving session open message while
umounting. The umount waiter will re-close the session.

Fixes: #8004
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agolibrbd: fix zero length request handling 1632/head
Josh Durgin [Wed, 9 Apr 2014 00:38:50 +0000 (17:38 -0700)]
librbd: fix zero length request handling

Zero-length writes would hang because the completion was never
called. Reads would hit an assert about zero length in
Striper::file_to_exents().

Fix all of these cases by skipping zero-length extents. The completion
is created and finished when finish_adding_requests() is called. This
is slightly different from usual completions since it comes from the
same thread as the one scheduling the request, but zero-length aio
requests should never happen from things that might care about this,
like QEMU.

Writes and discards have had this bug since the beginning of
librbd. Reads might have avoided it until stripingv2 was added.

Fixes: #5469
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoosd: do not block when updating osdmap superblock features 1631/head
Sage Weil [Wed, 9 Apr 2014 00:28:54 +0000 (17:28 -0700)]
osd: do not block when updating osdmap superblock features

We are holding osd_lock in check_osdmap_features, which means we cannot
block while waiting for filestore operations to flush/apply without
risking deadlock.

The important constraint is that we commit that the feature is enabled
before also commiting anything that utilizes sharded objects.  The normal
commit sequencing does that already; there is no reason to block here.

Fixes: #8045
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agodoc: Made minor changes to quick start preflight for RHEL.
John Wilkins [Tue, 8 Apr 2014 22:54:17 +0000 (15:54 -0700)]
doc: Made minor changes to quick start preflight for RHEL.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
11 years agodoc: Notes and minor modifications to gateway installation doc.
John Wilkins [Tue, 8 Apr 2014 22:53:32 +0000 (15:53 -0700)]
doc: Notes and minor modifications to gateway installation doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
11 years agopipe: only read AuthSessionHandler under pipe_lock
Josh Durgin [Tue, 1 Apr 2014 18:37:29 +0000 (11:37 -0700)]
pipe: only read AuthSessionHandler under pipe_lock

session_security, the AuthSessionHandler for a Pipe, is deleted and
recreated while the pipe_lock is held. read_message() is called
without pipe_lock held, and examines session_security. To make this
safe, make session_security a shared_ptr and take a reference to it
while the pipe_lock is still held, and use that shared_ptr in
read_message().

This may have caused crashes like:

*** Error in `qemu-system-x86_64': invalid fastbin entry (free): 0x00007f42a4002de0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x80a46)[0x7f452f1f3a46]
/usr/lib/x86_64-linux-gnu/libnss3.so(PK11_FreeSymKey+0xa8)[0x7f452e72ff98]
/usr/lib/librados.so.2(+0x2a18cd)[0x7f453451a8cd]
/usr/lib/librados.so.2(_ZNK9CryptoKey7encryptEP11CephContextRKN4ceph6buffer4listERS4_RSs+0x71)[0x7f4534519421]
/usr/lib/librados.so.2(_Z21encode_encrypt_enc_blIN4ceph6buffer4listEEvP11CephContextRKT_RK9CryptoKeyRS2_RSs+0xfe)[0x7f453451859e]
/usr/lib/librados.so.2(_Z14encode_encryptIN4ceph6buffer4listEEiP11CephContextRKT_RK9CryptoKeyRS2_RSs+0xa2)[0x7f45345186d2]
/usr/lib/librados.so.2(_ZN19CephxSessionHandler23check_message_signatureEP7Message+0x246)[0x7f4534516866]
/usr/lib/librados.so.2(_ZN4Pipe12read_messageEPP7Message+0xdcc)[0x7f453462ecbc]
/usr/lib/librados.so.2(_ZN4Pipe6readerEv+0xa5c)[0x7f453464059c]
/usr/lib/librados.so.2(_ZN4Pipe6Reader5entryEv+0xd)[0x7f4534643ecd]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7f8e)[0x7f452f543f8e]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f452f26da0d]

Partially-fixes: #6480
Backport: dumpling, emperor
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoMerge pull request #1628 from ceph/wip-5835
Josh Durgin [Tue, 8 Apr 2014 21:47:21 +0000 (14:47 -0700)]
Merge pull request #1628 from ceph/wip-5835

update package descriptions

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
11 years agodebian: update ceph description 1628/head
Sage Weil [Tue, 8 Apr 2014 21:19:38 +0000 (14:19 -0700)]
debian: update ceph description

Fixes: #5835
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph.spec: update ceph description
Sage Weil [Tue, 8 Apr 2014 21:18:44 +0000 (14:18 -0700)]
ceph.spec: update ceph description

Fixes: #5835
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1625 from ceph/wip-8019
Samuel Just [Tue, 8 Apr 2014 19:45:28 +0000 (12:45 -0700)]
Merge pull request #1625 from ceph/wip-8019

osd: fix journal umount/mount weirdness

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agoosd/PG: set CREATING pg state bit until we peer for the first time 1627/head
Sage Weil [Tue, 8 Apr 2014 19:26:19 +0000 (12:26 -0700)]
osd/PG: set CREATING pg state bit until we peer for the first time

We send PG state updates to the monitor while creating a PG before the
actual creation and been finalized and persisted.  Because those updates
do not include the CREATING bit, the mon will remove the pgid from it's
creating set.  If the OSD(s) crash before persisting that PG creation, the
PG will never get created.

Fix this by leaving the CREATING bit set on the primary as long as
last_epoch_started==0.  That is, until we successfully peer for the very
first time.  Only then do we clear the bit and tell the monitor it's duty
is complete.

Fixes: #8001
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoos/FileStore: reset journal state on umount 1625/head
Sage Weil [Tue, 8 Apr 2014 17:52:43 +0000 (10:52 -0700)]
os/FileStore: reset journal state on umount

We observed a sequence like:

 - replay journal
   - sets JournalingObjectStore applied_op_seq
 - umount
 - mount
   - initiate commit with prevous applied_op_seq
 - replay journal
   - commit finishes
   - on replay commit, we fail assert op > committed_seq

Although strictly speaking the assert failure is harmless here, in general
we should not let state leak through from a previous mount into this
mount or else assertions are in general more difficult to reason about.

Fixes: #8019
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agovstart.sh: make crush location match up with what init-ceph does
Sage Weil [Tue, 8 Apr 2014 17:58:53 +0000 (10:58 -0700)]
vstart.sh: make crush location match up with what init-ceph does

This makes is to that ./init-ceph restart osd.0 won't modify the CRUSH
tree.  And in any case, the localhost/localrack thing we were doing before
was pretty useless.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1623 from ceph/wip-8026
Gregory Farnum [Tue, 8 Apr 2014 17:43:14 +0000 (10:43 -0700)]
Merge pull request #1623 from ceph/wip-8026

mds: fix shared_ptr MDRequest bugs

Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoMerge pull request #1621 from dachary/wip-7914
Sage Weil [Tue, 8 Apr 2014 17:14:46 +0000 (10:14 -0700)]
Merge pull request #1621 from dachary/wip-7914

erasure-code: thread-safe initialization of gf-complete

This looks like a good interim solution until gf-complete exposes a simpler init function
that hides this.

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoosd: drop unused same_for_*() helpers 1626/head
Sage Weil [Tue, 8 Apr 2014 16:01:14 +0000 (09:01 -0700)]
osd: drop unused same_for_*() helpers

These were all identical and mostly served to obscure the actual logic,
which is now captured by can_discard_op() and the matching Objecter
code on the client side.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd: drop previous interval ops even if primary happens to be the same
Sage Weil [Tue, 8 Apr 2014 16:00:11 +0000 (09:00 -0700)]
osd: drop previous interval ops even if primary happens to be the same

If we have two consecutive intervals with the same primary, the client
will not resend the op and the same_primary_since epoch will not change,
and all is well.

If, however, we have 3 intervals, and the primary changes away and then
back to a particular OSD, the OSD will currently still process the old
request (assuming the timing works out) because it is currently the
primary.  This is unnecessary because the client will resend the request.
It may even introduce a hard-to-hit ordering problem since whether or not
the OSD processes the message becomes dependent on how many subsequent
maps it has consumed when the request is processed.

Instead, simplify the minor tangle of helpers by making a single simple
check that discards requests from before same_primary_since.  We can then
avoid using the same_for_*() helpers and drop the check from
handle_misdireted_op(), which is also nice because the name is now accurate
(it *only* deals with ops that are in fact misdirected, not just slow to
arrive).

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd: make misdirected checks explicit about replicas, flags
Sage Weil [Tue, 8 Apr 2014 15:52:43 +0000 (08:52 -0700)]
osd: make misdirected checks explicit about replicas, flags

Only allow read ops to target replicas if the necessary op flags are set.
The previous checks were very sloppy.

Fixes: #8031
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agomds: fix shared_ptr MDRequest bugs 1623/head
Yan, Zheng [Tue, 8 Apr 2014 08:11:03 +0000 (16:11 +0800)]
mds: fix shared_ptr MDRequest bugs

The main change is use shared_ptr instead of weak_ptr to define
active request map. The reason is that slave request needs to be
preserved until master explicitly finishes it.

Fixes: #8026
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agoerasure-code: thread-safe initialization of gf-complete 1621/head
Loic Dachary [Mon, 7 Apr 2014 22:20:29 +0000 (00:20 +0200)]
erasure-code: thread-safe initialization of gf-complete

Instead of relying on an implicit initialization happening during
encoding/decoding with galois.c:galois_init_default_field, call
gf.c:gf_init_easy for each w values when the plugin is loaded.

Loading the plugin is protected against race conditions by a lock.

It does not cover all possible uses of gf-complete but it is enough for
the ceph jerasure plugin.

http://tracker.ceph.com/issues/7914 fixes #7914

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoMerge pull request #1610 from ceph/wip-4354-shared_ptr
Sage Weil [Tue, 8 Apr 2014 04:27:50 +0000 (21:27 -0700)]
Merge pull request #1610 from ceph/wip-4354-shared_ptr

Use shared pointers for Mutations/OpRequests in the MDS

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1594 from ceph/wip-7958
Sage Weil [Tue, 8 Apr 2014 04:27:04 +0000 (21:27 -0700)]
Merge pull request #1594 from ceph/wip-7958

wip 7958

Passed sage-2014-04-07_07:04:02-fs-wip-7958-testing-basic-plana.

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoosd_types: fix pg_stat_t::encode, object_stat_sum_t::decode version
Samuel Just [Mon, 7 Apr 2014 23:40:09 +0000 (16:40 -0700)]
osd_types: fix pg_stat_t::encode, object_stat_sum_t::decode version

Introduced in a130a4452e4fb159dc62fb417077d98dc9ebd621
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoSimpleLock: Switch MutationRef& for MutationRef in get_xlock() 1610/head
Greg Farnum [Wed, 12 Mar 2014 20:14:56 +0000 (13:14 -0700)]
SimpleLock: Switch MutationRef& for MutationRef in get_xlock()

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoMDCache: use raw MutationImpl* instead of MutationRef in a few places
Greg Farnum [Thu, 13 Mar 2014 03:50:19 +0000 (20:50 -0700)]
MDCache: use raw MutationImpl* instead of MutationRef in a few places

Avoid the atomic ops necessary when copying a shared_ptr.

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoLocker: use raw MutationImpl* instead of MutationRef in several places
Greg Farnum [Wed, 12 Mar 2014 20:03:26 +0000 (13:03 -0700)]
Locker: use raw MutationImpl* instead of MutationRef in several places

Sadly, you can't implicitly convert non-const references to shared pointers, so avoid the atomic ops necessary when copying a shared_ptr.

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoLocker: use a null_ref instead of NULL
Greg Farnum [Wed, 12 Mar 2014 21:20:52 +0000 (14:20 -0700)]
Locker: use a null_ref instead of NULL

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoLocker: Use MutationRef instead of raw pointers
Greg Farnum [Wed, 12 Mar 2014 17:53:16 +0000 (10:53 -0700)]
Locker: Use MutationRef instead of raw pointers

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoLocker: remove Mutation param from xlock_import
Greg Farnum [Mon, 7 Apr 2014 23:05:49 +0000 (16:05 -0700)]
Locker: remove Mutation param from xlock_import

It's not used.

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoMDCache: fix users of active_requests for use of shared_ptr
Greg Farnum [Thu, 13 Mar 2014 03:20:08 +0000 (20:20 -0700)]
MDCache: fix users of active_requests for use of shared_ptr

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoMDCache: use a null_ref instead of NULL in a few places
Greg Farnum [Wed, 12 Mar 2014 20:43:20 +0000 (13:43 -0700)]
MDCache: use a null_ref instead of NULL in a few places

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoMDCache: use MutationRef instead of raw pointers
Greg Farnum [Wed, 12 Mar 2014 17:33:57 +0000 (10:33 -0700)]
MDCache: use MutationRef instead of raw pointers

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoServer: use MutationRef instead of raw pointer
Greg Farnum [Wed, 12 Mar 2014 16:48:04 +0000 (09:48 -0700)]
Server: use MutationRef instead of raw pointer

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoMDS: switch cache object classes to use MutationRef instead of raw pointers
Greg Farnum [Wed, 12 Mar 2014 16:42:45 +0000 (09:42 -0700)]
MDS: switch cache object classes to use MutationRef instead of raw pointers

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoceph_test_rados_api_misc: print osd_max_attr_size
Sage Weil [Mon, 7 Apr 2014 23:31:16 +0000 (16:31 -0700)]
ceph_test_rados_api_misc: print osd_max_attr_size

Very confusing results from this test in bug #8009.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1612 from ceph/wip-7919
Sage Weil [Mon, 7 Apr 2014 23:11:51 +0000 (16:11 -0700)]
Merge pull request #1612 from ceph/wip-7919

mon: MonCommands: have all 'auth' commands require 'execute' caps

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1620 from ceph/wip-8003
Sage Weil [Mon, 7 Apr 2014 23:09:40 +0000 (16:09 -0700)]
Merge pull request #1620 from ceph/wip-8003

Wip 8003

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1611 from ceph/wip-7975
Sage Weil [Mon, 7 Apr 2014 22:59:37 +0000 (15:59 -0700)]
Merge pull request #1611 from ceph/wip-7975

osd: disable agent when stats_invalid (post-split)

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agodoc: Removed --stable arg and replaced with --release arg for ceph-deploy.
John Wilkins [Mon, 7 Apr 2014 22:49:09 +0000 (15:49 -0700)]
doc: Removed --stable arg and replaced with --release arg for ceph-deploy.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
11 years agoosd/ReplicatedPG: warn if invalid stats prevent us from activating agent 1611/head
Sage Weil [Mon, 7 Apr 2014 22:39:59 +0000 (15:39 -0700)]
osd/ReplicatedPG: warn if invalid stats prevent us from activating agent

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: dump agent state on pg query
Sage Weil [Mon, 7 Apr 2014 22:34:53 +0000 (15:34 -0700)]
osd/ReplicatedPG: dump agent state on pg query

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: kickstart the agent if scrub stats become valid
Sage Weil [Mon, 7 Apr 2014 22:21:01 +0000 (15:21 -0700)]
osd/ReplicatedPG: kickstart the agent if scrub stats become valid

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge tag 'v0.79' into firefly
Sage Weil [Mon, 7 Apr 2014 22:04:18 +0000 (15:04 -0700)]
Merge tag 'v0.79' into firefly

v0.79

11 years agoMerge pull request #1619 from ceph/wip-7659
Samuel Just [Mon, 7 Apr 2014 21:47:40 +0000 (14:47 -0700)]
Merge pull request #1619 from ceph/wip-7659

Wip 7659

Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
11 years agoReplicatedPG: do not evict head while clone is being promoted 1620/head
Samuel Just [Sun, 6 Apr 2014 20:38:52 +0000 (13:38 -0700)]
ReplicatedPG: do not evict head while clone is being promoted

Fixes: #8003
Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoReplicatedPG::trim_object: account evicted prev clone for stats
Samuel Just [Mon, 7 Apr 2014 00:49:20 +0000 (17:49 -0700)]
ReplicatedPG::trim_object: account evicted prev clone for stats

If the previous clone is evicted, we shouldn't adjust the stats to
account for its new clone_overlap value.

Fixes: #7964
Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoReplicatedPG::make_writeable: check for evicted clone before adjusting for clone_overlap
Samuel Just [Sun, 6 Apr 2014 23:30:25 +0000 (16:30 -0700)]
ReplicatedPG::make_writeable: check for evicted clone before adjusting for clone_overlap

Fixes: #7964
Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoMerge pull request #1617 from ceph/wip-7904
Sage Weil [Mon, 7 Apr 2014 21:02:58 +0000 (14:02 -0700)]
Merge pull request #1617 from ceph/wip-7904

Wip 7904

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1614 from ceph/wip-7964
Sage Weil [Mon, 7 Apr 2014 21:01:58 +0000 (14:01 -0700)]
Merge pull request #1614 from ceph/wip-7964

Wip 7964

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1616 from ceph/wip-7916
Sage Weil [Mon, 7 Apr 2014 20:59:22 +0000 (13:59 -0700)]
Merge pull request #1616 from ceph/wip-7916

ReplicatedPG: improve get_object_context debugging

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoReplicatedPG: use get_clone_bytes on evict/promote
Samuel Just [Sun, 6 Apr 2014 19:29:56 +0000 (12:29 -0700)]
ReplicatedPG: use get_clone_bytes on evict/promote

Fixes: #7964
Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoReplicatedPG::_scrub: account for clone_overlap on each clone
Samuel Just [Sun, 6 Apr 2014 19:23:52 +0000 (12:23 -0700)]
ReplicatedPG::_scrub: account for clone_overlap on each clone

Otherwise, we end up subtracting off clone_overlap for evicted clones
whose sizes we did not add in.

Fixes: #7964
Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoReplicatedPG::find_object_context: check obs.exists on clone obc before checking...
Samuel Just [Sun, 6 Apr 2014 18:22:04 +0000 (11:22 -0700)]
ReplicatedPG::find_object_context: check obs.exists on clone obc before checking snaps

Fixes: #7858
Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoReplicatedPG::finish_promote: add debugging assert for clone_size
Samuel Just [Fri, 4 Apr 2014 20:53:22 +0000 (13:53 -0700)]
ReplicatedPG::finish_promote: add debugging assert for clone_size

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoMerge pull request #1613 from ceph/wip-7994
Sage Weil [Mon, 7 Apr 2014 17:57:33 +0000 (10:57 -0700)]
Merge pull request #1613 from ceph/wip-7994

OSD: _share_map_outgoing whenever sending a message to a peer

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoqa: workunits: mon: auth_caps.sh: test 'auth' caps requirements 1612/head
Joao Eduardo Luis [Mon, 7 Apr 2014 17:30:56 +0000 (18:30 +0100)]
qa: workunits: mon: auth_caps.sh: test 'auth' caps requirements

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
11 years agomon: MonCommands: have all 'auth' commands require 'execute' caps
Joao Eduardo Luis [Mon, 7 Apr 2014 17:17:54 +0000 (18:17 +0100)]
mon: MonCommands: have all 'auth' commands require 'execute' caps

Earlier patch already have the entity requiring 'execute' caps for
read-only commands.  This patch introduces the same requirement for *all*
auth commands, read-only and read-write alike.

While the rationale behind the earlier patch for leaving read-write
operations out of this requirement still holds, we now enforce this to
match compatibility with what was happening back on Dumpling with regard
to the 'execute' cap being required for auth commands.  However, it should
be noted that back on Dumpling we were only requiring the 'execute' cap
for auth commands, regardless of read-only or read-write, and no other
caps were required.

Fixes: 7919
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
11 years ago0.79 v0.79
Jenkins [Mon, 7 Apr 2014 16:48:36 +0000 (16:48 +0000)]
0.79

11 years agomds: fix uninit val in MMDSSlaveRequest
Sage Weil [Mon, 7 Apr 2014 03:26:39 +0000 (20:26 -0700)]
mds: fix uninit val in MMDSSlaveRequest

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1607 from ceph/wip-7997
Sage Weil [Mon, 7 Apr 2014 15:11:00 +0000 (08:11 -0700)]
Merge pull request #1607 from ceph/wip-7997

mon: wait for quorum for MMonGetVersion

Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
11 years agoclient: pin parent dentry of inode who has ll_ref > 0 1606/head
Yan, Zheng [Sat, 5 Apr 2014 14:16:08 +0000 (22:16 +0800)]
client: pin parent dentry of inode who has ll_ref > 0

This prevents Client:trim_dentry() from unlinking parent dentry of
directory inode referenced by fuse kernel module.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agoMerge pull request #1609 from ceph/wip-7739
Sage Weil [Mon, 7 Apr 2014 00:56:05 +0000 (17:56 -0700)]
Merge pull request #1609 from ceph/wip-7739

mds: fix some uninitialized message fields

Reviewed-by: Zheng Yan <zheng.z.yan@intel.com>
11 years agomds: fix uninit MMDSSlaveRequest lock_type 1609/head
Sage Weil [Mon, 7 Apr 2014 00:36:38 +0000 (17:36 -0700)]
mds: fix uninit MMDSSlaveRequest lock_type

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1608 from ceph/wip-8002
Samuel Just [Sun, 6 Apr 2014 23:32:38 +0000 (16:32 -0700)]
Merge pull request #1608 from ceph/wip-8002

osd: fix osd map subscribe on YOU_DIED osd_ping

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agoosd: fix map subscription in YOU_DIED osd_ping handler 1608/head
Sage Weil [Sun, 6 Apr 2014 23:03:50 +0000 (16:03 -0700)]
osd: fix map subscription in YOU_DIED osd_ping handler

If we have epoch X and find out we died as of epoch Y, we still want to
request X+1.  Among other things, this fixes a 'stall' if Y happens to be
the most recent map published and no new maps are generated because we will
never get anything back from our subscription.

This makes this osdmap_subscribe() caller match every other caller by
passing in current epoch + 1.

Fixes: #8002
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agomsgr: add ms_dump_on_send option
Sage Weil [Wed, 2 Apr 2014 15:49:33 +0000 (08:49 -0700)]
msgr: add ms_dump_on_send option

This is useful only for debugging.  The encoded contents of a message are
dumped to the log on message send.  This is useful when valgrind is
triggering warnings about uninitialized memory in messages because the
call chain will indicate which message type is to blame, whereas the
usual writer thread context does not tell us any useful information.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agomds: fix uninitialized fields in MDiscover
Sage Weil [Sun, 6 Apr 2014 20:18:40 +0000 (13:18 -0700)]
mds: fix uninitialized fields in MDiscover

Fixes: #7739
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agomon: wait for quorum for MMonGetVersion 1607/head
Sage Weil [Sat, 5 Apr 2014 23:58:55 +0000 (16:58 -0700)]
mon: wait for quorum for MMonGetVersion

We should not respond to checks for map versions when we are in the
probing or electing states or else clients will get incorrect results when
they ask what the latest map version is.

Fixes: #7997
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoclient: try shrinking kernel inode cache when trimming session caps
Yan, Zheng [Sat, 5 Apr 2014 10:30:02 +0000 (18:30 +0800)]
client: try shrinking kernel inode cache when trimming session caps

Notify kernel to invalidate top level directory entries. As a side
effect, the kernel inode cache get shrinked.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agoclient: release clean pages if no open file want RDCACHE 1594/head
Yan, Zheng [Fri, 4 Apr 2014 17:06:29 +0000 (01:06 +0800)]
client: release clean pages if no open file want RDCACHE

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>