Samuel Just [Mon, 14 Nov 2016 19:50:23 +0000 (11:50 -0800)]
OSDMonitor: only reject MOSDBoot based on up_from if inst matches
If the osd actually restarts, there is no guarrantee that the epoch will
advance past up_from. If the inst is different, it can't really be a
dup. At worst, it might be a queued MOSDBoot from a previous inst, but
in that case, the real inst would see itself marked up, and then back
down causing it to try booting again.
Fixes: http://tracker.ceph.com/issues/17899 Signed-off-by: Samuel Just <sjust@redhat.com>
Kefu Chai [Mon, 14 Nov 2016 08:04:48 +0000 (16:04 +0800)]
cmake:librbd: move common to the end of linked libraries
so the linked libraries are able to find the symbols in libcommon.
and do not `list(APPEND librbd ...)` anymore, which is wrong and not
necessary. librbd does not depends on krbd.
Kefu Chai [Wed, 9 Nov 2016 05:11:13 +0000 (13:11 +0800)]
cmake: move ContextCompletion.cc into rbd_internal
- ContextCompletion.cc is used by TrimRequest.cc which is included by
rbd_internal, it's more natural to put ContextCompletion.cc into
rbd_internal also. as rbd_internal is the only consumer of this
translation unit.
- librbd/internal.cc is not referencing any symbols from util.cc, so
remove this include. and also, do not add
$<TARGET_OBJECTS:common_util_obj> to librbd.
ceph-detect-init/ceph_detect_init/freebsd/__init__.py
New file
ceph_detect_init/__init__.py: Only test FreeBSD after it is not Linux
If we do it the other way around it will not work during testing on FreeBSD
because it will always have platform.system() == FreeBSD
So first test for Linux, and only then check to see if it is FreeBSD.
ceph-detect-init/tests/test_all.py: add tests for FreeBSD
ceph-detect-init/run-tox.sh: ReEnable to run test for FreeBSD
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl> Signed-off-by: Kefu Chai <kchai@redhat.com>
Pan Liu [Wed, 9 Nov 2016 09:45:27 +0000 (17:45 +0800)]
BlueStore: speedup the performance of multi-replication flow by
switching the callback order in bluestore.
In Bluestore ack callback and commit callback are queued one by one in
the function "BlueStore;:_txc_finish_kv". Therefore, just only one callback
is needed to call, in order to improve performance. We do this by switching
the callback order in bluestore, and the callback work can be done in
sub_op_modify_commit and doesn't need to do it again in sub_op_modify_applied.
Ken Dreyer [Fri, 11 Nov 2016 22:44:18 +0000 (15:44 -0700)]
doc: add infernalis EOL date
Remove the estimated Infernalis EOL date, and add the actual EOL date.
As discussed on the ceph-devel mailing list, we decided to say that this
infernalis EOL date is the same date that we made the first Jewel
release available.
Sage Weil [Wed, 9 Nov 2016 17:07:24 +0000 (12:07 -0500)]
buffer: put data and metadata in a mempool
Note that for raw_combined we leak some metadata into the data pool.
Also, we do not account for non-raw metadata or anything else in buffer.h,
as that would pollute the ABI and public interface.
Drop the namespace ceph in buffer.cc (which serves no real purpose) as it
confuses the MEMPOOL_DEFINE_* macros (they cannot be used inside a
namespace).
Jeff Layton [Thu, 10 Nov 2016 11:30:27 +0000 (06:30 -0500)]
client: rename flush_caps() with no arguments to flush_caps_sync()
Per Greg's recommendation, change the name of this function to better
indicate what it does now that we always request a journal flush on
the last cap flush.
Also, add a comment above the function to better explain why we do this.
Jeff Layton [Wed, 9 Nov 2016 14:36:07 +0000 (09:36 -0500)]
mds: only update change_attr and btime when client sets appropriate feature flags
The kernel client lags the userland code a bit, and feature support for
addr2 is not quite ready. Still, we want to allow the client to set the
new flags field in a cap request before then so it can get better fsync
performance.
When we go to update the cap fields, grab the features from the peer,
and verify that the appropriate flags are set before we apply updates
to the btime and change_attr.
Also, just have the function return early if dirty is 0, since it's
a no-op in that case, and turn the comment above the function into
an assertion.
Jeff Layton [Wed, 9 Nov 2016 14:36:07 +0000 (09:36 -0500)]
client: wire up the CHECK_CAPS_SYNCHRONOUS flag
Ensure that the client will request an immediate journal flush from the
MDS when we'll end up waiting on the flush response. This patch should
fix the fsync codepath, but we may need something similar for syncfs.
Jeff Layton [Wed, 9 Nov 2016 14:36:07 +0000 (09:36 -0500)]
client: change no_delay flag to a flags field
In a later patch, we'll want to have the client set the sync flag in
the cap flush, to hint to the MDS that it should process it immediately.
We could add a second bool, but let's instead do what the kernel client
does which is to have a flags field. With that, the existing no_delay
bool becomes CHECK_CAPS_NODELAY.
Jeff Layton [Fri, 11 Nov 2016 11:28:29 +0000 (06:28 -0500)]
mds: do mds log flush if CLIENT_CAPS_SYNC is set
If the client has set the sync flag in a cap update, then it
is indicating that it's waiting on the reply. Ensure that we flush
the journal in that case.
xie xingguo [Fri, 11 Nov 2016 02:18:36 +0000 (10:18 +0800)]
os/bluestore: fix compiler warnings
As follows:
/home/jenkins-build/build/workspace/ceph-pull-requests/build/boost/include/boost/intrusive/pointer_plus_bits.hpp: In member function ‘bool BlueStore::ExtentMap::encode_some(uint32_t, uint32_t, ceph::bufferlist&, unsigned int*)’:
/home/jenkins-build/build/workspace/ceph-pull-requests/build/boost/include/boost/intrusive/pointer_plus_bits.hpp:76:7: warning: ‘dummy’ is used uninitialized in this function [-Wuninitialized]
n = pointer(uintptr_t(p) | (uintptr_t(n) & Mask));
^
/home/jenkins-build/build/workspace/ceph-pull-requests/src/os/bluestore/BlueStore.cc:1779:10: note: ‘dummy’ was declared here
Extent dummy(offset);
Sage Weil [Thu, 10 Nov 2016 18:56:24 +0000 (13:56 -0500)]
os/filestore/HashIndex: fix list_by_hash_* termination on reaching end
If we set *next to max, then the caller (a few lines up) doesn't terminate
the loop and will keep trying to list objects in every following hash
dir until it reaches the end of the collection. In fact, if we have an
end bound we will never to an efficient listing unless we hit the max
first.
For one user, this was causing OSD suicides when scrub ran because it
wasn't able to list all objects before the timeout. In general, this would
cause scrub to stall a PG for a long time and slow down requests.