John Spray [Mon, 27 Oct 2014 13:27:30 +0000 (13:27 +0000)]
mds: set epoch barrier on transition to active
To handle case where MDSs restart after experiencing
a barrier-inducing operation: rather than persisting
OSD barrier somewhere, just have the MDSs always barrier
on latest OSD epoch at startup.
Effect is that after restart, MDS cap issues will
be delayed in (compliant) clients until the client
sees the latest OSD map.
John Spray [Mon, 27 Oct 2014 18:09:01 +0000 (18:09 +0000)]
messages: always encode barrier
Instead of truncating on INLINE_DATA, send
some blank fields there for clients that
don't support it (the kclient) in order
that we can include the following
epoch_barrier field.
John Spray [Wed, 8 Oct 2014 09:49:54 +0000 (10:49 +0100)]
messages: add osd_epoch_barrier to cap msgs
Extension to client-server protocol to allow clients
to release capabilities conditional on the receiver
having a particular OSD map, and the MDS to issue
caps conditional on the user having a particular
OSD map.
John Spray [Wed, 1 Oct 2014 22:01:35 +0000 (23:01 +0100)]
mds: return ENOSPC on write ops while osds full
Allow removals and read-only ops, prevent others. This
is a soft policy aimed at reducing the likelihood of a
"full" (mon_osd_full_ratio) OSD becoming physically full
(where it is unable to accept journal writes from MDS
even though it does not respect the FULL flag)
John Spray [Mon, 20 Oct 2014 15:41:29 +0000 (16:41 +0100)]
osdc/Objecter: add have_map method
This is for places we're going to call wait_for_map, so
that we can easily check if we will *probably* get
a true from wait_for_map, and thereby avoid
allocating Contexts we won't need.
John Spray [Thu, 9 Oct 2014 10:25:21 +0000 (11:25 +0100)]
osdc/ObjectCacher: invoke flush_set_callback on purge_set
For the benefit of Client, so that it can get its
caps released when using purge_set to deal with
an ENOSPC condition. Does not affect librbd because
librbd doesn't provide flush_set_callback.
Dan van der Ster [Fri, 12 Dec 2014 11:40:19 +0000 (12:40 +0100)]
ceph-disk: test re-using an existing journal partition
Add a ceph-disk test to first setup an OSD with a separate journal
block device, then tear down the OSD (simulating a failure) and create
a new OSD which re-uses the same journal device.
Add create_dev / destroy_dev helpers that encapsulate the operations
that ensure the partition table is up to date in the kernel and the
symlinks are created as expected. In particular it makes sure the kernel
is aware that the partition table of a newly created device is
empty. If the device previously existed and the kernel was not informed
of the latest partition table updates via partprobe / partx, it may
have cached an old partition table which can create all sorts of
unexpected behaviors such as a failure to create the by-partuuid
symbolic links as described in http://tracker.ceph.com/issues/10146
Refs: #10146
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch> Signed-off-by: Loic Dachary <ldachary@redhat.com>
Dan van der Ster [Tue, 18 Nov 2014 14:51:46 +0000 (15:51 +0100)]
ceph-disk: don't change the journal partition uuid
We observe that the new /dev/disk/by-partuuid/<journal_uuid>
symlink is not always created by udev when reusing a journal
partition. Fix by not changing the uuid of a journal partition
in this case -- instead we can reuse the existing uuid (and
journal_symlink) instead. We also now assert that the symlink
exists before further preparing the OSD.
Fixes: #10146 Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch> Tested-by: Dan van der Ster <daniel.vanderster@cern.ch>
Loic Dachary [Sat, 6 Dec 2014 22:59:54 +0000 (23:59 +0100)]
documentation: simplify running make check
Encapsulate the compilation steps (install dependencies, autogen.sh,
configure, make check) in the run-make-check.sh script. Update the
developer documentation to point to this script instead of multiple
steps.
It is intended as a tool to help new developer make sure their patch is
sane, it focuses on efficiency (runs make check in parallel if possible)
and coverage (enables docker based tests if possible).
It resolves the problem of fetching a commit that is not attached to any
ref, which is apparently not implemented in the git protocol (discussed
on irc.freenode.net#git).
Noah Watkins [Thu, 11 Dec 2014 19:03:30 +0000 (12:03 -0700)]
debian: enable libgoogle-perftools-dev on arm64
These binaries haven't landed in Ubuntu, but they are in
sid and jessie for arm64. On Saucy I've installed them by
hand from ports.ubuntu.com and things seems pretty swell.
Sage Weil [Thu, 11 Dec 2014 18:07:48 +0000 (10:07 -0800)]
common/blkdev: fix block device discard check
- fix base name calculation (do not assume sda)
- reverse sense of check (it was returning false when true before?)
- add a generic helper to get other properties, too
Fixes: #10296 Signed-off-by: Sage Weil <sage@redhat.com>
Noah Watkins [Thu, 11 Dec 2014 01:48:43 +0000 (18:48 -0700)]
lttng: add int type definitions
The normal path through #include <lttng/tracepoint.h> doesn't
appear to include int defintions like uint64_t that are used
in Ceph so we add our definitions file.
Dan Mick [Wed, 10 Dec 2014 21:19:16 +0000 (13:19 -0800)]
rados.py: remove Rados.__del__(); it just causes problems
Recent versions of Python contain a change to thread shutdown that
causes ceph to hang on exit; see http://bugs.python.org/issue21963.
As it turns out, this is relatively easy to avoid by not spawning
threads on exit, as Rados.__del__() will certainly do by calling
shutdown(); I suspect, but haven't proven, that the problem is
that shutdown() tries to start() a threading.Thread() that never
makes it all the way back to signal start().
Also add a PendingReleaseNote and extra doc comments to clarify.
Fixes: #8797 Signed-off-by: Dan Mick <dan.mick@redhat.com>
Jason Dillaman [Wed, 10 Dec 2014 13:56:59 +0000 (08:56 -0500)]
tests: Minor cleanup to librbd test
The tests can now be repeated to increase the chance of hitting
edge condition failures. Also added additional logic to immediately
fail IO tests when an error is encountered.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Sage Weil [Mon, 24 Nov 2014 02:50:51 +0000 (18:50 -0800)]
crush/CrushWrapper: fix create_or_move_item when name exists but item does not
We were using item_exists(), which simply checks if we have a name defined
for the item. Instead, use _search_item_exists(), which looks for an
instance of the item somewhere in the hierarchy. This matches what
get_item_weightf() is doing, which ensures we get a non-negative weight
that converts properly to floating point.
Sage Weil [Sat, 22 Nov 2014 01:47:56 +0000 (17:47 -0800)]
crush/builder: prevent bucket weight underflow on item removal
It is possible to set a bucket weight that is not the sum of the item
weights if you manually modify/build the CRUSH map. Protect against any
underflow on the bucket weight when removing items.