git.apps.os.sepia.ceph.com Git

]> git.apps.os.sepia.ceph.com Git - ceph.git/log

Sage Weil [Wed, 30 Jul 2014 20:57:34 +0000 (13:57 -0700)]

osd: prevent old clients from using tiered pools

If the client is old and doesn't understand tiering, don't let them use a
tiered pool. Reply with EOPNOTSUPP.

Fixes: #8714
Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Tue, 29 Jul 2014 23:23:12 +0000 (16:23 -0700)]

Merge tag 'v0.83'

v0.83

commit | commitdiff | tree

John Spray [Tue, 29 Jul 2014 22:55:30 +0000 (23:55 +0100)]

Merge pull request #2161 from ceph/wip-jcsp-test

Reviewed-by: Greg Farnum greg@inktank.com

commit | commitdiff | tree

John Spray [Tue, 22 Jul 2014 01:42:15 +0000 (02:42 +0100)]

mds: remove some rogue "using namespace std;"

Signed-off-by: John Spray <john.spray@redhat.com>

commit | commitdiff | tree

John Spray [Tue, 22 Jul 2014 01:08:08 +0000 (02:08 +0100)]

mds: handle replaying old format journals

To get back to the reformatting procedure that otherwise
occurs during MDLog::open, introduce an MDLog::reopen call
that MDS can use in the standbyreplay->standby transition
for the special case where the journal is old.

Fixes: #8869
Signed-off-by: John Spray <john.spray@redhat.com>

commit | commitdiff | tree

John Spray [Mon, 21 Jul 2014 19:22:46 +0000 (20:22 +0100)]

mds: introduce explicit DaemonState instead of int

Signed-off-by: John Spray <john.spray@redhat.com>

commit | commitdiff | tree

John Spray [Mon, 21 Jul 2014 17:50:07 +0000 (18:50 +0100)]

mds: refactor MDS boot

* Make boot_start private.
* Define boot stages in enum, replace int with type.
* Merge steps 0 and 1, 0 always fell through to 1.
* starting_done was only ever reached by a fall through
from the previous step, so call it directly from there.

Signed-off-by: John Spray <john.spray@redhat.com>

commit | commitdiff | tree

John Spray [Mon, 21 Jul 2014 16:08:46 +0000 (17:08 +0100)]

mds: make MDS::replay_done clearer

... and add some assertions.

Signed-off-by: John Spray <john.spray@redhat.com>

commit | commitdiff | tree

John Spray [Tue, 22 Jul 2014 11:16:26 +0000 (12:16 +0100)]

mds: remove unused purge_prealloc_ino

Signed-off-by: John Spray <john.spray@redhat.com>

commit | commitdiff | tree

John Spray [Thu, 17 Jul 2014 23:44:38 +0000 (00:44 +0100)]

mds: separate inode recovery queue from MDCache

Refactor to:
* have somewhere to put some logic for doing
background recovery in future.
* trim a few lines from the oversized MDCache.cc
whereever we can.

Signed-off-by: John Spray <john.spray@redhat.com>

commit | commitdiff | tree

Sandon Van Ness [Tue, 29 Jul 2014 21:11:03 +0000 (14:11 -0700)]

python-ceph: require libcephfs.

Signed-off-by: Sandon Van Ness <sandon@inktank.com>

commit | commitdiff | tree

Jenkins [Tue, 29 Jul 2014 20:42:53 +0000 (13:42 -0700)]

0.83

commit | commitdiff | tree

Gregory Farnum [Tue, 29 Jul 2014 20:40:31 +0000 (16:40 -0400)]

Merge pull request #2159 from ceph/wip-undump

tools/cephfs: fuller header in dump/undump

Reviewed-by: Greg Farnum <greg@inktank.com>

commit | commitdiff | tree

Sandon Van Ness [Mon, 28 Jul 2014 17:38:41 +0000 (10:38 -0700)]

Remove reference from mkcephfs.

A bit of colission from spec changes for the rhel7/ceph-common
changes and alfredo's pull request for wip-die-ceph-mkcephfs.

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
(cherry picked from commit 1526546ddcfd4230403d0d2364575c4e46970f8d)

commit | commitdiff | tree

Gregory Farnum [Tue, 29 Jul 2014 19:36:19 +0000 (15:36 -0400)]

Merge pull request #2156 from ceph/wip-upstart-nfile

upstart/ceph-osd.conf: bump nofile limit up by 10x

Reviewed-by: Greg Farnum <greg@inktank.com>

commit | commitdiff | tree

Sage Weil [Tue, 29 Jul 2014 19:33:52 +0000 (12:33 -0700)]

doc/release-notes: typo

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Tue, 29 Jul 2014 19:23:33 +0000 (12:23 -0700)]

doc/release-notes: v0.80.5 release notes

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Tue, 29 Jul 2014 18:16:24 +0000 (11:16 -0700)]

Merge remote-tracking branch 'gh/next'

commit | commitdiff | tree

Greg Farnum [Tue, 29 Jul 2014 17:33:34 +0000 (10:33 -0700)]

Merge branch 'origin/wip-osd-leaks'

Reviewed-by: Greg Farnum <greg@inktank.com>

commit | commitdiff | tree

Gregory Farnum [Tue, 29 Jul 2014 13:04:12 +0000 (09:04 -0400)]

Merge pull request #2139 from ceph/wip-journal-header

os/FileJournal: Update the journal header when closing journal

Reviewed-by: Greg Farnum <greg@inktank.com>

commit | commitdiff | tree

Gregory Farnum [Tue, 29 Jul 2014 13:01:41 +0000 (09:01 -0400)]

Merge pull request #2146 from ceph/wip-8932

ceph_test_rados_api_tier: do fewer writes in HitSetWrite

Reviewed-by: Greg Farnum <greg@inktank.com>

commit | commitdiff | tree

Gregory Farnum [Tue, 29 Jul 2014 12:58:30 +0000 (08:58 -0400)]

Merge pull request #2147 from ceph/wip-8931

osd: fix ops blocked by full cache tier dequeue

Reviewed-by: Greg Farnum <greg@inktank.com>

commit | commitdiff | tree

Sage Weil [Tue, 29 Jul 2014 00:17:23 +0000 (17:17 -0700)]

unittest_crush_wrapper: fix build

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Dan Mick [Tue, 29 Jul 2014 00:06:41 +0000 (17:06 -0700)]

Merge pull request #2150 from ceph/wip-libs

don't link everything with blkid, udev, and boost_threads

commit | commitdiff | tree

Josh Durgin [Mon, 28 Jul 2014 21:30:51 +0000 (14:30 -0700)]

Merge pull request #2153 from ceph/wip-fsx-overlap

librbd API fix + wip-fsx-overlap

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

commit | commitdiff | tree

Sage Weil [Mon, 28 Jul 2014 18:41:09 +0000 (11:41 -0700)]

Merge pull request #2152 from xiaoxichen/fix_ceph_df

PGMonitor: fix bug in caculating pool avail space

Reviewed-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sandon Van Ness [Mon, 28 Jul 2014 17:38:41 +0000 (10:38 -0700)]

commit | commitdiff | tree

Xiaoxi Chen [Mon, 28 Jul 2014 16:42:10 +0000 (00:42 +0800)]

Fix some style and checking issue

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>

commit | commitdiff | tree

Sage Weil [Mon, 28 Jul 2014 16:27:20 +0000 (09:27 -0700)]

upstart/ceph-osd.conf: bump nofile limit up by 10x

This should ensure that we don't hit this limit on all but the very biggest
clusters. We seen it hit on a ~500 OSD dumpling cluster.

Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Mon, 28 Jul 2014 16:22:47 +0000 (09:22 -0700)]

Merge pull request #2154 from simon3z/master

init: add systemd service files

Reviewed-by: Alfredo Deza <alfredo.deza@inktank.com>
Reviewed-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

John Spray [Mon, 28 Jul 2014 14:32:12 +0000 (15:32 +0100)]

tools/cephfs: fuller header in dump/undump

There were two problems here:
* write_pos was modified through an undump/dump cycle,
because it was probed during recovery.
* stream format was being forgotten.

Signed-off-by: John Spray <john.spray@redhat.com>

commit | commitdiff | tree

Ilya Dryomov [Fri, 25 Jul 2014 12:04:53 +0000 (16:04 +0400)]

test_librbd_fsx: clone/flatten probabilities

Higher the clone probability to 8% and lower the probability of flatten
to 2%. This should give us longer parent chaines (before this we would
usually have one parent and even then only for a few ops time).

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>

commit | commitdiff | tree

Ilya Dryomov [Fri, 25 Jul 2014 12:04:53 +0000 (16:04 +0400)]

test_librbd_fsx: randomize_parent_overlap

Truncate base images after they have been cloned from to cover more
code paths and make sure that clients look at snapshot parent_overlap
(i.e. parent_overlap of the base image at the time the snapshot was
taken) and not that of the base image (i.e. parent_overlap of the base
image as of now).

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>

commit | commitdiff | tree

Ilya Dryomov [Wed, 9 Jul 2014 13:59:02 +0000 (17:59 +0400)]

test_librbd_fsx: introduce rbd_image_has_parent()

A helper to check whether the image associated with the ctx has
a parent or not.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>

commit | commitdiff | tree

Ilya Dryomov [Tue, 8 Jul 2014 15:21:54 +0000 (19:21 +0400)]

librbd: make rbd_get_parent_info() accept NULL out params

The C++ version of rbd_get_parent_info() allows passing NULL for parent
image name, image name and snapshot name out parameters. Make C API do
the same both for consistency and to make it easier to check whether
the image at hand has a parent or not.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>

commit | commitdiff | tree

Xiaoxi Chen [Mon, 28 Jul 2014 08:54:48 +0000 (16:54 +0800)]

PGMonitor: fix bug in caculating pool avail space

Currently for pools with different rules, "ceph df" cannot report
right available space for them, respectively. For detail assisment
of the bug ,pls refer to bug report #8943

This patch fix this bug and make ceph df works correctlly.

Fixes Bug #8943

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>

commit | commitdiff | tree

Sage Weil [Mon, 28 Jul 2014 02:39:34 +0000 (19:39 -0700)]

Merge pull request #2149 from yuyuyu101/wip-flush-set

Fix dup bh_write for TX state bh

Tested-by: Sage Weil <sage@redhat.com>
Reviewed-by: Haomai Wang <haomaiwang@gmail.com>
Original changeset

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

commit | commitdiff | tree

Sage Weil [Sun, 27 Jul 2014 23:58:08 +0000 (16:58 -0700)]

configure.ac: link libboost_thread only with json-spirit

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Sat, 19 Jul 2014 05:56:05 +0000 (22:56 -0700)]

configure: don't link blkid, udev to everything

These are already explicitly called out for libkrbd; don't need them in
LIBS.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Haomai Wang [Sun, 27 Jul 2014 05:37:49 +0000 (13:37 +0800)]

Only write bufferhead when it's dirty

The TX state bh should be skipped because the bh should be inflight. We only
need to write dirty bh. And TX and dirty state bh both should be waited until
flushed.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>

commit | commitdiff | tree

Josh Durgin [Mon, 21 Jul 2014 21:09:48 +0000 (14:09 -0700)]

ObjectCacher: fix bh_{add,remove} dirty_or_tx_bh accounting

tx buffers need to go on the bh_lru_rest as well, and removing erases
(not inserts) them into dirty_or_tx_bh.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>

commit | commitdiff | tree

Josh Durgin [Mon, 21 Jul 2014 21:08:44 +0000 (14:08 -0700)]

ObjectCacher: fix dirty_or_tx_bh logic in bh_set_state()

The else-if chain here was wrong. Handling dirty or tx buffers and
errors should be in independent conditions.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>

commit | commitdiff | tree

Haomai Wang [Wed, 16 Jul 2014 06:34:22 +0000 (14:34 +0800)]

Wait tx state buffer in flush_set

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>

commit | commitdiff | tree

Haomai Wang [Mon, 14 Jul 2014 06:27:17 +0000 (14:27 +0800)]

Add rbdcache max dirty object option

Librbd will calculate max dirty object according to rbd_cache_max_size, it
doesn't suitable for every case. If user set image order 24, the calculating
result is too small for reality. It will increase the overhead of trim call
which is called each read/write op.

Now we make it as option for tunning, by default this value is calculated.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>

commit | commitdiff | tree

Haomai Wang [Mon, 14 Jul 2014 06:32:57 +0000 (14:32 +0800)]

Reduce ObjectCacher flush overhead

Flush op in ObjectCacher will iterate the whole active object set, each
dirty object also may own several BufferHead. If the object set is large,
it will consume too much time.

Use dirty_bh instead to reduce overhead. Now only dirty BufferHead will
be checked.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>

commit | commitdiff | tree

Sage Weil [Sun, 27 Jul 2014 04:19:34 +0000 (21:19 -0700)]

Revert "Merge pull request #2129 from ceph/wip-librbd-oc"

This reverts commit 74b386f03e4ca9970256db72c575589aea077534, reversing
changes made to 36265d0db0d7c0eb31d25a0f77ac233b3fd198f8.

The dirty_or_tx list is used by flush_set, which means we can
resubmit new IOs for writes that are already in progress. This
has a compounding effect that overwhelms the OSDs with dup IOs
and stalls out the client.

See, for example, teh failues in this run:
/a/sage-2014-07-25_17:14:20-fs-wip-msgr-testing-basic-plana

The fix is probably pretty simple, but reverting for now to make
the tests pass.

Signed-off-by: Sage Weil <sage@inktank.com>

commit | commitdiff | tree

Sage Weil [Sat, 26 Jul 2014 04:42:35 +0000 (21:42 -0700)]

Merge remote-tracking branch 'gh/next'

Conflicts:
src/osdc/Journaler.h

commit | commitdiff | tree

John Spray [Thu, 17 Jul 2014 12:15:45 +0000 (13:15 +0100)]

mds: fix journal reformat failure in standbyreplay

In the 0.82 release, standbyreplay MDS daemons would try
to reformat the jouranl if they saw an older version on
disk, where this should have only been done by the active
MDS for the rank. Depending on timing, this could cause
fatal corruption of the journal.

This change handles the following cases:
* only do reformat if not in standbyreplay (else raise EAGAIN
to keep trying til an active mds reformats it)
* if journal header goes away while in standbyreplay then raise
EAGAIN (handle rewrite happening in background)
* if journal version is greater than the max supported, suicide

Fixes: #8811
Signed-off-by: John Spray <john.spray@redhat.com>
(cherry picked from commit 5438500af8979fda32e61714ae40b71c7ffdfd15)

commit | commitdiff | tree

Sage Weil [Fri, 25 Jul 2014 22:23:25 +0000 (15:23 -0700)]

Merge pull request #2112 from ceph/wip-rbd-defaults

respect rbd_default_* parameters in /usr/bin/rbd

Reviewed-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Fri, 25 Jul 2014 21:48:10 +0000 (14:48 -0700)]

osd/ReplicatedPG: requeue cache full waiters if no longer writeback

If the cache is full, we block some requests, and then we change the
cache_mode to something else (say, forward), the full waiters don't get
requeued until the cache becomes un-full. In the meantime, however, later
requests will get processed and redirected, breaking the op ordering.

Fix this by requeueing any full waiters if we see that the cache_mode is
not writeback.

Fixes: #8931
Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Fri, 25 Jul 2014 21:43:48 +0000 (14:43 -0700)]

osd/ReplicatedPG: fix cache full -> not full requeueing when !active

We only want to do this if is_active(). Otherwise, the normal
requeueing code will do its thing, taking care to get the queue orders
correct.

Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Fri, 25 Jul 2014 20:51:45 +0000 (13:51 -0700)]

ceph_test_rados_api_tier: do fewer writes in HitSetWrite

We don't need to do quite so many writes. It can be slow when we are
thrashing and aren't doing anything in parallel.

Fixes: #8932
Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Dan Mick [Fri, 25 Jul 2014 20:19:42 +0000 (13:19 -0700)]

Merge pull request #2145 from ceph/wip-ref-put

common/RefCountedObject: fix use-after-free in debug print

Reviewed-by: Dan Mick <dan.mick@inktank.com>

commit | commitdiff | tree

Sage Weil [Fri, 25 Jul 2014 20:17:32 +0000 (13:17 -0700)]

common/RefCountedObject: fix use-after-free in debug print

We could race with another thread that deletes this right after we call
dec(). Our access of cct would then become a use-after-free. Valgrind
managed to turn this up.

Copy it into a local variable before the dec() to be safe, and move the
dout line below to make this possibility explicit and obvious in the code.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Josh Durgin [Fri, 25 Jul 2014 18:36:29 +0000 (11:36 -0700)]

Merge pull request #2143 from ceph/wip-rgw-align

Wip rgw align

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

commit | commitdiff | tree

Yehuda Sadeh [Thu, 24 Jul 2014 22:30:27 +0000 (15:30 -0700)]

rgw: object write should not exceed part size

Fixes: #8928
This can happen if the stripe size is not a multiple of the chunk size.

Backport: firefly

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

commit | commitdiff | tree

Yehuda Sadeh [Tue, 22 Jul 2014 22:30:11 +0000 (15:30 -0700)]

rgw: align object chunk size with pool alignment

Fixes: #8442
Backport: firefly
Data pools might have strict write alignment requirements. Use pool
alignment info when setting the max_chunk_size for the write.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>

commit | commitdiff | tree

Sage Weil [Fri, 25 Jul 2014 17:34:33 +0000 (10:34 -0700)]

Merge pull request #2141 from ceph/wip-8882

osd: set pg flag INCOMPLETE_CLONES when turning off cache pool

Reviewed-by: Greg Farnum <greg@inktank.com>
First patch Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>

commit | commitdiff | tree

John Wilkins [Fri, 25 Jul 2014 16:55:52 +0000 (09:55 -0700)]

doc: Add additional hyperlink to Cache Tiering defaults.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

commit | commitdiff | tree

John Wilkins [Fri, 25 Jul 2014 16:55:28 +0000 (09:55 -0700)]

doc: Update doc from user feedback.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

commit | commitdiff | tree

Sage Weil [Fri, 25 Jul 2014 16:20:20 +0000 (09:20 -0700)]

osd: fix bad Message* defer in C_SendMap and send_map_on_destruct

We were carrying a bare Message*, which could get freed if the op was
canceled (or possibly completed). Instead, just stash the entity_name_t,
the only piece we need. The Connection is properly ref counted so no
worries there.

Fixes: #8926
Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Fri, 25 Jul 2014 16:03:34 +0000 (09:03 -0700)]

Merge pull request #2142 from ceph/wip-data-pool

test: catch a straggler still using 'data' pool

Reviewed-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

John Spray [Fri, 25 Jul 2014 16:01:39 +0000 (17:01 +0100)]

test: catch a straggler still using 'data' pool

Used rbd pool instead, which is still created by default.

Signed-off-by: John Spray <john.spray@redhat.com>

commit | commitdiff | tree

Ma Jianpeng [Wed, 23 Jul 2014 17:10:38 +0000 (10:10 -0700)]

os/FileJournal: Update the journal header when closing journal

When closing journal, it should check must_write_header and update
journal header if must_write_header alreay set.
It can reduce the nosense journal-replay after restarting osd.

Signed-off-by: Ma Jianpeng <jianpeng.ma@intel.com>
Reviewed-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Fri, 25 Jul 2014 01:22:22 +0000 (18:22 -0700)]

msg/SimpleMessenger: drop local_conneciton priv link on shutdwon

This breaks ref cycles between the local_connection and session, and let's
us drop the explicit set_priv() calls in OSD::shutdown().

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

John Wilkins [Thu, 24 Jul 2014 23:00:52 +0000 (16:00 -0700)]

doc: Updated mon doc per feedback. Fixed hyperlinks.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>

commit | commitdiff | tree

Gregory Farnum [Thu, 24 Jul 2014 21:36:21 +0000 (14:36 -0700)]

Merge pull request #2079 from nereocystis/seq_read_bench-args

Make the declaration argument names match those in the implementation (as used by callers).

Reviewed-by: Greg Farnum <greg@inktank.com>

commit | commitdiff | tree

Abhishek Lekshmanan [Thu, 24 Jul 2014 15:00:43 +0000 (20:30 +0530)]

doc: update radosgw man page with available opts

Fixes:#8112

Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@gmail.com>

commit | commitdiff | tree

Abhishek Lekshmanan [Thu, 24 Jul 2014 15:00:42 +0000 (20:30 +0530)]

rgw: list all available options during help()

Adding the available help arguments from the man page

Fixes: #8112
Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@gmail.com>

commit | commitdiff | tree

Abhishek Lekshmanan [Thu, 24 Jul 2014 15:00:41 +0000 (20:30 +0530)]

rgw: format help options to align with the rest

Whitespace removal to make all help options align in a similar fashion

Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@gmail.com>

commit | commitdiff | tree

Sage Weil [Thu, 24 Jul 2014 01:25:53 +0000 (18:25 -0700)]

osd/ReplicatedPG: observe INCOMPLETE_CLONES in is_present_clone()

We cannot assume that just because cache_mode is NONE that we will have
all clones present; check for the absense of the INCOMPLETE_CLONES flag
here too.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Thu, 24 Jul 2014 01:24:51 +0000 (18:24 -0700)]

osd/ReplicatedPG: observed INCOMPLETE_CLONES when doing clone subsets

During recovery, we can clone subsets if we know that all clones will be
present. We skip this on caching pools because they may not be; do the
same when INCOMPLETE_CLONES is set.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Thu, 24 Jul 2014 01:23:56 +0000 (18:23 -0700)]

osd/ReplicatedPG: do not complain about missing clones when INCOMPLETE_CLONES is set

When scrubbing, do not complain about missing cloens when we are in a
caching mode *or* when the INCOMPLETE_CLONES flag is set. Both are
indicators that we may be missing clones and that that is okay.

Fixes: #8882
Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Thu, 24 Jul 2014 01:21:38 +0000 (18:21 -0700)]

osd/osd_types: add pg_pool_t FLAG_COMPLETE_CLONES

Set a flag on the pg_pool_t when we change cache_mode NONE. This
is because object promotion may promote heads without all of the clones,
and when we switch the cache_mode back those objects may remain. Do
this on any cache_mode change (to or from NONE) to capture legacy
pools that were set up before this flag existed.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Thu, 24 Jul 2014 17:06:31 +0000 (10:06 -0700)]

mon/OSDMonitor: improve no-op cache_mode set check

If we have a pending pool value but the cache_mode hasn't changed, this is
still a no-op (and we don't need to block).

Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Thu, 24 Jul 2014 02:14:52 +0000 (19:14 -0700)]

Merge remote-tracking branch 'gh/next'

commit | commitdiff | tree

Sage Weil [Thu, 24 Jul 2014 02:13:55 +0000 (19:13 -0700)]

Merge pull request #2127 from ceph/wip-8701

filestore: fix collection_move behavior

Reviewed-by: Greg Farnum <greg@inktank.com>

commit | commitdiff | tree

Sage Weil [Thu, 24 Jul 2014 02:13:11 +0000 (19:13 -0700)]

Merge pull request #2140 from ceph/wip-8889

osd: greedily get obc write lock in some cases

Reviewed-by: Greg Farnum <greg@inktank.com>

commit | commitdiff | tree

Sage Weil [Tue, 22 Jul 2014 13:53:41 +0000 (06:53 -0700)]

ceph_test_objectstore: clean up on finish of MoveRename

Otherwise, we leave collections around, and the next test fails.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Mon, 21 Jul 2014 20:45:21 +0000 (13:45 -0700)]

os/LFNIndex: use FDCloser for fsync_dir

This prevents an fd leak when maybe_inject_failure() throws an exception.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Sat, 19 Jul 2014 06:16:09 +0000 (23:16 -0700)]

os/LFNIndex: only consider alt xattr if nlink > 1

If we are doing a lookup, the main xattr fails, we'll check if there is an
alt xattr. If it exists, but the nlink on the inode is only 1, we will
kill the xattr. This cleans up the mess left over by an incomplete
lfn_unlink operation.

This resolves the problem with an lfn_link to a second long name that
hashes to the same short_name: we will ignore the old name the moment the
old link goes away.

Fixes: #8701
Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Sat, 19 Jul 2014 00:28:18 +0000 (17:28 -0700)]

os/LFNIndex: remove alt xattr after unlink

After we unlink, if the nlink on the inode is still non-zero, remove the
alt xattr. We can *only* do this after the rename or unlink operation
because we don't want to leave a file system link in place without the
matching xattr; hence the fsync_dir() call.

Note that this might leak an alt xattr if we happen to fail after the
rename/unlink but before the removexattr is committed. We'll fix that
next.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Mon, 21 Jul 2014 20:43:42 +0000 (13:43 -0700)]

os/LFNIndex: FDCloser helper

Add a helper to close fd's when we leave scope. This is important when
injecting failures by throwing exceptions.

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Sat, 19 Jul 2014 00:09:07 +0000 (17:09 -0700)]

os/LFNIndex: handle long object names with multiple links (i.e., rename)

When we rename an object (collection_move_rename) to a different name, and
the name is long, we run into problems because the lfn xattr can only track
a single long name linking to the inode.  For example, suppose we have

foobar -> foo_123_0 (attr: foobar) where foobar hashes to 123.

At first, collection_add could only link a file to another file in a
different collection with the same name. Allowing collection_move_rename
to rename the file, however, means that we have to convert:

col1/foobar -> foo_123_0 (attr: foobar)

to

col1/foobaz -> foo_234_0 (attr: foobaz)

This is a problem because if we link, reset xattr, unlink we end up with

col1/foobar -> foo_123_0 (attr: foobaz)

if we restart after we reset the attr.  This will cause the initial foobar
lookup to since the attr doesn't match, and the file won't be able to be
looked up.

Fix this by allow *two* (long) names to link to the same inode.  If we
lfn_link a second (different) name, move the previous name to the "alt"
xattr and set the new name.  (This works because link is always followed
by unlink.)  On lookup, check either xattr.

Don't even bother to remove the alt xattr on unlink.  This works as long
as the old name and new name don't hash to the same shortname and end up
in the same LFN chain.  (Don't worry, we'll fix that next.)

Fixes part of #8701
Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Fri, 18 Jul 2014 22:46:58 +0000 (15:46 -0700)]

ceph_test_objectstore: fix warning

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Samuel Just [Tue, 15 Jul 2014 21:50:33 +0000 (14:50 -0700)]

store_test: add long name collection_move_rename tests

Currently fails.

Signed-off-by: Samuel Just <sam.just@inktank.com>

commit | commitdiff | tree

Dan Mick [Thu, 3 Jul 2014 23:11:24 +0000 (16:11 -0700)]

ceph.spec.in: add bash completion file for radosgw-admin

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
(cherry picked from commit b70096307130bcbac176704493a63c5d039d3edc)

commit | commitdiff | tree

Dan Mick [Thu, 3 Jul 2014 23:10:55 +0000 (16:10 -0700)]

ceph.spec.in: rhel7-related changes:

udev rules: /lib -> /usr/lib
/sbin binaries move to /usr/sbin or %{_sbindir}

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
(cherry picked from commit 235e4c7de8f8efe491edefbdde8e5da4dfc44034)

commit | commitdiff | tree

Dan Mick [Thu, 3 Jul 2014 23:08:44 +0000 (16:08 -0700)]

Fix/add missing dependencies:

- rbd-fuse depends on librados2/librbd1
- ceph-devel depends on specific releases of libs and libcephfs_jni1
- librbd1 depends on librados2
- python-ceph does not depend on libcephfs1

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
(cherry picked from commit 7cf81322391b629b241da90181800ca1f138ce78)

commit | commitdiff | tree

Dan Mick [Thu, 3 Jul 2014 23:05:00 +0000 (16:05 -0700)]

ceph.spec.in: whitespace fixes

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
(cherry picked from commit ec8af52a5ede78511423a1455a496d46d580c644)

commit | commitdiff | tree

Dan Mick [Thu, 3 Jul 2014 23:04:10 +0000 (16:04 -0700)]

ceph.spec.in: split out ceph-common as in Debian

Move files, postun scriptlet, and add dependencies on ceph-common
where appropriate

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
(cherry picked from commit e131b9d5a5e90e87d8a8346cb96cb5a26135c144)

commit | commitdiff | tree

Sage Weil [Wed, 23 Jul 2014 17:11:59 +0000 (10:11 -0700)]

common/random_cache: fix typo

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Wed, 23 Jul 2014 16:57:59 +0000 (09:57 -0700)]

Merge pull request #2136 from yuyuyu101/fix-randomcache

common/RandomCache: Fix inconsistence between contents and count

Reviewed-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Haomai Wang [Wed, 23 Jul 2014 03:26:18 +0000 (11:26 +0800)]

common/RandomCache: Fix inconsistence between contents and count

The add/clear method may cause count inconsistent with the real size of
contents.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>

commit | commitdiff | tree

Sage Weil [Wed, 23 Jul 2014 01:01:14 +0000 (18:01 -0700)]

osd/ReplicatedPG: debug obc locks

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Tue, 22 Jul 2014 20:16:11 +0000 (13:16 -0700)]

osd/ReplicatedPG: greedily take write_lock for copyfrom finish, snapdir

In the cases where we are taking a write lock and are careful
enough that we know we should succeed (i.e, we assert(got)),
use the get_write_greedy() variant that skips the checks for
waiters (be they ops or backfill) that are normally necessary
to avoid starvation. We don't care about staration here
because our op is already in-progress and can't easily be
aborted, and new ops won't start because they do make those
checks.

Fixes: #8889
Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Sage Weil [Tue, 22 Jul 2014 20:11:42 +0000 (13:11 -0700)]

osd: allow greedy get_write() for ObjectContext locks

There are several lockers that need to take a write lock
because there is an operation that is already in progress and
know it is safe to do so. In particular, they need to skip
the starvation checks (op waiters, backfill waiting).

Signed-off-by: Sage Weil <sage@redhat.com>

commit | commitdiff | tree

Josh Durgin [Tue, 22 Jul 2014 23:58:25 +0000 (16:58 -0700)]

Merge pull request #2120 from ceph/wip-8858

Wip 8858

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>

commit | commitdiff | tree

Gregory Farnum [Tue, 22 Jul 2014 22:36:40 +0000 (15:36 -0700)]

Merge pull request #2133 from ceph/wip-8897

os: fix build warnings with name/attr len checks (fixes 8889)

Reviewed-by: Greg Farnum <greg@inktank.com>

commit | commitdiff | tree

João Eduardo Luís [Tue, 22 Jul 2014 21:10:17 +0000 (22:10 +0100)]

Merge pull request #2128 from ceph/wip-8851

mon: AuthMonitor: always encode full regardless of keyserver having keys

Reviewed-by: Gregory Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@redhat.com>

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom