]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
11 years agoosd: prevent old clients from using tiered pools 2172/head
Sage Weil [Wed, 30 Jul 2014 20:57:34 +0000 (13:57 -0700)]
osd: prevent old clients from using tiered pools

If the client is old and doesn't understand tiering, don't let them use a
tiered pool.  Reply with EOPNOTSUPP.

Fixes: #8714
Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoMerge tag 'v0.83'
Sage Weil [Tue, 29 Jul 2014 23:23:12 +0000 (16:23 -0700)]
Merge tag 'v0.83'

v0.83

11 years agoMerge pull request #2161 from ceph/wip-jcsp-test
John Spray [Tue, 29 Jul 2014 22:55:30 +0000 (23:55 +0100)]
Merge pull request #2161 from ceph/wip-jcsp-test

Reviewed-by: Greg Farnum greg@inktank.com
11 years agomds: remove some rogue "using namespace std;" 2161/head
John Spray [Tue, 22 Jul 2014 01:42:15 +0000 (02:42 +0100)]
mds: remove some rogue "using namespace std;"

Signed-off-by: John Spray <john.spray@redhat.com>
11 years agomds: handle replaying old format journals
John Spray [Tue, 22 Jul 2014 01:08:08 +0000 (02:08 +0100)]
mds: handle replaying old format journals

To get back to the reformatting procedure that otherwise
occurs during MDLog::open, introduce an MDLog::reopen call
that MDS can use in the standbyreplay->standby transition
for the special case where the journal is old.

Fixes: #8869
Signed-off-by: John Spray <john.spray@redhat.com>
11 years agomds: introduce explicit DaemonState instead of int
John Spray [Mon, 21 Jul 2014 19:22:46 +0000 (20:22 +0100)]
mds: introduce explicit DaemonState instead of int

Signed-off-by: John Spray <john.spray@redhat.com>
11 years agomds: refactor MDS boot
John Spray [Mon, 21 Jul 2014 17:50:07 +0000 (18:50 +0100)]
mds: refactor MDS boot

* Make boot_start private.
* Define boot stages in enum, replace int with type.
* Merge steps 0 and 1, 0 always fell through to 1.
* starting_done was only ever reached by a fall through
  from the previous step, so call it directly from there.

Signed-off-by: John Spray <john.spray@redhat.com>
11 years agomds: make MDS::replay_done clearer
John Spray [Mon, 21 Jul 2014 16:08:46 +0000 (17:08 +0100)]
mds: make MDS::replay_done clearer

... and add some assertions.

Signed-off-by: John Spray <john.spray@redhat.com>
11 years agomds: remove unused purge_prealloc_ino
John Spray [Tue, 22 Jul 2014 11:16:26 +0000 (12:16 +0100)]
mds: remove unused purge_prealloc_ino

Signed-off-by: John Spray <john.spray@redhat.com>
11 years agomds: separate inode recovery queue from MDCache
John Spray [Thu, 17 Jul 2014 23:44:38 +0000 (00:44 +0100)]
mds: separate inode recovery queue from MDCache

Refactor to:
* have somewhere to put some logic for doing
  background recovery in future.
* trim a few lines from the oversized MDCache.cc
  whereever we can.

Signed-off-by: John Spray <john.spray@redhat.com>
11 years agopython-ceph: require libcephfs.
Sandon Van Ness [Tue, 29 Jul 2014 21:11:03 +0000 (14:11 -0700)]
python-ceph: require libcephfs.

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
11 years ago0.83 v0.83
Jenkins [Tue, 29 Jul 2014 20:42:53 +0000 (13:42 -0700)]
0.83

11 years agoMerge pull request #2159 from ceph/wip-undump
Gregory Farnum [Tue, 29 Jul 2014 20:40:31 +0000 (16:40 -0400)]
Merge pull request #2159 from ceph/wip-undump

tools/cephfs: fuller header in dump/undump

Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoRemove reference from mkcephfs.
Sandon Van Ness [Mon, 28 Jul 2014 17:38:41 +0000 (10:38 -0700)]
Remove reference from mkcephfs.

A bit of colission from spec changes for the rhel7/ceph-common
changes and alfredo's pull request for wip-die-ceph-mkcephfs.

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
(cherry picked from commit 1526546ddcfd4230403d0d2364575c4e46970f8d)

11 years agoMerge pull request #2156 from ceph/wip-upstart-nfile
Gregory Farnum [Tue, 29 Jul 2014 19:36:19 +0000 (15:36 -0400)]
Merge pull request #2156 from ceph/wip-upstart-nfile

upstart/ceph-osd.conf: bump nofile limit up by 10x

Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agodoc/release-notes: typo
Sage Weil [Tue, 29 Jul 2014 19:33:52 +0000 (12:33 -0700)]
doc/release-notes: typo

Signed-off-by: Sage Weil <sage@redhat.com>
11 years agodoc/release-notes: v0.80.5 release notes
Sage Weil [Tue, 29 Jul 2014 19:23:33 +0000 (12:23 -0700)]
doc/release-notes: v0.80.5 release notes

Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Tue, 29 Jul 2014 18:16:24 +0000 (11:16 -0700)]
Merge remote-tracking branch 'gh/next'

11 years agoMerge branch 'origin/wip-osd-leaks'
Greg Farnum [Tue, 29 Jul 2014 17:33:34 +0000 (10:33 -0700)]
Merge branch 'origin/wip-osd-leaks'

Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoMerge pull request #2139 from ceph/wip-journal-header
Gregory Farnum [Tue, 29 Jul 2014 13:04:12 +0000 (09:04 -0400)]
Merge pull request #2139 from ceph/wip-journal-header

os/FileJournal: Update the journal header when closing journal

Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoMerge pull request #2146 from ceph/wip-8932
Gregory Farnum [Tue, 29 Jul 2014 13:01:41 +0000 (09:01 -0400)]
Merge pull request #2146 from ceph/wip-8932

ceph_test_rados_api_tier: do fewer writes in HitSetWrite

Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoMerge pull request #2147 from ceph/wip-8931
Gregory Farnum [Tue, 29 Jul 2014 12:58:30 +0000 (08:58 -0400)]
Merge pull request #2147 from ceph/wip-8931

osd: fix ops blocked by full cache tier dequeue

Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agounittest_crush_wrapper: fix build
Sage Weil [Tue, 29 Jul 2014 00:17:23 +0000 (17:17 -0700)]
unittest_crush_wrapper: fix build

Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoMerge pull request #2150 from ceph/wip-libs
Dan Mick [Tue, 29 Jul 2014 00:06:41 +0000 (17:06 -0700)]
Merge pull request #2150 from ceph/wip-libs

don't link everything with blkid, udev, and boost_threads

11 years agoMerge pull request #2153 from ceph/wip-fsx-overlap
Josh Durgin [Mon, 28 Jul 2014 21:30:51 +0000 (14:30 -0700)]
Merge pull request #2153 from ceph/wip-fsx-overlap

librbd API fix + wip-fsx-overlap

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoMerge pull request #2152 from xiaoxichen/fix_ceph_df
Sage Weil [Mon, 28 Jul 2014 18:41:09 +0000 (11:41 -0700)]
Merge pull request #2152 from xiaoxichen/fix_ceph_df

PGMonitor: fix bug in caculating pool avail space

Reviewed-by: Sage Weil <sage@redhat.com>
11 years agoRemove reference from mkcephfs.
Sandon Van Ness [Mon, 28 Jul 2014 17:38:41 +0000 (10:38 -0700)]
Remove reference from mkcephfs.

A bit of colission from spec changes for the rhel7/ceph-common
changes and alfredo's pull request for wip-die-ceph-mkcephfs.

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
11 years agoFix some style and checking issue 2152/head
Xiaoxi Chen [Mon, 28 Jul 2014 16:42:10 +0000 (00:42 +0800)]
Fix some style and checking issue

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
11 years agoupstart/ceph-osd.conf: bump nofile limit up by 10x 2156/head
Sage Weil [Mon, 28 Jul 2014 16:27:20 +0000 (09:27 -0700)]
upstart/ceph-osd.conf: bump nofile limit up by 10x

This should ensure that we don't hit this limit on all but the very biggest
clusters.  We seen it hit on a ~500 OSD dumpling cluster.

Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoMerge pull request #2154 from simon3z/master
Sage Weil [Mon, 28 Jul 2014 16:22:47 +0000 (09:22 -0700)]
Merge pull request #2154 from simon3z/master

init: add systemd service files

Reviewed-by: Alfredo Deza <alfredo.deza@inktank.com>
Reviewed-by: Sage Weil <sage@redhat.com>
11 years agotools/cephfs: fuller header in dump/undump 2159/head
John Spray [Mon, 28 Jul 2014 14:32:12 +0000 (15:32 +0100)]
tools/cephfs: fuller header in dump/undump

There were two problems here:
 * write_pos was modified through an undump/dump cycle,
   because it was probed during recovery.
 * stream format was being forgotten.

Signed-off-by: John Spray <john.spray@redhat.com>
11 years agotest_librbd_fsx: clone/flatten probabilities 2153/head
Ilya Dryomov [Fri, 25 Jul 2014 12:04:53 +0000 (16:04 +0400)]
test_librbd_fsx: clone/flatten probabilities

Higher the clone probability to 8% and lower the probability of flatten
to 2%.  This should give us longer parent chaines (before this we would
usually have one parent and even then only for a few ops time).

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
11 years agotest_librbd_fsx: randomize_parent_overlap
Ilya Dryomov [Fri, 25 Jul 2014 12:04:53 +0000 (16:04 +0400)]
test_librbd_fsx: randomize_parent_overlap

Truncate base images after they have been cloned from to cover more
code paths and make sure that clients look at snapshot parent_overlap
(i.e. parent_overlap of the base image at the time the snapshot was
taken) and not that of the base image (i.e. parent_overlap of the base
image as of now).

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
11 years agotest_librbd_fsx: introduce rbd_image_has_parent()
Ilya Dryomov [Wed, 9 Jul 2014 13:59:02 +0000 (17:59 +0400)]
test_librbd_fsx: introduce rbd_image_has_parent()

A helper to check whether the image associated with the ctx has
a parent or not.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
11 years agolibrbd: make rbd_get_parent_info() accept NULL out params
Ilya Dryomov [Tue, 8 Jul 2014 15:21:54 +0000 (19:21 +0400)]
librbd: make rbd_get_parent_info() accept NULL out params

The C++ version of rbd_get_parent_info() allows passing NULL for parent
image name, image name and snapshot name out parameters.  Make C API do
the same both for consistency and to make it easier to check whether
the image at hand has a parent or not.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
11 years agoPGMonitor: fix bug in caculating pool avail space
Xiaoxi Chen [Mon, 28 Jul 2014 08:54:48 +0000 (16:54 +0800)]
PGMonitor: fix bug in caculating pool avail space

Currently for pools with different rules, "ceph df" cannot report
right available space for them, respectively. For detail assisment
of the bug ,pls refer to bug report #8943

This patch fix this bug and make ceph df works correctlly.

Fixes Bug #8943

Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com>
11 years agoMerge pull request #2149 from yuyuyu101/wip-flush-set
Sage Weil [Mon, 28 Jul 2014 02:39:34 +0000 (19:39 -0700)]
Merge pull request #2149 from yuyuyu101/wip-flush-set

Fix dup bh_write for TX state bh

Tested-by: Sage Weil <sage@redhat.com>
Reviewed-by: Haomai Wang <haomaiwang@gmail.com>
Original changeset

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoconfigure.ac: link libboost_thread only with json-spirit 2150/head
Sage Weil [Sun, 27 Jul 2014 23:58:08 +0000 (16:58 -0700)]
configure.ac: link libboost_thread only with json-spirit

Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoconfigure: don't link blkid, udev to everything
Sage Weil [Sat, 19 Jul 2014 05:56:05 +0000 (22:56 -0700)]
configure: don't link blkid, udev to everything

These are already explicitly called out for libkrbd; don't need them in
LIBS.

Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoOnly write bufferhead when it's dirty 2149/head
Haomai Wang [Sun, 27 Jul 2014 05:37:49 +0000 (13:37 +0800)]
Only write bufferhead when it's dirty

The TX state bh should be skipped because the bh should be inflight. We only
need to write dirty bh. And TX and dirty state bh both should be waited until
flushed.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
11 years agoObjectCacher: fix bh_{add,remove} dirty_or_tx_bh accounting
Josh Durgin [Mon, 21 Jul 2014 21:09:48 +0000 (14:09 -0700)]
ObjectCacher: fix bh_{add,remove} dirty_or_tx_bh accounting

tx buffers need to go on the bh_lru_rest as well, and removing erases
(not inserts) them into dirty_or_tx_bh.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoObjectCacher: fix dirty_or_tx_bh logic in bh_set_state()
Josh Durgin [Mon, 21 Jul 2014 21:08:44 +0000 (14:08 -0700)]
ObjectCacher: fix dirty_or_tx_bh logic in bh_set_state()

The else-if chain here was wrong. Handling dirty or tx buffers and
errors should be in independent conditions.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoWait tx state buffer in flush_set
Haomai Wang [Wed, 16 Jul 2014 06:34:22 +0000 (14:34 +0800)]
Wait tx state buffer in flush_set

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
11 years agoAdd rbdcache max dirty object option
Haomai Wang [Mon, 14 Jul 2014 06:27:17 +0000 (14:27 +0800)]
Add rbdcache max dirty object option

Librbd will calculate max dirty object according to rbd_cache_max_size, it
doesn't suitable for every case. If user set image order 24, the calculating
result is too small for reality. It will increase the overhead of trim call
which is called each read/write op.

Now we make it as option for tunning, by default this value is calculated.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
11 years agoReduce ObjectCacher flush overhead
Haomai Wang [Mon, 14 Jul 2014 06:32:57 +0000 (14:32 +0800)]
Reduce ObjectCacher flush overhead

Flush op in ObjectCacher will iterate the whole active object set, each
dirty object also may own several BufferHead. If the object set is large,
it will consume too much time.

Use dirty_bh instead to reduce overhead. Now only dirty BufferHead will
be checked.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
11 years agoRevert "Merge pull request #2129 from ceph/wip-librbd-oc"
Sage Weil [Sun, 27 Jul 2014 04:19:34 +0000 (21:19 -0700)]
Revert "Merge pull request #2129 from ceph/wip-librbd-oc"

This reverts commit 74b386f03e4ca9970256db72c575589aea077534, reversing
changes made to 36265d0db0d7c0eb31d25a0f77ac233b3fd198f8.

The dirty_or_tx list is used by flush_set, which means we can
resubmit new IOs for writes that are already in progress.  This
has a compounding effect that overwhelms the OSDs with dup IOs
and stalls out the client.

See, for example, teh failues in this run:
  /a/sage-2014-07-25_17:14:20-fs-wip-msgr-testing-basic-plana

The fix is probably pretty simple, but reverting for now to make
the tests pass.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Sat, 26 Jul 2014 04:42:35 +0000 (21:42 -0700)]
Merge remote-tracking branch 'gh/next'

Conflicts:
src/osdc/Journaler.h

11 years agomds: fix journal reformat failure in standbyreplay
John Spray [Thu, 17 Jul 2014 12:15:45 +0000 (13:15 +0100)]
mds: fix journal reformat failure in standbyreplay

In the 0.82 release, standbyreplay MDS daemons would try
to reformat the jouranl if they saw an older version on
disk, where this should have only been done by the active
MDS for the rank.  Depending on timing, this could cause
fatal corruption of the journal.

This change handles the following cases:
* only do reformat if not in standbyreplay (else raise EAGAIN
to keep trying til an active mds reformats it)
* if journal header goes away while in standbyreplay then raise
EAGAIN (handle rewrite happening in background)
* if journal version is greater than the max supported, suicide

Fixes: #8811
Signed-off-by: John Spray <john.spray@redhat.com>
(cherry picked from commit 5438500af8979fda32e61714ae40b71c7ffdfd15)

11 years agoMerge pull request #2112 from ceph/wip-rbd-defaults
Sage Weil [Fri, 25 Jul 2014 22:23:25 +0000 (15:23 -0700)]
Merge pull request #2112 from ceph/wip-rbd-defaults

respect rbd_default_* parameters in /usr/bin/rbd

Reviewed-by: Sage Weil <sage@redhat.com>
11 years agoosd/ReplicatedPG: requeue cache full waiters if no longer writeback 2147/head
Sage Weil [Fri, 25 Jul 2014 21:48:10 +0000 (14:48 -0700)]
osd/ReplicatedPG: requeue cache full waiters if no longer writeback

If the cache is full, we block some requests, and then we change the
cache_mode to something else (say, forward), the full waiters don't get
requeued until the cache becomes un-full.  In the meantime, however, later
requests will get processed and redirected, breaking the op ordering.

Fix this by requeueing any full waiters if we see that the cache_mode is
not writeback.

Fixes: #8931
Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoosd/ReplicatedPG: fix cache full -> not full requeueing when !active
Sage Weil [Fri, 25 Jul 2014 21:43:48 +0000 (14:43 -0700)]
osd/ReplicatedPG: fix cache full -> not full requeueing when !active

We only want to do this if is_active().  Otherwise, the normal
requeueing code will do its thing, taking care to get the queue orders
correct.

Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoceph_test_rados_api_tier: do fewer writes in HitSetWrite 2146/head
Sage Weil [Fri, 25 Jul 2014 20:51:45 +0000 (13:51 -0700)]
ceph_test_rados_api_tier: do fewer writes in HitSetWrite

We don't need to do quite so many writes.  It can be slow when we are
thrashing and aren't doing anything in parallel.

Fixes: #8932
Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoMerge pull request #2145 from ceph/wip-ref-put
Dan Mick [Fri, 25 Jul 2014 20:19:42 +0000 (13:19 -0700)]
Merge pull request #2145 from ceph/wip-ref-put

common/RefCountedObject: fix use-after-free in debug print

Reviewed-by: Dan Mick <dan.mick@inktank.com>
11 years agocommon/RefCountedObject: fix use-after-free in debug print 2145/head
Sage Weil [Fri, 25 Jul 2014 20:17:32 +0000 (13:17 -0700)]
common/RefCountedObject: fix use-after-free in debug print

We could race with another thread that deletes this right after we call
dec().  Our access of cct would then become a use-after-free.  Valgrind
managed to turn this up.

Copy it into a local variable before the dec() to be safe, and move the
dout line below to make this possibility explicit and obvious in the code.

Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoMerge pull request #2143 from ceph/wip-rgw-align
Josh Durgin [Fri, 25 Jul 2014 18:36:29 +0000 (11:36 -0700)]
Merge pull request #2143 from ceph/wip-rgw-align

Wip rgw align

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
11 years agorgw: object write should not exceed part size 2143/head
Yehuda Sadeh [Thu, 24 Jul 2014 22:30:27 +0000 (15:30 -0700)]
rgw: object write should not exceed part size

Fixes: #8928
This can happen if the stripe size is not a multiple of the chunk size.

Backport: firefly

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
11 years agorgw: align object chunk size with pool alignment
Yehuda Sadeh [Tue, 22 Jul 2014 22:30:11 +0000 (15:30 -0700)]
rgw: align object chunk size with pool alignment

Fixes: #8442
Backport: firefly
Data pools might have strict write alignment requirements. Use pool
alignment info when setting the max_chunk_size for the write.

Signed-off-by: Yehuda Sadeh <yehuda@redhat.com>
11 years agoMerge pull request #2141 from ceph/wip-8882
Sage Weil [Fri, 25 Jul 2014 17:34:33 +0000 (10:34 -0700)]
Merge pull request #2141 from ceph/wip-8882

osd: set pg flag INCOMPLETE_CLONES when turning off cache pool

Reviewed-by: Greg Farnum <greg@inktank.com>
First patch Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>

11 years agodoc: Add additional hyperlink to Cache Tiering defaults.
John Wilkins [Fri, 25 Jul 2014 16:55:52 +0000 (09:55 -0700)]
doc: Add additional hyperlink to Cache Tiering defaults.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
11 years agodoc: Update doc from user feedback.
John Wilkins [Fri, 25 Jul 2014 16:55:28 +0000 (09:55 -0700)]
doc: Update doc from user feedback.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
11 years agoosd: fix bad Message* defer in C_SendMap and send_map_on_destruct
Sage Weil [Fri, 25 Jul 2014 16:20:20 +0000 (09:20 -0700)]
osd: fix bad Message* defer in C_SendMap and send_map_on_destruct

We were carrying a bare Message*, which could get freed if the op was
canceled (or possibly completed).  Instead, just stash the entity_name_t,
the only piece we need.  The Connection is properly ref counted so no
worries there.

Fixes: #8926
Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoMerge pull request #2142 from ceph/wip-data-pool
Sage Weil [Fri, 25 Jul 2014 16:03:34 +0000 (09:03 -0700)]
Merge pull request #2142 from ceph/wip-data-pool

test: catch a straggler still using 'data' pool

Reviewed-by: Sage Weil <sage@redhat.com>
11 years agotest: catch a straggler still using 'data' pool 2142/head
John Spray [Fri, 25 Jul 2014 16:01:39 +0000 (17:01 +0100)]
test: catch a straggler still using 'data' pool

Used rbd pool instead, which is still created by default.

Signed-off-by: John Spray <john.spray@redhat.com>
11 years agoos/FileJournal: Update the journal header when closing journal 2139/head
Ma Jianpeng [Wed, 23 Jul 2014 17:10:38 +0000 (10:10 -0700)]
os/FileJournal: Update the journal header when closing journal

When closing journal, it should check must_write_header and update
journal header if must_write_header alreay set.
It can reduce the nosense journal-replay after restarting osd.

Signed-off-by: Ma Jianpeng <jianpeng.ma@intel.com>
Reviewed-by: Sage Weil <sage@redhat.com>
11 years agomsg/SimpleMessenger: drop local_conneciton priv link on shutdwon
Sage Weil [Fri, 25 Jul 2014 01:22:22 +0000 (18:22 -0700)]
msg/SimpleMessenger: drop local_conneciton priv link on shutdwon

This breaks ref cycles between the local_connection and session, and let's
us drop the explicit set_priv() calls in OSD::shutdown().

Signed-off-by: Sage Weil <sage@redhat.com>
11 years agodoc: Updated mon doc per feedback. Fixed hyperlinks.
John Wilkins [Thu, 24 Jul 2014 23:00:52 +0000 (16:00 -0700)]
doc: Updated mon doc per feedback. Fixed hyperlinks.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
11 years agoMerge pull request #2079 from nereocystis/seq_read_bench-args
Gregory Farnum [Thu, 24 Jul 2014 21:36:21 +0000 (14:36 -0700)]
Merge pull request #2079 from nereocystis/seq_read_bench-args

Make the declaration argument names match those in the implementation (as used by callers).

Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agodoc: update radosgw man page with available opts
Abhishek Lekshmanan [Thu, 24 Jul 2014 15:00:43 +0000 (20:30 +0530)]
doc: update radosgw man page with available opts

Fixes:#8112

Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@gmail.com>
11 years agorgw: list all available options during help()
Abhishek Lekshmanan [Thu, 24 Jul 2014 15:00:42 +0000 (20:30 +0530)]
rgw: list all available options during help()

Adding the available help arguments from the man page

Fixes: #8112
Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@gmail.com>
11 years agorgw: format help options to align with the rest
Abhishek Lekshmanan [Thu, 24 Jul 2014 15:00:41 +0000 (20:30 +0530)]
rgw: format help options to align with the rest

Whitespace removal to make all help options align in a similar fashion

Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@gmail.com>
11 years agoosd/ReplicatedPG: observe INCOMPLETE_CLONES in is_present_clone() 2141/head
Sage Weil [Thu, 24 Jul 2014 01:25:53 +0000 (18:25 -0700)]
osd/ReplicatedPG: observe INCOMPLETE_CLONES in is_present_clone()

We cannot assume that just because cache_mode is NONE that we will have
all clones present; check for the absense of the INCOMPLETE_CLONES flag
here too.

Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoosd/ReplicatedPG: observed INCOMPLETE_CLONES when doing clone subsets
Sage Weil [Thu, 24 Jul 2014 01:24:51 +0000 (18:24 -0700)]
osd/ReplicatedPG: observed INCOMPLETE_CLONES when doing clone subsets

During recovery, we can clone subsets if we know that all clones will be
present.  We skip this on caching pools because they may not be; do the
same when INCOMPLETE_CLONES is set.

Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoosd/ReplicatedPG: do not complain about missing clones when INCOMPLETE_CLONES is set
Sage Weil [Thu, 24 Jul 2014 01:23:56 +0000 (18:23 -0700)]
osd/ReplicatedPG: do not complain about missing clones when INCOMPLETE_CLONES is set

When scrubbing, do not complain about missing cloens when we are in a
caching mode *or* when the INCOMPLETE_CLONES flag is set.  Both are
indicators that we may be missing clones and that that is okay.

Fixes: #8882
Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoosd/osd_types: add pg_pool_t FLAG_COMPLETE_CLONES
Sage Weil [Thu, 24 Jul 2014 01:21:38 +0000 (18:21 -0700)]
osd/osd_types: add pg_pool_t FLAG_COMPLETE_CLONES

Set a flag on the pg_pool_t when we change cache_mode NONE.  This
is because object promotion may promote heads without all of the clones,
and when we switch the cache_mode back those objects may remain.  Do
this on any cache_mode change (to or from NONE) to capture legacy
pools that were set up before this flag existed.

Signed-off-by: Sage Weil <sage@redhat.com>
11 years agomon/OSDMonitor: improve no-op cache_mode set check
Sage Weil [Thu, 24 Jul 2014 17:06:31 +0000 (10:06 -0700)]
mon/OSDMonitor: improve no-op cache_mode set check

If we have a pending pool value but the cache_mode hasn't changed, this is
still a no-op (and we don't need to block).

Backport: firefly
Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Thu, 24 Jul 2014 02:14:52 +0000 (19:14 -0700)]
Merge remote-tracking branch 'gh/next'

11 years agoMerge pull request #2127 from ceph/wip-8701
Sage Weil [Thu, 24 Jul 2014 02:13:55 +0000 (19:13 -0700)]
Merge pull request #2127 from ceph/wip-8701

filestore: fix collection_move behavior

Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoMerge pull request #2140 from ceph/wip-8889
Sage Weil [Thu, 24 Jul 2014 02:13:11 +0000 (19:13 -0700)]
Merge pull request #2140 from ceph/wip-8889

osd: greedily get obc write lock in some cases

Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoceph_test_objectstore: clean up on finish of MoveRename 2127/head
Sage Weil [Tue, 22 Jul 2014 13:53:41 +0000 (06:53 -0700)]
ceph_test_objectstore: clean up on finish of MoveRename

Otherwise, we leave collections around, and the next test fails.

Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoos/LFNIndex: use FDCloser for fsync_dir
Sage Weil [Mon, 21 Jul 2014 20:45:21 +0000 (13:45 -0700)]
os/LFNIndex: use FDCloser for fsync_dir

This prevents an fd leak when maybe_inject_failure() throws an exception.

Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoos/LFNIndex: only consider alt xattr if nlink > 1
Sage Weil [Sat, 19 Jul 2014 06:16:09 +0000 (23:16 -0700)]
os/LFNIndex: only consider alt xattr if nlink > 1

If we are doing a lookup, the main xattr fails, we'll check if there is an
alt xattr.  If it exists, but the nlink on the inode is only 1, we will
kill the xattr.  This cleans up the mess left over by an incomplete
lfn_unlink operation.

This resolves the problem with an lfn_link to a second long name that
hashes to the same short_name: we will ignore the old name the moment the
old link goes away.

Fixes: #8701
Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoos/LFNIndex: remove alt xattr after unlink
Sage Weil [Sat, 19 Jul 2014 00:28:18 +0000 (17:28 -0700)]
os/LFNIndex: remove alt xattr after unlink

After we unlink, if the nlink on the inode is still non-zero, remove the
alt xattr.  We can *only* do this after the rename or unlink operation
because we don't want to leave a file system link in place without the
matching xattr; hence the fsync_dir() call.

Note that this might leak an alt xattr if we happen to fail after the
rename/unlink but before the removexattr is committed.  We'll fix that
next.

Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoos/LFNIndex: FDCloser helper
Sage Weil [Mon, 21 Jul 2014 20:43:42 +0000 (13:43 -0700)]
os/LFNIndex: FDCloser helper

Add a helper to close fd's when we leave scope.  This is important when
injecting failures by throwing exceptions.

Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoos/LFNIndex: handle long object names with multiple links (i.e., rename)
Sage Weil [Sat, 19 Jul 2014 00:09:07 +0000 (17:09 -0700)]
os/LFNIndex: handle long object names with multiple links (i.e., rename)

When we rename an object (collection_move_rename) to a different name, and
the name is long, we run into problems because the lfn xattr can only track
a single long name linking to the inode.  For example, suppose we have

foobar -> foo_123_0 (attr: foobar) where foobar hashes to 123.

At first, collection_add could only link a file to another file in a
different collection with the same name. Allowing collection_move_rename
to rename the file, however, means that we have to convert:

col1/foobar -> foo_123_0 (attr: foobar)

to

col1/foobaz -> foo_234_0 (attr: foobaz)

This is a problem because if we link, reset xattr, unlink we end up with

col1/foobar -> foo_123_0 (attr: foobaz)

if we restart after we reset the attr.  This will cause the initial foobar
lookup to since the attr doesn't match, and the file won't be able to be
looked up.

Fix this by allow *two* (long) names to link to the same inode.  If we
lfn_link a second (different) name, move the previous name to the "alt"
xattr and set the new name.  (This works because link is always followed
by unlink.)  On lookup, check either xattr.

Don't even bother to remove the alt xattr on unlink.  This works as long
as the old name and new name don't hash to the same shortname and end up
in the same LFN chain.  (Don't worry, we'll fix that next.)

Fixes part of #8701
Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoceph_test_objectstore: fix warning
Sage Weil [Fri, 18 Jul 2014 22:46:58 +0000 (15:46 -0700)]
ceph_test_objectstore: fix warning

Signed-off-by: Sage Weil <sage@redhat.com>
11 years agostore_test: add long name collection_move_rename tests
Samuel Just [Tue, 15 Jul 2014 21:50:33 +0000 (14:50 -0700)]
store_test: add long name collection_move_rename tests

Currently fails.

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoceph.spec.in: add bash completion file for radosgw-admin
Dan Mick [Thu, 3 Jul 2014 23:11:24 +0000 (16:11 -0700)]
ceph.spec.in: add bash completion file for radosgw-admin

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
(cherry picked from commit b70096307130bcbac176704493a63c5d039d3edc)

11 years agoceph.spec.in: rhel7-related changes:
Dan Mick [Thu, 3 Jul 2014 23:10:55 +0000 (16:10 -0700)]
ceph.spec.in: rhel7-related changes:

udev rules: /lib -> /usr/lib
/sbin binaries move to /usr/sbin or %{_sbindir}

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
(cherry picked from commit 235e4c7de8f8efe491edefbdde8e5da4dfc44034)

11 years agoFix/add missing dependencies:
Dan Mick [Thu, 3 Jul 2014 23:08:44 +0000 (16:08 -0700)]
Fix/add missing dependencies:

- rbd-fuse depends on librados2/librbd1
- ceph-devel depends on specific releases of libs and libcephfs_jni1
- librbd1 depends on librados2
- python-ceph does not depend on libcephfs1

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
(cherry picked from commit 7cf81322391b629b241da90181800ca1f138ce78)

11 years agoceph.spec.in: whitespace fixes
Dan Mick [Thu, 3 Jul 2014 23:05:00 +0000 (16:05 -0700)]
ceph.spec.in: whitespace fixes

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
(cherry picked from commit ec8af52a5ede78511423a1455a496d46d580c644)

11 years agoceph.spec.in: split out ceph-common as in Debian
Dan Mick [Thu, 3 Jul 2014 23:04:10 +0000 (16:04 -0700)]
ceph.spec.in: split out ceph-common as in Debian

Move files, postun scriptlet, and add dependencies on ceph-common
where appropriate

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
(cherry picked from commit e131b9d5a5e90e87d8a8346cb96cb5a26135c144)

11 years agocommon/random_cache: fix typo
Sage Weil [Wed, 23 Jul 2014 17:11:59 +0000 (10:11 -0700)]
common/random_cache: fix typo

Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoMerge pull request #2136 from yuyuyu101/fix-randomcache
Sage Weil [Wed, 23 Jul 2014 16:57:59 +0000 (09:57 -0700)]
Merge pull request #2136 from yuyuyu101/fix-randomcache

common/RandomCache: Fix inconsistence between contents and count

Reviewed-by: Sage Weil <sage@redhat.com>
11 years agocommon/RandomCache: Fix inconsistence between contents and count 2136/head
Haomai Wang [Wed, 23 Jul 2014 03:26:18 +0000 (11:26 +0800)]
common/RandomCache: Fix inconsistence between contents and count

The add/clear method may cause count inconsistent with the real size of
contents.

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
11 years agoosd/ReplicatedPG: debug obc locks 2140/head
Sage Weil [Wed, 23 Jul 2014 01:01:14 +0000 (18:01 -0700)]
osd/ReplicatedPG: debug obc locks

Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoosd/ReplicatedPG: greedily take write_lock for copyfrom finish, snapdir
Sage Weil [Tue, 22 Jul 2014 20:16:11 +0000 (13:16 -0700)]
osd/ReplicatedPG: greedily take write_lock for copyfrom finish, snapdir

In the cases where we are taking a write lock and are careful
enough that we know we should succeed (i.e, we assert(got)),
use the get_write_greedy() variant that skips the checks for
waiters (be they ops or backfill) that are normally necessary
to avoid starvation.  We don't care about staration here
because our op is already in-progress and can't easily be
aborted, and new ops won't start because they do make those
checks.

Fixes: #8889
Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoosd: allow greedy get_write() for ObjectContext locks
Sage Weil [Tue, 22 Jul 2014 20:11:42 +0000 (13:11 -0700)]
osd: allow greedy get_write() for ObjectContext locks

There are several lockers that need to take a write lock
because there is an operation that is already in progress and
know it is safe to do so.  In particular, they need to skip
the starvation checks (op waiters, backfill waiting).

Signed-off-by: Sage Weil <sage@redhat.com>
11 years agoMerge pull request #2120 from ceph/wip-8858
Josh Durgin [Tue, 22 Jul 2014 23:58:25 +0000 (16:58 -0700)]
Merge pull request #2120 from ceph/wip-8858

Wip 8858

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoMerge pull request #2133 from ceph/wip-8897
Gregory Farnum [Tue, 22 Jul 2014 22:36:40 +0000 (15:36 -0700)]
Merge pull request #2133 from ceph/wip-8897

os: fix build warnings with name/attr len checks (fixes 8889)

Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoMerge pull request #2128 from ceph/wip-8851
João Eduardo Luís [Tue, 22 Jul 2014 21:10:17 +0000 (22:10 +0100)]
Merge pull request #2128 from ceph/wip-8851

mon: AuthMonitor: always encode full regardless of keyserver having keys

Reviewed-by: Gregory Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@redhat.com>