]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
12 years agoObjectCacher: remove NULL checks in flush_set()
Josh Durgin [Fri, 22 Mar 2013 19:17:43 +0000 (12:17 -0700)]
ObjectCacher: remove NULL checks in flush_set()

Callers will always pass a callback, so assert this and remove the
checks for it being NULL.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 41568b904de6d155e5ee87c68e9c31cbb69508e5)

12 years agoObjectCacher: always complete flush_set() callback
Josh Durgin [Fri, 22 Mar 2013 19:13:36 +0000 (12:13 -0700)]
ObjectCacher: always complete flush_set() callback

This removes the last remnants of
b5e9995f59d363ba00d9cac413d9b754ee44e370. If there's nothing to flush,
immediately call the callback instead of deleting it. Callers were
assuming they were responsible for completing the callback whenever
flush_set() returned true, and always called complete(0) in this
case. Simplify the interface and just do this in flush_set(), so that
it always calls the callback.

Since C_GatherBuilder deletes its finisher if there are no subs,
only set its finisher when subs are present. This way we can still
call ->complete() for the callback.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 92db06c05dc2cad8ed31648cb08866781aee2855)

Conflicts:

src/client/Client.cc

12 years agoObjectCacher: fix flush_set when no flushing is needed
Josh Durgin [Tue, 29 Jan 2013 22:22:15 +0000 (14:22 -0800)]
ObjectCacher: fix flush_set when no flushing is needed

C_GatherBuilder takes ownership of the Context we pass it. Deleting it
in flush_set after constructing the C_GatherBuilder results in a
double delete.

Fixes: #3946
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sam Lang <sam.lang@inktank.com>
(cherry picked from commit 3bc21143552b35698c9916c67494336de8964d2a)

12 years agoobjectcacher: Remove commit_set, use flush_set
Sam Lang [Fri, 18 Jan 2013 20:59:12 +0000 (14:59 -0600)]
objectcacher: Remove commit_set, use flush_set

commit_set() and flush_set() are identical in functionality,
so use flush_set everywhere and remove commit_set from
the code.

Also fixes a bug in flush_set where the finisher context was
getting freed twice if no objects needed to be flushed.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
(cherry picked from commit 72147fd3a1da8ecbcb31ddf6b66a158d71933909)

12 years agolibrbd: make aio_writes to the cache always non-blocking by default
Josh Durgin [Wed, 13 Mar 2013 16:42:43 +0000 (09:42 -0700)]
librbd: make aio_writes to the cache always non-blocking by default

When the ObjectCacher's writex blocks, it affects the thread requesting
the aio, which can cause starvation for other I/O when used by QEMU.

Preserve the old behavior via a config option in case this has any
bad side-effects, like too much memory usage under heavy write loads.

Fixes: #4091
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 03ac01fa6a94fa7a66ede057e9267e0a562c3cdb)

12 years agoObjectCacher: optionally make writex always non-blocking
Josh Durgin [Wed, 13 Mar 2013 16:37:21 +0000 (09:37 -0700)]
ObjectCacher: optionally make writex always non-blocking

Add a callback argument to writex, and a finisher to run the
callbacks. Move the check for dirty+tx > max_dirty into a helper that
can be called from a wrapper around the callbacks from writex, or from
the current place in _wait_for_write().

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit c21250406eced8e5c467f492a2148c57978634f4)

12 years agolibrbd: flush cache when set_snap() is called
Josh Durgin [Thu, 28 Mar 2013 00:30:42 +0000 (17:30 -0700)]
librbd: flush cache when set_snap() is called

If there are writes pending, they should be sent while the image
is still writeable. If the image becomes read-only, flushing the
cache will just mark everything dirty again due to -EROFS.

Fixes: #4525
Backport: bobtail
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 613b7085bb48cde1e464b7a97c00b8751e0e917f)

12 years agolibrbd: optionally wait for a flush before enabling writeback
Josh Durgin [Sat, 16 Mar 2013 00:28:13 +0000 (17:28 -0700)]
librbd: optionally wait for a flush before enabling writeback

Older guests may not send flushes properly (i.e. never), so if this is
enabled, rbd_cache=true is safe for them transparently.

Disable by default, since it will unnecessarily slow down newer guest
boot, and prevent writeback caching for things that don't need to send
flushes, like the command line tool.

Refs: #3817
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 1597b3e3a1d776b56e05c57d7c3de396f4f2b5b2)

12 years agolibrbd: invalidate cache when flattening
Josh Durgin [Sat, 9 Mar 2013 02:57:24 +0000 (18:57 -0800)]
librbd: invalidate cache when flattening

The cache stores which objects don't exist. Flatten bypasses the cache
when doing its copyups, so when it is done the -ENOENT from the cache
is treated as zeroes instead of 'need to read from parent'.

Clients that have the image open need to forgot about the cached
non-existent objects as well. Do this during ictx_refresh, while the
parent_lock is held exclusively so no new reads from the parent can
happen until the updated parent metadata is visible, so no new reads
from the parent will occur.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 46e8fc00b2dc8eb17d8777b6ef5ad1cfcc389cea)

12 years agoObjectCacher: add a method to clear -ENOENT caching
Josh Durgin [Sat, 9 Mar 2013 01:53:31 +0000 (17:53 -0800)]
ObjectCacher: add a method to clear -ENOENT caching

Clear the exists and complete flags for any objects that have exists
set to false, and force any in-flight reads to retry if they get
-ENOENT instead of generating zeros.

This is useful for getting the cache into a consistent state for rbd
after an image has been flattened, since many objects which previously
did not exist and went up to the parent to retrieve data may now exist
in the child.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit f2a23dc0b092c5ac081893e8f28c6d4bcabd0c2e)

12 years agoObjectCacher: keep track of outstanding reads on an object
Josh Durgin [Sat, 9 Mar 2013 01:49:27 +0000 (17:49 -0800)]
ObjectCacher: keep track of outstanding reads on an object

Reads always use C_ReadFinish as a callback (and they are the only
user of this callback). Keep an xlist of these for each object, so
they can remove themselves as they finish. To prevent racing requests
and with discard removing objects from the cache, clear the xlist in
the object destructor, so if the Object is still valid the set_item
will still be on the list.

Make the ObjectCacher constructor take an Object* instead of the pool
and object id, which are derived from the Object* anyway.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit f6f876fe51e40570596c25ac84ba3689f72776c2)

12 years agotest_rbd: move flatten tests back into TestClone
Josh Durgin [Tue, 26 Feb 2013 00:09:26 +0000 (16:09 -0800)]
test_rbd: move flatten tests back into TestClone

They need the same setup, and it's easy enough to run specific
subtests. Making them a separate subclass accidentally duplicated
tests from TestClone.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 9c693d7e8312026f6d8d9586381b026ada35d808)

12 years agolibrbd: fix rollback size
Josh Durgin [Tue, 26 Feb 2013 21:20:08 +0000 (13:20 -0800)]
librbd: fix rollback size

The duplicate calls to get_image_size() and get_snap_size() replaced
by 5806226cf0743bb44eaf7bc815897c6846d43233 uncovered this. The first
call was using the currently set snap_id instead of the snapshot being
rolled back to.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit d6c126e2131fefab6df676f2b9d0addf78f7a488)

12 years agoMerge branch 'wip-4249' into wip-4249-master
Josh Durgin [Mon, 25 Feb 2013 20:05:16 +0000 (12:05 -0800)]
Merge branch 'wip-4249' into wip-4249-master

Make snap_rollback() only take a read lock on snap_lock, since
it does not modify snapshot-related fields.
Conflicts:
src/librbd/internal.cc
(cherry picked from commit db5fc2270f91aae220fc3c97b0c62e92e263527b)

12 years agolibrbd: make sure racing flattens don't crash
Josh Durgin [Thu, 21 Feb 2013 19:26:45 +0000 (11:26 -0800)]
librbd: make sure racing flattens don't crash

The only way for a parent to disappear is a racing flatten completing,
or possibly in the future the image being forcibly removed. In either
case, continuing to flatten makes no sense, so stop early.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit a1ae8562877d1b902918e866a1699214090c40bd)

12 years agolibrbd: use rwlocks instead of mutexes for several fields
Josh Durgin [Thu, 21 Feb 2013 19:17:18 +0000 (11:17 -0800)]
librbd: use rwlocks instead of mutexes for several fields

Image metadata like snapshots, size, and parent is frequently read,
but rarely updated. During flatten, we were depending on the parent
lock to prevent the parent ImageCtx from disappearing out from under
us while we read from it. The copy-up path also needed the parent lock
to be able to read from the parent image, which lead to a deadlock.

Convert parent_lock, snap_lock, and md_lock to RWLocks, and change
their use to read instead of exclusive locks where appropriate. The
main place exclusive locks are needed is in ictx_refresh, so this is
pretty simple. This fixes the deadlock, since parent_lock is only
needed for read access in both flatten and the copy-up operation.

cache_lock and refresh_lock are only really used for exclusive access,
so leave them as regular mutexes.

One downside to this is that there's no way to assert is_locked()
for RWLocks, so we'll have to be very careful about changing code
in the future.

Fixes: #3665
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 995ff0e3eaa560b242da8c019a2e11e735e854f7)

12 years agocommon: add lockers for RWLocks
Josh Durgin [Thu, 21 Feb 2013 19:15:41 +0000 (11:15 -0800)]
common: add lockers for RWLocks

This makes them easier to use, especially instead of existing mutexes.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit e0f8e5a80d6d22bd4dee79a4996ea7265d11b0c1)

12 years agoobjecter: initialize linger op snapid
Josh Durgin [Fri, 22 Feb 2013 07:22:59 +0000 (23:22 -0800)]
objecter: initialize linger op snapid

Since they are write ops now, it must be CEPH_NOSNAP or the OSD
returns EINVAL.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 15bb9ba9fbb4185708399ed6deee070d888ef6d2)

12 years agoobjecter: separate out linger_read() and linger_mutate()
Sage Weil [Thu, 21 Feb 2013 23:44:19 +0000 (15:44 -0800)]
objecter: separate out linger_read() and linger_mutate()

A watch is a mutation, while a notify is a read.  The mutations need to
pass in a proper snap context to be fully correct.

Also, make the WRITE flag implicit so the caller doesn't need to pass it
in.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 6c08c7c1c6d354d090eb16df279d4b63ca7a355a)

12 years agoosd: make watch OSDOp print sanely
Sage Weil [Thu, 21 Feb 2013 23:31:08 +0000 (15:31 -0800)]
osd: make watch OSDOp print sanely

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit de4fa95f03b99a55b5713911c364d7e2a4588679)

12 years agoosdc/Objecter: unwatch is a mutation, not a read
Sage Weil [Thu, 21 Feb 2013 21:28:47 +0000 (13:28 -0800)]
osdc/Objecter: unwatch is a mutation, not a read

This was causing librados to unblock after the ACK on unwatch, which meant
that librbd users raced and tried to delete the image before the unwatch
change was committed..and got EBUSY.  See #3958.

The watch operation has a similar problem.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit fea77682a6cf9c7571573bc9791c03373d1d976d)

Conflicts:

src/librados/IoCtxImpl.cc

12 years agoosd: an interval can't go readwrite if its acting is empty
Sage Weil [Thu, 21 Feb 2013 19:15:58 +0000 (11:15 -0800)]
osd: an interval can't go readwrite if its acting is empty

Let's not forget that min_size can be zero.

Fixes: #4159
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 4277265d99647c9fe950ba627e5d86234cfd70a9)

12 years agomon: restrict pool size to 1..10
Sage Weil [Tue, 19 Feb 2013 16:29:53 +0000 (08:29 -0800)]
mon: restrict pool size to 1..10

See: #4159
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 30b8d653751acb4bc4be5ca611f154e19afe910a)

12 years agoinit-ceph: do not stop start on first failure
Sage Weil [Fri, 19 Apr 2013 20:05:43 +0000 (13:05 -0700)]
init-ceph: do not stop start on first failure

When starting we often loop over many daemon instances.  Currently we stop
on the first error and do not try to start other daemons.

Instead, try them all, but return a failure if anything did not start.

Fixes: #2545
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Gary Lowell <gary.lowell@inktank.com>
(cherry picked from commit d395aa521e8a4b295ed2b08dd7cfb7d9f995fcf7)

Conflicts:

src/init-ceph.in

12 years agoMerge pull request #210 from dalgaaf/wip-da-bobtail-pybind
Josh Durgin [Thu, 11 Apr 2013 20:00:27 +0000 (13:00 -0700)]
Merge pull request #210 from dalgaaf/wip-da-bobtail-pybind

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agorados.py: fix create_pool() 210/head
Danny Al-Gaaf [Fri, 5 Apr 2013 13:55:34 +0000 (15:55 +0200)]
rados.py: fix create_pool()

Call rados_pool_create_with_all() only if auid and crush_rule
are set properly. In case only crush_rule is set call
rados_pool_create_with_crush_rule() on librados, not the other
way around.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
(cherry picked from commit 94a1f25e7230a700f06a2699c9c2b99ec1bf7144)

12 years agomon: Use _daemon version of argparse functions
Dan Mick [Mon, 8 Apr 2013 20:52:32 +0000 (13:52 -0700)]
mon: Use _daemon version of argparse functions

Allow argparse functions to fail if no argument given by using
special versions that avoid the default CLI behavior of "cerr/exit"

Fixes: #4678
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit be801f6c506d9fbfb6c06afe94663abdb0037be5)

Conflicts:
src/mon/Monitor.cc

12 years agoceph_argparse: add _daemon versions of argparse calls
Dan Mick [Mon, 8 Apr 2013 20:49:22 +0000 (13:49 -0700)]
ceph_argparse: add _daemon versions of argparse calls

mon needs to call argparse for a couple of -- options, and the
argparse_witharg routines were attempting to cerr/exit on missing
arguments.  This is appropriate for the CLI usage, but not the daemon
usage.  Add a 'cli' flag that can be set false for the daemon usage
(and cause the parsing routine to return false instead of exit).

The daemon's parsing code due for a rewrite soon.

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit c76bbc2e6df16d283cac3613628a44937e38bed8)

12 years agosilence logrotate some more
Alexandre Oliva [Wed, 6 Feb 2013 17:27:13 +0000 (15:27 -0200)]
silence logrotate some more

I was getting email with logrotate error output from “which invoke-rc.d”
on systems without an invoke-rc.d.  This patch silences it.

Silence stderr from which when running logrotate

From: Alexandre Oliva <oliva@gnu.org>

Signed-off-by: Alexandre Oliva <oliva@gnu.org>
(cherry picked from commit d02340d90c9d30d44c962bea7171db3fe3bfba8e)

12 years agoMerge remote-tracking branch 'upstream/bobtail-4556' into bobtail
Samuel Just [Fri, 29 Mar 2013 19:14:22 +0000 (12:14 -0700)]
Merge remote-tracking branch 'upstream/bobtail-4556' into bobtail

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoOSD: always activate_map in advance_pgs, only send messages if up
Samuel Just [Thu, 14 Feb 2013 22:03:56 +0000 (14:03 -0800)]
OSD: always activate_map in advance_pgs, only send messages if up

We should always handle_activate_map() after handle_advance_map() in
order to kick the pg into a valid peering state for processing requests
prior to dropping the lock.

Additionally, we would prefer to avoid sending irrelevant messages
during boot, so only send if we are up according to the current service
osdmap.

Fixes: #4572
Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 4dfcad44431855ba7d68a1ccb41dc3cb5db6bb50)

12 years agoPG: update PGPool::name in PGPool::update
Samuel Just [Thu, 28 Mar 2013 21:09:17 +0000 (14:09 -0700)]
PG: update PGPool::name in PGPool::update

Fixes: #4471
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit f804892d725cfa25c242bdc577b12ee81dcc0dcc)

12 years agoReplicatedPG: send entire stats on OP_BACKFILL_FINISH
Samuel Just [Tue, 26 Mar 2013 22:10:37 +0000 (15:10 -0700)]
ReplicatedPG: send entire stats on OP_BACKFILL_FINISH

Otherwise, we update the stat.stat structure, but not the
stat.invalid_stats part.  This will result in a recently
split primary propogating the invalid stats but not the
invalid marker.  Sending the whole pg_stat_t structure
also mirrors MOSDSubOp.

Fixes: #4557
Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 76b296f01fd0d337c8fc9f79013883e62146f0c6)

12 years agoosd: disallow classes with flags==0
Sage Weil [Wed, 27 Mar 2013 20:19:03 +0000 (13:19 -0700)]
osd: disallow classes with flags==0

They must be RD, WR, or something....

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 89c69016e1dddb9f3ca40fd699e4a995ef1e3eee)

12 years agoosd: EINVAL when rmw_flags is 0
Sage Weil [Wed, 27 Mar 2013 19:59:41 +0000 (12:59 -0700)]
osd: EINVAL when rmw_flags is 0

A broken client (e.g., v0.56) can send a request that ends up with an
rmw_flags of 0.  Treat this as invalid and return EINVAL.

Fixes: #4556
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit f2dda43c9ed4fda9cfa87362514985ee79e0ae15)

12 years agoosd: fix detection of non-existent class method
Sage Weil [Wed, 27 Mar 2013 20:08:38 +0000 (13:08 -0700)]
osd: fix detection of non-existent class method

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 50b831e3641c21cd5b145271688189e199f432d1)

12 years agoosd: tolerate rmw_flags==0
Sage Weil [Wed, 27 Mar 2013 20:12:38 +0000 (13:12 -0700)]
osd: tolerate rmw_flags==0

We will let OSD return a proper error instead of asserting.

This is effectively a backport of c313423cfda55a2231e000cd5ff20729310867f8.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agotest_librbd_fsx: fix image closing
Josh Durgin [Fri, 22 Feb 2013 01:39:19 +0000 (17:39 -0800)]
test_librbd_fsx: fix image closing

Always close the image we opened in check_clone(), and check the
return code of the rbd_close() called before cloning.

Refs: #3958
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 94ae72546507799667197fd941633bb1fd2520c2)

12 years agorbd: remove fiemap use from import
Josh Durgin [Thu, 14 Mar 2013 00:05:42 +0000 (17:05 -0700)]
rbd: remove fiemap use from import

On some kernels and filesystems fiemap can be racy and provide
incorrect data even after an fsync. Later we can use SEEK_HOLE and
SEEK_DATA, but for now just detect zero runs like we do with stdin.

Basically this adapts import from stdin to work in the case of a file
or block device, and gets rid of other cruft in the import that used
fiemap.

Fixes: #4388
Backport: bobtail
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 3091283895e8ffa3e4bda13399318a6e720d498f)

12 years agov0.56.4 v0.56.4
Gary Lowell [Mon, 25 Mar 2013 18:02:31 +0000 (11:02 -0700)]
v0.56.4

12 years agorgw: bucket index ops on system buckets shouldn't do anything
Yehuda Sadeh [Mon, 25 Mar 2013 16:50:33 +0000 (09:50 -0700)]
rgw: bucket index ops on system buckets shouldn't do anything

Fixes: #4508
Backport: bobtail
On certain bucket index operations we didn't check whether
the bucket was a system bucket, which caused the operations
to fail. This triggered an error message on bucket removal
operations.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 70e0ee8ba955322832f0c366537ddf7a0288761e)

12 years agosystest: restrict list error acceptance
Josh Durgin [Mon, 25 Feb 2013 23:02:50 +0000 (15:02 -0800)]
systest: restrict list error acceptance

Only ignore errors after the midway point if the midway_sem_post is
defined.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 5b24a68b6e7d57bac688021b822fb2f73494c3e9)

12 years agosystest: fix race with pool deletion
Josh Durgin [Mon, 25 Feb 2013 22:55:34 +0000 (14:55 -0800)]
systest: fix race with pool deletion

The second test have pool deletion and object listing wait on the same
semaphore to connect and start. This led to errors sometimes when the
pool was deleted before it could be opened by the listing process. Add
another semaphore so the pool deletion happens only after the listing
has begun.

Fixes: #4147
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit b0271e390564119e998e18189282252d54f75eb6)

12 years agoos/FileJournal: fix aio self-throttling deadlock
Sage Weil [Tue, 19 Mar 2013 21:26:16 +0000 (14:26 -0700)]
os/FileJournal: fix aio self-throttling deadlock

This block of code tries to limit the number of aios in flight by waiting
for the amount of data to be written to grow relative to a function of the
number of aios.  Strictly speaking, the condition we are waiting for is a
function of both aio_num and the write queue, but we are only woken by
changes in aio_num, and were (in rare cases) waiting when aio_num == 0 and
there was no possibility of being woken.

Fix this by verifying that aio_num > 0, and restructuring the loop to
recheck that condition on each wakeup.

Fixes: #4079
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit e5940da9a534821d0d8f872c13f9ac26fb05a0f5)

12 years agocommon/MemoryModel: remove logging to /tmp/memlog
Sage Weil [Fri, 22 Mar 2013 20:25:49 +0000 (13:25 -0700)]
common/MemoryModel: remove logging to /tmp/memlog

This was a hack for dev purposes ages ago; remove it.  The predictable
filename is a security issue.

CVE-2013-1882

Reported-by: Michael Scherer <misc@zarb.org>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit c524e2e01da41ab5b6362c117939ea1efbd98095)

12 years agoinit-ceph: clean up temp ceph.conf filename on exit
Sage Weil [Fri, 22 Mar 2013 20:25:43 +0000 (13:25 -0700)]
init-ceph: clean up temp ceph.conf filename on exit

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit 6a7ad2eac1db6abca3d7edb23ca9b80751400a23)

12 years agoinit-ceph: push temp conf file to a unique location on remote host
Sage Weil [Fri, 22 Mar 2013 20:25:33 +0000 (13:25 -0700)]
init-ceph: push temp conf file to a unique location on remote host

The predictable file name is a security problem.

CVE-2013-1882

Reported-by: Michael Scherer <misc@zarb.org>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit 051734522fea92878dd8139f28ec4e6b01371ede)

12 years agomkcephfs: make remote temp directory name unique
Sage Weil [Fri, 22 Mar 2013 20:25:23 +0000 (13:25 -0700)]
mkcephfs: make remote temp directory name unique

The predictable file name is a security problem.

CVE-2013-1882

Reported-by: Michael Scherer <misc@zarb.org>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit f463ef78d77b11b5ad78b31e9a3a88d0a6e62bca)

12 years agoPG::GetMissing: need to check need_up_thru in MLogRec handler
Samuel Just [Fri, 22 Mar 2013 20:51:14 +0000 (13:51 -0700)]
PG::GetMissing: need to check need_up_thru in MLogRec handler

Backport: bobtail
Fixes: #4534
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 4fe4deafbe1758a6b3570048aca57485bd562440)

12 years agoPG,osd_types: improve check_new_interval debugging
Samuel Just [Fri, 22 Mar 2013 20:48:49 +0000 (13:48 -0700)]
PG,osd_types: improve check_new_interval debugging

Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit d611eba9caf45f2d977c627b123462a073f523a4)

12 years agoFileStore: fix reversed collection_empty return value
Samuel Just [Wed, 6 Mar 2013 00:06:20 +0000 (16:06 -0800)]
FileStore: fix reversed collection_empty return value

Backport: bobtail
Fixes: #4380
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 7a434d10da0f77e1b329de0b06b6645cd73cc81b)

Conflicts:
src/os/FileStore.cc

12 years agoFileStore: set replay guard on create_collection
Samuel Just [Mon, 11 Feb 2013 20:52:07 +0000 (12:52 -0800)]
FileStore: set replay guard on create_collection

This should prevent sequences like:

rmcoll a
mkcoll a
touch a foo
<crash>

from causing trouble by preventing the rmcoll
and mkcoll from being replayed.

Fixes: 4064
Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 411770c45734c9827745ddc4018d86c14f2858a6)

12 years agoFileStore: _split_collection should not create the collection
Samuel Just [Mon, 11 Feb 2013 20:24:14 +0000 (12:24 -0800)]
FileStore: _split_collection should not create the collection

This will simplify adding a replay guard to create_collection.

Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit b184ff581a08c9e6ce5b858f06ccbe9d0e2a170b)

12 years agoclient: use 4MB f_bsize and f_frsize for statfs
Sage Weil [Fri, 22 Feb 2013 23:15:27 +0000 (15:15 -0800)]
client: use 4MB f_bsize and f_frsize for statfs

Old stat(1) reports:

  Block size: 1048576    Fundamental block size: 1048576

and the df(1) arithmetic works out.  New stat(1) reports:

  Block size: 1048576    Fundamental block size: 4096

which is what we are shoving into statvfs, but we have the b_size and
fr_size arithmetic swapped.  However, doing the *correct* reporting would
then break the old stat by making both sizes appear to be 4KB (or
whatever).

Sidestep the issue by making *both* values 4MB.. which is both large enough
to report large FS sizes, and also the default stripe size and thus a
"reasonable" value to report for a block size.

Perhaps in the future, when we no longer care about old userland, we can
report the page size for f_bsize, which is probably the "most correct"
thing to do.

Fixes: #3794. See also #3793.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 7c94083643891c9d66a117352f312b268bdb1135)

12 years agoos/FileStore: check replay guard on src for collection rename
Sage Weil [Tue, 19 Feb 2013 01:39:46 +0000 (17:39 -0800)]
os/FileStore: check replay guard on src for collection rename

This avoids a problematic sequence like:

     - rename A/ -> B/
     - remove B/1...100
     - destroy B/
     - create A/
     - write A/101...
     <crash>
     - replay A/ -> B/
     - remove B/1...100  (fails but tolerated)
     - destroy B/        (fails with ENOTEMPTY)

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 5fc83c8d9887d2a916af11436ccc94fcbfe59b7a)

12 years agoPG::proc_replica_log: oinfo.last_complete must be *before* first entry in omissing
Samuel Just [Fri, 22 Feb 2013 22:12:28 +0000 (14:12 -0800)]
PG::proc_replica_log: oinfo.last_complete must be *before* first entry in omissing

Fixes: #4189
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 2dae6a68ee85a20220ee940dbe33a2144d43457b)

12 years agoosd/PG: fix typo, missing -> omissing
Sage Weil [Fri, 22 Feb 2013 01:55:21 +0000 (17:55 -0800)]
osd/PG: fix typo, missing -> omissing

From ce7ffc34408bf32c66dc07e6f42d54b7ec489d41.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit dc181224abf6fb8fc583730ae3d90acdf0b80f39)

12 years agoPG::proc_replica_log: adjust oinfo.last_complete based on omissing
Samuel Just [Thu, 21 Feb 2013 23:31:36 +0000 (15:31 -0800)]
PG::proc_replica_log: adjust oinfo.last_complete based on omissing

Otherwise, search_for_missing may neglect to check the missing
set for some objects assuming that if the need version is
prior to last_complete, the replica must have it.

Fixes: #4994
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit ce7ffc34408bf32c66dc07e6f42d54b7ec489d41)

12 years agoosd: fix load_pgs collection handling
Sage Weil [Sat, 9 Feb 2013 08:05:33 +0000 (00:05 -0800)]
osd: fix load_pgs collection handling

On a _TEMP pg, is_pg() would succeed, which meant we weren't actually
hitting the cleanup checks.  Instead, restructure this loop as positive
checks and handle each type of collection we understand.

This fixes _TEMP cleanup.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit b19b6dced85617d594c15631571202aab2f94ae8)

12 years agoosd: fix load_pgs handling of pg dirs without a head
Sage Weil [Sat, 9 Feb 2013 08:04:29 +0000 (00:04 -0800)]
osd: fix load_pgs handling of pg dirs without a head

If there is a pgid that passes coll_t::is_pg() but there is no head, we
will populate the pgs map but then fail later when we try to do
read_state.  This is a side-effect of 55f8579.

Take explicit note of _head collections we see, and then warn when we
find stray snap collections.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 1f80a0b576c0af1931f743ad988b6293cbf2d6d9)

12 years agoOSD::load_pgs: first scan colls before initing PGs
Samuel Just [Thu, 7 Feb 2013 21:34:47 +0000 (13:34 -0800)]
OSD::load_pgs: first scan colls before initing PGs

Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 073f58ede2e473af91f76d01679631c169274af7)

12 years agoosd: Add digest of omap for deep-scrub
David Zafman [Wed, 9 Jan 2013 03:24:13 +0000 (19:24 -0800)]
osd: Add digest of omap for deep-scrub

Add ScrubMap encode/decode v4 message with omap digest
Compute digest of header and key/value.  Use bufferlist
to reflect structure and compute as we go, clearing
bufferlist to reduce memory usage.

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 509a93e89f04d7e9393090563cf7be8e0ea53891)

12 years agoOSD: split temp collection as well
Samuel Just [Fri, 15 Mar 2013 22:13:46 +0000 (15:13 -0700)]
OSD: split temp collection as well

Otherwise, when we eventually remove the temp collection, there might be
objects in the temp collection which were independently pulled into the child
pg collection.  Thus, removing the old stale parent link from its temp
collection also blasts the omap entries and snap mappings for the real child
object.

Backport: bobtail
Fixes: #4452
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit f8d66e87a5c155b027cc6249006b83b4ac9b6c9b)

12 years agoPG: ignore non MISSING pg query in ReplicaActive
Samuel Just [Fri, 15 Mar 2013 02:59:36 +0000 (19:59 -0700)]
PG: ignore non MISSING pg query in ReplicaActive

1) Replica sends notify
2) Prior to processing notify, primary queues query to replica
3) Primary processes notify and activates sending MOSDPGLog
to replica.
4) Primary does do_notifies at end of process_peering_events
and sends to Query.
5) Replica sees MOSDPGLog and activates
6) Replica sees Query and asserts.

In the above case, the Replica should simply ignore the old
Query.

Fixes: #4050
Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 8222cbc8f35c359a35f8381ad90ff0eed5615dac)

12 years agoFileJournal: queue_pos \in [get_top(), header.max_size)
Samuel Just [Wed, 13 Mar 2013 23:04:23 +0000 (16:04 -0700)]
FileJournal: queue_pos \in [get_top(), header.max_size)

If queue_pos == header.max_size when we create the entry
header magic, the entry will be rejected at get_top() on
replay.

Fixes: #4436
Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit de8edb732e3a5ce4471670e43cfe6357ae6a2758)

12 years agoOSD: expand_pg_num after pg removes
Samuel Just [Fri, 15 Mar 2013 01:52:02 +0000 (18:52 -0700)]
OSD: expand_pg_num after pg removes

Otherwise:
1) expand_pg_num removes a splitting pg entry
2) peering thread grabs pg lock and starts split
3) OSD::consume_map grabs pg lock and starts removal

At step 2), we run afoul of the assert(is_splitting)
check in split_pgs.  This way, the would be splitting
pg is marked as removed prior to the splitting state
being updated.

Backport: bobtail
Fixes: #4449
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit f1b031b3cf195cf6df3d3c47c7d606fba63ed4c4)

12 years agoosd: update snap collections for sub_op_modify log records conditionaly
Sage Weil [Mon, 11 Feb 2013 14:23:54 +0000 (06:23 -0800)]
osd: update snap collections for sub_op_modify log records conditionaly

The only remaining caller is sub_op_modify().  If we do have a non-empty
op transaction, we want to do this update, regardless of what we think
last_backfill is (our notion may be not completely in sync with the
primary).  In particular, our last_backfill may be the same object but
a different snapid, but the primary disagrees and is pushing an op
transaction through.

Instead, update the collections if we have a non-empty transaction.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 31e911b63d326bdd06981ec4029ad71b7479ed70)

12 years agoosd: include snaps in pg_log_entry_t::dump()
Sage Weil [Mon, 11 Feb 2013 01:02:45 +0000 (17:02 -0800)]
osd: include snaps in pg_log_entry_t::dump()

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 715d8717a0e8a08fbe97a3e7d3ffd33aa9529d90)

12 years agoosd: unconditionally encode snaps buffer
Sage Weil [Mon, 11 Feb 2013 00:59:48 +0000 (16:59 -0800)]
osd: unconditionally encode snaps buffer

Previously we would only encode the updated snaps vector for CLONE ops.
This doesn't work for MODIFY ops generated by the snap trimmer, which
may also adjust the clone collections.  It is also possible that other
operations may need to populate this field in the future (e.g.,
LOST_REVERT may, although it currently does not).

Fixes: #4071, and possibly #4051.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 54b6dd924fea3af982f3d729150b6449f318daf2)

12 years agoosd: improve debug output on snap collections
Sage Weil [Sun, 10 Feb 2013 18:57:12 +0000 (10:57 -0800)]
osd: improve debug output on snap collections

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 8b05492ca5f1479589bb19c1ce058b0d0988b74f)

12 years agoPG: check_recovery_sources must happen even if not active
Samuel Just [Thu, 7 Mar 2013 20:53:51 +0000 (12:53 -0800)]
PG: check_recovery_sources must happen even if not active

missing_loc/missing_loc_sources also must be cleaned up
if a peer goes down during peering:

1) pg is in GetInfo, acting is [3,1]
2) we find object A on osd [0] in GetInfo
3) 0 goes down, no new peering interval since it is neither up nor
acting, but peer_missing[0] is removed.
4) pg goes active and try to pull A from 0 since missing_loc did not get
cleaned up.

Backport: bobtail
Fixes: #4371
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit de22b186c497ce151217aecf17a8d35cdbf549bb)

12 years agoHashIndex: _collection_list_partial must tolerate NULL next
Samuel Just [Tue, 5 Mar 2013 23:49:26 +0000 (15:49 -0800)]
HashIndex: _collection_list_partial must tolerate NULL next

Backport: bobtail
Fixes: #4379
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit ce4432adc67dc2fc06dd21ea08e59d179496bcc6)

12 years agoOSD: lock not needed in ~DeletingState()
Samuel Just [Tue, 5 Mar 2013 22:35:39 +0000 (14:35 -0800)]
OSD: lock not needed in ~DeletingState()

No further refs to the object can remain at this point.
Furthermore, the callbacks might lock mutexes of their
own.

Backport: bobtail
Fixes: #4378
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit e4bf1bcab159d7c5b720f5da01877c0f67c16d16)

12 years agoReplicatedPG: don't leak reservation on removal
Samuel Just [Sun, 10 Mar 2013 19:50:01 +0000 (12:50 -0700)]
ReplicatedPG: don't leak reservation on removal

Fixes: 4431
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 32bf131e0141faf407b5ff993f75f97516b27c12)

Conflicts:

src/osd/ReplicatedPG.cc

12 years agorgw: set up curl with CURL_NOSIGNAL
Yehuda Sadeh [Tue, 12 Mar 2013 19:56:01 +0000 (12:56 -0700)]
rgw: set up curl with CURL_NOSIGNAL

Fixes: #4425
Backport: bobtail
Apparently, libcurl needs that in order to be thread safe. Side
effect is that if libcurl is not compiled with c-ares support,
domain name lookups are not going to time out.
Issue affected keystone.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 88725316ddcfa02ff110e659f7a8131dc1ea2cfc)

12 years agoosd: mark down connections from old peers
Sage Weil [Fri, 8 Mar 2013 16:56:44 +0000 (08:56 -0800)]
osd: mark down connections from old peers

Close out any connection with an old peer.  This avoids a race like:

- peer marked down
- we get map, mark down the con
- they reconnect and try to send us some stuff
- we share our map to tell them they are old and dead, but leave the con
  open
...
- peer marks itself up a few times, eventually reuses the same port
- sends messages on their fresh con
- we discard because of our old con

This could cause a tight reconnect loop, but it is better than wrong
behavior.

Other possible fixes:
 - make addr nonce truly unique (augment pid in nonce)
 - make a smarter 'disposable' msgr state (bleh)

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 881e9d850c6762290f8be24da9e74b9dc112f1c9)

12 years agoosd/PG: rename require_same_or_newer_map -> is_same_or_newer_map
Sage Weil [Fri, 8 Mar 2013 16:53:40 +0000 (08:53 -0800)]
osd/PG: rename require_same_or_newer_map -> is_same_or_newer_map

This avoids confusion with the OSD method of the same name, and better
matches what the function tests (and does not do).

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit ba7e815a18cad110525f228db1b3fe39e011409e)

Conflicts:

src/osd/ReplicatedPG.cc

12 years agolog: drop default 'log max recent' from 100k -> 10k
Sage Weil [Mon, 11 Mar 2013 23:25:16 +0000 (16:25 -0700)]
log: drop default 'log max recent' from 100k -> 10k

Use less memory.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit c021c5ccf0c063cccd7314964420405cea6406de)

12 years agoFix radosgw actually reloading after rotating logs.
Jan Harkes [Fri, 8 Mar 2013 17:45:57 +0000 (12:45 -0500)]
Fix radosgw actually reloading after rotating logs.

The --signal argument to Debian's start-stop-daemon doesn't
make it send a signal, but defines which signal should be send
when --stop is specified.

Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
(cherry picked from commit 44f1cc5bc42f9bb6d5a386037408d2de17dc5413)

12 years agocommon: reduce default in-memory logs for non-daemons
Josh Durgin [Thu, 7 Mar 2013 01:42:03 +0000 (17:42 -0800)]
common: reduce default in-memory logs for non-daemons

The default of 100000 can result in hundreds of MBs of extra memory
used. This was most obvious when using librbd with caching enabled,
since there was a dout(0) accidentally left in the ObjectCacher.

refs: #4352
backport: bobtail
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 7c208d2f8e3f28f4055a4ae51eceae892dcef1dc)

12 years agoosd: allow (some) log trim when degraded, but not during recovery
Sage Weil [Sat, 23 Feb 2013 01:01:53 +0000 (17:01 -0800)]
osd: allow (some) log trim when degraded, but not during recovery

We allow some trim during degraded, although we keep more entries around to
improve our chances of a restarting OSD of doing log-based recovery.

Still disallow during recovery...

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 6d89b34e5608c71b49ef33ab58340e90bd8da6e4)

12 years agoosd: restructure calc_trim
Sage Weil [Mon, 25 Feb 2013 23:33:35 +0000 (15:33 -0800)]
osd: restructure calc_trim

No functional change, except that we log more debug, yay!

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 86df164d04f6e31a0f20bbb94dbce0599c0e8b3d)

12 years agoosd: allow pg log trim during (non-classic) scrub
Sage Weil [Sat, 23 Feb 2013 00:48:02 +0000 (16:48 -0800)]
osd: allow pg log trim during (non-classic) scrub

Chunky (and deep) scrub do not care about PG log trimming.  Classic scrub
still does.

Deep scrub can take a long time, so not trimming the log during that period
may eat lots of RAM; avoid that!

Might fix: #4179
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 0ba8db6b664205348d5499937759916eac0997bf)

12 years agomsgr: drop messages on cons with CLOSED Pipes
Sage Weil [Thu, 28 Feb 2013 20:46:00 +0000 (12:46 -0800)]
msgr: drop messages on cons with CLOSED Pipes

Back in commit 6339c5d43974f4b495f15d199e01a141e74235f5, we tried to make
this deal with a race between a faulting pipe and new messages being
queued.  The sequence is

- fault starts on pipe
- fault drops pipe_lock to unregister the pipe
- user (objecter) queues new message on the con
- submit_message reopens a Pipe (due to this bug)
- the message managed to make it out over the wire
- fault finishes faulting, calls ms_reset
- user (objecter) closes the con
- user (objecter) resends everything

It appears as though the previous patch *meant* to drop *m on the floor in
this case, which is what this patch does.  And that fixes the crash I am
hitting; see #4271.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 0f42eddef5da6c1babe9ed51ceaa3212a42c2ec4)

12 years agoFix output of 'ceph osd tree --format=json'
Concubidated [Fri, 8 Mar 2013 21:44:39 +0000 (13:44 -0800)]
Fix output of 'ceph osd tree --format=json'

Signed-off-by: Tyler Brekke <tyler.brekke@inktank.com>
(cherry picked from commit 9bcba944c6586ad5f007c0a30e69c6b5a886510b)

12 years agodeb: Add ceph-coverage to ceph-test deb package
Sam Lang [Tue, 12 Feb 2013 17:32:29 +0000 (11:32 -0600)]
deb:  Add ceph-coverage to ceph-test deb package

Teuthology uses the ceph-coverage script extensively
and expects it to be installed by the ceph task.  Add
the script to the ceph-test debian package so that it
gets installed for that use case.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
(cherry picked from commit 376cca2d4d4f548ce6b00b4fc2928d2e6d41038f)

12 years agorgw: set attrs on various list bucket xml results (swift)
Yehuda Sadeh [Fri, 22 Feb 2013 23:04:37 +0000 (15:04 -0800)]
rgw: set attrs on various list bucket xml results (swift)

Fixes: #4247
The list buckets operation was missing some attrs on the different
xml result entities. This fixes it.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 4384e59ad046afc9ec53a2d2f1fff6a86e645505)

12 years agoformatter: add the ability to dump attrs in xml entities
Yehuda Sadeh [Fri, 22 Feb 2013 23:02:02 +0000 (15:02 -0800)]
formatter: add the ability to dump attrs in xml entities

xml entities may have attrs assigned to them. Add the ability
to set them. A usage example:

formatter->open_array_section_with_attrs("container",
     FormatterAttrs("name", "foo", NULL));

This will generate the following xml entity:
<container name="foo">

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 7cb6ee28073824591d8132a87ea09a11c44efd66)

Conflicts:
src/common/Formatter.cc

12 years agorgw: don't iterate through all objects when in namespace
Yehuda Sadeh [Thu, 7 Mar 2013 03:32:21 +0000 (19:32 -0800)]
rgw: don't iterate through all objects when in namespace

Fixes: #4363
Backport: argonaut, bobtail
When listing objects in namespace don't iterate through all the
objects, only go though the ones that starts with the namespace
prefix

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 6669e73fa50e3908ec825ee030c31a6dbede6ac0)

12 years agoObjectCacher: fix debug log level in split
Josh Durgin [Thu, 28 Feb 2013 20:13:45 +0000 (12:13 -0800)]
ObjectCacher: fix debug log level in split

Level 0 should never be used for this kind of debugging.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit cb3ee33532fb60665f39f6ccb1d69d67279fd5e1)

12 years agorados: remove unused "check_stdio" parameter
Dan Mick [Thu, 24 Jan 2013 21:38:25 +0000 (13:38 -0800)]
rados: remove unused "check_stdio" parameter

Signed-off-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit bb860e49a7faeaf552538a9492ef0ba738c99760)

12 years agorados: obey op_size for 'get'
Sage Weil [Thu, 24 Jan 2013 05:31:11 +0000 (21:31 -0800)]
rados: obey op_size for 'get'

Otherwise we try to read the whole object in one go, which doesn't bode
well for large objects (either non-optimal or simply broken).

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit 234becd3447a679a919af458440bc31c8bd6b84f)

12 years agoFileJournal::wrap_read_bl: adjust pos before returning
Samuel Just [Thu, 28 Feb 2013 00:58:45 +0000 (16:58 -0800)]
FileJournal::wrap_read_bl: adjust pos before returning

Otherwise, we may feed an offset past the end of the journal to
check_header in read_entry and incorrectly determine that the entry is
corrupt.

Fixes: 4296
Backport: bobtail
Backport: argonaut
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 5d54ab154ca790688a6a1a2ad5f869c17a23980a)

12 years agoosd: leave osd_lock locked in shutdown()
Sage Weil [Wed, 16 Jan 2013 21:14:00 +0000 (13:14 -0800)]
osd: leave osd_lock locked in shutdown()

No callers expect the lock to be dropped.

Fixes: #3816
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 98a763123240803741ac9f67846b8f405f1b005b)

12 years agomsg: fix entity_addr_t::is_same_host() for IPv6
Sage Weil [Tue, 26 Feb 2013 22:07:12 +0000 (14:07 -0800)]
msg: fix entity_addr_t::is_same_host() for IPv6

We weren't checking the memcmp return value properly!  Aie...

Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit c8dd2b67b39a8c70e48441ecd1a5cc3c6200ae97)

12 years agoosd: requeue pg waiters at the front of the finished queue
Sage Weil [Mon, 18 Feb 2013 06:35:50 +0000 (22:35 -0800)]
osd: requeue pg waiters at the front of the finished queue

We could have a sequence like:

- op1
- notify
- op2

in the finished queue.  Op1 gets put on waiting_for_pg, the notify
creates the pg and requeues op1 (and the end), op2 is handled, and
finally op1 is handled.  That breaks ordering; see #2947.

Instead, when we wake up a pg, queue the waiting messages at the front
of the dispatch queue.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 56c5a07708d52de1699585c9560cff8b4e993d0a)

12 years agoosd: pull requeued requests off one at a time
Sage Weil [Mon, 18 Feb 2013 04:49:52 +0000 (20:49 -0800)]
osd: pull requeued requests off one at a time

Pull items off the finished queue on at a time.  In certain cases, an
event may result in new items betting added to the finished queue that
will be put at the *front* instead of the back.  See latest incarnation
of #2947.

Note that this is a significant changed in behavior in that we can
theoretically starve if an event keeps resulting in new events getting
generated.  Beware!

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit f1841e4189fce70ef5722d508289e516faa9af6a)

12 years agomds: open mydir after replay
Sage Weil [Fri, 18 Jan 2013 06:00:42 +0000 (22:00 -0800)]
mds: open mydir after replay

In certain cases, we may replay the journal and not end up with the
dirfrag for mydir open.  This is fine--we just need to open it up and
fetch it below.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit e51299fbce6bdc3d6ec736e949ba8643afc965ec)

12 years agomds: use inode_t::layout for dir layout policy
Greg Farnum [Thu, 21 Feb 2013 17:21:01 +0000 (09:21 -0800)]
mds: use inode_t::layout for dir layout policy

Remove the default_file_layout struct, which was just a ceph_file_layout,
and store it in the inode_t.  Rip out all the annoying code that put this
on the heap.

To aid in this usage, add a clear_layout() function to inode_t.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agomds: parse ceph.*.layout vxattr key/value content
Sage Weil [Mon, 21 Jan 2013 05:53:37 +0000 (21:53 -0800)]
mds: parse ceph.*.layout vxattr key/value content

Use qi to parse a strictly formatted set of key/value pairs.  Be picky
about whitespace.  Any subset of recognized keys is allowed.  Parse the
same set of keys as the ceph.*.layout.* vxattrs.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 5551aa5b3b5c2e9e7006476b9cd8cc181d2c9a04)