]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
11 years agorgw: add NextMarker param for bucket listing
Yehuda Sadeh [Thu, 17 Jul 2014 18:24:51 +0000 (11:24 -0700)]
rgw: add NextMarker param for bucket listing

Partially fixes #8858.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 924686f0b6593deffcd1d4e80ab06b1e7af00dcb)

11 years agorgw: improve delmited listing of bucket
Yehuda Sadeh [Wed, 16 Jul 2014 22:21:09 +0000 (15:21 -0700)]
rgw: improve delmited listing of bucket

If found a prefix, calculate a string greater than that so that next
request we can skip to that. This is still not the most efficient way to
do it. It'll be better to push it down to the objclass, but that'll
require a much bigger change.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit e6cf618c257f26f97f60a4c1df1d23a14496cab0)

11 years agorgw: don't try to wait for pending if list is empty
Yehuda Sadeh [Wed, 16 Jul 2014 19:23:31 +0000 (12:23 -0700)]
rgw: don't try to wait for pending if list is empty

Fixes: #8846
Backport: firefly, dumpling

This was broken at ea68b9372319fd0bab40856db26528d36359102e. We ended
up calling wait_pending_front() when pending list was empty.
This commit also moves the need_to_wait check to a different place,
where we actually throttle (and not just drain completed IOs).

Reported-by: Sylvain Munaut <s.munaut@whatever-company.com>
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit f9f2417d7db01ecf2425039539997901615816a9)

11 years agoUse new git mirror for qemu-iotests
Warren Usui [Thu, 24 Apr 2014 19:55:26 +0000 (12:55 -0700)]
Use new git mirror for qemu-iotests

Fixes: 8191
Signed-off-by: Warren Usui <warren.usui@inktank.com>
(cherry picked from commit ddf37d903f826f3e153d8009c716780453b68b05)

11 years agoSupport latest qemu iotest code
Warren Usui [Wed, 23 Apr 2014 20:20:14 +0000 (13:20 -0700)]
Support latest qemu iotest code

Modified qemu-iotests workunit script to check for versions
that use the latest qemu (currently only Trusty).  Limit the
tests to those that are applicable to rbd.

Fixes: 7882
Signed-off-by: Warren Usui <warren.usui@inktank.com>
(cherry picked from commit 606e725eb5204e76e602d26ffd113e40af2ee812)

11 years agolibrbd: skip zeroes when copying an image
Josh Durgin [Mon, 31 Mar 2014 21:53:31 +0000 (14:53 -0700)]
librbd: skip zeroes when copying an image

This is the simple coarse-grained solution, but it works well in
common cases like a small base image resized with a bunch of empty
space at the end. Finer-grained sparseness can be copied by using rbd
{export,import}-diff.

Fixes: #6257
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 824da2029613a6f4b380b6b2f16a0bd0903f7e3c)

11 years agoRevert "qa/workunits/suites/fsx.sh: don't use zero range"
Greg Farnum [Tue, 1 Jul 2014 22:19:21 +0000 (15:19 -0700)]
Revert "qa/workunits/suites/fsx.sh: don't use zero range"

This reverts commit 583e6e3ef7f28bf34fe038e8a2391f9325a69adf.

We're using a different fsx source, which doesn't support the
same options as our git-based one does.

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoqa/workunits/suites/fsx.sh: don't use zero range
Sage Weil [Mon, 30 Jun 2014 14:05:04 +0000 (07:05 -0700)]
qa/workunits/suites/fsx.sh: don't use zero range

Zero range is not supported by cephfs.

Fixes: #8542
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 2dec8a810060f65d022c06e82090b4aa5ccec0cb)

11 years agoMerge pull request #2014 from ceph/wip-scrub-dumpling
Loic Dachary [Mon, 30 Jun 2014 17:19:24 +0000 (19:19 +0200)]
Merge pull request #2014 from ceph/wip-scrub-dumpling

osd: scrub priority updates for dumpling

Reviewed-by: Loic Dachary <loic@dachary.org>
11 years agorgw: allocate enough space for bucket instance id
Yehuda Sadeh [Mon, 16 Jun 2014 18:48:24 +0000 (11:48 -0700)]
rgw: allocate enough space for bucket instance id

Fixes: #8608
Backport: dumpling, firefly
Bucket instance id is a concatenation of zone name, rados instance id,
and a running counter. We need to allocate enough space to account zone
name length.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit d2e86a66ca55685e04ffbfaa58452af59f381277)

11 years agoceph-disk: partprobe before settle when preparing dev
Sage Weil [Thu, 8 May 2014 15:52:51 +0000 (08:52 -0700)]
ceph-disk: partprobe before settle when preparing dev

Two users have reported this fixes a problem with using --dmcrypt.

Fixes: #6966
Tested-by: Eric Eastman <eric0e@aol.com>
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 0f196265f049d432e399197a3af3f90d2e916275)

11 years agoosd: fix filestore perf stats update
Sage Weil [Tue, 17 Jun 2014 20:33:14 +0000 (13:33 -0700)]
osd: fix filestore perf stats update

Update the struct we are about to send, not the (unlocked!) one we will
send the next time around.

Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 4afffb4a10a0bbf7f2018ef3ed6b167c7921e46b)

11 years agocommon/WorkQueue: allow io priority to be set for wq 2014/head
Sage Weil [Wed, 18 Jun 2014 18:02:09 +0000 (11:02 -0700)]
common/WorkQueue: allow io priority to be set for wq

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 5e4b3b1f1cb870f39fc7cfb3adeae93e078d9057)

Conflicts:
src/common/WorkQueue.cc

11 years agocommon/Thread: allow io priority to be set for a Thread
Sage Weil [Wed, 18 Jun 2014 18:01:42 +0000 (11:01 -0700)]
common/Thread: allow io priority to be set for a Thread

Ideally, set this before starting the thread.  If you set it after, we
could potentially race with create() itself.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 01533183e7455b713640e001962339907fb6f980)

11 years agocommon/io_priority: wrap ioprio_set() and gettid()
Sage Weil [Wed, 18 Jun 2014 18:01:09 +0000 (11:01 -0700)]
common/io_priority: wrap ioprio_set() and gettid()

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 705713564bebd84ad31cc91698311cf2fbd51a48)

Conflicts:
src/common/Makefile.am

11 years agoosd: introduce simple sleep during scrub
Sage Weil [Tue, 17 Jun 2014 17:47:24 +0000 (10:47 -0700)]
osd: introduce simple sleep during scrub

This option is similar to osd_snap_trim_sleep: simply inject an optional
sleep in the thread that is doing scrub work.  This is a very kludgey and
coarse knob for limiting the impact of scrub on the cluster, but can help
until we have a more robust and elegant solution.

Only sleep if we are in the NEW_CHUNK state to avoid delaying processing of
an in-progress chunk.  In this state nothing is blocked on anything.
Conveniently, chunky_scrub() requeues itself for each new chunk.

Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit c4e8451cc5b4ec5ed07e09c08fb13221e31a7ac6)

11 years agoMerge pull request #1963 from dachary/wip-8599-ruleset-dumpling
Sage Weil [Mon, 16 Jun 2014 16:27:03 +0000 (09:27 -0700)]
Merge pull request #1963 from dachary/wip-8599-ruleset-dumpling

mon: pool set <pool> crush_ruleset must not use rule_exists (dumpling)

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agomon: pool set <pool> crush_ruleset must not use rule_exists 1963/head
John Spray [Tue, 20 May 2014 15:50:18 +0000 (16:50 +0100)]
mon: pool set <pool> crush_ruleset must not use rule_exists

Implement CrushWrapper::ruleset_exists that iterates over the existing
rulesets to find the one matching the ruleset argument.

ceph osd pool set <pool> crush_ruleset must not use
CrushWrapper::rule_exists, which checks for a *rule* existing, whereas
the value being set is a *ruleset*. (cherry picked from commit
fb504baed98d57dca8ec141bcc3fd021f99d82b0)

A test via ceph osd pool set data crush_ruleset verifies the ruleset
argument is accepted.

http://tracker.ceph.com/issues/8599 fixes: #8599

Backport: firefly, emperor, dumpling
Signed-off-by: John Spray <john.spray@inktank.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit d02d46e25080d5f7bb8ddd4874d9019a078b816b)

Conflicts:
src/mon/OSDMonitor.cc

11 years agoosd: 'status' admin socket command
Sage Weil [Mon, 3 Mar 2014 15:03:01 +0000 (07:03 -0800)]
osd: 'status' admin socket command

Basic stuff, like what state is the OSD in, and what osdmap epoch are
we on.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 09099c9e4c7d2aa31eb8a0b7c18e43272fae7ce2)

11 years agoinit-ceph: continue after failure doing osd data mount
Sage Weil [Mon, 9 Jun 2014 03:18:49 +0000 (20:18 -0700)]
init-ceph: continue after failure doing osd data mount

If we are starting many daemons and hit an error, we normally note it and
move on.  Do the same when doing the pre-mount step.

Fixes: #8554
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 6a7e20147cc39ed4689809ca7d674d3d408f2a17)

11 years agorgw: cut short object read if a chunk returns error
Yehuda Sadeh [Tue, 6 May 2014 18:06:29 +0000 (11:06 -0700)]
rgw: cut short object read if a chunk returns error

Fixes: #8289
Backport: firefly, dumpling
When reading an object, if we hit an error when trying to read one of
the rados objects then we should just stop. Otherwise we're just going
to continue reading the rest of the object, and since it can't be sent
back to the client (as we have a hole in the middle), we end up
accumulating everything in memory.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 03b0d1cfb7bd30a77fedcf75eb06476b21b14e95)

11 years agoMerge pull request #1931 from ceph/wip-7068-dumpling
Sage Weil [Mon, 9 Jun 2014 04:12:32 +0000 (21:12 -0700)]
Merge pull request #1931 from ceph/wip-7068-dumpling

Wip 7068 dumpling

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge remote-tracking branch 'origin/wip-8269-dumpling' into dumpling
Yehuda Sadeh [Fri, 6 Jun 2014 15:45:58 +0000 (08:45 -0700)]
Merge remote-tracking branch 'origin/wip-8269-dumpling' into dumpling

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
11 years agodoc: Added requiretty comment to preflight checklist.
John Wilkins [Thu, 5 Jun 2014 18:41:41 +0000 (11:41 -0700)]
doc: Added requiretty comment to preflight checklist.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
11 years agodoc: Added Disable requiretty to quick start.
John Wilkins [Thu, 5 Jun 2014 18:34:46 +0000 (11:34 -0700)]
doc: Added Disable requiretty to quick start.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
11 years agoReplicatedPG: lock snapdir obc during write 1931/head
Samuel Just [Thu, 3 Oct 2013 01:00:04 +0000 (18:00 -0700)]
ReplicatedPG: lock snapdir obc during write

Otherwise, we won't block properly in prep_push_backfill_object.

Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit b87bc2311aa4da065477f402a869e2edc1558e2f)

Conflicts:
src/osd/ReplicatedPG.h

11 years ago0.67.9 v0.67.9
Jenkins [Wed, 21 May 2014 16:57:02 +0000 (16:57 +0000)]
0.67.9

11 years agomsg: Fix inconsistent message sequence negotiation during connection reset
Guang Yang [Fri, 9 May 2014 09:21:23 +0000 (09:21 +0000)]
msg: Fix inconsistent message sequence negotiation during connection reset

Backport: firefly, emperor, dumpling

Signed-off-by: Guang Yang (yguang@yahoo-inc.com)
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit bdee119076dd0eb65334840d141ccdf06091e3c9)

11 years agoOSD::handle_pg_query: on dne pg, send lb=hobject_t() if deleting
Sage Weil [Tue, 20 May 2014 17:46:34 +0000 (10:46 -0700)]
OSD::handle_pg_query: on dne pg, send lb=hobject_t() if deleting

We will set lb=hobject_t() if we resurrect the pg.  In that case,
we need to have sent that to the primary before hand.  If we
finish the removal before the pg is recreated, we'll just end
up backfilling it, which is ok since the pg doesn't exist anyway.

Fixes: #7740
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 04de781765dd5ac0e28dd1a43cfe85020c0854f8)

Conflicts:

src/osd/OSD.cc

11 years agomon/MonClient: remove stray _finish_hunting() calls
Sage Weil [Fri, 2 May 2014 21:48:35 +0000 (14:48 -0700)]
mon/MonClient: remove stray _finish_hunting() calls

Callig _finish_hunting() clears out the bool hunting flag, which means we
don't retry by connection to another mon periodically.  Instead, we send
keepalives every 10s.  But, since we aren't yet in state HAVE_SESSION, we
don't check that the keepalives are getting responses.  This means that an
ill-timed connection reset (say, after we get a MonMap, but before we
finish authenticating) can drop the monc into a black hole that does not
retry.

Instead, we should *only* call _finish_hunting() when we complete the
authentication handshake.

Fixes: #8278
Backport: firefly, dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 77a6f0aefebebf057f02bfb95c088a30ed93c53f)

11 years agoMerge pull request #1826 from ceph/wip-8162-dumpling
Sage Weil [Tue, 20 May 2014 17:19:00 +0000 (10:19 -0700)]
Merge pull request #1826 from ceph/wip-8162-dumpling

Wip 8162 dumpling

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoOSD: fix an osdmap_subscribe interface misuse
Greg Farnum [Thu, 15 May 2014 23:50:43 +0000 (16:50 -0700)]
OSD: fix an osdmap_subscribe interface misuse

When calling osdmap_subscribe, you have to pass an epoch newer than the
current map's. _maybe_boot() was not doing this correctly -- we would
fail a check for being *in* the monitor's existing map range, and then
pass along the map prior to the monitor's range. But if we were exactly
one behind, that value would be our current epoch, and the request would
get dropped. So instead, make sure we are not *in contact* with the monitor's
existing map range.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 290ac818696414758978b78517b137c226110bb4)

11 years agoMerge pull request #1827 from ceph/wip-6565-dumpling
Sage Weil [Mon, 19 May 2014 20:45:50 +0000 (13:45 -0700)]
Merge pull request #1827 from ceph/wip-6565-dumpling

Wip 6565 dumpling

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoOSD: check for splitting when processing recover/backfill reservations 1827/head
Samuel Just [Wed, 16 Oct 2013 17:07:37 +0000 (10:07 -0700)]
OSD: check for splitting when processing recover/backfill reservations

Fixes: 6565
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 15ec5332ba4154930a0447e2bcf1acec02691e97)

11 years agoReplicatedPG::recover_backfill: do not update last_backfill prematurely 1826/head
Samuel Just [Thu, 8 May 2014 20:25:32 +0000 (13:25 -0700)]
ReplicatedPG::recover_backfill: do not update last_backfill prematurely

Previously, we would update last_backfill on the backfill peer to

backfills_in_flight.empty() ? backfill_pos :
  backfills_in_flight.begin()->first

which is actually the next backfill to complete.  We want to update
last_backfill to the largest completed backfill instead.

We use the pending_backfill_updates mapping to identify the most
recently completed backfill.  Due to the previous patch, deletes
will also be included in that mapping.

Related sha1s from master:
4139e75d63b0503dbb7fea8036044eda5e8b7cf1
7a06a71e0f2023f66d003dfb0168f4fe51eaa058

We don't really want to backport those due to the changes in:
9ec35d5ccf6a86c380865c7fc96017a1f502560a

This patch does essentially the same thing, but using backfill_pos.

Fixse: #8162
Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoReplicatedPG: add empty stat when we remove an object in recover_backfill
Samuel Just [Mon, 28 Oct 2013 23:03:25 +0000 (16:03 -0700)]
ReplicatedPG: add empty stat when we remove an object in recover_backfill

Subsequent updates to that object need to have their stats added
to the backfill info stats atomically with the last_backfill
update.

Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit ecddd12b01be120fba87f5ac60539f98f2c69a28)

11 years agorgw: don't error out on empty owner when setting acls
Yehuda Sadeh [Wed, 27 Nov 2013 21:34:00 +0000 (13:34 -0800)]
rgw: don't error out on empty owner when setting acls

Fixes: #6892
Backport: dumpling, emperor
s3cmd specifies empty owner field when trying to set acls on object
/ bucket. We errored out as it didn't match the current owner name, but
with this change we ignore it.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 14cf4caff58cc2c535101d48c53afd54d8632104)

11 years agorgw: send user manifest header field
Yehuda Sadeh [Mon, 21 Apr 2014 22:34:04 +0000 (15:34 -0700)]
rgw: send user manifest header field

Fixes: #8170
Backport: firefly
If user manifest header exists (swift) send it as part of the object
header data.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 5cc5686039a882ad345681133c9c5a4a2c2fd86b)

11 years agoclient: add asok command to kick sessions that were remote reset
Yan, Zheng [Fri, 11 Apr 2014 07:03:37 +0000 (15:03 +0800)]
client: add asok command to kick sessions that were remote reset

Fixes: #8021
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
(cherry picked from commit 09a1bc5a4601d356b9cc69be8541e6515d763861)

11 years agoosd: throttle snap trimmming with simple delay
Sage Weil [Fri, 18 Apr 2014 20:50:11 +0000 (13:50 -0700)]
osd: throttle snap trimmming with simple delay

This is not particularly smart, but it is *a* knob that lets you make
the snap trimmer slow down.  It's a flow and a simple delay, so it is
adjustable at runtime.  Default is 0 (no change in behavior).

Partial solution for #6278.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 4413670d784efc2392359f0f22bca7c9056188f4)

11 years agoPG: only complete replicas should count toward min_size
Sage Weil [Tue, 1 Apr 2014 23:01:28 +0000 (16:01 -0700)]
PG: only complete replicas should count toward min_size

Backport: emperor,dumpling,cuttlefish
Fixes: #7805
Signed-off-by: Samuel Just <sam.just@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 0d5d3d1a30685e7c47173b974caa12076c43a9c4)

11 years agorgw: don't allow multiple writers to same multiobject part
Yehuda Sadeh [Sat, 3 May 2014 00:06:05 +0000 (17:06 -0700)]
rgw: don't allow multiple writers to same multiobject part

Fixes: #8269
A client might need to retry a multipart part write. The original thread
might race with the new one, trying to clean up after it, clobbering the
part's data.
The fix is to detect whether an original part already existed, and if so
use a different part name for it.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
11 years agomon/PGMonitor: set tid on no-op PGStatsAck
Sage Weil [Fri, 2 May 2014 22:10:43 +0000 (15:10 -0700)]
mon/PGMonitor: set tid on no-op PGStatsAck

The OSD needs to know the tid.  Both generally, and specifically because
the flush_pg_stats may be blocking on it.

Fixes: #8280
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 5a6ae2a978dcaf96ef89de3aaa74fe951a64def6)

11 years ago0.67.8 v0.67.8
Jenkins [Thu, 1 May 2014 11:18:24 +0000 (11:18 +0000)]
0.67.8

11 years agoMerge pull request #1743 from ceph/wip-mon-backports.dumpling
Sage Weil [Wed, 30 Apr 2014 22:15:48 +0000 (15:15 -0700)]
Merge pull request #1743 from ceph/wip-mon-backports.dumpling

mon: OSDMonitor: HEALTH_WARN on 'mon osd down out interval == 0'

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agomon: OSDMonitor: HEALTH_WARN on 'mon osd down out interval == 0' 1743/head
Joao Eduardo Luis [Wed, 30 Apr 2014 16:13:30 +0000 (17:13 +0100)]
mon: OSDMonitor: HEALTH_WARN on 'mon osd down out interval == 0'

A 'status' or 'health' request will return a HEALTH_WARN whenever the
monitor handling the request has the option set to zero.

Fixes: 7784
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit b2112d5087b449d3b019678cb266ff6fa897897e)

11 years agoMake symlink of librbd to qemu's folder so it can detect it.
Sandon Van Ness [Wed, 5 Mar 2014 00:15:15 +0000 (16:15 -0800)]
Make symlink  of librbd to qemu's folder so it can detect it.

Per issue #7293.

Signed-off-by: Sandon Van Ness <sandon@inktank.com>
(cherry picked from commit 65f3354903fdbdb81468a84b8049ff19c00f91ba)

11 years agorgw: fix url escaping
Yehuda Sadeh [Fri, 25 Apr 2014 21:11:27 +0000 (14:11 -0700)]
rgw: fix url escaping

Fixes: #8202
This fixes the radosgw side of issue #8202. Needed to cast value
to unsigned char, otherwise it'd get padded.

Backport: dumpling

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit bcf92c496aba0dfde432290fc2df5620a2767313)

11 years agoMerge pull request #1700 from xanpeng/patch-1
Sage Weil [Fri, 25 Apr 2014 23:00:24 +0000 (16:00 -0700)]
Merge pull request #1700 from xanpeng/patch-1

Fix error in mkcephfs.rst

Signed-off-by: Xan Peng <xanpeng@gmail.com>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoUpdate mkcephfs.rst 1700/head
xanpeng [Mon, 21 Apr 2014 03:30:42 +0000 (11:30 +0800)]
Update mkcephfs.rst

There should be no blank between mount options.

11 years agoauth: add rwlock to AuthClientHandler to prevent races
Josh Durgin [Wed, 2 Apr 2014 00:27:01 +0000 (17:27 -0700)]
auth: add rwlock to AuthClientHandler to prevent races

For cephx, build_authorizer reads a bunch of state (especially the
current session_key) which can be updated by the MonClient. With no
locks held, Pipe::connect() calls SimpleMessenger::get_authorizer()
which ends up calling RadosClient::get_authorizer() and then
AuthClientHandler::bulid_authorizer(). This unsafe usage can lead to
crashes like:

Program terminated with signal 11, Segmentation fault.
0x00007fa0d2ddb7cb in ceph::buffer::ptr::release (this=0x7f987a5e3070) at common/buffer.cc:370
370 common/buffer.cc: No such file or directory.
in common/buffer.cc
(gdb) bt
0x00007fa0d2ddb7cb in ceph::buffer::ptr::release (this=0x7f987a5e3070) at common/buffer.cc:370
0x00007fa0d2ddec00 in ~ptr (this=0x7f989c03b830) at ./include/buffer.h:171
ceph::buffer::list::rebuild (this=0x7f989c03b830) at common/buffer.cc:817
0x00007fa0d2ddecb9 in ceph::buffer::list::c_str (this=0x7f989c03b830) at common/buffer.cc:1045
0x00007fa0d2ea4dc2 in Pipe::connect (this=0x7fa0c4307340) at msg/Pipe.cc:907
0x00007fa0d2ea7d73 in Pipe::writer (this=0x7fa0c4307340) at msg/Pipe.cc:1518
0x00007fa0d2eb44dd in Pipe::Writer::entry (this=<value optimized out>) at msg/Pipe.h:59
0x00007fa0e0f5f9d1 in start_thread (arg=0x7f987a5e4700) at pthread_create.c:301
0x00007fa0de560b6d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

and

Error in `qemu-system-x86_64': invalid fastbin entry (free): 0x00007ff12887ff20
*** ======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x80a46)[0x7ff3dea1fa46]
/usr/lib/librados.so.2(+0x29eb03)[0x7ff3e3d43b03]
/usr/lib/librados.so.2(_ZNK9CryptoKey7encryptEP11CephContextRKN4ceph6buffer4listERS4_RSs+0x71)[0x7ff3e3d42661]
/usr/lib/librados.so.2(_Z21encode_encrypt_enc_blIN4ceph6buffer4listEEvP11CephContextRKT_RK9CryptoKeyRS2_RSs+0xfe)[0x7ff3e3d417de]
/usr/lib/librados.so.2(_Z14encode_encryptIN4ceph6buffer4listEEiP11CephContextRKT_RK9CryptoKeyRS2_RSs+0xa2)[0x7ff3e3d41912]
/usr/lib/librados.so.2(_ZN19CephxSessionHandler12sign_messageEP7Message+0x242)[0x7ff3e3d40de2]
/usr/lib/librados.so.2(_ZN4Pipe6writerEv+0x92b)[0x7ff3e3e61b2b]
/usr/lib/librados.so.2(_ZN4Pipe6Writer5entryEv+0xd)[0x7ff3e3e6c7fd]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7f8e)[0x7ff3ded6ff8e]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7ff3dea99a0d]

Fix this by adding an rwlock to AuthClientHandler. A simpler fix would
be to move RadosClient::get_authorizer() into the MonClient() under
the MonClient lock, but this would not catch all uses of other
Authorizer, e.g. for verify_authorizer() and it would serialize
independent connection attempts.

This mainly matters for cephx, but none and unknown can have the
global_id reset as well.

Partially-fixes: #6480
Backport: dumpling, emperor
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 2cc76bcd12d803160e98fa73810de2cb916ef1ff)

11 years agopipe: only read AuthSessionHandler under pipe_lock
Josh Durgin [Tue, 1 Apr 2014 18:37:29 +0000 (11:37 -0700)]
pipe: only read AuthSessionHandler under pipe_lock

session_security, the AuthSessionHandler for a Pipe, is deleted and
recreated while the pipe_lock is held. read_message() is called
without pipe_lock held, and examines session_security. To make this
safe, make session_security a shared_ptr and take a reference to it
while the pipe_lock is still held, and use that shared_ptr in
read_message().

This may have caused crashes like:

*** Error in `qemu-system-x86_64': invalid fastbin entry (free): 0x00007f42a4002de0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x80a46)[0x7f452f1f3a46]
/usr/lib/x86_64-linux-gnu/libnss3.so(PK11_FreeSymKey+0xa8)[0x7f452e72ff98]
/usr/lib/librados.so.2(+0x2a18cd)[0x7f453451a8cd]
/usr/lib/librados.so.2(_ZNK9CryptoKey7encryptEP11CephContextRKN4ceph6buffer4listERS4_RSs+0x71)[0x7f4534519421]
/usr/lib/librados.so.2(_Z21encode_encrypt_enc_blIN4ceph6buffer4listEEvP11CephContextRKT_RK9CryptoKeyRS2_RSs+0xfe)[0x7f453451859e]
/usr/lib/librados.so.2(_Z14encode_encryptIN4ceph6buffer4listEEiP11CephContextRKT_RK9CryptoKeyRS2_RSs+0xa2)[0x7f45345186d2]
/usr/lib/librados.so.2(_ZN19CephxSessionHandler23check_message_signatureEP7Message+0x246)[0x7f4534516866]
/usr/lib/librados.so.2(_ZN4Pipe12read_messageEPP7Message+0xdcc)[0x7f453462ecbc]
/usr/lib/librados.so.2(_ZN4Pipe6readerEv+0xa5c)[0x7f453464059c]
/usr/lib/librados.so.2(_ZN4Pipe6Reader5entryEv+0xd)[0x7f4534643ecd]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7f8e)[0x7f452f543f8e]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f452f26da0d]

Partially-fixes: #6480
Backport: dumpling, emperor
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 1d74170a4c252f35968ccfbec8e432582e92f638)

11 years agoosdc/ObjectCacher: back off less during flush
Sage Weil [Fri, 3 Jan 2014 20:51:15 +0000 (12:51 -0800)]
osdc/ObjectCacher: back off less during flush

In cce990efc8f2a58c8d0fa11c234ddf2242b1b856 we added a limit to avoid
holding the lock for too long.  However, if we back off, we currently
wait for a full second, which is probably a bit much--we really just want
to give other threads a chance.

Backport: emperor
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit e2ee52879e9de260abbf5eacbdabbd71973a6a83)

11 years agoosdc/ObjectCacher: limit writeback IOs generated while holding lock
Sage Weil [Tue, 1 Oct 2013 16:28:29 +0000 (09:28 -0700)]
osdc/ObjectCacher: limit writeback IOs generated while holding lock

While analyzing a log from Mike Dawson I saw a long stall while librbd's
objectcacher was starting lots (many hundreds) of IOs.  Limit the amount of
time we spend doing this at a time to allow IO replies to be processed so
that the cache remains responsive.

I'm not sure this warrants a tunable (which we would need to add for both
libcephfs and librbd).

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit cce990efc8f2a58c8d0fa11c234ddf2242b1b856)

11 years agoos/FileStore: reset journal state on umount
Sage Weil [Tue, 8 Apr 2014 17:52:43 +0000 (10:52 -0700)]
os/FileStore: reset journal state on umount

We observed a sequence like:

 - replay journal
   - sets JournalingObjectStore applied_op_seq
 - umount
 - mount
   - initiate commit with prevous applied_op_seq
 - replay journal
   - commit finishes
   - on replay commit, we fail assert op > committed_seq

Although strictly speaking the assert failure is harmless here, in general
we should not let state leak through from a previous mount into this
mount or else assertions are in general more difficult to reason about.

Fixes: #8019
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 4de49e8676748b6ab4716ff24fd0a465548594fc)

11 years agorgw: deny writes to a secondary zone by non-system users
Yehuda Sadeh [Tue, 5 Nov 2013 22:54:20 +0000 (14:54 -0800)]
rgw: deny writes to a secondary zone by non-system users

Fixes: #6678
We don't want to allow regular users to write to secondary zones,
otherwise we'd end up with data inconsistencies.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 6961b5254f16ac3362c3a51f5490328d23640dbf)

Conflicts:
src/rgw/rgw_rados.h

11 years agomon: wait for quorum for MMonGetVersion
Sage Weil [Sat, 5 Apr 2014 23:58:55 +0000 (16:58 -0700)]
mon: wait for quorum for MMonGetVersion

We should not respond to checks for map versions when we are in the
probing or electing states or else clients will get incorrect results when
they ask what the latest map version is.

Fixes: #7997
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 67fd4218d306c0d2c8f0a855a2e5bf18fa1d659e)

11 years agorgw: fix swift range response
Yehuda Sadeh [Wed, 19 Feb 2014 16:59:07 +0000 (08:59 -0800)]
rgw: fix swift range response

Fixes: #7099
Backport: dumpling
The range response header was broken in swift.

Reported-by: Julien Calvet <julien.calvet@neurea.com>
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 0427f61544529ab4e0792b6afbb23379fe722de1)

11 years agorgw: don't log system requests in usage log
Yehuda Sadeh [Fri, 22 Nov 2013 23:41:49 +0000 (15:41 -0800)]
rgw: don't log system requests in usage log

Fixes: 6889
System requets should not be logged in the usage log.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 42ef8ba543c7bf13c5aa3b6b4deaaf8a0f9c58b6)

11 years agoOSD: _share_map_outgoing whenever sending a message to a peer
Greg Farnum [Fri, 4 Apr 2014 23:06:05 +0000 (16:06 -0700)]
OSD: _share_map_outgoing whenever sending a message to a peer

This ensures that they get new maps before an op which requires them (that
they would then request from the monitor).

Signed-off-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 232ac1a52a322d163d8d8dbc4a7da4b6a9acb709)

11 years agomsgr: fix rebind() race
Xihui He [Mon, 30 Dec 2013 04:04:10 +0000 (12:04 +0800)]
msgr: fix rebind() race
stop the accepter and mark all pipes down before rebind to avoid race

Fixes: #6992
Signed-off-by: Xihui He xihuihe@gmail.com
(cherry picked from commit f8e413f9c79a3a2a12801f5f64a2f612de3f06a0)

11 years agoPG: retry GetLog() each time we get a notify in Incomplete
Samuel Just [Tue, 26 Nov 2013 21:20:21 +0000 (13:20 -0800)]
PG: retry GetLog() each time we get a notify in Incomplete

If for some reason there are no up OSDs in the history which
happen to have usable copies of the pg, it's possible that
there is a usable copy elsewhere on the cluster which will
become known to the primary if it waits.

Fixes: #6909
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 964c8e978f86713e37a13b4884a6c0b9b41b5bae)

11 years agoos/FileJournal: return errors on make_writeable() if reopen fails
Sage Weil [Mon, 17 Mar 2014 22:37:44 +0000 (15:37 -0700)]
os/FileJournal: return errors on make_writeable() if reopen fails

This is why #7738 is resulting in a crash instead of an error.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit aed074401d2834a5b04edd1b7f6b4f36336f6293)

11 years agomon/Paxos: commit only after entire quorum acks
Sage Weil [Mon, 17 Mar 2014 23:21:17 +0000 (16:21 -0700)]
mon/Paxos: commit only after entire quorum acks

If a subset of the quorum accepts the proposal and we commit, we will start
sharing the new state.  However, the mon that didn't yet reply with the
accept may still be sharing the old and stale value.

The simplest way to prevent this is not to commit until the entire quorum
replies.  In the general case, there are no failures and this is just fine.
In the failure case, we will call a new election and have a smaller quorum
of (live) nodes and will recommit the same value.

A more performant solution would be to have a separate message invalidate
the old state and commit once we have all invalidations and a majority of
accepts.  This will lower latency a bit in the non-failure case, but not
change the failure case significantly.  Later!

Fixes: #7736
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit fa1d957c115a440e162dba1b1002bc41fc1eac43)

11 years agoPrioritizedQueue: cap costs at max_tokens_per_subqueue
Samuel Just [Thu, 13 Mar 2014 21:04:19 +0000 (14:04 -0700)]
PrioritizedQueue: cap costs at max_tokens_per_subqueue

Otherwise, you can get a recovery op in the queue which has a cost
higher than the max token value.  It won't get serviced until all other
queues also do not have enough tokens and higher priority queues are
empty.

Fixes: #7706
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 2722a0a487e77ea2aa0d18caec0bdac50cb6a264)

11 years agoFix byte-order dependency in calculation of initial challenge
Dan Mick [Thu, 3 Apr 2014 20:59:59 +0000 (13:59 -0700)]
Fix byte-order dependency in calculation of initial challenge

Fixes: #7977
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 4dc62669ecd679bc4d0ef2b996b2f0b45b8b4dc7)

11 years agorbd: return 0 and an empty list when pool is entirely empty
Josh Durgin [Wed, 1 Jan 2014 01:00:06 +0000 (17:00 -0800)]
rbd: return 0 and an empty list when pool is entirely empty

rbd_list will return -ENOENT when no rbd_directory object
exists. Handle this in the cli tool and interpret it as success with
an empty list.

Add this to the release notes since it changes command line behavior.

Fixes: #6693
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit ac547a5b7dc94282f079aef78e66348d99d9d5e9)

Conflicts:
PendingReleaseNotes
src/rbd.cc

11 years agotest: use older names for module setup/teardown
Josh Durgin [Thu, 21 Nov 2013 02:35:34 +0000 (18:35 -0800)]
test: use older names for module setup/teardown

setUp and tearDown require nosetests 0.11, but 0.10.4 is the latest on
centos. Rename to use the older aliases, which still work with newer
versions of nosetests as well.

Fixes: #6368
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit f753d56a9edba6ce441520ac9b52b93bd8f1b5b4)

11 years agoOSD: don't clear peering_wait_for_split in advance_map()
Samuel Just [Sun, 3 Nov 2013 19:06:10 +0000 (11:06 -0800)]
OSD: don't clear peering_wait_for_split in advance_map()

I really don't know why I added this...  Ops can be discarded from the
waiting_for_pg queue if we aren't primary simply because there must have
been an exchange of peering events before subops will be sent within a
particular epoch.  Thus, any events in the waiting_for_pg queue must be
client ops which should only be seen by the primary.  Peering events, on
the other hand, should only be discarded if we are in a new interval,
and that check might as well be performed in the peering wq.

Fixes: #6681
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 9ab513334c7ff9544bac07bd420c6d5d200cf535)

11 years agoMerge remote-tracking branch 'gh/wip-7888-dumpling' into dumpling
Sage Weil [Wed, 2 Apr 2014 19:57:30 +0000 (12:57 -0700)]
Merge remote-tracking branch 'gh/wip-7888-dumpling' into dumpling

11 years agoPG: fix operator<<,log_wierdness log bound warning
Samuel Just [Wed, 6 Nov 2013 05:48:53 +0000 (21:48 -0800)]
PG: fix operator<<,log_wierdness log bound warning

Split may cause holes such that head != tail and yet
log.empty().

Fixes: #6722
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
(cherry picked from commit c6826c1e8a301b2306530c6e5d0f4a3160c4e691)

11 years agoPGLog::rewind_divergent_log: log may not contain newhead
Samuel Just [Wed, 6 Nov 2013 01:47:48 +0000 (17:47 -0800)]
PGLog::rewind_divergent_log: log may not contain newhead

Due to split, there may be a hole at newhead.

Fixes: #6722
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
(cherry picked from commit f4648bc6fec89c870e0c47b38b2f13496742b10f)

11 years agoqa/workunits/fs/misc/layout_vxattrs: ceph.file.layout is not listed
Sage Weil [Sat, 29 Mar 2014 21:23:21 +0000 (14:23 -0700)]
qa/workunits/fs/misc/layout_vxattrs: ceph.file.layout is not listed

As of 08a3d6bd428c5e78dd4a10e6ee97540f66f9729c.  A similar change was made
in the kernel.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 4f9f7f878953b29cd5f56a8e0834832d6e3a9cec)

11 years agoMerge pull request #1519 from ceph/wip-6951-dumpling
Sage Weil [Sat, 29 Mar 2014 01:01:08 +0000 (18:01 -0700)]
Merge pull request #1519 from ceph/wip-6951-dumpling

rgw: reset objv tracker on bucket recreation

11 years agoMerge pull request #1559 from ceph/wip-7881-dumpling
Sage Weil [Sat, 29 Mar 2014 00:02:39 +0000 (17:02 -0700)]
Merge pull request #1559 from ceph/wip-7881-dumpling

Wip 7881 dumpling

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agomon/MonClient: use keepalive2 to verify the mon session is live
Sage Weil [Fri, 28 Mar 2014 04:33:21 +0000 (21:33 -0700)]
mon/MonClient: use keepalive2 to verify the mon session is live

Verify that the mon is responding by checking the keepalive2 reply
timestamp.  We cannot rely solely on TCP timing out and returning an
error.

Fixes: #7888
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 056151a6334c054505c54e59af40f203a0721f28)

11 years agomsgr: add KEEPALIVE2 feature
Sage Weil [Fri, 28 Mar 2014 04:09:13 +0000 (21:09 -0700)]
msgr: add KEEPALIVE2 feature

This is similar to KEEPALIVE, except a timestamp is also exchanged.  It is
sent with the KEEPALIVE, and then returned with the ACK.  The last
received stamp is stored in the Connection so that it can be queried for
liveness.  Since all of the users of keepalive are already regularly
triggering a keepalive, they can check the liveness at the same time.

See #7888.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit d747d79fd5ea8662a809c5636dfd2eaaa9bf8f5d)

Conflicts:

src/include/ceph_features.h

11 years agoPipe: rename keepalive->send_keepalive
Greg Farnum [Wed, 26 Mar 2014 22:58:10 +0000 (15:58 -0700)]
Pipe: rename keepalive->send_keepalive

Signed-off-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 38d4c71a456c1cc9a5044dbcae5378836a34484d)

11 years agoclient: pin Inode during readahead
Sage Weil [Thu, 27 Mar 2014 04:52:00 +0000 (21:52 -0700)]
client: pin Inode during readahead

Make sure the Inode does not go away while a readahead is in progress.  In
particular:

 - read_async
   - start a readahead
   - get actual read from cache, return
 - close/release
   - call ObjectCacher::release_set() and get unclean > 0, assert

Fixes: #7867
Backport: emperor, dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit f1c7b4ef0cd064a9cb86757f17118d17913850db)

11 years agoosdc/ObjectCacher: call read completion even when no target buffer
Sage Weil [Fri, 28 Mar 2014 19:34:07 +0000 (12:34 -0700)]
osdc/ObjectCacher: call read completion even when no target buffer

If we do no assemble a target bl, we still want to return a valid return
code with the number of bytes read-ahead so that the C_RetryRead completion
will see this as a finish and call the caller's provided Context.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 032d4ec53e125ad91ad27ce58da6f38dcf1da92e)

11 years agoPGLog: remove obsolete assert in merge_log 1559/head
Samuel Just [Wed, 30 Oct 2013 23:54:39 +0000 (16:54 -0700)]
PGLog: remove obsolete assert in merge_log

This assert assumes that if olog.head != log.head, olog contains
a log entry at log.head, which may not be true since pg splitting
might have left the log with arbitrary holes.

Related: 0c2769d3321bff6e85ec57c85a08ee0b8e751bcb
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 353813b2e1a98901b876790c7c531f8a202c661d)

11 years agoPGLog: on split, leave log head alone
Samuel Just [Mon, 30 Sep 2013 22:54:27 +0000 (15:54 -0700)]
PGLog: on split, leave log head alone

This way last_update doesn't go backwards.

Fixes: 6447
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 0c2769d3321bff6e85ec57c85a08ee0b8e751bcb)

11 years agoMerge pull request #1539 from ceph/wip-6910-dumpling
Sage Weil [Thu, 27 Mar 2014 00:18:24 +0000 (17:18 -0700)]
Merge pull request #1539 from ceph/wip-6910-dumpling

PG: don't query unfound on empty pgs

11 years agoPG: don't query unfound on empty pgs 1539/head
Samuel Just [Wed, 27 Nov 2013 03:17:59 +0000 (19:17 -0800)]
PG: don't query unfound on empty pgs

When the replica responds, it responds with a notify
rather than a log, which the primary then ignores since
it is already in the peer_info map.  Rather than fix that
we'll simply not send queries to peers we already know to
have no unfound objects.

Fixes: #6910
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
(cherry picked from commit 838b6c8387087543ce50837277f7f6b52ae87d00)

11 years agoMerge pull request #1313 from ceph/dumpling-osd-subscribe
Sage Weil [Fri, 21 Mar 2014 21:53:23 +0000 (14:53 -0700)]
Merge pull request #1313 from ceph/dumpling-osd-subscribe

Dumpling backport: clean up osd subscriptions

11 years agoMerge pull request #1485 from ceph/wip-7212.dumpling
Sage Weil [Fri, 21 Mar 2014 21:52:20 +0000 (14:52 -0700)]
Merge pull request #1485 from ceph/wip-7212.dumpling

backport 7212 fixes to dumpling

11 years agorgw: reset objv tracker on bucket recreation 1519/head
Yehuda Sadeh [Wed, 19 Feb 2014 16:11:56 +0000 (08:11 -0800)]
rgw: reset objv tracker on bucket recreation

Fixes: #6951
If we cannot create a new bucket (as it already existed), we need to
read the old bucket's info. However, this was failing as we were holding
the objv tracker that we created for the bucket creation. We need to
clear it, as subsequent read using it will fail.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 859ed33ed7f9a96f4783dfb3e130d5eb60c622dd)

11 years agoReplicatedPG: don't skip missing if sentries is empty on pgls
Samuel Just [Wed, 6 Nov 2013 22:33:03 +0000 (14:33 -0800)]
ReplicatedPG: don't skip missing if sentries is empty on pgls

Formerly, if sentries is empty, we skip missing.  In general,
we need to continue adding items from missing until we get
to next (returned from collection_list_partial) to avoid
missing any objects.

Fixes: #6633
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
(cherry picked from commit c7a30b881151e08b37339bb025789921e7115288)

11 years agomon/Elector: bootstrap on timeout 1485/head
Sage Weil [Sat, 15 Feb 2014 16:59:51 +0000 (08:59 -0800)]
mon/Elector: bootstrap on timeout

Currently if an election times out we call a new
election.  If we have never joined a quorum, bootstrap
instead. This is heavier weight, but captures the case
where, during bootstrap:

 - a and b have learned each others' addresses
 - everybody calls an election
 - a and b form a quorum
 - c loops trying to call an election, but is ignored
   because a and b don't see its address in the monmap

See logs:
  ubuntu@teuthology:/var/lib/teuthworker/archive/sage-2014-02-14_13:50:04-ceph-deploy-wip-7212-sage-b-testing-basic-plana/83194

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit a4bcb1f8129a4ece97bd3419abf1ff45d260ad8e)
(cherry picked from commit 143ec0281aa8b640617a3fe19a430248ce3b514c)

11 years agomon: tell MonmapMonitor first about winning an election
Sage Weil [Fri, 14 Feb 2014 19:25:52 +0000 (11:25 -0800)]
mon: tell MonmapMonitor first about winning an election

It is important in the bootstrap case that the very first paxos round
also codify the contents of the monmap itself in order to avoid any manner
of confusing scenarios where subsequent elections are called and people
try to recover and modify paxos without agreeing on who the quorum
participants are.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit ad7f5dd481a7f45dfe6b50d27ad45abc40950510)
(cherry picked from commit e073a062d56099b5fb4311be2a418f7570e1ffd9)

11 years agomon: only learn peer addresses when monmap == 0
Sage Weil [Fri, 14 Feb 2014 19:13:26 +0000 (11:13 -0800)]
mon: only learn peer addresses when monmap == 0

It is only safe to dynamically update the address for a peer mon in our
monmap if we are in the midst of the initial quorum formation (i.e.,
monmap.epoch == 0).  If it is a later epoch, we have formed our initial
quorum and any and all monmap changes need to be agreed upon by the quorum
and committed via paxos.

Fixes: #7212
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 7bd2104acfeff0c9aa5e648d82ed372f901f767f)
(cherry picked from commit 1996fd89fb3165a63449b135e05841579695aabd)

11 years agoceph.in: do not allow using 'tell' with interactive mode
Joao Eduardo Luis [Mon, 17 Mar 2014 14:37:09 +0000 (14:37 +0000)]
ceph.in: do not allow using 'tell' with interactive mode

This avoids a lot of hassle when dealing with to whom tell each command
on interactive mode, and even more so if multiple targets are specified.

As so, 'tell' commands should be used while on interactive mode instead.

Backport: dumpling,emperor

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit e39c213c1d230271d23b74086664c2082caecdb9)

11 years agoRGWListBucketMultiparts: init max_uploads/default_max with 0
Danny Al-Gaaf [Wed, 12 Mar 2014 21:56:44 +0000 (22:56 +0100)]
RGWListBucketMultiparts: init max_uploads/default_max with 0

CID 717377 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
 2. uninit_member: Non-static class member "max_uploads" is not initialized
    in this constructor nor in any functions that it calls.
 4. uninit_member: Non-static class member "default_max" is not initialized
    in this constructor nor in any functions that it calls.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
(cherry picked from commit b23a141d54ffb39958aba9da7f87544674fa0e50)

11 years agoceph_test_rados: wait for commit, not ack
Sage Weil [Thu, 13 Mar 2014 21:49:30 +0000 (14:49 -0700)]
ceph_test_rados: wait for commit, not ack

First, this is what we wanted in the first place

Second, if we wait for ACK, we may look at a user_version value that is
not stable.

Fixes: #7705
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit f2124c5846f1e9cb44e66eb2e957b8c7df3e19f4)

Conflicts:

src/test/osd/RadosModel.h

11 years agotest-upgrade-firefly: skip watch-notify system test
Josh Durgin [Thu, 13 Mar 2014 16:50:16 +0000 (09:50 -0700)]
test-upgrade-firefly: skip watch-notify system test

This also fails on mixed version clusters due to watch on a
non-existent object returning ENOENT in firefly and 0 in dumpling.

Reviewed-by: Sage Weil <sage.weil@inktank.com>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoqa/workunit/rados/test-upgrade-firefly: skip watch-notify test
Sage Weil [Thu, 13 Mar 2014 04:30:12 +0000 (21:30 -0700)]
qa/workunit/rados/test-upgrade-firefly: skip watch-notify test

A watch on a non-existent object now returns ENOENT in firefly; skip this
test as it will fail on a hybrid or upgraded cluster.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1411 from ceph/wip-7076-dumpling
Sage Weil [Wed, 12 Mar 2014 04:33:40 +0000 (21:33 -0700)]
Merge pull request #1411 from ceph/wip-7076-dumpling

dumpling backport of watchers check for rbd_remove()

11 years agorgw: off-by-one in rgw_trim_whitespace()
Ray Lv [Wed, 26 Feb 2014 13:17:32 +0000 (21:17 +0800)]
rgw: off-by-one in rgw_trim_whitespace()

Fixes: #7543
Backport: dumpling

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Signed-off-by: Ray Lv <raylv@yahoo-inc.com>
(cherry picked from commit 195d53a7fc695ed954c85022fef6d2a18f68fe20)

11 years agorbd: check for watchers before trimming an image on 'rbd rm' 1411/head
Ilya Dryomov [Wed, 29 Jan 2014 14:12:01 +0000 (16:12 +0200)]
rbd: check for watchers before trimming an image on 'rbd rm'

Check for watchers before trimming image data to try to avoid getting
into the following situation:

  - user does 'rbd rm' on a mapped image with an fs mounted from it
  - 'rbd rm' trims (removes) all image data, only header is left
  - 'rbd rm' tries to remove a header and fails because krbd has a
    watcher registered on the header
  - at this point image cannot be unmapped because of the mounted fs
  - fs cannot be unmounted because all its data and metadata is gone

Unfortunately, this fix doesn't make it impossible to happen (the
required atomicity isn't there), but it's a big improvement over the
status quo.

Fixes: http://tracker.ceph.com/issues/7076
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
(cherry picked from commit 0a553cfa81b06e75585ab3c39927e307ec0f4cb6)

11 years agoMerge pull request #1407 from dachary/wip-7188-dumpling
Sage Weil [Sun, 9 Mar 2014 17:56:31 +0000 (10:56 -0700)]
Merge pull request #1407 from dachary/wip-7188-dumpling

common: ping existing admin socket before unlink (dumpling)

Reviewed-by: Sage Weil <sage@inktank.com>