]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
9 years agolibrbd: Fix rebase with new io flow 5465/head
Haomai Wang [Fri, 20 Nov 2015 04:27:27 +0000 (12:27 +0800)]
librbd: Fix rebase with new io flow

Signed-off-by: Haomai Wang <haomai@xsky.com>
9 years agolibrbd: fix lttng tracing argument mismatch
Haomai Wang [Fri, 20 Nov 2015 04:27:18 +0000 (12:27 +0800)]
librbd: fix lttng tracing argument mismatch

Signed-off-by: Haomai Wang <haomai@xsky.com>
9 years agolibrbd: Add set_event_notify to AioImageRequestWQ
Haomai Wang [Wed, 18 Nov 2015 07:43:51 +0000 (15:43 +0800)]
librbd: Add set_event_notify to AioImageRequestWQ

Signed-off-by: Haomai Wang <haomai@xsky.com>
9 years agolibrbd: Remove unneeded set_event_notify
Haomai Wang [Mon, 9 Nov 2015 15:05:01 +0000 (23:05 +0800)]
librbd: Remove unneeded set_event_notify

Since xlist clear method invoked below is enough to disassociate the xlist
item from the xlist, so the remove_myself() call in the destructor is safe
to invoke since it's already been removed from the list.

Signed-off-by: Haomai Wang <haomai@xsky.com>
9 years agoMakefile: Add noinst headr files
Haomai Wang [Mon, 9 Nov 2015 15:01:45 +0000 (23:01 +0800)]
Makefile: Add noinst headr files

Signed-off-by: Haomai Wang <haomai@xsky.com>
9 years agolibrbd: Make rbd header file uses independent enum definition
Haomai Wang [Mon, 9 Nov 2015 14:58:32 +0000 (22:58 +0800)]
librbd: Make rbd header file uses independent enum definition

Signed-off-by: Haomai Wang <haomai@xsky.com>
9 years agoLibrbd: Make AioCompletion complete doesn't unlock if callback
Haomai Wang [Sat, 7 Nov 2015 07:55:47 +0000 (15:55 +0800)]
Librbd: Make AioCompletion complete doesn't unlock if callback

Signed-off-by: Haomai Wang <haomai@xsky.com>
9 years agoLibrbd: fix return code of EventSocket init and notify
Haomai Wang [Sat, 7 Nov 2015 07:54:53 +0000 (15:54 +0800)]
Librbd: fix return code of EventSocket init and notify

Signed-off-by: Haomai Wang <haomai@xsky.com>
9 years agolibrbd: Add ictx check to avoid AIO_TYPE_NONE completion
Haomai Wang [Fri, 21 Aug 2015 06:54:56 +0000 (14:54 +0800)]
librbd: Add ictx check to avoid AIO_TYPE_NONE completion

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
9 years agolibrbd: normalize notify return code
Haomai Wang [Wed, 19 Aug 2015 08:15:40 +0000 (16:15 +0800)]
librbd: normalize notify return code

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
9 years agolibrbd: Fix incorrect api declaration
Haomai Wang [Thu, 6 Aug 2015 03:14:46 +0000 (11:14 +0800)]
librbd: Fix incorrect api declaration

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
9 years agolibrbd: check event_notify to avoid extra logic
Haomai Wang [Wed, 5 Aug 2015 14:59:32 +0000 (22:59 +0800)]
librbd: check event_notify to avoid extra logic

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
9 years agoEventSocket: Add new event type pipe support
Haomai Wang [Tue, 4 Aug 2015 09:50:36 +0000 (17:50 +0800)]
EventSocket: Add new event type pipe support

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
9 years agotest: Add tests for getting arg of completion
Haomai Wang [Tue, 4 Aug 2015 09:33:39 +0000 (17:33 +0800)]
test: Add tests for getting arg of completion

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
9 years agolibrbd: Add interface to let user can get private data from comp
Haomai Wang [Tue, 4 Aug 2015 09:31:43 +0000 (17:31 +0800)]
librbd: Add interface to let user can get private data from comp

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
9 years agotests: Add tests for user io event notify
Haomai Wang [Tue, 4 Aug 2015 09:22:56 +0000 (17:22 +0800)]
tests: Add tests for user io event notify

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
9 years agolibrbd: Add event notify interfaces
Haomai Wang [Tue, 4 Aug 2015 09:22:40 +0000 (17:22 +0800)]
librbd: Add event notify interfaces

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
9 years agoEventSocket: Add EventSocket structure used for event notification
Haomai Wang [Tue, 4 Aug 2015 09:20:36 +0000 (17:20 +0800)]
EventSocket: Add EventSocket structure used for event notification

EventSocket will wrap different user event notification method like linux
eventfd, solaris port. Caller can user this to replace signal

Signed-off-by: Haomai Wang <haomaiwang@gmail.com>
9 years agoMerge pull request #6533 from ghost/wip-fix-trivial-bug
Sage Weil [Thu, 26 Nov 2015 01:15:13 +0000 (20:15 -0500)]
Merge pull request #6533 from ghost/wip-fix-trivial-bug

osd: fix trivial scrub bug

Reviewed-by: Kefu Chai <kchai@redhat.com>
9 years agoMerge pull request #6612 from H3C/wip-yrf-destroy_collection
Sage Weil [Thu, 26 Nov 2015 01:14:45 +0000 (20:14 -0500)]
Merge pull request #6612 from H3C/wip-yrf-destroy_collection

osd: fix FileStore::_destroy_collection error return code

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Haomai Wang <haomai@xsky.com>
9 years agoMerge pull request #6660 from chengyli/master
Sage Weil [Thu, 26 Nov 2015 01:13:40 +0000 (20:13 -0500)]
Merge pull request #6660 from chengyli/master

mon: fix ceph df pool available calculation for 0-weighted OSDs

Reviewed-by: Kefu Chai <kchai@redhat.com>
9 years agoMerge pull request #6473 from H3C/wip-osd-bugfix1
Sage Weil [Wed, 25 Nov 2015 22:46:01 +0000 (17:46 -0500)]
Merge pull request #6473 from H3C/wip-osd-bugfix1

auth: fail if rotating key is missing (do not spam log)

Reviewed-by: Kefu Chai <kchai@redhat.com>
9 years agoMerge pull request #6278 from XinzeChi/wip-failinfo-mon
Sage Weil [Wed, 25 Nov 2015 22:44:56 +0000 (17:44 -0500)]
Merge pull request #6278 from XinzeChi/wip-failinfo-mon

osd: cancel failure reports if we fail to rebind network

Reviewed-by: Sage Weil <sage@redhat.com>
9 years agoMerge pull request #6675 from rohanmars/wip-aix-librados-port
Sage Weil [Wed, 25 Nov 2015 22:43:44 +0000 (17:43 -0500)]
Merge pull request #6675 from rohanmars/wip-aix-librados-port

aix gcc librados port

Reviewed-by: Sage Weil <sage@redhat.com>
9 years agoaix shared library build 6675/head
Rohan Mars [Wed, 25 Nov 2015 23:30:31 +0000 (18:30 -0500)]
aix shared library build

Signed-off-by: Rohan Mars <code@rohanmars.com>
9 years agoMerge branch 'wip-13800' of git://github.com/ukernel/ceph
Greg Farnum [Wed, 25 Nov 2015 22:21:38 +0000 (14:21 -0800)]
Merge branch 'wip-13800' of git://github.com/ukernel/ceph

client: fix deadlock related to async pagecache invalidation

Conflicts:
src/client/Client.cc
Fixed a conflict with the earlier page cache invalidate
changes in 73beb7f9378182cc3901fe86c4f1f5d5d98169a6.

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
9 years agoMerge pull request #6454 from H3C/wip-mds
Gregory Farnum [Wed, 25 Nov 2015 22:10:58 +0000 (17:10 -0500)]
Merge pull request #6454 from H3C/wip-mds

mds: repair the command option "--hot-standby"

Reviewed-by: John Spray <john.spray@redhat.com>
Reviewed-by: Greg Farnum <gfarnum@redhat.com>
9 years agoMerge pull request #6269 from jcsp/wip-client-mark-down
Gregory Farnum [Wed, 25 Nov 2015 22:07:56 +0000 (17:07 -0500)]
Merge pull request #6269 from jcsp/wip-client-mark-down

client: close mds sessions in shutdown()

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
9 years agoMerge pull request #6380 from ukernel/wip-client-keep-cache
Gregory Farnum [Wed, 25 Nov 2015 22:02:06 +0000 (17:02 -0500)]
Merge pull request #6380 from ukernel/wip-client-keep-cache

client: don't invalidate page cache when inode is no longer used

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
9 years agoMerge pull request #6253 from jcsp/wip-client-availability
Gregory Farnum [Wed, 25 Nov 2015 22:00:04 +0000 (17:00 -0500)]
Merge pull request #6253 from jcsp/wip-client-availability

client: a better check for MDS availability

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
9 years agoMerge pull request #6051 from clever215/master
Yehuda Sadeh [Wed, 25 Nov 2015 17:37:40 +0000 (09:37 -0800)]
Merge pull request #6051 from clever215/master

rgw: add an inspection to the field of type when assigning user caps

Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
9 years agorgw: add an inspection to the field of type when assigning user caps 6051/head
clever215 [Wed, 25 Nov 2015 16:31:48 +0000 (11:31 -0500)]
rgw: add an inspection to the field of type when assigning user caps

Bug #13096

This modification adds the check to the field of type of a user's capality while previous versions set it any value. i.e. we limit the option of types to the 9 certain values, which are "users|buckets|metadata|usage|zone|bilog|mdlog|datalog|ops These 9 choosens are found in ceph documents and in source codes.

Signed-off-by: Kongming Wu <wu.kongming@h3c.com>
9 years agoMerge pull request #6700 from dillaman/wip-librbd-32bit-support
Josh Durgin [Wed, 25 Nov 2015 16:24:37 +0000 (08:24 -0800)]
Merge pull request #6700 from dillaman/wip-librbd-32bit-support

librbd: simplify IO method signatures for 32bit environments

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
9 years agolibrbd: simplify IO method signatures for 32bit environments 6700/head
Jason Dillaman [Wed, 25 Nov 2015 14:23:54 +0000 (09:23 -0500)]
librbd: simplify IO method signatures for 32bit environments

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
9 years agoMerge pull request #6625 from dillaman/wip-12698
Josh Durgin [Wed, 25 Nov 2015 03:01:14 +0000 (19:01 -0800)]
Merge pull request #6625 from dillaman/wip-12698

librbd: integrate journaling for maintenance operations

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
9 years agoMerge pull request #6687 from dillaman/wip-journal-replay-fixes
Josh Durgin [Wed, 25 Nov 2015 00:37:52 +0000 (16:37 -0800)]
Merge pull request #6687 from dillaman/wip-journal-replay-fixes

journal: support replaying beyond skipped splay objects

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
9 years agoMerge pull request #6685 from dachary/wip-erasure-code-benchmark
Loic Dachary [Tue, 24 Nov 2015 20:25:42 +0000 (21:25 +0100)]
Merge pull request #6685 from dachary/wip-erasure-code-benchmark

qa: erasure-code benchmark plugin selection

Reviewed-by: Andreas Peters <andreas.joachim.peters@cern.ch>
9 years agoMerge pull request #6292 from dx9/wip-12406-res_nquery
Yehuda Sadeh [Tue, 24 Nov 2015 20:09:10 +0000 (12:09 -0800)]
Merge pull request #6292 from dx9/wip-12406-res_nquery

rgw/rgw_resolve: fallback to res_query when res_nquery not implemented

Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
9 years agoMerge pull request #6236 from guangyy/err-msg
Sage Weil [Tue, 24 Nov 2015 20:00:07 +0000 (15:00 -0500)]
Merge pull request #6236 from guangyy/err-msg

osd: use pg id (without shard) when referring the PG

Reviewed-by: Sage Weil <sage@redhat.com>
9 years agoMerge pull request #6610 from ktdreyer/wip-build-doc-lxml
Sage Weil [Tue, 24 Nov 2015 17:49:08 +0000 (12:49 -0500)]
Merge pull request #6610 from ktdreyer/wip-build-doc-lxml

admin/build-doc: add lxml dependencies on debian

Reviewed-by: Sage Weil <sage@redhat.com>
9 years agoclient: s/close_sessions/_close_sessions/ 6269/head
John Spray [Tue, 24 Nov 2015 17:48:14 +0000 (17:48 +0000)]
client: s/close_sessions/_close_sessions/

Signed-off-by: John Spray <john.spray@redhat.com>
9 years agoMerge pull request #6323 from dingshang/wip-cephfs-dingshang
Gregory Farnum [Tue, 24 Nov 2015 15:16:47 +0000 (10:16 -0500)]
Merge pull request #6323 from dingshang/wip-cephfs-dingshang

pybind/cephfs: add symlink and its unit test

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
9 years agoclient: close mds sessions in shutdown()
John Spray [Thu, 15 Oct 2015 00:31:16 +0000 (01:31 +0100)]
client: close mds sessions in shutdown()

Usually this happens in unmount(), but when we
have instantiated Client without mounting (to
send MDS commands), we need to handle closing
any open sessions in shutdown as well.

This is the correct replacement for the mark_down()
call that was removed from handle_command_reply
in the last commit.

Signed-off-by: John Spray <john.spray@redhat.com>
9 years agoMerge tag 'v10.0.0'
Sage Weil [Tue, 24 Nov 2015 13:41:04 +0000 (08:41 -0500)]
Merge tag 'v10.0.0'

v10.0.0

9 years agoMerge pull request #6684 from jcsp/wip-fix-scrub
Yan, Zheng [Tue, 24 Nov 2015 02:56:30 +0000 (10:56 +0800)]
Merge pull request #6684 from jcsp/wip-fix-scrub

mds: fix scrub_path

9 years agojournal: support replay passed skipped splay objects 6687/head
Jason Dillaman [Mon, 23 Nov 2015 22:46:55 +0000 (17:46 -0500)]
journal: support replay passed skipped splay objects

It's possible for a splay object within a set to be skipped
if the set is closed due to a full object within the set.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
9 years agotests: verify that journal player can handle skipped journal objects
Jason Dillaman [Mon, 23 Nov 2015 19:35:43 +0000 (14:35 -0500)]
tests: verify that journal player can handle skipped journal objects

It's possible for a journal object to not exist if another journal object
within the same object set filled up before records were written.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
9 years agoMerge pull request #6605 from yuyuyu101/wip-13797
Gregory Farnum [Mon, 23 Nov 2015 22:33:20 +0000 (17:33 -0500)]
Merge pull request #6605 from yuyuyu101/wip-13797

ceph_test_msgr: Use send_message instead of keepalive to wakeup connection

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
9 years agoMerge pull request #6495 from objoo/master
Loic Dachary [Mon, 23 Nov 2015 22:31:06 +0000 (23:31 +0100)]
Merge pull request #6495 from objoo/master

Mailmap updates for infernalis.

Reviewed-by: Loic Dachary <ldachary@redhat.com>
9 years agomailmap: Jenkins affiliation 6495/head
Yann Dupont [Sun, 8 Nov 2015 17:40:20 +0000 (18:40 +0100)]
mailmap: Jenkins affiliation

Signed-off-by: Yann Dupont <yann@objoo.org>
9 years agomailmap: Burkhard Linke affiliation
Yann Dupont [Sun, 8 Nov 2015 20:39:40 +0000 (21:39 +0100)]
mailmap: Burkhard Linke affiliation

Signed-off-by: Yann Dupont <yann@objoo.org>
9 years agomailmap: Chen Dihao affiliation
Yann Dupont [Sun, 8 Nov 2015 17:11:09 +0000 (18:11 +0100)]
mailmap: Chen Dihao affiliation

Signed-off-by: Yann Dupont <yann@objoo.org>
9 years agomailmap: Wei Qian affiliation
Yann Dupont [Sun, 8 Nov 2015 15:10:36 +0000 (16:10 +0100)]
mailmap: Wei Qian affiliation
Signed-off-by: Yann Dupont <yann@objoo.org>
9 years agoqa: erasure-code-benchmark technique and plugin selection 6685/head
Loic Dachary [Mon, 23 Nov 2015 19:59:28 +0000 (20:59 +0100)]
qa: erasure-code-benchmark technique and plugin selection

Update the PLUGINS variable that was no longer used. Add the TECHNIQUES
variable to control which techniques are compared.

Signed-off-by: Loic Dachary <loic@dachary.org>
9 years agoqa: erasure-code has --erasure-code-dir
Loic Dachary [Mon, 23 Nov 2015 19:21:42 +0000 (20:21 +0100)]
qa: erasure-code has --erasure-code-dir

It is used instead of the obsoleted --parameter directory= to specify
the location of the erasure code directory plugins.

Signed-off-by: Loic Dachary <loic@dachary.org>
9 years agoadd aix compile warning
Rohan Mars [Mon, 23 Nov 2015 18:09:24 +0000 (13:09 -0500)]
add aix compile warning

Signed-off-by: Rohan Mars <code@rohanmars.com>
9 years agoinitialized backtrace variables
Rohan Mars [Mon, 23 Nov 2015 17:47:02 +0000 (12:47 -0500)]
initialized backtrace variables

Signed-off-by: Rohan Mars <code@rohanmars.com>
9 years agomds: fix scrub_path 6684/head
John Spray [Mon, 23 Nov 2015 17:39:14 +0000 (17:39 +0000)]
mds: fix scrub_path

This was tripping up over calling
validate_disk_state with no ScrubHeader.

Signed-off-by: John Spray <john.spray@redhat.com>
9 years agoMerge pull request #6679 from suckowbiz/patch-1
Loic Dachary [Mon, 23 Nov 2015 16:33:52 +0000 (17:33 +0100)]
Merge pull request #6679 from suckowbiz/patch-1

Fixed typos

Reviewed-by: Loic Dachary <ldachary@redhat.com>
9 years agodoc/release-notes: fix typo
Sage Weil [Mon, 23 Nov 2015 16:02:58 +0000 (11:02 -0500)]
doc/release-notes: fix typo

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agodoc/release-notes: final v10.0.0 notes
Sage Weil [Mon, 23 Nov 2015 16:00:29 +0000 (11:00 -0500)]
doc/release-notes: final v10.0.0 notes

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon: do not ignore a failure report cancellation form osd 6278/head
Xinze Chi [Fri, 20 Nov 2015 12:59:35 +0000 (20:59 +0800)]
mon: do not ignore a failure report cancellation form osd

do not ignore a failure report cancellation form osd even if it is
down.

Signed-off-by: Xinze Chi <xinze@xsky.com>
9 years agomon: fix osd failure info in mon
Xinze Chi [Fri, 20 Nov 2015 12:59:16 +0000 (20:59 +0800)]
mon: fix osd failure info in mon

when the network adapter of node A run into error, osd in this node
would tell mon other osd's heartbeat is timeout also. So when rebind
fail after retry 3 times, the osd should cancel in-flight failure report
send to mon before.

Signed-off-by: Xinze Chi <xinze@xsky.com>
9 years agodoc: fix message typos in systemd 6679/head
suckowbiz [Mon, 23 Nov 2015 11:17:45 +0000 (12:17 +0100)]
doc: fix message typos in systemd

Signed-off-by: Tobias Suckow <tobias@suckow.biz>
9 years agoMerge branch 'master' of github.com:ceph/ceph
Sage Weil [Mon, 23 Nov 2015 14:01:30 +0000 (09:01 -0500)]
Merge branch 'master' of github.com:ceph/ceph

9 years agoMerge pull request #6666 from dachary/wip-release-notes
Sage Weil [Mon, 23 Nov 2015 14:01:48 +0000 (09:01 -0500)]
Merge pull request #6666 from dachary/wip-release-notes

release-notes: draft v10.0.0 release notes

9 years agoMerge branch 'wip-bigbang'
Sage Weil [Mon, 23 Nov 2015 13:39:46 +0000 (08:39 -0500)]
Merge branch 'wip-bigbang'

Reviewed-by: Joao Eduardo Luis <joao@suse.de>
Reviewed-by: Greg Farnum <gfarnum@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
9 years agotest/mon/osd-crush.sh: escape ceph tell mon.*
Sage Weil [Fri, 20 Nov 2015 15:17:37 +0000 (10:17 -0500)]
test/mon/osd-crush.sh: escape ceph tell mon.*

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosd: make some of the pg_temp methods/fields private
Sage Weil [Mon, 16 Nov 2015 17:17:48 +0000 (12:17 -0500)]
osd: make some of the pg_temp methods/fields private

Reported-by: Kefu Chai <kchai@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosdc/Objecter: call notify completion only once
Sage Weil [Mon, 16 Nov 2015 16:32:34 +0000 (11:32 -0500)]
osdc/Objecter: call notify completion only once

If we race with a reconnect we could get a second notify message
before the notify linger op is torn down.  Ensure we only ever
call the notify completion once to prevent a segfault.

Fixes: #13805
Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon: change mon_osd_min_down_reporters from 1 -> 2
Sage Weil [Sat, 14 Nov 2015 03:34:12 +0000 (22:34 -0500)]
mon: change mon_osd_min_down_reporters from 1 -> 2

This makes more sense to me.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/OSDMonitor: simplify failure reporters vs reports logic
Sage Weil [Sat, 14 Nov 2015 03:27:14 +0000 (22:27 -0500)]
mon/OSDMonitor: simplify failure reporters vs reports logic

Since each OSD only sends a failure report for a given peer once,
we don't need to count reports vs reporters separately.  (This was
probably a bad idea anyway.)  Remove this logic and the associated
config option.

Reported-by: Greg Farnum <gfarnum@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosd: simplify pg creation
Sage Weil [Sat, 14 Nov 2015 03:11:17 +0000 (22:11 -0500)]
osd: simplify pg creation

We used to have a complicated pg creation process in which we
would query any previous mappings for the pg before we created the
new 'empty' pg locally.  The tracking of the prior mappings was
very simple (and broken), but it didn't really matter because the
mon would resend pg create messages periodically.  Now it doesn't,
so that broke.

However, none of this is necessary: the PG peering process does
all of the same things.  Namely, it

- enumerates past intervals
- determines which ones may have been rw
- queries OSDs from each one to gather any potential changes

This is a more robust version of what the creation code was (or
should have been doing).  So, let's rip it all out and let
peering handle it.  As long as the newly instantiated PG sets
last_epoch_started and _clean to the created epoch we will probe
and consider all of these prior mappings and find any previous
instance of the PG (if one existed).

Yay for removing unnecessary code!

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/MonClient: make _sub_got behave if we "got" old stuff
Sage Weil [Fri, 13 Nov 2015 18:03:16 +0000 (13:03 -0500)]
mon/MonClient: make _sub_got behave if we "got" old stuff

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/OSDMonitor: fix oldest_map in send_incremental
Sage Weil [Wed, 11 Nov 2015 03:19:48 +0000 (22:19 -0500)]
mon/OSDMonitor: fix oldest_map in send_incremental

This should be the oldest map on the sender (like every other
place that generates an MOSDMap message).

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/PGMonitor: avoid useless pg gets when pool is deleted
Sage Weil [Mon, 12 Oct 2015 02:06:33 +0000 (22:06 -0400)]
mon/PGMonitor: avoid useless pg gets when pool is deleted

If the .0 pg no longer exists, we know the entire pool was
deleted, and can avoid querying every other pg.  (This is a good
thing because leveldb and rocksdb can be very slow to query
missing keys.)

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/PGMonitor: revamp how pg creates are tracked
Sage Weil [Thu, 8 Oct 2015 16:13:40 +0000 (12:13 -0400)]
mon/PGMonitor: revamp how pg creates are tracked

Previously we were calculating and managing in-core state that
wasn't committed as part of the pg_map, leading to all sorts of
ugliness that didn't really work.  Instead,

 * set mapping in all creating pgs in the committed pg_map
 * make all pg create message sending be based on committed state
 * update mappings for creating pgs every time we consume a new
   osdmap, so that we have a reliable/stable epoch to attach to
   it.

In particular, having that stable epoch means we have a reference
we can put in the pg create message that will also be used for
the subscription version.  That way OSDs get consistent creates
from any mon.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/PGMonitor: only send pg create messages to up osds
Sage Weil [Thu, 8 Oct 2015 16:12:34 +0000 (12:12 -0400)]
mon/PGMonitor: only send pg create messages to up osds

If the OSD is down it will ignore the message.  If it gets marked up, we
will eventually consume that map and call check_subs().

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/PGMonitor: only churn mapping_epoch if the primary changes
Sage Weil [Wed, 7 Oct 2015 05:07:34 +0000 (01:07 -0400)]
mon/PGMonitor: only churn mapping_epoch if the primary changes

This results is fewer resent pg create messages.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/PGMonitor: a bunch of cosmetic cleanup
Sage Weil [Fri, 9 Oct 2015 21:25:00 +0000 (17:25 -0400)]
mon/PGMonitor: a bunch of cosmetic cleanup

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/PGMonitor: drop old creating_pgs_by_osd
Sage Weil [Wed, 7 Oct 2015 04:39:41 +0000 (00:39 -0400)]
mon/PGMonitor: drop old creating_pgs_by_osd

Obsoleted by creating_pgs_by_osd_epoch.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosd: reduce mon_subscribe messages
Sage Weil [Sat, 14 Nov 2015 17:57:05 +0000 (12:57 -0500)]
osd: reduce mon_subscribe messages

1. MonClient remembers our subscriptions; only indicate we want
osd_pg_creates once, in init.

2. We don't need to re-request the latest osdmap each time we
reconnect.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/MonClient: only send new subscriptions
Sage Weil [Wed, 7 Oct 2015 04:09:18 +0000 (00:09 -0400)]
mon/MonClient: only send new subscriptions

Instead of resending all subscriptions, only send the new ones.  This
avoids races like

 - ask for 4+
 - mon sends maps 4-50
 - ask for 4+ and something else
 - mon has to resend same maps and the other thing

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/PGMonitor: send pg creates via persistent subscriptions, not spam
Sage Weil [Wed, 7 Oct 2015 01:39:33 +0000 (21:39 -0400)]
mon/PGMonitor: send pg creates via persistent subscriptions, not spam

Generate and send pg create messages only for those OSDs who have
subscribed on this monitor.  This is N time more efficient (where there
are N monitors) than the previous method.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/PGMonitor: only map and send pg creates post paxos update
Sage Weil [Wed, 7 Oct 2015 03:57:50 +0000 (23:57 -0400)]
mon/PGMonitor: only map and send pg creates post paxos update

These other call sites are no longer needed.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/PGMonitor: remove map_pg_creates, send_pg_creates commands
Sage Weil [Fri, 9 Oct 2015 21:22:01 +0000 (17:22 -0400)]
mon/PGMonitor: remove map_pg_creates, send_pg_creates commands

These shouldn't be triggered manually.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomessages/MOSDPGCreate: make it more readable
Sage Weil [Wed, 7 Oct 2015 03:58:28 +0000 (23:58 -0400)]
messages/MOSDPGCreate: make it more readable

1- include the epoch
2- drop the 'pg'
3- hide the timestamp

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosd: subscribe to all pg creates, not just once on start
Sage Weil [Wed, 7 Oct 2015 00:48:38 +0000 (20:48 -0400)]
osd: subscribe to all pg creates, not just once on start

We want to know about all future pg creations, not just those pending
when we start.  (This only helps once the mon knows how to do this...)

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/PGMonitor: track creating_pgs_by_osd_epoch
Sage Weil [Wed, 7 Oct 2015 00:37:06 +0000 (20:37 -0400)]
mon/PGMonitor: track creating_pgs_by_osd_epoch

Track pg creations, grouped by the first epoch they mapped to a particular
OSD.  This will be necessary to send messages only for new creations.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/PGMap: assert our pg counts don't go negative
Sage Weil [Thu, 8 Oct 2015 16:15:01 +0000 (12:15 -0400)]
mon/PGMap: assert our pg counts don't go negative

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/OSDMonitor: do not prime pg_temp for creating pgs
Sage Weil [Thu, 8 Oct 2015 16:14:49 +0000 (12:14 -0400)]
mon/OSDMonitor: do not prime pg_temp for creating pgs

It will be less work for the old primary to ignore the create message
and the new one to query it and find nothing that for the slightly more
complicated peering and removal process to happen.  Also, this reduces
bloat in the OSDMap a bit.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/PGMonitor: note mapping_epoch for creating pgs
Sage Weil [Tue, 6 Oct 2015 22:52:22 +0000 (18:52 -0400)]
mon/PGMonitor: note mapping_epoch for creating pgs

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon: let peon mons send the osdmap replies
Sage Weil [Thu, 17 Sep 2015 01:44:04 +0000 (21:44 -0400)]
mon: let peon mons send the osdmap replies

Currently the leader mon often replies to OSDs by sending a set of
incremental OSDmaps (e.g., in response to an osd boot or failure).

Instead, send a small message to the proxying peon mon (if any)
with the epoch to start from and let *them* generate a suitable
reply.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomsg/simple/Pipe: show keepalives at level 2
Sage Weil [Tue, 6 Oct 2015 19:37:31 +0000 (15:37 -0400)]
msg/simple/Pipe: show keepalives at level 2

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon: set mon_subscribe_interval to a day
Sage Weil [Tue, 6 Oct 2015 19:35:58 +0000 (15:35 -0400)]
mon: set mon_subscribe_interval to a day

This is only needed for legacy clients to avoid confusing them--
we don't actually need the renewals at all.  Make them infrequent
to reduce mon load.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon: only ack subscriptions (and renew) if client or mon is old
Sage Weil [Tue, 6 Oct 2015 19:25:02 +0000 (15:25 -0400)]
mon: only ack subscriptions (and renew) if client or mon is old

Old client expect an ack so they can schedule renewal; send it for
them only.

Old mons expect renewals.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon: remove old subscribe renewal-based timeouts
Sage Weil [Tue, 6 Oct 2015 19:19:33 +0000 (15:19 -0400)]
mon: remove old subscribe renewal-based timeouts

This is no longer needed/used.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon: small cleanup in _ms_dispatch
Sage Weil [Tue, 6 Oct 2015 19:18:21 +0000 (15:18 -0400)]
mon: small cleanup in _ms_dispatch

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon: new session_timeout mechanism that is not subscribe-based
Sage Weil [Tue, 6 Oct 2015 19:11:03 +0000 (15:11 -0400)]
mon: new session_timeout mechanism that is not subscribe-based

Simplify the session liveness detection:

 - renew on any message
 - renew on keepalive[2] messages (lightweight ping in msgr)

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomsg: make last_keepalive[_ack] lock safe
Sage Weil [Tue, 6 Oct 2015 19:10:02 +0000 (15:10 -0400)]
msg: make last_keepalive[_ack] lock safe

Signed-off-by: Sage Weil <sage@redhat.com>