]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
9 years agoSubmittingPatches: there is no next; only jewel 6811/head
Nathan Cutler [Sat, 5 Dec 2015 16:18:12 +0000 (17:18 +0100)]
SubmittingPatches: there is no next; only jewel

Signed-off-by: Nathan Cutler <ncutler@suse.com>
9 years agotest: use sequential journal_tid for object cacher test
Josh Durgin [Thu, 26 Nov 2015 04:24:30 +0000 (20:24 -0800)]
test: use sequential journal_tid for object cacher test

This matches the real usage by librbd.

Fixes: #13877
Signed-off-by: Josh Durgin <jdurgin@redhat.com>
(cherry picked from commit 9331e031bd2a719463920581a47c13f0606e9971)

9 years agoMerge pull request #6778 from liewegas/wip-13962
Sage Weil [Fri, 4 Dec 2015 00:38:22 +0000 (19:38 -0500)]
Merge pull request #6778 from liewegas/wip-13962

osd: call on_new_interval on newly split child PG

Reviewed-by: Samuel Just <sjust@redhat.com>
9 years agoMerge pull request #6767 from oritwas/wip-13529-jewel
Yehuda Sadeh [Thu, 3 Dec 2015 17:31:59 +0000 (09:31 -0800)]
Merge pull request #6767 from oritwas/wip-13529-jewel

rgw: use smart pointer for C_Reinitwatch

Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
9 years agoosd: call on_new_interval on newly split child PG 6778/head
Sage Weil [Wed, 2 Dec 2015 19:50:28 +0000 (14:50 -0500)]
osd: call on_new_interval on newly split child PG

We must call on_new_interval() on any interval change *and* on the
creation of the PG.  Currently we call it from PG::init() and
PG::start_peering_interval().  However, PG::split_into() did not
do so for the child PG, which meant that the new child feature
bits were not properly initialized and the bitwise/nibblewise
debug bit was not correctly set.  That, in turn, could lead to
various misbehaviors, the most obvious of which is scrub errors
due to the sort order mismatch.

Fixes: #13962
Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoMerge pull request #6761 from mikulely/jewel
Yehuda Sadeh [Wed, 2 Dec 2015 17:28:45 +0000 (09:28 -0800)]
Merge pull request #6761 from mikulely/jewel

rgw: fix partial read issue in rgw_admin and rgw_tools

Reviewed-by: Yehuda Sadeh <yehuda@redhat.com>
9 years agorgw: use smart pointer for C_Reinitwatch 6767/head
Orit Wasserman [Mon, 9 Nov 2015 12:05:27 +0000 (13:05 +0100)]
rgw: use smart pointer for C_Reinitwatch

Fixes: 13529
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
9 years agorgw: fix partial read mime map issue 6761/head
Jiaying Ren [Wed, 2 Dec 2015 07:52:12 +0000 (15:52 +0800)]
rgw: fix partial read mime map issue

Signed-off-by: Jiaying Ren <mikulely@gmail.com>
9 years agorgw: fix rgw_admin partial read issue
Jiaying Ren [Mon, 30 Nov 2015 03:26:30 +0000 (11:26 +0800)]
rgw: fix rgw_admin partial read issue

Signed-off-by: Jiaying Ren <mikulely@gmail.com>
9 years agoMerge pull request #6691 from SUSE/wip-13858
Ken Dreyer [Tue, 1 Dec 2015 16:27:45 +0000 (09:27 -0700)]
Merge pull request #6691 from SUSE/wip-13858

ceph.spec.in: limit _smp_mflags when lowmem_builder is set in SUSE's OBS

Reviewed-by: Ken Dreyer <kdreyer@redhat.com>
9 years agoMerge pull request #6747 from ukernel/wip-osd-getfilter
John Spray [Tue, 1 Dec 2015 16:04:35 +0000 (16:04 +0000)]
Merge pull request #6747 from ukernel/wip-osd-getfilter

osd: fix ClassHandler::ClassData::get_filter()

Reviewed-by: John Spray <john.spray@redhat.com>
9 years agoosd: fix ClassHandler::ClassData::get_filter() 6746/head 6747/head
Yan, Zheng [Tue, 1 Dec 2015 12:18:44 +0000 (20:18 +0800)]
osd: fix ClassHandler::ClassData::get_filter()

Signed-off-by: Yan, Zheng <zyan@redhat.com>
9 years agoMerge pull request #6738 from ceph/wip-rbd-cli-misc
Jason Dillaman [Mon, 30 Nov 2015 19:43:42 +0000 (14:43 -0500)]
Merge pull request #6738 from ceph/wip-rbd-cli-misc

rbd: fixes for refactored CLI and related tests

Reviewed-by: Jason Dillaman <dillaman@redhat.com>
9 years agorbd: bail if too many arguments provided 6738/head
Ilya Dryomov [Sun, 29 Nov 2015 20:46:41 +0000 (21:46 +0100)]
rbd: bail if too many arguments provided

The code has a catch clause for that, but it was being rendered useless
by the preceding

    if (command_spec.size() > matching_spec->size())
      positional_options.add(at::POSITIONAL_ARGUMENTS.c_str(), -1);

which names all (both expected and extraneous) positional arguments.

Change it to name only expected arguments, deriving the number of
expected arguments from the length of positional_opts vector, supplied
by each action.  This works for all actions except "feature enable" and
"feature disable" which are specified as multitoken, so keep on passing
in -1 for those.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
9 years agorbd: don't append an extra newline after some errors
Ilya Dryomov [Mon, 30 Nov 2015 16:19:12 +0000 (17:19 +0100)]
rbd: don't append an extra newline after some errors

Don't append an extra newline after program_options-generated errors,
like "unrecognised option" or "the argument for option is invalid".

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
9 years agotests: update unmap.t CLI test
Ilya Dryomov [Mon, 30 Nov 2015 15:36:43 +0000 (16:36 +0100)]
tests: update unmap.t CLI test

Fixup the exit code - the old CLI tried to differentiate between CLI
errors and action errors by returning EXIT_FAILURE in the former case.
Also remove a test that relied on a special case check in the old CLI.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
9 years agocmake: librbd needs libjournal and libcls_journal_client
Ilya Dryomov [Mon, 30 Nov 2015 15:29:56 +0000 (16:29 +0100)]
cmake: librbd needs libjournal and libcls_journal_client

Commit 4719696cadd1 ("cmake: updates for refactored librbd IO path")
fixed file lists but missed the link dependency - librbd now needs
libjournal and libcls_journal_client.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
9 years agomon/PGMonitor: MAX AVAIL is 0 if some OSDs' weight is 0
Chengyuan Li [Fri, 20 Nov 2015 05:29:39 +0000 (22:29 -0700)]
mon/PGMonitor: MAX AVAIL is 0 if some OSDs' weight is 0

In get_rule_avail(), even p->second is 0, it's possible to be used
as divisor and quotient is infinity, then is converted to an integer
which is negative value.
So we should check p->second value before calculation.

It fixes BUG #13840.

Signed-off-by: Chengyuan Li <chengyli@ebay.com>
(cherry picked from commit 18713e60edd1fe16ab571f7c83e6de026db483ca)

9 years agoos: FileStore::_destroy_collection may hide the real mistake.
Ruifeng Yang [Tue, 17 Nov 2015 03:18:27 +0000 (11:18 +0800)]
os: FileStore::_destroy_collection may hide the real mistake.

Signed-off-by: Ruifeng Yang <yangruifeng.09209@h3c.com>
(cherry picked from commit 9e9770ca87720781264d2e283739fc9e197706c9)

9 years agoFix mon routed_request_tids leak
Ning Yao [Thu, 8 Oct 2015 08:24:50 +0000 (16:24 +0800)]
Fix mon routed_request_tids leak

Signed-off-by: Ning Yao <zay11022@gmail.com>
(cherry picked from commit ba3c64ca705590f833806300461f3b98de0e62f8)

9 years agopybind: decode empty string in conf_parse_argv() correctly
Josh Durgin [Thu, 26 Nov 2015 05:37:23 +0000 (21:37 -0800)]
pybind: decode empty string in conf_parse_argv() correctly

cretargs is a array of c_char_p, which means ctypes has already
converted it to python byte strings. decode_cstr() would misinterpret
the empty string as a NULL c_char_p(), and convert it to None by
accident, resulting in errors when running commands like
'ceph config-key put foo ""'.

Since this is the only place we use arrays of c_char_p, just decode
it directly in conf_parse_argv(). Tested with python 2 and 3.

Signed-off-by: Josh Durgin <jdurgin@redhat.com>
(cherry picked from commit f76d5d6fe6a9f92b1ec17191f5504e2f61e0ff28)

9 years agoceph_test_keyvaluedb_iterators: Fix broken test
Haomai Wang [Mon, 16 Nov 2015 04:41:50 +0000 (12:41 +0800)]
ceph_test_keyvaluedb_iterators: Fix broken test

Introduced by #6312
Signed-off-by: Haomai Wang <haomai@xsky.com>
(cherry picked from commit ce0369444558b6308e1e0ffa49ae68faaac2bd1d)

9 years agoMerge pull request #6704 from liewegas/wip-up-thru
Sage Weil [Thu, 26 Nov 2015 22:31:16 +0000 (17:31 -0500)]
Merge pull request #6704 from liewegas/wip-up-thru

mon: block 'ceph osd pg-temp ...' if pg_temp update is already pending

Reviewed-by: Joao Eduardo Luis <joao@suse.de>
9 years agomon/OSDMonitor: block 'ceph osd pg-temp ...' if update is pending 6704/head
Sage Weil [Wed, 25 Nov 2015 21:40:13 +0000 (16:40 -0500)]
mon/OSDMonitor: block 'ceph osd pg-temp ...' if update is pending

The OSD expects it's pg_temp update requests to succeed.  If it
races with an ill-timed admin request, it can get stuck in
WaitActingChange indefinitely.

This is only a real problem now that the OSD/mon interaction has
been updated with wip-bigbang; previously we would retry (although
it would take a while).  Backporting is optional.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoceph.spec.in: make --with lowmem_builder limit _smp_mflags 6691/head
Nathan Cutler [Mon, 23 Nov 2015 12:22:28 +0000 (13:22 +0100)]
ceph.spec.in: make --with lowmem_builder limit _smp_mflags

The limit, -j8, may seem arbitrary but works nicely in the openSUSE Build
Service.

http://tracker.ceph.com/issues/13858 Fixes: #13858

Signed-off-by: Nathan Cutler <ncutler@suse.com>
9 years agoMerge tag 'v10.0.0'
Sage Weil [Tue, 24 Nov 2015 13:41:04 +0000 (08:41 -0500)]
Merge tag 'v10.0.0'

v10.0.0

9 years agoMerge pull request #6684 from jcsp/wip-fix-scrub
Yan, Zheng [Tue, 24 Nov 2015 02:56:30 +0000 (10:56 +0800)]
Merge pull request #6684 from jcsp/wip-fix-scrub

mds: fix scrub_path

9 years agoMerge pull request #6605 from yuyuyu101/wip-13797
Gregory Farnum [Mon, 23 Nov 2015 22:33:20 +0000 (17:33 -0500)]
Merge pull request #6605 from yuyuyu101/wip-13797

ceph_test_msgr: Use send_message instead of keepalive to wakeup connection

Reviewed-by: Greg Farnum <gfarnum@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
9 years agoMerge pull request #6495 from objoo/master
Loic Dachary [Mon, 23 Nov 2015 22:31:06 +0000 (23:31 +0100)]
Merge pull request #6495 from objoo/master

Mailmap updates for infernalis.

Reviewed-by: Loic Dachary <ldachary@redhat.com>
9 years agomailmap: Jenkins affiliation 6495/head
Yann Dupont [Sun, 8 Nov 2015 17:40:20 +0000 (18:40 +0100)]
mailmap: Jenkins affiliation

Signed-off-by: Yann Dupont <yann@objoo.org>
9 years agomailmap: Burkhard Linke affiliation
Yann Dupont [Sun, 8 Nov 2015 20:39:40 +0000 (21:39 +0100)]
mailmap: Burkhard Linke affiliation

Signed-off-by: Yann Dupont <yann@objoo.org>
9 years agomailmap: Chen Dihao affiliation
Yann Dupont [Sun, 8 Nov 2015 17:11:09 +0000 (18:11 +0100)]
mailmap: Chen Dihao affiliation

Signed-off-by: Yann Dupont <yann@objoo.org>
9 years agomailmap: Wei Qian affiliation
Yann Dupont [Sun, 8 Nov 2015 15:10:36 +0000 (16:10 +0100)]
mailmap: Wei Qian affiliation
Signed-off-by: Yann Dupont <yann@objoo.org>
9 years agomds: fix scrub_path 6684/head
John Spray [Mon, 23 Nov 2015 17:39:14 +0000 (17:39 +0000)]
mds: fix scrub_path

This was tripping up over calling
validate_disk_state with no ScrubHeader.

Signed-off-by: John Spray <john.spray@redhat.com>
9 years agoMerge pull request #6679 from suckowbiz/patch-1
Loic Dachary [Mon, 23 Nov 2015 16:33:52 +0000 (17:33 +0100)]
Merge pull request #6679 from suckowbiz/patch-1

Fixed typos

Reviewed-by: Loic Dachary <ldachary@redhat.com>
9 years agodoc/release-notes: fix typo
Sage Weil [Mon, 23 Nov 2015 16:02:58 +0000 (11:02 -0500)]
doc/release-notes: fix typo

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agodoc/release-notes: final v10.0.0 notes
Sage Weil [Mon, 23 Nov 2015 16:00:29 +0000 (11:00 -0500)]
doc/release-notes: final v10.0.0 notes

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agodoc: fix message typos in systemd 6679/head
suckowbiz [Mon, 23 Nov 2015 11:17:45 +0000 (12:17 +0100)]
doc: fix message typos in systemd

Signed-off-by: Tobias Suckow <tobias@suckow.biz>
9 years agoMerge branch 'master' of github.com:ceph/ceph
Sage Weil [Mon, 23 Nov 2015 14:01:30 +0000 (09:01 -0500)]
Merge branch 'master' of github.com:ceph/ceph

9 years agoMerge pull request #6666 from dachary/wip-release-notes
Sage Weil [Mon, 23 Nov 2015 14:01:48 +0000 (09:01 -0500)]
Merge pull request #6666 from dachary/wip-release-notes

release-notes: draft v10.0.0 release notes

9 years agoMerge branch 'wip-bigbang'
Sage Weil [Mon, 23 Nov 2015 13:39:46 +0000 (08:39 -0500)]
Merge branch 'wip-bigbang'

Reviewed-by: Joao Eduardo Luis <joao@suse.de>
Reviewed-by: Greg Farnum <gfarnum@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
9 years agotest/mon/osd-crush.sh: escape ceph tell mon.*
Sage Weil [Fri, 20 Nov 2015 15:17:37 +0000 (10:17 -0500)]
test/mon/osd-crush.sh: escape ceph tell mon.*

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosd: make some of the pg_temp methods/fields private
Sage Weil [Mon, 16 Nov 2015 17:17:48 +0000 (12:17 -0500)]
osd: make some of the pg_temp methods/fields private

Reported-by: Kefu Chai <kchai@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosdc/Objecter: call notify completion only once
Sage Weil [Mon, 16 Nov 2015 16:32:34 +0000 (11:32 -0500)]
osdc/Objecter: call notify completion only once

If we race with a reconnect we could get a second notify message
before the notify linger op is torn down.  Ensure we only ever
call the notify completion once to prevent a segfault.

Fixes: #13805
Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon: change mon_osd_min_down_reporters from 1 -> 2
Sage Weil [Sat, 14 Nov 2015 03:34:12 +0000 (22:34 -0500)]
mon: change mon_osd_min_down_reporters from 1 -> 2

This makes more sense to me.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/OSDMonitor: simplify failure reporters vs reports logic
Sage Weil [Sat, 14 Nov 2015 03:27:14 +0000 (22:27 -0500)]
mon/OSDMonitor: simplify failure reporters vs reports logic

Since each OSD only sends a failure report for a given peer once,
we don't need to count reports vs reporters separately.  (This was
probably a bad idea anyway.)  Remove this logic and the associated
config option.

Reported-by: Greg Farnum <gfarnum@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosd: simplify pg creation
Sage Weil [Sat, 14 Nov 2015 03:11:17 +0000 (22:11 -0500)]
osd: simplify pg creation

We used to have a complicated pg creation process in which we
would query any previous mappings for the pg before we created the
new 'empty' pg locally.  The tracking of the prior mappings was
very simple (and broken), but it didn't really matter because the
mon would resend pg create messages periodically.  Now it doesn't,
so that broke.

However, none of this is necessary: the PG peering process does
all of the same things.  Namely, it

- enumerates past intervals
- determines which ones may have been rw
- queries OSDs from each one to gather any potential changes

This is a more robust version of what the creation code was (or
should have been doing).  So, let's rip it all out and let
peering handle it.  As long as the newly instantiated PG sets
last_epoch_started and _clean to the created epoch we will probe
and consider all of these prior mappings and find any previous
instance of the PG (if one existed).

Yay for removing unnecessary code!

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/MonClient: make _sub_got behave if we "got" old stuff
Sage Weil [Fri, 13 Nov 2015 18:03:16 +0000 (13:03 -0500)]
mon/MonClient: make _sub_got behave if we "got" old stuff

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/OSDMonitor: fix oldest_map in send_incremental
Sage Weil [Wed, 11 Nov 2015 03:19:48 +0000 (22:19 -0500)]
mon/OSDMonitor: fix oldest_map in send_incremental

This should be the oldest map on the sender (like every other
place that generates an MOSDMap message).

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/PGMonitor: avoid useless pg gets when pool is deleted
Sage Weil [Mon, 12 Oct 2015 02:06:33 +0000 (22:06 -0400)]
mon/PGMonitor: avoid useless pg gets when pool is deleted

If the .0 pg no longer exists, we know the entire pool was
deleted, and can avoid querying every other pg.  (This is a good
thing because leveldb and rocksdb can be very slow to query
missing keys.)

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/PGMonitor: revamp how pg creates are tracked
Sage Weil [Thu, 8 Oct 2015 16:13:40 +0000 (12:13 -0400)]
mon/PGMonitor: revamp how pg creates are tracked

Previously we were calculating and managing in-core state that
wasn't committed as part of the pg_map, leading to all sorts of
ugliness that didn't really work.  Instead,

 * set mapping in all creating pgs in the committed pg_map
 * make all pg create message sending be based on committed state
 * update mappings for creating pgs every time we consume a new
   osdmap, so that we have a reliable/stable epoch to attach to
   it.

In particular, having that stable epoch means we have a reference
we can put in the pg create message that will also be used for
the subscription version.  That way OSDs get consistent creates
from any mon.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/PGMonitor: only send pg create messages to up osds
Sage Weil [Thu, 8 Oct 2015 16:12:34 +0000 (12:12 -0400)]
mon/PGMonitor: only send pg create messages to up osds

If the OSD is down it will ignore the message.  If it gets marked up, we
will eventually consume that map and call check_subs().

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/PGMonitor: only churn mapping_epoch if the primary changes
Sage Weil [Wed, 7 Oct 2015 05:07:34 +0000 (01:07 -0400)]
mon/PGMonitor: only churn mapping_epoch if the primary changes

This results is fewer resent pg create messages.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/PGMonitor: a bunch of cosmetic cleanup
Sage Weil [Fri, 9 Oct 2015 21:25:00 +0000 (17:25 -0400)]
mon/PGMonitor: a bunch of cosmetic cleanup

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/PGMonitor: drop old creating_pgs_by_osd
Sage Weil [Wed, 7 Oct 2015 04:39:41 +0000 (00:39 -0400)]
mon/PGMonitor: drop old creating_pgs_by_osd

Obsoleted by creating_pgs_by_osd_epoch.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosd: reduce mon_subscribe messages
Sage Weil [Sat, 14 Nov 2015 17:57:05 +0000 (12:57 -0500)]
osd: reduce mon_subscribe messages

1. MonClient remembers our subscriptions; only indicate we want
osd_pg_creates once, in init.

2. We don't need to re-request the latest osdmap each time we
reconnect.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/MonClient: only send new subscriptions
Sage Weil [Wed, 7 Oct 2015 04:09:18 +0000 (00:09 -0400)]
mon/MonClient: only send new subscriptions

Instead of resending all subscriptions, only send the new ones.  This
avoids races like

 - ask for 4+
 - mon sends maps 4-50
 - ask for 4+ and something else
 - mon has to resend same maps and the other thing

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/PGMonitor: send pg creates via persistent subscriptions, not spam
Sage Weil [Wed, 7 Oct 2015 01:39:33 +0000 (21:39 -0400)]
mon/PGMonitor: send pg creates via persistent subscriptions, not spam

Generate and send pg create messages only for those OSDs who have
subscribed on this monitor.  This is N time more efficient (where there
are N monitors) than the previous method.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/PGMonitor: only map and send pg creates post paxos update
Sage Weil [Wed, 7 Oct 2015 03:57:50 +0000 (23:57 -0400)]
mon/PGMonitor: only map and send pg creates post paxos update

These other call sites are no longer needed.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/PGMonitor: remove map_pg_creates, send_pg_creates commands
Sage Weil [Fri, 9 Oct 2015 21:22:01 +0000 (17:22 -0400)]
mon/PGMonitor: remove map_pg_creates, send_pg_creates commands

These shouldn't be triggered manually.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomessages/MOSDPGCreate: make it more readable
Sage Weil [Wed, 7 Oct 2015 03:58:28 +0000 (23:58 -0400)]
messages/MOSDPGCreate: make it more readable

1- include the epoch
2- drop the 'pg'
3- hide the timestamp

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosd: subscribe to all pg creates, not just once on start
Sage Weil [Wed, 7 Oct 2015 00:48:38 +0000 (20:48 -0400)]
osd: subscribe to all pg creates, not just once on start

We want to know about all future pg creations, not just those pending
when we start.  (This only helps once the mon knows how to do this...)

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/PGMonitor: track creating_pgs_by_osd_epoch
Sage Weil [Wed, 7 Oct 2015 00:37:06 +0000 (20:37 -0400)]
mon/PGMonitor: track creating_pgs_by_osd_epoch

Track pg creations, grouped by the first epoch they mapped to a particular
OSD.  This will be necessary to send messages only for new creations.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/PGMap: assert our pg counts don't go negative
Sage Weil [Thu, 8 Oct 2015 16:15:01 +0000 (12:15 -0400)]
mon/PGMap: assert our pg counts don't go negative

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/OSDMonitor: do not prime pg_temp for creating pgs
Sage Weil [Thu, 8 Oct 2015 16:14:49 +0000 (12:14 -0400)]
mon/OSDMonitor: do not prime pg_temp for creating pgs

It will be less work for the old primary to ignore the create message
and the new one to query it and find nothing that for the slightly more
complicated peering and removal process to happen.  Also, this reduces
bloat in the OSDMap a bit.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/PGMonitor: note mapping_epoch for creating pgs
Sage Weil [Tue, 6 Oct 2015 22:52:22 +0000 (18:52 -0400)]
mon/PGMonitor: note mapping_epoch for creating pgs

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon: let peon mons send the osdmap replies
Sage Weil [Thu, 17 Sep 2015 01:44:04 +0000 (21:44 -0400)]
mon: let peon mons send the osdmap replies

Currently the leader mon often replies to OSDs by sending a set of
incremental OSDmaps (e.g., in response to an osd boot or failure).

Instead, send a small message to the proxying peon mon (if any)
with the epoch to start from and let *them* generate a suitable
reply.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomsg/simple/Pipe: show keepalives at level 2
Sage Weil [Tue, 6 Oct 2015 19:37:31 +0000 (15:37 -0400)]
msg/simple/Pipe: show keepalives at level 2

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon: set mon_subscribe_interval to a day
Sage Weil [Tue, 6 Oct 2015 19:35:58 +0000 (15:35 -0400)]
mon: set mon_subscribe_interval to a day

This is only needed for legacy clients to avoid confusing them--
we don't actually need the renewals at all.  Make them infrequent
to reduce mon load.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon: only ack subscriptions (and renew) if client or mon is old
Sage Weil [Tue, 6 Oct 2015 19:25:02 +0000 (15:25 -0400)]
mon: only ack subscriptions (and renew) if client or mon is old

Old client expect an ack so they can schedule renewal; send it for
them only.

Old mons expect renewals.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon: remove old subscribe renewal-based timeouts
Sage Weil [Tue, 6 Oct 2015 19:19:33 +0000 (15:19 -0400)]
mon: remove old subscribe renewal-based timeouts

This is no longer needed/used.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon: small cleanup in _ms_dispatch
Sage Weil [Tue, 6 Oct 2015 19:18:21 +0000 (15:18 -0400)]
mon: small cleanup in _ms_dispatch

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon: new session_timeout mechanism that is not subscribe-based
Sage Weil [Tue, 6 Oct 2015 19:11:03 +0000 (15:11 -0400)]
mon: new session_timeout mechanism that is not subscribe-based

Simplify the session liveness detection:

 - renew on any message
 - renew on keepalive[2] messages (lightweight ping in msgr)

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomsg: make last_keepalive[_ack] lock safe
Sage Weil [Tue, 6 Oct 2015 19:10:02 +0000 (15:10 -0400)]
msg: make last_keepalive[_ack] lock safe

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomsg: track stamp of last keepalive[2] received
Sage Weil [Tue, 6 Oct 2015 19:08:57 +0000 (15:08 -0400)]
msg: track stamp of last keepalive[2] received

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agocommon: mirror leveldb default tuning w/ rocksdb
Sage Weil [Tue, 6 Oct 2015 18:38:47 +0000 (14:38 -0400)]
common: mirror leveldb default tuning w/ rocksdb

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon/MonClient: don't send log if we're reconnecting
Sage Weil [Tue, 6 Oct 2015 18:38:30 +0000 (14:38 -0400)]
mon/MonClient: don't send log if we're reconnecting

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agomon: disabled rocksdb compression when used as the backend
Sage Weil [Fri, 2 Oct 2015 13:15:33 +0000 (09:15 -0400)]
mon: disabled rocksdb compression when used as the backend

This significantly reduced CPU utilization on the bigbang scale
testing cluster at CERN.  Note that it is already disabled for
leveldb by default (in ceph_mon.cc).

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosd: cap adjusted max mon report interval at 2/3 of timeout
Sage Weil [Fri, 2 Oct 2015 13:06:29 +0000 (09:06 -0400)]
osd: cap adjusted max mon report interval at 2/3 of timeout

This ensures that we don't throttle back mon reports so much that
the mon times out out due to no pg stat reports.  Since there is
little value is having a lower max anyway, just set this at an
upper bound (relative to the mon's timeout value).

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosd: protect mon reporting with mon_report_lock
Sage Weil [Wed, 30 Sep 2015 01:03:53 +0000 (21:03 -0400)]
osd: protect mon reporting with mon_report_lock

We need an exclusive lock over paths that update state related to
mon reports, lest they step on fields like up_thru_*, *stats_ack*,
last_mon_report, and so on.  Everybody still needs a read lock
on map_lock too to get a stable OSDMap epoch.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosd: fix reconnect behavior from booting state
Sage Weil [Mon, 23 Nov 2015 13:38:44 +0000 (08:38 -0500)]
osd: fix reconnect behavior from booting state

We don't need to restart the boot process unless we are in preboot;
if we are in booting state we just need to resend the boot
message.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosd: move the monitor report to OSD::tick_without_osd_lock
Guang Yang [Tue, 22 Sep 2015 20:59:28 +0000 (20:59 +0000)]
osd: move the monitor report to OSD::tick_without_osd_lock

Fixes: #12722
Reviewed-by: Guang Yang <yguang@yahoo-inc.com>
9 years agoosd: _got_mon_epochs - refactor the lock scope to avoid a race (which fail make check)
Guang Yang [Tue, 29 Sep 2015 22:26:14 +0000 (22:26 +0000)]
osd: _got_mon_epochs - refactor the lock scope to avoid a race (which fail make check)

Reviewed-by: Guang Yang <yguang@yahoo-inc.com>
9 years agoosd: don't send dup subscribes so much
Sage Weil [Mon, 28 Sep 2015 21:22:01 +0000 (17:22 -0400)]
osd: don't send dup subscribes so much

The subscribe MonClient service is stateful--we don't need to
force a new subscribe send unless sub_want() says we need to.

Keep forcing it for instances where we request an *old* map.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosd: introduce explicit preboot stage
Sage Weil [Wed, 23 Sep 2015 21:58:15 +0000 (17:58 -0400)]
osd: introduce explicit preboot stage

We want to separate the stage where we do a bunch of work
prior to booting (but intend to eventually boot), like when we
get maps and wait to be healthy, from the point after we've sent
the boot message while we are just waiting for a response (so that
we can avoid resending that boot message needlessly).

- start at PREBOOT in start_boot()
- transition to BOOTING in _send_boot()
- only call _preboot() while in PREBOOT state

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosd: skip osdmap version query if we can
Sage Weil [Wed, 23 Sep 2015 21:31:50 +0000 (17:31 -0400)]
osd: skip osdmap version query if we can

If we get OSDmaps from the mon we *also* learn the oldest/newest
map epochs; no need to query again.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosd: make [_]maybe_boot lockless variant
Sage Weil [Wed, 23 Sep 2015 21:33:28 +0000 (17:33 -0400)]
osd: make [_]maybe_boot lockless variant

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosd: only send boot if booting on getversion completion
Sage Weil [Tue, 22 Sep 2015 15:16:15 +0000 (11:16 -0400)]
osd: only send boot if booting on getversion completion

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosd: do not resend pg_temp requests
Sage Weil [Fri, 18 Sep 2015 17:27:49 +0000 (13:27 -0400)]
osd: do not resend pg_temp requests

Send each pg_temp request once (per mon session); no need to
resend everything that is pending every time.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosd: do not send dup failure reports
Sage Weil [Fri, 18 Sep 2015 01:45:16 +0000 (21:45 -0400)]
osd: do not send dup failure reports

If a failure report is already pending, we do not need to resend
it.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosd: resend pending failure reports with a new mon session
Sage Weil [Fri, 18 Sep 2015 01:48:30 +0000 (21:48 -0400)]
osd: resend pending failure reports with a new mon session

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosd: fix send_failures() locking
Sage Weil [Fri, 18 Sep 2015 01:42:53 +0000 (21:42 -0400)]
osd: fix send_failures() locking

It is unsafe to check failure_queue.empty() without the lock.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosd: backoff the max reporting interval, too
Sage Weil [Thu, 17 Sep 2015 21:47:54 +0000 (17:47 -0400)]
osd: backoff the max reporting interval, too

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosd: no need for regular send_pg_temps
Sage Weil [Thu, 17 Sep 2015 21:47:43 +0000 (17:47 -0400)]
osd: no need for regular send_pg_temps

This is done by process_peering_events.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosd: just send alive when it is queue
Sage Weil [Fri, 18 Sep 2015 18:24:27 +0000 (14:24 -0400)]
osd: just send alive when it is queue

No need to futz with last_mon_report or resend it again later.

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoosd: fix pg stat reporting
Sage Weil [Wed, 16 Sep 2015 15:00:57 +0000 (11:00 -0400)]
osd: fix pg stat reporting

Signed-off-by: Sage Weil <sage@redhat.com>
9 years agoMerge pull request #6565 from chenji-kael/patch-1
Loic Dachary [Mon, 23 Nov 2015 09:30:13 +0000 (10:30 +0100)]
Merge pull request #6565 from chenji-kael/patch-1

Update .organizationmap

Reviewed-by: Loic Dachary <ldachary@redhat.com>
9 years agoMerge pull request #6673 from dzafman/wip-rand-scrub-fix
David Zafman [Mon, 23 Nov 2015 01:11:47 +0000 (17:11 -0800)]
Merge pull request #6673 from dzafman/wip-rand-scrub-fix

Reviewed-by: Loic Dachary <ldachary@redhat.com>
9 years agoosd: Only add random deep scrubs when NOT user initiated scrub 6673/head
David Zafman [Sun, 22 Nov 2015 18:14:12 +0000 (10:14 -0800)]
osd: Only add random deep scrubs when NOT user initiated scrub

Signed-off-by: David Zafman <dzafman@redhat.com>
9 years agoRevert "test: osd-scrub-snaps.sh: Randomized deep-scrubs can now happen during a...
David Zafman [Sun, 22 Nov 2015 18:13:18 +0000 (10:13 -0800)]
Revert "test: osd-scrub-snaps.sh: Randomized deep-scrubs can now happen during a scrub"

This reverts commit 0fe26c25c5fefd12628fcdba67be047f640b4afc.