]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
11 years agoOSD: use the osdmap_subscribe helper 1223/head
Greg Farnum [Tue, 11 Feb 2014 20:51:19 +0000 (12:51 -0800)]
OSD: use the osdmap_subscribe helper

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoOSD: create a helper for handling OSDMap subscriptions, and clean them up
Greg Farnum [Tue, 11 Feb 2014 21:34:39 +0000 (13:34 -0800)]
OSD: create a helper for handling OSDMap subscriptions, and clean them up

We've had some trouble with not clearing out subscription requests and
overloading the monitors (though only because of other bugs). Write a
helper for handling subscription requests that we can use to centralize
safety logic. Clear out the subscription whenever we get a map that covers
it; if there are more maps available than we received, we will issue another
subscription request based on "m->newest_map" at the end of handle_osd_map().

Notice that the helper will no longer request old maps which we already have,
and that unless forced it will not dispatch multiple subscribe requests
to a single monitor.
Skipping old maps is safe:
1) we only trim old maps when the monitor tells us to,
2) we do not send messages to our peers until we have updated our maps
from the monitor.
That means only old and broken OSDs will send us messages based on maps
in our past, and we can (and should) ignore any directives from them anyway.

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agomonc: new fsub_want_increment( function to make handling subscriptions easier
Greg Farnum [Tue, 11 Feb 2014 21:31:26 +0000 (13:31 -0800)]
monc: new fsub_want_increment( function to make handling subscriptions easier

Provide a subscription-modifying function which will not decrement
the start version.

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoMerge pull request #1205 from ceph/wip-7334
Josh Durgin [Mon, 10 Feb 2014 21:09:40 +0000 (13:09 -0800)]
Merge pull request #1205 from ceph/wip-7334

use `partx` for CentOS/RHEL instead of `partprobe`
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoMerge pull request #1204 from ceph/wip-fsetpipesz-fix
Josh Durgin [Mon, 10 Feb 2014 20:59:03 +0000 (12:59 -0800)]
Merge pull request #1204 from ceph/wip-fsetpipesz-fix

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoalert the user about error messages from partx 1205/head
Alfredo Deza [Mon, 10 Feb 2014 20:07:55 +0000 (15:07 -0500)]
alert the user about error messages from partx

Signed-off-by: Alfredo Deza <alfredo@deza.pe>
11 years agouse partx for red hat or centos instead of partprobe
Alfredo Deza [Fri, 7 Feb 2014 16:55:01 +0000 (11:55 -0500)]
use partx for red hat or centos instead of partprobe

Signed-off-by: Alfredo Deza <alfredo@deza.pe>
11 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Mon, 10 Feb 2014 18:19:55 +0000 (10:19 -0800)]
Merge remote-tracking branch 'gh/next'

11 years agocommon/buffer: fix build breakage for CEPH_HAVE_SETPIPE_SZ 1204/head
Ilya Dryomov [Mon, 10 Feb 2014 17:34:44 +0000 (19:34 +0200)]
common/buffer: fix build breakage for CEPH_HAVE_SETPIPE_SZ

common/buffer.cc fails to build if CEPH_HAVE_SETPIPE_SZ is defined.
Fix it.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
11 years agoconfigure: fix F_SETPIPE_SZ detection
Ilya Dryomov [Mon, 10 Feb 2014 17:34:44 +0000 (19:34 +0200)]
configure: fix F_SETPIPE_SZ detection

Currently CEPH_HAVE_SETPIPE_SZ is not set even if F_SETPIPE_SZ is
available, because AC_COMPILE_IFELSE test program as written always
fails to compile.  F_SETPIPE_SZ is a macro, so use AC_EGREP_CPP which
works on the preprocessor output instead of trying to compile.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
11 years agoconfigure: don't check for arpa/nameser_compat.h twice
Ilya Dryomov [Mon, 10 Feb 2014 17:34:44 +0000 (19:34 +0200)]
configure: don't check for arpa/nameser_compat.h twice

Nuke redundant check and move the real one into the common
AC_CHECK_HEADERS stanza.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
11 years agoMerge remote-tracking branch 'gh/wip-7329' into next
Sage Weil [Sun, 9 Feb 2014 18:37:47 +0000 (10:37 -0800)]
Merge remote-tracking branch 'gh/wip-7329' into next

11 years agoceph_test_rados_api_tier: try harder to trigger the flush vs try-flush race
Sage Weil [Sun, 9 Feb 2014 04:20:21 +0000 (20:20 -0800)]
ceph_test_rados_api_tier: try harder to trigger the flush vs try-flush race

It seems to be reasonable easy to complete a flush before the next client
request is processed.  Crazy...

Same with the flush vs write race.

Fixes: #7329
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1201 from ceph/wip-7370
Loic Dachary [Sun, 9 Feb 2014 00:12:46 +0000 (01:12 +0100)]
Merge pull request #1201 from ceph/wip-7370

crush: fix tries/retries bug that was recently introduced

Reviewed-by: Loic Dachary <loic@dachary.org>
11 years agoMerge pull request #1115 from jcsp/tell_cleanup
Loic Dachary [Sat, 8 Feb 2014 23:58:39 +0000 (00:58 +0100)]
Merge pull request #1115 from jcsp/tell_cleanup

Remove some almost-duplicate COMMAND definitions

Reviewed-by: Loic Dachary <loic@dachary.org>
11 years agoMerge pull request #1127 from dmsimard/log_links
Loic Dachary [Sat, 8 Feb 2014 23:33:42 +0000 (00:33 +0100)]
Merge pull request #1127 from dmsimard/log_links

Doc: Fix 404 broken links to logging and debug configuration

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Loic Dachary <loic@dachary.org>
11 years agocrush: allow crush rules to set (re)tries counts to 0 1201/head
Sage Weil [Sat, 8 Feb 2014 20:23:05 +0000 (12:23 -0800)]
crush: allow crush rules to set (re)tries counts to 0

These two fields are misnomers; they are *retry* counts.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agocrush: fix off-by-one errors in total_tries refactor
Sage Weil [Tue, 4 Feb 2014 20:14:14 +0000 (12:14 -0800)]
crush: fix off-by-one errors in total_tries refactor

Back in 27f4d1f6bc32c2ed7b2c5080cbd58b14df622607 we refactored the CRUSH
code to allow adjustment of the retry counts on a per-pool basis.  That
commit had an off-by-one bug: the previous "tries" counter was a *retry*
count, not a *try* count, but the new code was passing in 1 meaning
there should be no retries.

Fix the ftotal vs tries comparison to use < instead of <= to fix the
problem.  Note that the original code used <= here, which means the
global "choose_total_tries" tunable is actually counting retries.
Compensate for that by adding 1 in crush_do_rule when we pull the tunable
into the local variable.

This was noticed looking at output from a user provided osdmap.
Unfortunately the map doesn't illustrate the change in mapping behavior
and I haven't managed to construct one yet that does.  Inspection of the
crush debug output now aligns with prior versions, though.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agocrushtool: add cli test for off-by-one tries vs retries bug
Sage Weil [Sat, 8 Feb 2014 20:21:26 +0000 (12:21 -0800)]
crushtool: add cli test for off-by-one tries vs retries bug

See bug #7370.  This passes on dumpling and breaks prior to the #7370 fix.

Backport: emperor, dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Sat, 8 Feb 2014 16:23:12 +0000 (08:23 -0800)]
Merge remote-tracking branch 'gh/next'

11 years agoqa/workunits/rest: use larger max_file_size
Sage Weil [Sat, 8 Feb 2014 16:22:29 +0000 (08:22 -0800)]
qa/workunits/rest: use larger max_file_size

64k is the min.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoscript to test rgw multi part uploads using s3 interface
tamil [Sat, 8 Feb 2014 06:15:11 +0000 (22:15 -0800)]
script to test rgw multi part uploads using s3 interface

Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
(cherry picked from commit 5d59dd9cd67834d991b038323bcbc3e8f8612229)

11 years agoscript to test rgw multi part uploads using s3 interface
tamil [Sat, 8 Feb 2014 06:15:11 +0000 (22:15 -0800)]
script to test rgw multi part uploads using s3 interface

Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
11 years agoMerge branch 'next' of github.com:ceph/ceph into next
tamil [Sat, 8 Feb 2014 01:10:10 +0000 (17:10 -0800)]
Merge branch 'next' of github.com:ceph/ceph into next

11 years agoadded script to test rgw user quota
tamil [Sat, 8 Feb 2014 01:09:30 +0000 (17:09 -0800)]
added script to test rgw user quota

Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
11 years agoscript to test rgw user quota functionality
tamil [Fri, 7 Feb 2014 23:34:05 +0000 (15:34 -0800)]
script to test rgw user quota functionality

Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
11 years agoMerge pull request #1197 from ceph/wip-osdmap-primary
Gregory Farnum [Fri, 7 Feb 2014 18:21:40 +0000 (10:21 -0800)]
Merge pull request #1197 from ceph/wip-osdmap-primary

osd/OSDMap: populate *primary when pool dne

Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoosd/OSDMap: populate *primary when pool dne 1197/head
Sage Weil [Fri, 7 Feb 2014 17:38:37 +0000 (09:38 -0800)]
osd/OSDMap: populate *primary when pool dne

This fixes a valgrind error from OSD::handle_osd_map where primary is not
initialized and is compared after the call to pg_to_acting_osds().

We are still not distinguishing from "no mapping" to "pool doesn't exist,
no mapping".  That is a somewhat larger change, though.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1173 from ceph/wip-7109
Gregory Farnum [Fri, 7 Feb 2014 17:26:38 +0000 (09:26 -0800)]
Merge pull request #1173 from ceph/wip-7109

Fix #7109: Prevent removal of default data pool

Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agorgw: initialize variable before call
Yehuda Sadeh [Wed, 5 Feb 2014 23:19:51 +0000 (15:19 -0800)]
rgw: initialize variable before call

Need to initialize the truncated variable, as we sometimes ignore error
response (e.g., with ENOENT), and in such cases we can't expect it to be
set.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 9ecf3467a3efc071954c4f69a23b51e677742c1f)

11 years agoMerge pull request #1191 from ceph/wip-rgw-vg
Sage Weil [Fri, 7 Feb 2014 16:43:21 +0000 (08:43 -0800)]
Merge pull request #1191 from ceph/wip-rgw-vg

rgw: initialize variable before call

Revewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1194 from dachary/wip-erasure-code-directory
Loic Dachary [Fri, 7 Feb 2014 16:10:29 +0000 (17:10 +0100)]
Merge pull request #1194 from dachary/wip-erasure-code-directory

erasure code directory

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
11 years agoMerge pull request #1178 from ceph/wip-osdmaptool
Sage Weil [Fri, 7 Feb 2014 15:12:11 +0000 (07:12 -0800)]
Merge pull request #1178 from ceph/wip-osdmaptool

osdmaptool: add --test-map-pgs mode

Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Loic Dachary <loic@dachary.org>
11 years agoosdmaptool: tests for --test-map-pgs 1178/head
Loic Dachary [Fri, 7 Feb 2014 11:13:27 +0000 (12:13 +0100)]
osdmaptool: tests for --test-map-pgs

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoosdmaptool: test --import/export-crush
Loic Dachary [Fri, 7 Feb 2014 11:10:47 +0000 (12:10 +0100)]
osdmaptool: test --import/export-crush

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoosdmaptool: s/simple.t/missing-argument.t/
Loic Dachary [Fri, 7 Feb 2014 08:36:07 +0000 (09:36 +0100)]
osdmaptool: s/simple.t/missing-argument.t/

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoosdmaptool: fix cli tests
Sage Weil [Tue, 4 Feb 2014 01:47:18 +0000 (17:47 -0800)]
osdmaptool: fix cli tests

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosdmaptool: allow a completely random placement
Sage Weil [Tue, 4 Feb 2014 00:32:48 +0000 (16:32 -0800)]
osdmaptool: allow a completely random placement

This useful for comparison purposes and sanity-checking the results.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosdmaptool: add --test-map-pgs mode
Sage Weil [Tue, 4 Feb 2014 00:19:07 +0000 (16:19 -0800)]
osdmaptool: add --test-map-pgs mode

This command will map all pgs from all pools (or just one pool) to osds
and summarize the placement and calculate the actual standard deviation and
the expected value.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agorest/test.py: use larger max_file_size for mds set test
Sage Weil [Fri, 7 Feb 2014 14:11:57 +0000 (06:11 -0800)]
rest/test.py: use larger max_file_size for mds set test

Current min is 64k.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoRevert test case of "mon: OSDMonitor: do not allow changing an erasure-coded pool...
David Zafman [Fri, 7 Feb 2014 03:13:20 +0000 (19:13 -0800)]
Revert test case of "mon: OSDMonitor: do not allow changing an erasure-coded pool's size"

This reverts part of commit c8c4cc6e81816069886af6bff968712993554759.

Fixes: #7355
Signed-off-by: David Zafman <david.zafman@inktank.com>
11 years agoerasure-code: move test files to a dedicated directory 1194/head
Loic Dachary [Thu, 6 Feb 2014 10:28:21 +0000 (11:28 +0100)]
erasure-code: move test files to a dedicated directory

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoerasure-code: move source files to a dedicated directory
Loic Dachary [Thu, 6 Feb 2014 10:00:33 +0000 (11:00 +0100)]
erasure-code: move source files to a dedicated directory

The src/erasure-code directory contains the erasure-code plugin system
and the jerasure plugin. It is moved out of OSD because it now belongs
to a convenience library ( LIBERASURE_CODE ) which is used both by OSDs
and MONs.

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoMerge pull request #1188 from dachary/wip-7339
Loic Dachary [Thu, 6 Feb 2014 00:38:42 +0000 (01:38 +0100)]
Merge pull request #1188 from dachary/wip-7339

DNM: define pg_pool_t::stripe_width and set from erasure-code plugin on erasure pool creation

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agorgw: initialize variable before call 1191/head
Yehuda Sadeh [Wed, 5 Feb 2014 23:19:51 +0000 (15:19 -0800)]
rgw: initialize variable before call

Need to initialize the truncated variable, as we sometimes ignore error
response (e.g., with ENOENT), and in such cases we can't expect it to be
set.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
11 years agoMerge pull request #1190 from ceph/wip-snaptest-next
Sage Weil [Wed, 5 Feb 2014 21:04:44 +0000 (13:04 -0800)]
Merge pull request #1190 from ceph/wip-snaptest-next

qa/workunits/snaps: New allow_new_snaps syntax

11 years agoqa/workunits/snaps: New allow_new_snaps syntax 1190/head
John Spray [Wed, 5 Feb 2014 18:44:40 +0000 (18:44 +0000)]
qa/workunits/snaps: New allow_new_snaps syntax

These were probably just obscuring other failures.

Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: John Spray <john.spray@inktank.com>
11 years agomon: test osd pool create pg_pool_t::stripe_width behavior 1188/head
Loic Dachary [Wed, 5 Feb 2014 19:44:32 +0000 (20:44 +0100)]
mon: test osd pool create pg_pool_t::stripe_width behavior

* Check that the default from the configuration options is found in the
  output of osd dump
* Check that specifying an undersized osd_pool_erasure_code_stripe_width
  value is taken into account and padded.

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agomon: osd pool create sets pg_pool_t::stripe_width
Loic Dachary [Wed, 5 Feb 2014 19:40:46 +0000 (20:40 +0100)]
mon: osd pool create sets pg_pool_t::stripe_width

It does nothing if the pool is replicated. Otherwise it uses
osd_pool_erasure_code_stripe_width as the desired stripe width and run
it by get_chunk_size() on the erasure code plugin to get the actual
stripe_width. It will always be >= 0 to the desired stripe_width, padded
to match the alignment constraints imposed by the erasure code plugin.

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agocommon: add osd_pool_erasure_code_stripe_width
Loic Dachary [Wed, 5 Feb 2014 19:34:37 +0000 (20:34 +0100)]
common: add osd_pool_erasure_code_stripe_width

and document it.

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agounittests: update osdmaptools with stripe_width
Loic Dachary [Wed, 5 Feb 2014 19:33:21 +0000 (20:33 +0100)]
unittests: update osdmaptools with stripe_width

stripe_width 0 now shows on every osd dump.

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agomon: add erasure-code pg_pool_t::stripe_width
Loic Dachary [Wed, 5 Feb 2014 19:31:18 +0000 (20:31 +0100)]
mon: add erasure-code pg_pool_t::stripe_width

Contains the actual stripe size used by erasure coded pools. It is
initialized to zero by default and has to be explicitly set by erasure
coded pools. get/set methods are added inline.

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoMerge pull request #1187 from ceph/port/fixes
Sage Weil [Wed, 5 Feb 2014 19:19:54 +0000 (11:19 -0800)]
Merge pull request #1187 from ceph/port/fixes

portability fixes

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoosd: fix type mismatch warning 1187/head
Noah Watkins [Wed, 5 Feb 2014 16:37:24 +0000 (08:37 -0800)]
osd: fix type mismatch warning

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
11 years agoos/kvstore: remove used var
Noah Watkins [Wed, 5 Feb 2014 16:37:08 +0000 (08:37 -0800)]
os/kvstore: remove used var

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
11 years agoos/kvstore: trivial portability fixes
Noah Watkins [Wed, 5 Feb 2014 16:36:56 +0000 (08:36 -0800)]
os/kvstore: trivial portability fixes

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
11 years agocommon: simpler erasure code technique
Loic Dachary [Wed, 5 Feb 2014 16:27:09 +0000 (17:27 +0100)]
common: simpler erasure code technique

Change the default technique from Cauchy to ReedSolomon. Although it is
less efficient the alignment constraints are more intuitive.

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoMerge pull request #1183 from ceph/wip-7336
Sage Weil [Wed, 5 Feb 2014 05:33:54 +0000 (21:33 -0800)]
Merge pull request #1183 from ceph/wip-7336

rgw: fix rgw_read_user_buckets() use of max param

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agorgw: fix rgw_read_user_buckets() use of max param 1183/head
Yehuda Sadeh [Tue, 4 Feb 2014 18:34:02 +0000 (10:34 -0800)]
rgw: fix rgw_read_user_buckets() use of max param

Fixes: #7336
The rgw_read_user_buckets() treated the max param as the max number of
entries to request in a single op, but always fetched the entire list
of buckets. This is wrong, as it should have treated it as the total
number of entries requested. All the callers assume the latter.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
11 years agoMerge pull request #1182 from ceph/wip-mds-cluster
Sage Weil [Tue, 4 Feb 2014 17:18:48 +0000 (09:18 -0800)]
Merge pull request #1182 from ceph/wip-mds-cluster

mds: avoid sending duplicated discovers during recovery

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1177 from dachary/wip-erasure-code-command
Loic Dachary [Tue, 4 Feb 2014 16:34:37 +0000 (08:34 -0800)]
Merge pull request #1177 from dachary/wip-erasure-code-command

erasure code command

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
11 years agomon: MDSMonitor: Forbid removal of first data pool 1173/head
John Spray [Fri, 31 Jan 2014 17:22:13 +0000 (17:22 +0000)]
mon: MDSMonitor: Forbid removal of first data pool

Because inodes from other pools now store their backtrace
on the first/default data pool, it is special and may not
be removed.

Fixes: 7109
Signed-off-by: John Spray <john.spray@inktank.com>
11 years agomon: OSDMonitor: Refuse to delete CephFS pools
John Spray [Fri, 31 Jan 2014 16:25:42 +0000 (16:25 +0000)]
mon: OSDMonitor: Refuse to delete CephFS pools

To avoid confusing CephFS, don't permit deletion
of pools which are in use as the metadata pool
or any of the data pools.

Signed-off-by: John Spray <john.spray@inktank.com>
11 years agoerasure-code: add ceph_erasure_code debug command 1177/head
Loic Dachary [Mon, 3 Feb 2014 13:00:41 +0000 (14:00 +0100)]
erasure-code: add ceph_erasure_code debug command

It loads a designated erasure-code plugin and calls its
methods. It is convenient to figure out and tune the number of data
chunks, the size of an aligned chunk etc. For instance:

ceph_erasure_code \
      --parameter erasure-code-plugin=jerasure \
      --parameter erasure-code-directory=.libs \
      --parameter erasure-code-technique=reed_sol_van \
      --parameter erasure-code-k=2 \
      --parameter erasure-code-m=2 \
      --all

displays the chunk size when encoding an object of 1024 bytes.

get_chunk_size(1024) 512
get_data_chunk_count 2
get_chunk_count 4

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agomds: avoid sending duplicated discovers during recovery 1182/head
Yan, Zheng [Fri, 10 Jan 2014 01:55:50 +0000 (09:55 +0800)]
mds: avoid sending duplicated discovers during recovery

If MDS just entered the rejoin state, it should not kick discovers
because the discovers were just sent. Similarly, if MDS just entered
the clientreplay state, it should not call MDS::handle_mds_recovery()
because MDS::recovery_done() has already recovered the table server.

Also make MDCache::handle_mds_recovery() not wake the discover waiters
up. Because the MDCache::kick_discovers re-sends the discovers, their
replies will wake the discover waiter up.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agoerasure-code: benchmark moves to a dedicated directory
Loic Dachary [Mon, 3 Feb 2014 11:50:20 +0000 (12:50 +0100)]
erasure-code: benchmark moves to a dedicated directory

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoMerge pull request #1180 from dachary/wip-7313
Loic Dachary [Tue, 4 Feb 2014 11:21:46 +0000 (03:21 -0800)]
Merge pull request #1180 from dachary/wip-7313

mon: check cluster features before rule create-erasure

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agomon: check cluster features before rule create-erasure 1180/head
Loic Dachary [Tue, 4 Feb 2014 00:31:52 +0000 (01:31 +0100)]
mon: check cluster features before rule create-erasure

Encapsulate the logic used when creating an erasure coded pool into the
check_cluster_features helper.

check_cluster_features(CEPH_FEATURE_CRUSH_V2) is required for crush rule
create-erasure because it is expected that the erasure code plugin will
use indep instead of firstn and expect the V2 behavior and not the
legacy behavior.

The CEPH_FEATURE_CRUSH_V2 is added to CEPH_FEATURE_OSD_ERASURE_CODES
when an erasure coded pool is created. It is necessary because pool
won't function properly if given an indep ruleset that does not
implement the V2 behavior.

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoMerge pull request #1144 from dachary/wip-7146
Loic Dachary [Tue, 4 Feb 2014 10:38:24 +0000 (02:38 -0800)]
Merge pull request #1144 from dachary/wip-7146

osd crush rule create-erasure

gitbuilder is green/yellow for wip-7146

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
11 years agomon: OSDMonitor: do not allow changing an erasure-coded pool's size 1144/head
Joao Eduardo Luis [Sun, 2 Feb 2014 14:02:17 +0000 (14:02 +0000)]
mon: OSDMonitor: do not allow changing an erasure-coded pool's size

Fixes: 7277
Reviewed-by: Loic Dachary <loic@dachary.org>
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
11 years agomon: osd-pool-create test no longer use hardcoded ruleset
Loic Dachary [Sun, 2 Feb 2014 10:35:13 +0000 (11:35 +0100)]
mon: osd-pool-create test no longer use hardcoded ruleset

For erasure-code the ruleset must be specified instead of relying on an
hardcoded value. Adapt the test to this for tests that do not otherwise
change behavior.

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agomon: osd-pool-create test EAGAIN when pending
Loic Dachary [Mon, 3 Feb 2014 21:59:30 +0000 (22:59 +0100)]
mon: osd-pool-create test EAGAIN when pending

Test that if the ruleset is found in the pending paxos proposal, the
pool creation will try again.

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agomon: test erasure code pool creation
Loic Dachary [Sun, 2 Feb 2014 09:40:37 +0000 (10:40 +0100)]
mon: test erasure code pool creation

* The sequence now is a) create ruleset, b) create pool.
* Check that not specifying the ruleset when an erasure coded pool is
  created fails
* Check that specifying a non existent ruleset when a pool is created
  fails
* Check that osd dump shows the expected ruleset. It assumes ruleset
  numbers are allocated in sequence, i.e. the ruleset will be 1.

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agomon: osd-pool-create test enforce -e
Loic Dachary [Sun, 2 Feb 2014 09:37:18 +0000 (10:37 +0100)]
mon: osd-pool-create test enforce -e

Use

   ! grep foo || exit 1

instead of

   grep foo && exit 1

so that all commands have a successful exit code. Otherwise set -e is
supposed to fail on them.

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agomon: osd-pool-create test must kill -9
Loic Dachary [Sun, 2 Feb 2014 09:34:43 +0000 (10:34 +0100)]
mon: osd-pool-create test must kill -9

If a MON is stuck, kill -15 won't be enough and the test will hang
forever.

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agomon: osd-pool-create test initialization
Loic Dachary [Sun, 2 Feb 2014 09:33:20 +0000 (10:33 +0100)]
mon: osd-pool-create test initialization

* reduce the paxos propose interval speed up tests
* load erasure code plugins from sources

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agomon: osd-pool-create shows logs on error
Loic Dachary [Sun, 2 Feb 2014 09:31:48 +0000 (10:31 +0100)]
mon: osd-pool-create shows logs on error

extracting the logs would otherwise require a modification of the test
file to not clobber the directory containing the logs when it exits

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agovstart: load erasure code plugins from sources
Loic Dachary [Tue, 4 Feb 2014 00:12:31 +0000 (01:12 +0100)]
vstart: load erasure code plugins from sources

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agounittests: reduce paxos propose interval to increase speed
Loic Dachary [Sun, 2 Feb 2014 08:38:23 +0000 (09:38 +0100)]
unittests: reduce paxos propose interval to increase speed

The MONs are stressed more often and there is less aggregation of the
pending requests. But the unit tests are only meant to verify that a
known code path exists and performs as expected, therefore it will not
make a difference. And if it does, it is a bug that needs fixing.

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agopybind: osd crush rule create-erasure tests
Loic Dachary [Wed, 29 Jan 2014 14:01:00 +0000 (15:01 +0100)]
pybind: osd crush rule create-erasure tests

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agopybind: cosmetic changes to tests
Loic Dachary [Wed, 29 Jan 2014 13:59:46 +0000 (14:59 +0100)]
pybind: cosmetic changes to tests

* untabify
* re-indent
* 2014 to copyright notice

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agopybind: fix tests that do not fail as expected
Loic Dachary [Wed, 29 Jan 2014 13:52:22 +0000 (14:52 +0100)]
pybind: fix tests that do not fail as expected

A missing argument make the test fail indeed, but the intended test is
to demonstrate something else ( either character validation or excess of
arguments etc. ). The result is {} instead of None which is what should
have been expected in the first place.

Ideally there would be a more verbose way to check for syntactic errors
to make such mistakes less probable.

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agomon: osd crush rule create-erasure
Loic Dachary [Sun, 26 Jan 2014 17:52:50 +0000 (18:52 +0100)]
mon: osd crush rule create-erasure

Delegates the creation of the rule to the erasure code plugin associated
with the specified pool.

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agomon: compute the ruleset of erasure-coded pools
Loic Dachary [Sun, 2 Feb 2014 09:05:59 +0000 (10:05 +0100)]
mon: compute the ruleset of erasure-coded pools

The default ruleset of an erasure coded pool may depend on the
parameters used to configure it. In the case of a pyramidal /
hierarchical plugin, the desired ruleset will, for instance, chose from
datacenters and then from racks and disperse local coding chunks among
them.

For this reason the default ruleset cannot be hardcoded in config_opts
as it is for replicated pools. Instead, the "crush_ruleset" property is
interpreted to be the name of an existing crush ruleset to be used.

If the corresponding ruleset is found in a pending crushmap, the
prepare_pool_crush_ruleset will return EAGAIN. The "osd pool create"
caller is modified to handle the EAGAIN error and reschedules the message.

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agomon: compute the size of erasure-coded pools
Loic Dachary [Sun, 2 Feb 2014 09:04:48 +0000 (10:04 +0100)]
mon: compute the size of erasure-coded pools

It is K+M ( data chunks + coding chunks ) as returned by the
get_chunk_count() method of the erasure code plugin.

http://tracker.ceph.com/issues/7277 refs #7277

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agomon: erasure code plugin loader helper
Loic Dachary [Sun, 2 Feb 2014 08:59:52 +0000 (09:59 +0100)]
mon: erasure code plugin loader helper

The get_erasure_code helper loads the erasure code plugin found in the
erasure-code-plugin string of the properties argument. It is meant to be
used to query the plugin to determine the desired size of a pool, the
more suitable ruleset to use etc.

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agomon: pool create helper for crush ruleset
Loic Dachary [Sun, 2 Feb 2014 08:56:13 +0000 (09:56 +0100)]
mon: pool create helper for crush ruleset

The crush ruleset of the replicated pools are by default set to
osd_pool_default_crush_replicated_ruleset but it may vary depending on
the pool type. Create a helper to compute the crush ruleset.

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agomon: pool creation helper for size
Loic Dachary [Sun, 2 Feb 2014 08:51:50 +0000 (09:51 +0100)]
mon: pool creation helper for size

The size of the replicated pools are by default set to
osd_pool_default_size but it may vary depending on the pool type. Create
a helper to compute the pool size.

http://tracker.ceph.com/issues/7277 refs #7277

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agomon: no default ruleset except for replicated pools
Loic Dachary [Sat, 1 Feb 2014 09:21:00 +0000 (10:21 +0100)]
mon: no default ruleset except for replicated pools

Remove the hardcoded default ruleset for erasure coded pools and only
keep it for replicated pools. Move the logic up in the  prepare_new_pool
method so that an error code can be returned before allocating the new
pending pool in case the ruleset is not initialized.

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agomon: helper for pool properties parsing
Loic Dachary [Sat, 1 Feb 2014 09:09:12 +0000 (10:09 +0100)]
mon: helper for pool properties parsing

Add the prepare_pool_properties to convert the properties vector into a
properties map suitable for either initializing the pg_pool_t member or
an erasure code plugin.

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoerasure-code: test ErasureCodeJerasure::create_ruleset
Loic Dachary [Wed, 29 Jan 2014 14:10:58 +0000 (15:10 +0100)]
erasure-code: test ErasureCodeJerasure::create_ruleset

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoerasure-code: implement ErasureCodeJerasure::create_ruleset
Loic Dachary [Wed, 29 Jan 2014 14:08:01 +0000 (15:08 +0100)]
erasure-code: implement ErasureCodeJerasure::create_ruleset

It is based on CrushWrapper::add_simple_ruleset, using a "default" root
and "host" failure domain by default. They can be overridden with
erasure-code parameters ( erasure-code-ruleset-root and
erasure-code-ruleset-failure-domain respectively ).

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoerasure-code: implement example create_ruleset
Loic Dachary [Wed, 29 Jan 2014 14:06:41 +0000 (15:06 +0100)]
erasure-code: implement example create_ruleset

And the associated unit tests.

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoerasure-code: add crush ruleset creation API
Loic Dachary [Sun, 26 Jan 2014 17:54:37 +0000 (18:54 +0100)]
erasure-code: add crush ruleset creation API

Because only the erasure code plugin knows enough to create a ruleset
that is best suited for a given set of parameters.

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoerasure-code: the plugin is in a convenience library
Loic Dachary [Sun, 26 Jan 2014 17:51:08 +0000 (18:51 +0100)]
erasure-code: the plugin is in a convenience library

So that it can be used by mon without linking with libosd

Reviewed-By: Christophe Courtaut <christophe.courtaut@gmail.com>
Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Tue, 4 Feb 2014 06:20:47 +0000 (22:20 -0800)]
Merge remote-tracking branch 'gh/next'

11 years agodoc/release-notes: v0.77 draft notes
Sage Weil [Tue, 4 Feb 2014 05:55:45 +0000 (21:55 -0800)]
doc/release-notes: v0.77 draft notes

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agodoc/release-notes: v0.76
Sage Weil [Tue, 4 Feb 2014 05:40:59 +0000 (21:40 -0800)]
doc/release-notes: v0.76

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoclient: fix warnings
Sage Weil [Tue, 4 Feb 2014 05:12:41 +0000 (21:12 -0800)]
client: fix warnings

client/Client.cc: In member function 'int Client::_read(Fh*, int64_t, uint64_t, ceph::bufferlist*)':
warning: client/Client.cc:5893:27: comparison between signed and unsigned integer expressions [-Wsign-compare]
client/Client.cc: In member function 'int Client::_write(Fh*, int64_t, uint64_t, const char*)':
warning: client/Client.cc:6235:30: comparison between signed and unsigned integer expressions [-Wsign-compare]

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoos/KeyValueStore: fix warning
Sage Weil [Tue, 4 Feb 2014 00:54:52 +0000 (16:54 -0800)]
os/KeyValueStore: fix warning

Signed-off-by: Sage Weil <sage@inktank.com>