Loic Dachary [Thu, 28 May 2015 08:35:51 +0000 (10:35 +0200)]
debian: ceph-dbg steals ceph-objectstore-tool from ceph-test-dbg
When ceph-objectstore-tool was moved from ceph-test to
ceph by 61cf5da0b51e2d9578c7b4bca85184317e30f4ca, the ceph package in
debian/control was updated accordingly, as recommended by
https://www.debian.org/doc/debian-policy/ch-relationships.html#s-replaces
The same must be done for the ceph-dbg package because
/usr/lib/debug/usr/bin/ceph-objectstore-too is no longer in
ceph-test-dbg.
Although the change was merged may 6th, 2015 8f23382064c189b657564d58c3f9d17720e891ed, teuthology jobs were not
always failing because packages were not systematically upgraded during
the installation. The missing dependencies that were responsible for
this upgrade problem were fixed by f898ec1e4e3472b0202280f09653a769fc62c8d3 on may 18th, 2015 and all
upgrade tests relying on ceph-*-dbg packages started to fail
systematically after this date.
Samuel Just [Wed, 27 May 2015 18:00:54 +0000 (11:00 -0700)]
ReplicatedPG: start_flush: use filtered snapset
Otherwise, we might send our deletes based on deleted snaps. This is
problematic since we may have trimmed the clones to which those snaps
belong, causing us to send them at an earlier snap than we used before.
The specific situation was
78:[78, 70, 63, 5a, 58, 57]:[64(63), 58(58, 57)]
with 58 already clean. To flush 64, we send:
delete@58
delete@59
copyfrom@62
Then, snap 63 is trimmed leaving us with a snapset of:
78:[78, 70, 63, 5a, 58, 57]:[58(58, 57)]
since trim_object doesn't filter the head object snapset snaps. This
isn't really a bug since in general all snapset users must be aware
that there may be trimmed snaps in snapset::snaps. However, here
it becomes a problem when we go to flush head:
delete@58 -- ignored due to snapc
delete@59 -- ignored due to snapc
copyfrom@78 -- not ignored
The base pool head is at snap seq 62, so it clones that value into
clone 78(78, 70) instead of forgetting it. What should have happened
is that we should have based our flushes on filtered snapset:
78:[78, 70, 58, 57]:[58(58, 57)]
Causing us to instead send:
delete@58 -- ignored due to snapc
delete@69 -- not ignored, causes no clone to be made
copyfrom@78 -- not ignored, updates head such that a subsequent clone
will leave 70 out of the clone snaps vector.
Fixes: 11787 Signed-off-by: Samuel Just <sjust@redhat.com>
Ilya Dryomov [Fri, 15 May 2015 18:44:27 +0000 (21:44 +0300)]
doc: fix crush-ruleset-name param description
Specified crush-ruleset-name is required to exist, implicit creation is
going to happen only if crush-ruleset-name wasn't specified on the
command line. While at it, pool-name is very much a required param.
Validate osd_pool_default_crush_{replicated_ruleset,rule} config
options, in particular when creating pools. Otherwise "ceph osd pool
create foo <pg_num>" may end up creating pools with non-existent
rulesets.
Ilya Dryomov [Thu, 21 May 2015 15:52:52 +0000 (18:52 +0300)]
OSDMap: respect default replicated ruleset config opt in build_simple()
Use id provided by osd_pool_default_crush_{replicated_ruleset,rule}
config options when creating a simple replicated ruleset for an initial
osdmap instead of always making it ruleset 0. Not doing so may leave
default created pools (currently "rbd") in a broken state with their
crush_ruleset pointing to a non-existent ruleset.
Ilya Dryomov [Fri, 22 May 2015 12:50:07 +0000 (15:50 +0300)]
tests: a couple tweaks to osd-pool-create.sh
In TEST_default_deprectated_*(), make expected/unexpected vars local
and actually check that rbd, being a default created pool, is set to
use the ruleset specified by conf.
INVALIDRULESET thing in TEST_replicated_pool() is redundant - it is
checked in TEST_replicated_pool_with_ruleset() a bit earlier.
First cut for a `ceph-release-notes` script added which looks at merge
commits and picks out issue numbers. Though this ideally suits for
backport releases workflow where the commit messages always follow a
specific pattern, it is partly useful for preparing release notes for
normal releases as well.
Jason Dillaman [Wed, 8 Apr 2015 23:06:52 +0000 (19:06 -0400)]
librbd: avoid blocking AIO API methods
Enqueue all AIO API methods within the new librbd thread pool to
reduce the possibility of any blocking operations. To maintain
backwards compatibility with the legacy return codes of the API's
AIO methods, it's still possible to block attempting to acquire
the snap_lock.
Fixes: #11056 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Loic Dachary [Tue, 26 May 2015 10:53:46 +0000 (12:53 +0200)]
doc: update the development workflow
* use HOWTO_monitor_the_automated_tests_AKA_nightlies to explain the nightlies
* replace references to Serverity with Backport tracker
* add links to the backporter manual and the release page
* s/0.95/9.0.0/
* unify release names to be lowercase
* replace lifecycle with release cycle and end of life with retirement
* Prefer LTS or Long Term Stable over Long Term Support
Loic Dachary [Sun, 17 May 2015 13:28:52 +0000 (15:28 +0200)]
erasure-code: implement consistent error stream
The error stream in the erasure code path is broken and the error
message is sometime not reported back to the user. For instance the
ErasureCodePlugin::factory method has no error stream: when an error
happens the user is left with a cryptic error code that needs lookup in
the sources to figure it out.
The error stream is made more systematic by:
* always pass it as ostream *ss (instead of something passing it as
a reference and sometime as a stringstream)
* ostream *ss is added to ErasureCodePlugin::factory
* define the ErasureCodeInterface::init pure virtual. It is
already implemented by all plugins, only in slightly different
ways. The ostream *ss is added so the init function has a way to
report error in a human readable way to the caller, in addition to
the error code.
The ErasureCodePluginJerasure::init return value was incorrectly ignored
when called from ErasureCodePluginJerasure::factory and now returns when
it fails.
The ErasureCodeLrc::layers_init method is given ostream *ss for error
messages instead of printing them via derr.
The ErasureCodePluginLrc::factory method no longer prints errors via
derr: this workaround is made unnecessary by the ostream *ss argument.
The ErasureCodeShec::init ostream *ss argument is ignored. The
ErasureCodeShec::parse method entirely relies on derr to report errors
and converting it goes beyond the scope of this cleanup. There is a
slight risk of getting it wrong and it deserves a separate commit and
careful and independent review.
The PGBackend, OSDMonitor.{cc,h} changes are only about prototype
changes.
Loic Dachary [Mon, 25 May 2015 13:44:53 +0000 (15:44 +0200)]
erasure-code: lrc size test depends on layer semantic
When the lrc layers are defined, the semantic of the D,c and _
characters are defined, the rest is undefined. The test that verifies
the guard against layers of different size uses the A character which
is undefined. Depending on the implementation, the size test could fail
because the A character is undefined and a guard to forbid undefined
characters is added. Replace A with D to make sure the undefined
character A will not interfere with the test.
This may seem nitpicking but it actually caused problems after a code
refactor that will appear in a few commits from here.
Loic Dachary [Sat, 16 May 2015 22:46:38 +0000 (00:46 +0200)]
erasure-code: define the ErasureCodeProfile type
Instead of map<string,string>. Make it a non const when initializing
an ErasureCodeInterface instance so that it can be modified.
Rename parameters into profile for consistency with the user
documentation. The parameters name was chosen before the user interface
was defined. This cosmetic update is made in the context of larger
functional changes to improve error reporting and user interface
consistency.
The init() method are made to accept non const parameters. It is
desirable for them to be able to modify the profile so that is
accurately reflects the values that are used. The caller may use this
information for better error reporting.
Kefu Chai [Fri, 22 May 2015 07:54:22 +0000 (15:54 +0800)]
tests/test-erasure-code: spin off eio tests into another testsuite
* since the eio tests crashes some of the OSD nodes, before the
change, the tests try to undo the crash before moving on, so it
won't interfere with following tests. a more robust/clean way to
do this is to isolate individual tests in a sandbox, so each eio
test will have its own:
setup + inject + verify crash + teardown
cycle. this change helps to remove the cleanup/undo steps in
invidual test.
* update the disabled tests accordingly.
* use a minimum set of OSDs and R-S(2,1) for the testing to speed
up the test.
* add the new testsuite to check_SCRIPTS
Kefu Chai [Fri, 22 May 2015 07:58:10 +0000 (15:58 +0800)]
tests: fix the get_config()
* the "daemon" parameter was not respected.
* update the test_get_config() to check the overrided option instead of
the default one.
* add set_config()
Casey Bodley [Fri, 22 May 2015 14:38:29 +0000 (10:38 -0400)]
cmake: skip man/CMakeLists.txt
man pages have to be preprocessed now, and can't be installed directly.
skip installing them until we add the cmake-fu to copy what man/Makefile.am
is doing
David Disseldorp [Fri, 22 May 2015 15:22:51 +0000 (17:22 +0200)]
tests: don't choke on deleted losetup paths
If a file has been deleted with a loopback device attached, then the
`losetup --all` output will carry:
/dev/loopX: [0032]:344213 (/.../src/test-ceph-disk/vdf.disk (deleted))
This causes the losetup parsing in reset_leftover_dev() to throw an
error, e.g.:
rreset_leftover_dev: 430: test
'(/home/ddiss/ceph/src/test-ceph-disk/vdf.disk' '(deleted))' =
'(/home/ddiss/ceph/src/test-ceph-disk/vdf.disk)'
test/ceph-disk.sh: line 430: test: too many arguments
Fix this by quoting the path variable for the string comparison.