Kefu Chai [Fri, 22 May 2015 07:54:22 +0000 (15:54 +0800)]
tests/test-erasure-code: spin off eio tests into another testsuite
* since the eio tests crashes some of the OSD nodes, before the
change, the tests try to undo the crash before moving on, so it
won't interfere with following tests. a more robust/clean way to
do this is to isolate individual tests in a sandbox, so each eio
test will have its own:
setup + inject + verify crash + teardown
cycle. this change helps to remove the cleanup/undo steps in
invidual test.
* update the disabled tests accordingly.
* use a minimum set of OSDs and R-S(2,1) for the testing to speed
up the test.
* add the new testsuite to check_SCRIPTS
Kefu Chai [Fri, 22 May 2015 07:58:10 +0000 (15:58 +0800)]
tests: fix the get_config()
* the "daemon" parameter was not respected.
* update the test_get_config() to check the overrided option instead of
the default one.
* add set_config()
David Disseldorp [Fri, 22 May 2015 15:22:51 +0000 (17:22 +0200)]
tests: don't choke on deleted losetup paths
If a file has been deleted with a loopback device attached, then the
`losetup --all` output will carry:
/dev/loopX: [0032]:344213 (/.../src/test-ceph-disk/vdf.disk (deleted))
This causes the losetup parsing in reset_leftover_dev() to throw an
error, e.g.:
rreset_leftover_dev: 430: test
'(/home/ddiss/ceph/src/test-ceph-disk/vdf.disk' '(deleted))' =
'(/home/ddiss/ceph/src/test-ceph-disk/vdf.disk)'
test/ceph-disk.sh: line 430: test: too many arguments
Fix this by quoting the path variable for the string comparison.
Loic Dachary [Thu, 21 May 2015 14:45:07 +0000 (16:45 +0200)]
tests: CEPH_CLI_TEST_DUP_COMMAND=1 for qa/workunits/cephtool/test.sh
Run cephtool-test-{mon,osd,mds}.sh with CEPH_CLI_TEST_DUP_COMMAND=1 to
detect idempotency related problems during make check. This is how
ceph-qa-suite/tasks/workunit.py will run
suites/rados/singleton/all/cephtool.yaml and it's easier to fix when
make check fails rather than later on when a fully populated rados suite
has one failed job.
Loic Dachary [Thu, 21 May 2015 14:39:30 +0000 (16:39 +0200)]
tests: ceph create may consume more than one id
When CEPH_CLI_TEST_DUP_COMMAND=1 is set, ceph osd create will consume
two osd id and return the later. Fix the test to account for that and
not assume the osd id being allocated by osd create is always the
next available osd id.
The other osd create tests do not suffer from the same variation because
they provide a UUID argument that guarantees the same osd id is going to
be returned every time.
Ken Dreyer [Thu, 21 May 2015 18:54:30 +0000 (12:54 -0600)]
doc: recommend opening entire 6800-7300 port range
Prior to this commit, the Network Configuration Reference guide and
Troubleshooting guide recommended opening a number of ports that were
unique to the number of daemons that we ran.
This doesn't really cover all use cases. Users can easily restart
daemons in ways that cause the daemons to bind to higher ports. This
leads to OSDs or MDSs binding to ports that are firewalled.
Update the Network Configuration Reference guide and Troubleshooting
guides to simply recommend that users open all the ports between 6800
and 7300 on their OSDs and MDSs.
Ken Dreyer [Thu, 21 May 2015 18:53:43 +0000 (12:53 -0600)]
doc: update OSD port range to 6800-7300
The upper limit for OSD/MDS ports changed from 7100 to 7300 in commit f9ec5a7945518089ffae540649b77ac06f98df5f. Update the Quick Start
Preflight documentation to reflect this change.
Ken Dreyer [Tue, 14 Apr 2015 13:58:17 +0000 (07:58 -0600)]
debian: move ceph_argparse into ceph-common
Prior to this commit, if a user installed the "ceph-common" Debian
package without installing "ceph", then /usr/bin/ceph would crash
because it was missing the ceph_argparse library.
Ship the ceph_argparse library in "ceph-common" instead of "ceph". (This
was the intention of the original commit that moved argparse to "ceph", 2a23eac54957e596d99985bb9e187a668251a9ec)
http://tracker.ceph.com/issues/11388 Refs: #11388
Reported-by: Jens Rosenboom <j.rosenboom@x-ion.de> Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
Kefu Chai [Mon, 9 Mar 2015 08:42:34 +0000 (16:42 +0800)]
osd: randomize scrub times to avoid scrub wave
- to avoid the scrub wave when the osd_scrub_max_interval reaches in a
high-load OSD, the scrub time is randomized.
- extract scrub_load_below_threshold() out of scrub_should_schedule()
- schedule an automatic scrub job at a time which is uniformly distributed
over [now+osd_scrub_min_interval,
now+osd_scrub_min_interval*(1+osd_scrub_time_limit]. before
this change this sort of scrubs will be performed once the hard interval
is end or system load is below the threshold, but with this change, the
jobs will be performed as long as the load is low or the interval of
the scheduled scrubs is longer than conf.osd_scrub_max_interval. all
automatic jobs should be performed in the configured time period, otherwise
they are postponed.
- the requested scrub job will be scheduled right away, before this change
it is queued with the timestamp of `now` and postponed after
osd_scrub_min_interval.
Douglas Fuller [Tue, 19 May 2015 00:37:00 +0000 (17:37 -0700)]
rbd: expunged xfstests generic/078
This tests RENAME_WHITEOUT, which was enabled for xfs in kernel commit 7dcf5c3e4527cfa2807567b00387cf2ed5e07f00. At first execution, it throws a BUG.
Subsequent executions appear to work correctly. This issue manifests for disks
and RBD instances.
Signed-off-by: Douglas Fuller <dfuller@redhat.com>
John Spray [Mon, 18 May 2015 15:15:07 +0000 (16:15 +0100)]
mds: fix handling missing mydir dirfrag
This was broken by 96992466 aka "mds: handle missing mydir dirfrag"
The previous code was mistakenly treating a not-yet-loaded
dirfrag as a non-existent dirfrag, resulting in
inconsistent fragstats even when no objects had
actually been lost.
Fixes: #11641 Signed-off-by: John Spray <john.spray@redhat.com>
to check_SCRIPTS. Their output is captured in .log file when running
with a recent automake. This reduces the output of make check by an
order of magnitude.
Use ceph-helpers.sh instead of mon/mon-test-helpers.sh.
* modifying the .asok and .log names to match the ceph-helpers.sh
conventions
* use explicit ports 7300 and 7301 instead of +1 so that grep
will show that 7301 is used. This reduces the odds of a
port collision when looking for a port that's not already
used by an existing test.
Instead of using mon-test-helpers.sh, primarily because the kill_daemon
function implemented in mon-test-helpers.sh is not as good as
ceph-helpers.sh.
Instead of having tests that share the same monitor, each test now runs
on a fresh monitor. The test writer no longer has to worry that it will
be re-using the pool or profile from a previous test. That causes
problems that are difficult to diagnose and the overhead of running a
new monitor is not so high.
Loic Dachary [Sat, 16 May 2015 13:32:20 +0000 (15:32 +0200)]
tests: ceph-helpers.sh do not hardcode id a in run_mon
Fix hardcoding of id a in the run_mon function. The directory
in which the mon data is stored must be a sub-directory of the
directory given in argument.
If mon_initial_members is set, the rbd pool cannot be redefined, which
is ok because this is rare and it's only an optimization to reduce the
number of PG.
Loic Dachary [Sat, 16 May 2015 09:12:16 +0000 (11:12 +0200)]
tests: ceph-helpers.sh implement wait_for_osd
The wait_for_osd to wait for an osd to go up and down is needed
internally, after running an osd. Move the inline snippet from run_osd
into a function so that it can be used by scripts as well.
init-ceph.in: Create osd data dir before fs_type check.
One host in cluster crashed and rebuilded, but failed to start osds
because the data dir not exist.
Varada Kari [Fri, 15 May 2015 14:16:26 +0000 (19:46 +0530)]
KeyValueStore: Fix the prefix comparion to avoid object leaks.
Iterator becomes invalid due to a partial prefix comparision in
rmkeys_by_prefix, resulting in not deleting the objects from backend.
Modified the comparision to the given prefix.
Signed-off-by: Varada Kari <varada.kari@sandisk.com>
John Spray [Tue, 12 May 2015 16:24:58 +0000 (17:24 +0100)]
mds: validate the state+rank in MDS map
Especially:
* once I have been assigned a rank, it
can't be taken away without restarting
the daemon.
* once I have entered standby, I can
only go upwards through the states.
Fixes: #11481 Signed-off-by: John Spray <john.spray@redhat.com>
Kefu Chai [Thu, 14 May 2015 10:51:22 +0000 (18:51 +0800)]
doc: use @name to define a group, not @group
we are able to output a specified group using the directive
of `doxygengroup` in breathe. this directive prints out the
description of the group. but it's not realistic to enumerate
all groups defined in source code in the rst files. but the
doxygen command @name also helps to group functions together.
the downside of this approach is that we can not add more
items to a group later on. but it should be fine with us,
since in our case, all the grouped items are living in a single
header file.