Loic Dachary [Sun, 8 Dec 2013 13:38:59 +0000 (14:38 +0100)]
crush: fix map->choose_tries boundary test
CrushWrapper::start_choose_profile allocates map->choose_tries with
choose_total_tries elements. When crush_choose_firstn sets a value, it
tests against map->choose_local_tries which could lead to memory
corruption if map->choose_total_tries is smaller than
map->choose_local_tries.
Another indesirable but non fatal side effect is that the output crushtool
--show-choose-tries will be truncated to choose_local_tries which is
set to a lower value than choose_total_tries by the default tuneables.
Noah Watkins [Sat, 7 Dec 2013 17:59:13 +0000 (09:59 -0800)]
librbd: rename howmany to avoid conflict
A howmany macro exists on some platforms in standard headers, but there
really isn't any sort of standard that I've found. We just avoid the
conflict entirely this way.
Checking for fdatasync uses the same approach as the qemu configure
script. The relevant commit is d1722a27f552a22561104210e0afad4577878e53.
Here is a copy of the commit message which explains the check:
Under Darwin, a symbol exists for the fdatasync() function, so that our
link test succeeds. However _POSIX_SYNCHRONIZED_IO is set to '-1'.
According to POSIX:2008, a value of -1 means the feature is not
supported.
A value of 0 means supported at compilation time, and a value greater 0
means supported at both compilation and run time.
Enable fdatasync() only if _POSIX_SYNCHRONIZED_IO is '>0'.
Loic Dachary [Fri, 6 Dec 2013 23:31:54 +0000 (00:31 +0100)]
crush: detach_bucket must test item >= 0 not > 0
Since detach_bucket is a private helper solely used by move_bucket which
contains another ( correct ) safeguard, the code cannot be reached and
the problem can never happen. If another function uses detach_bucket,
it may happen.
// un-set the device name so we can use add_item later
build_rmap(name_map, name_rmap);
name_map.erase(id);
name_rmap.erase(id_name);
when insert_item refused to move a bucket for which a name already
exists. It was changed in 2013 by 4e2557a038dc1e8c68993ad8571d74e2eb8ea90a and now supports it. The
TestCrushWrapper unittest for move_bucket pass.
Sage Weil [Tue, 3 Dec 2013 16:33:55 +0000 (08:33 -0800)]
crush/mapper: apply chooseleaf_tries to firstn mode too
Parameterize the attempts for the _firstn choose method, and apply the
rule-specified tries count to firstn mode as well. Note that we have
slightly different behavior here than with indep:
If the firstn value is not specified for firstn, we pass through the
normal attempt count. This maintains compatibility with legacy behavior.
Note that this is usually *not* actually N^2 work, though, because of the
descend_once tunable. However, descend_once is unfortunately *not* the
same thing as 1 chooseleaf try because it is only checked on a reject but
not on a collision. Sigh.
In contrast, for indep, if tries is not specified we default to 1
recursive attempt, because that is simply more sane, and we have the
option to do so. The descend_once tunable has no effect for indep.
Loic Dachary [Thu, 5 Dec 2013 18:41:50 +0000 (19:41 +0100)]
crush: check for invalid names in loc[]
Add the is_valid_crush_loc helper to test for invalid crush names in
insert_item and update_item, before performing any side
effect. Implement the associated unit tests.
This is (as near to) a trivial ObjectStore backend for the OSD as we can
get at the moment. Everything is stored in memory. We are slightly
tricky with the locking, but not overly so.
On umount we dump everything out to disk, and on mount we load it all in
again, so we have some very coarse persistence/durability... just enough
to make this usable in a non-failure environment.
Merge pull request #900 from ceph/wip-mon-mds-trim
mon: MDSMonitor: trim versions and let PaxosService decide whether to propose
We were not trimming mdsmap versions and were generating a new map every time
we modified the pending value.
Now we not only make sure that MDSMonitor will trim old maps (configurable
option allowing us to set the maximum number of maps to keep, defaulting to 500,
much like other services do) but we also delegate to PaxosService the decision on
whether to propose our pending value.
We also perform several modifications to 'ceph-kvstore-tool', allowing one to obtain
the contents of a given prefix:key and have them outputted to a file instead of stdout,
and also add support for getting the size of a given prefix:key's value.
'ceph report' was also modified so that we always output the first and last
committed versions for all services; up until this point, we would only output the
first committed version on all services, and only a few were also outputting the
last committed version.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
mon: MDSMonitor: implement 'get_trim_to()' to let the mon trim mdsmaps
This commit also adds two options to the MDSMonitor:
- mon_max_mdsmap_epochs: the maximum amount of maps we'll keep (def: 500)
- mon_mds_force_trim: the version we want to trim to
This results in 'get_trim_to()' returning the possible values:
- if we have set mon_mds_force_trim, and this value is greater than the
last committed version, trim to mon_mds_force_trim
- if we hold more than the max number of maps, trim to last - max
- if we have set mon_mds_force_trim and if we hold more than the max
number of maps, and mon_mds_force_trim is lower than last - max,
then trim to last - max
Backport: dumpling
Backport: emperor
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Loic Dachary [Thu, 5 Dec 2013 12:01:00 +0000 (13:01 +0100)]
crush: remove redundant test in insert_item
A year after the last modification of test to check if an item was added
twice to the same bucket, the subtree_contains test was added a few
lines above it, making it redundant.
Loic Dachary [Thu, 5 Dec 2013 08:54:37 +0000 (09:54 +0100)]
crush: insert_item returns on error if bucket name is invalid
A bucket name may be created as a side effect of insert_item. All names
in the loc argument are checked for validity at the beginning of the
method and an error is returned immediately if one is found. This allows
to not check for errors when setting the name of an item later on.