Matt Benjamin [Sat, 5 Nov 2016 17:13:47 +0000 (13:13 -0400)]
rgw: add bucket size limit check to radosgw-admin
The change adds a new list of all buckets x all users, with
fields for bucket name, tenant name, current num_objects,
current num_shards, current objects per shard, and the
corresponding fill_status--the latter consisting of 'OK',
'WARN <n>%', or 'OVER <n>%.'
The warning check is relative to two new tunables. The threshold
max objects per shard is set as rgw_bucket_safe_max_objects_per_shard,
which defaults to 100K. The value rgw_bucket_warning_threshold is
a percent of the current safe max at which to warn (defaults to
90% of full).
From review:
* fix indentation (rgw_admin)
* if user a user_id is provided, check only buckets for that user
* update shard warn pct to be pct-of-fill (not 100 - pct-of-fill)
* print only buckets near or over per-shard limit, if --warnings-only
* s/bucket limitcheck/bucket limit check */
* sanity shard limit should be 90, not 10 (because that changed)
* fixes for memleaks and other points found by cbodley
Fixes: http://tracker.ceph.com/issues/17925 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
doc: correct the package name in ceph/admin/build-doc
In document, it is called 'libxml2-devel', one of the dependent
package of CentOS/RHEL7. But the 'build-doc' check 'libxml-devel'.
And it id also called 'libxml2-devel' in software of yum repo .
So correct the checked package name in ceph/admin/build-doc.
doc: update dependent packages links and remove a 'important' tip
Some links in document has been deprecated, such as:
http://rpmfind.net/linux/centos/7.0.1406/, So update it;
The 'fc21, fc24, fc26(newest)' rpm 'ditaa' can be used in
CentOS/RHEL7 to build doc. So remove the '...important' tip.
osd,OSDMonitor: try to protect against ec overwrites with filestore
This isn't perfect, but it's better than nothing. Prevent enabling the
allow_ec_overwrites flag if any of a sample of pgs in the pool map to
osds using filestore. This mainly protects filestore-only clusters
from enabling it.
If a filestore osd is started later, warn in the cluster log when it
gets a pg with ec overwrites enabled.
Josh Durgin [Tue, 14 Feb 2017 08:04:12 +0000 (00:04 -0800)]
osd, OSDMonitor, qa: mark ec overwrites non-experimental
Keep the pool flag around so we can distinguish between a pool that
should maintain hashes for each chunk, and a missing one is a bug, vs
an overwrites pool where we rely on bluestore checksums for detecting
corruption.
Josh Durgin [Tue, 14 Feb 2017 01:42:33 +0000 (17:42 -0800)]
OSDMonitor: get stripe_width via stripe_unit in ec profile
With bluestore, making the smallest write match min_alloc_size avoids
write amplification. With EC pools this is the stripe unit, or
stripe_width / num_data_chunks. Rather than requiring people to divide
by k to get the smallest ec write, allow it to be specified directly
via stripe_unit. Store it in the ec profile so changing a monitor
config option isn't necessary to set it.
This is particularly important for ec overwrites since they allow random i/o
which should match bluestore's checksum granularity (aka min_alloc_size).
Piotr Dałek [Wed, 19 Apr 2017 10:57:38 +0000 (12:57 +0200)]
test/osd/osd-dup.sh: warn on low open file limit
This test fails badly when open file limit is low. Increasing to around
1536 seems to be doing the trick, so warn the user with appropriate
message and try to proceed anyway.
Signed-off-by: Piotr Dałek <piotr.dalek@corp.ovh.com>
test/rgw: update and fixes for test-rgw-multisite.sh
the script was incomplete and unused, but it seems useful in itself
to bring up a simple multisite cluster without having to go through
test_multi.py. it's also a good test for functions in the other
test-rgw-*.sh scripts
two separate PRs had done refactoring around RGWMPObj, and it ended up
in two different places. remove the one in rgw_multi.h, because
rgw_rados.h now depends on its definition
Commit d1f2c557 incorrectly changed the order of variables within
the payload. This resulted in breaking the resize RPC message
with older versions of Ceph.
Fixes: http://tracker.ceph.com/issues/19636 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Previously, errors stuck indelibly to the inode, which
meant that a close call would see an error even if the
user already dutifully fsync()'d and handled it.
We should emit each error only once per file handle.
crush: disable modification API when choose_args is not empty
Adding, removing or move items / buckets via the CrushWrapper API when
choose_args is not empty is unlikely to produce the desired outcome. The
caller should instead add, remove or move items / buckets in a
decompiled crushmap, update the associated choose_arg and upload the new
crushmap.
crush: implement weight and id overrides for straw2
bucket_straw2_choose needs to use weights that may be different from
weight_items. For instance to compensate for an uneven distribution
caused by a low number of values. Or to fix the probability biais
introduced by conditional probabilities (see
http://tracker.ceph.com/issues/15653 for more information).
We introduce a weight_set for each straw2 bucket to set the desired
weight for a given item at a given position. The weight of a given item
when picking the first replica (first position) may be different from
the weight the second replica (second position). For instance the weight
matrix for a given bucket containing items 3, 7 and 13 could be as
follows:
When crush_do_rule picks the first of two replicas (position 0), item 7,
3 are four times more likely to be choosen by bucket_straw2_choose than
item 13. When choosing the second replica (position 1), item 3 is ten
times more likely to be choosen than item 7, 13.
By default the weight_set of each bucket exactly matches the content of
item_weights for each position to ensure backward compatibility.
bucket_straw2_choose compares items by using their id. The same ids are
also used to index buckets and they must be unique. For each item in a
bucket an array of ids can be provided for placement purposes and they
are used instead of the ids. If no replacement ids are provided, the
legacy behavior is preserved.
mds: drop partial entry and adjust write_pos when opening PurgeQueue
At tail journal, there can be partial written entry. Before appending
new entries to the journal, we need to drop any partial written entry
and adjust write_pos. For mds log, partial written entry is detected
and dropped when replaying the journal.
For PurgeQueue journal, we don't replay the whole journal when MDS
starts. Before appending new entry to the journal, we need to drop
any partial written entry and adjust write_pos.
Previous patch makes the journal header write_pos align to boundary
of fully flushed entry. We can start finding partial written entry
from the journal header write_pos. It should be fast even when the
purge queue is very large.