James McClune [Thu, 29 Nov 2018 03:10:33 +0000 (22:10 -0500)]
doc: broken link on troubleshooting-mon page Fixes: https://tracker.ceph.com/issues/37451
This fix involves adding :ref: labels to the add-or-rm-mons.rst
page. I also added :ref: labels for other headings within
add-or-rm-mons.rst (for future reference).
J. Eric Ivancich [Tue, 20 Nov 2018 18:32:54 +0000 (13:32 -0500)]
rgw: perf -- remove bucket shards asynchronously rather than synchronously
We can now take advantage of the new asynchronous bucket shard removal
code and where we used to remove each shard synchronously now remove
them asynchronously. This would be a huge win when we have tens of
thousands of shards.
J. Eric Ivancich [Tue, 20 Nov 2018 14:52:39 +0000 (09:52 -0500)]
rgw: clean up unused bucket index shards
Clean up old bucket index shards when a resharding is complete. Also,
when a resharding fails, clean up unfinished bucket index shards. Do
both clean-ups asynchronously.
xie xingguo [Wed, 21 Nov 2018 01:36:21 +0000 (09:36 +0800)]
osd/OSDMap: add pg-existence sanity check
The reason why __get_pg_pool_size(pg)__ or __get_pg_pool_crush_rule(pg)__ fails is
that the pg does not exist anymore. So it generally makes sense to check __pg_exists(pg)__
before moving further.
xie xingguo [Wed, 20 Jun 2018 01:04:19 +0000 (09:04 +0800)]
osd/OSDMap.cc: remove pg_upmap/pg_upmap_items too if osd is gone
If an osd is gone or moved out from the specific crush rule,
we should cancel any pg_upmap/pg_upmap_items still bound to
that osd too.
The original code does not work for the above case because
get_parent_of_type() will fail if that osd does not belong
to the crush_rule passed in and hence hits the assert below:
Sort through and batch bucket instances so that multiple calls to reading
current bucket info and locking can be avoided. For the most trivial case when
the bucket is already deleted we exit early with all the stale instances. When
the bucket reshard is in progress we only process the stale entries with status
done, if the bucket is available for locking then we lock down and mark the
other instances as well.
Conflicts:
src/rgw/rgw_bucket.cc
Get rid of the following c++17isms:
- split_tenant auto return type -> trailing return type
- tuple destructuring bind for split tenant with std::tie
rgw: implement listing stale instances from a reshard
Dynamic resharding used to leave behind stale bucket instances; walk through the
metadata pool and identify these instances by comparing the reshard status. If
the reshard status is done, these instances are ok to be cleared. For reshard
status of none we compare against the bucket entry point to ensure that we don't
match the current entry point.
J. Eric Ivancich [Wed, 17 Oct 2018 17:43:24 +0000 (13:43 -0400)]
rgw: recover from incomplete reshard attempt
In case a reshard attempt is left in an incomplete state, i.e., flags
still show resharding even though the bucket reshard lock isn't being
held, try to recover by taking the bucket reshard lock and clearing
flags associated with resharding.
This change requires access to an RGWBucketInfo object. So call stack
into this function should provide that to prevent unnecessary
work. Changes were made to provide this object.
J. Eric Ivancich [Tue, 16 Oct 2018 20:40:03 +0000 (16:40 -0400)]
rgw: move RGWReshardBucket lock to its own separate class
There are other processes beyond resharding that would need to take a
bucket reshard lock (e.g., correcting bucet resharding flags in event
of crash, tools to remove bucket shard information from earlier
versions of ceph). Pulling this logic outside of RGWReshardBucket
allows this code to be re-used.
J. Eric Ivancich [Fri, 12 Oct 2018 22:07:24 +0000 (18:07 -0400)]
rgw: failed resharding clears resharding status from shard heads
Previously, when resharding failed, we restored the shard status on
the bucket info object. However the status on each of the shards was
left indicating a reshard was underway. This prevented some write
operations from taking place, as they would wait for resharding to
complete. This adds the missing functionality. It also makes the
functionality available to other classes via static functions in
RGWBucketReshard.
J. Eric Ivancich [Fri, 12 Oct 2018 14:24:32 +0000 (10:24 -0400)]
rgw: change the bucket reshard lock to exclusive-ephemeral
The bucket reshard lock was simply an exclusive lock that existed on
an object solely for the purpose of representing the lock. This is now
changed to exclusvie-ephemeral lock, so as not to leave these objects
behind.
J. Eric Ivancich [Fri, 12 Oct 2018 14:23:57 +0000 (10:23 -0400)]
cls: add exclusive ephemeral locks that auto-clean
Add a new type of cls lock -- exclusive ephemeral for which the
object only exists to represent the lock and for which the object
should be deleted at unlock. This is to prevent the accumulation of
unneeded objects in the cluster by automatically cleaning them up.
J. Eric Ivancich [Thu, 27 Sep 2018 17:31:57 +0000 (13:31 -0400)]
rgw: renew resharding locks to prevent expiration
Fix lock expiration problem with resharding. The resharding process
will renew its bucket lock (and logshard lock if necessary) when half
the remaining time is left on the lock. If the lock is expired and
cannot renew the process fails and errors out appropriately.
cls: add semantics for cls locks to require renewal without expiring
Add ability to *require* renewal of an existing lock in addition
toexisting ability to *allow* renewal of an existing lock. The key
difference is that a MUST_RENEW will fail if the lock has expired
(where a MAY_RENEW) will succeed. This provides calling code with the
ability to verify that a lock is held continually and that it was
never lost/expired.
Noah Watkins [Mon, 1 Oct 2018 23:54:19 +0000 (16:54 -0700)]
luminous: doc: show edit on github links and version warnings
backport of #24452 that adds edit on
github links to documentation and notification banners that display
warnings when old documentation is being viewed.
this is not a cherry-pick: it removes from the original patch the
dynamic generation of the releases schedule from a yaml database file.
backporting this portion requires modifying the patch to deal with a
different file / directory structure [in luminous] with no real added value.
Adding a basic test in test_multi that creates a new zonegroup and zone and
removes them, period update after zone deletion will fail now lessening the
chance for the period referring to a non existant master_zone. Subsequent
zonegroup deletion will allow things to pass.
rgw: period update: check for dangling master zone references
If we are deleting a master zone of a zonegroup fail on period update, if this
was intentional, either creating / modifying a zone as master or in case of
deletions, deletion of the zonegroup itself will correct the period update to
work correctly. Without the check, while period commit will be successful a
subsequent RGWRados::init_complete() will fail.
rgw: allow init complete to proceed in case of erroneus zone deletes
Currently a master zone delete in a zonegroup followed by a period commit would
render RGWRados to be unusable, check if the zonegroup is empty and continue
initialization in these cases so that removal can proceed.
Matt Benjamin [Wed, 17 Oct 2018 14:43:01 +0000 (10:43 -0400)]
radosgw-admin: translate reshard status codes (trivial)
Fixes: http://tracker.ceph.com/issues/36486 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 731fca4f921e8227e907b204dec9f1016d66b8c3)
Nathan Cutler [Tue, 20 Nov 2018 12:45:11 +0000 (13:45 +0100)]
mgr: Ignore daemon if no metadata was returned
It can happen that the Mgr does not return any metadata for a given
daemon as it might not be available at that moment.
None is returned by the get_metadata() method at that moment and both
the Influx and Telegraf module should then ignore the daemon in their
statistics and continue on to the next daemon.
Sage Weil [Wed, 17 Oct 2018 22:12:34 +0000 (17:12 -0500)]
os/bluestore: handle spurious read errors
Some kernels (4.9+) sometime fail to return data when reading
from a block device under memory pressure. This patch retries
the read if the checksum verification fails, tests show that
the first retried read succeeds in ~99.5% of the cases, so
3 attempts are made by default before giving up on the data.
Works-around: http://tracker.ceph.com/issues/22464 Signed-off-by: Paul Emmerich <paul.emmerich@croit.io>
(cherry picked from commit cffcbc73aaaa874829d5fc9091af3042b887f9a7)
Conflicts:
src/common/legacy_config_opts.h
- adjacent options
src/common/options.cc
- no RUNTIME flag in luminous
src/os/bluestore/BlueStore.cc
src/os/bluestore/BlueStore.h
- adjacent perfcounter
src/test/objectstore/store_test.cc
- adjacent tests, no #ifdef
- g_conf, not g_conf()
- no create_new_collection
- queue_transaction etc take osr, not ch
Jan Fajerski [Fri, 16 Nov 2018 08:22:06 +0000 (09:22 +0100)]
ceph-volume: rename Device property valid to available
This flag is used in the inventory reporting and available is deemed more
appropriate. Furthermore this fixes a bug where rejected_reasons
accumulated duplicate entries.
Fixes: http://tracker.ceph.com/issues/36701 Signed-off-by: Jan Fajerski <jfajerski@suse.com>
(cherry picked from commit 8a80990471108b0920d1d8aa1239733ae2b20e9c)