Yan, Zheng [Sat, 23 May 2015 12:01:46 +0000 (20:01 +0800)]
client: hold reference for returned inode
CLient::make_request() returns pointer of target inode, but it does not
increase reference count of the returned inode. The inode may get freed
when CLient::make_request() releases MetaRequest.
he fix is hold a reference for the returned inode. Since there are many
of places use CLient::make_request() directly or indirectly, it's easy
to leak reference. This patch uses intrusive_ptr to track the reference.
xiexingguo [Tue, 13 Oct 2015 06:04:20 +0000 (14:04 +0800)]
OSD:shall reset primary and up_primary fields when beginning a new past_interval.
Shall reset primary and up_primary fields when we start over a new past_interval in OSD::build_past_intervals_parallel(). Fixes: #13471 Signed-off-by: xie.xingguo@zte.com.cn
(cherry picked from commit 65064ca05bc7f8b6ef424806d1fd14b87add62a4)
Jason Dillaman [Wed, 21 Oct 2015 17:12:48 +0000 (13:12 -0400)]
librbd: potential assertion failure during cache read
It's possible for a cache read from a clone to trigger a writeback if a
previous read op determined the object doesn't exist in the clone,
followed by a cached write to the non-existent clone object, followed
by another read request to the same object. This causes the cache to
flush the pending writeback ops while not holding the owner lock.
Sage Weil [Thu, 1 Oct 2015 18:50:34 +0000 (14:50 -0400)]
osdc/Objecter: distinguish between multiple notify completions
We may send a notify to the cluster multiple times due to OSDMap
changes. In some cases, earlier notify attempts may complete with
an error, while later attempts succeed. We need to only pay
attention to the most-recently send notify's completion.
Do this by making note of the notify_id in the initial ACK (only
present when talking to newer OSDs). When we get a notify
completion, match it against our expected notify_id (if we have
one) or else discard it.
This is important because in some cases an early notify completion
may be an error while a later one succeeds.
Note that if we are talking to an old cluster we will simply not record a
notify_id and our behavior will be the same as before (we will trust any
notify completion we get).
Conflicts:
src/osdc/Objecter.cc
In Objecter::handle_watch_notify, a conflict was there due to a modified comment by commit 47277c51db7bb2725ea117e4e8834869ae93e006, which was not backported
Sage Weil [Thu, 1 Oct 2015 18:50:00 +0000 (14:50 -0400)]
osd: reply to notify request with our unique notify_id
The OSD assigns a unique ID to each notify it queues for
processing. Include this in the reply to the notifier so that
they can match it up with the eventual completions they receive.
This is necessary to distinguish between multiple completions
they may receive if there is PG peering and the notify is resent.
In particular, an earlier notify may return an error when a later
attempt succeeds.
This is forwards and backwards compatible: new clients will make use of
this reply payload but older clients ignore it.
Zhiqiang Wang [Thu, 18 Jun 2015 01:05:28 +0000 (09:05 +0800)]
osd: implement hit_set_remove_all
When hit set is not configured on startup or on a change, remove all
previous hitsets.
Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
(cherry picked from commit be28319bf3dc54b4b6f400d83405a29facfe3bd4)
Conflicts:
src/osd/ReplicatedPG.cc
pass the new argument of `p->using_gmt` to get_hit_set_archive_object()
the gmt_hitset is enabled by default in the ctor of pg_pool_t, this
is intentional. because we want to remove this setting and make
gmt_hitset=true as a default in future. but this forces us to
disable it explicitly when preparing a new pool if any OSD does
not support gmt hitset.
Kefu Chai [Fri, 5 Jun 2015 13:06:48 +0000 (21:06 +0800)]
osd: use GMT time for the object name of hitsets
* bump the encoding version of pg_hit_set_info_t to 2, so we can
tell if the corresponding hit_set is named using localtime or
GMT
* bump the encoding version of pg_pool_t to 20, so we can know
if a pool is using GMT to name the hit_set archive or not. and
we can tell if current cluster allows OSDs not support GMT
mode or not.
* add an option named `osd_pool_use_gmt_hitset`. if enabled,
the cluster will try to use GMT mode when creating a new pool
if all the the up OSDs support GMT mode. if any of the
pools in the cluster is using GMT mode, then only OSDs
supporting GMT mode are allowed to join the cluster.
Conflicts:
src/include/ceph_features.h
src/osd/ReplicatedPG.cc
src/osd/osd_types.cc
src/osd/osd_types.h
fill pg_pool_t with default settings in master branch.
The KeyServer class has a public method get_auth() that returns a boolean
value. This value is being checked here - fix the conditional so it triggers
when get_auth() returns false.
tests: robust test for the pool create crushmap test
The test that goes with f1e86be589803596e86acc964ac5c5c03b4038d8 to
verify that a bugous crush ruleset will prevent the creation of a pool
trying to use it was fragile. I depends on the implementation of the
erasure code lrc plugin and turns out to not work on i386.
The test is modified to use a fake crushtool that always returns false
and validate that it prevents the creation of a pool, which demonstrate
it is used for crushmap validation prior to the pool creation.
Kefu Chai [Mon, 9 Mar 2015 08:42:34 +0000 (16:42 +0800)]
osd: randomize scrub times to avoid scrub wave
- to avoid the scrub wave when the osd_scrub_max_interval reaches in a
high-load OSD, the scrub time is randomized.
- extract scrub_load_below_threshold() out of scrub_should_schedule()
- schedule an automatic scrub job at a time which is uniformly distributed
over [now+osd_scrub_min_interval,
now+osd_scrub_min_interval*(1+osd_scrub_time_limit]. before
this change this sort of scrubs will be performed once the hard interval
is end or system load is below the threshold, but with this change, the
jobs will be performed as long as the load is low or the interval of
the scheduled scrubs is longer than conf.osd_scrub_max_interval. all
automatic jobs should be performed in the configured time period, otherwise
they are postponed.
- the requested scrub job will be scheduled right away, before this change
it is queued with the timestamp of `now` and postponed after
osd_scrub_min_interval.
Conflicts:
src/crush/CrushTester.cc
in hammer the crushtool validation is via a shell
and not via an internal subprocess utility
src/tools/crushtool.cc
ceph_argparse_withint is preferred to ceph_argparse_witharg
Several callers create messengers using exactly the same parameters:
- reading the ms type from cct that is also passed in
- a default entity_name_t::CLIENT
- the default features
Additionally, the nonce should be randomized and not depend on
e.g. pid, as it does in several callers now. Clients running in
containers can easily have pid collisions, leading to hangs, so
randomize the nonce in this simplified constructor rather than
duplicating that logic in every caller.
Daemons have meaningful entity_name_ts, and monitors currently depend
on using 0 as a nonce, so make this simple constructer
client-specific.
make parsing 8601 more flexible by not restricting the length of seconds
to 5, this allows timestamp to be specified both as ms or us. Newer
keystone backends such as fernet token backend default to microseconds
when publishing iso8601 timestamps, so this allows these timestamps to
be allowed when specifying the token expiry time.
Fixes: #12761 Reported-by: Ian Unruh <ianunruh@gmail.com> Signed-off-by: Abhishek Lekshmanan <abhishek.lekshmanan@ril.com>
(cherry picked from commit 136242b5612b8bbf260910b1678389361e86d22a)
radosgw init script is unable to start radosgw daemon.
as it is relies on requiretty being disabled.
once init script start this daemon with sudo it fails
to start the daemon.
changing 'sudo' to 'su', it will fix this issue and
will also help running radosgw daemon with our new
"ceph" UID project.