Zhiqiang Wang [Wed, 25 Mar 2015 08:32:44 +0000 (16:32 +0800)]
Objecter: failed assert(tick_event==NULL) at osdc/Objecter.cc
When the Objecter timer erases the tick_event from its events queue and
calls tick() to dispatch it, if the Objecter::rwlock is held by shutdown(),
it waits there to get the rwlock. However, inside the shutdown function,
it checks the tick_event and tries to cancel it. The cancel_event function
returns false since tick_event is already removed from the events queue. Thus
tick_event is not set to NULL in shutdown(). Later the tick function return
ealier and doesn't set tick_event to NULL as well. This leads to the assertion
failure.
This is a regression introduced by an incorrect conflict resolution when d790833 was backported.
Fixes: #11183 Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
Matt Richards [Thu, 8 Jan 2015 21:16:17 +0000 (13:16 -0800)]
librados: Translate operation flags from C APIs
The operation flags in the public C API are a distinct enum
and need to be translated to Ceph OSD flags, like as happens in
the C++ API. It seems like the C enum and the C++ enum consciously
use the same values, so I reused the C++ translation function.
shutdown() resets initialized to 0, but we can still receive messages
after this point, so fix message handlers to skip messages in this
case instead of asserting.
Also read initialized while holding Objecter::rwlock to avoid races
where e.g. handle_osd_map() checks initialized -> 1, continues,
shutdown() is called, sets initialized to 0, then handle_osd_map()
goes about its business and calls op_submit(), which would fail the
assert(initialized.read()) check. Similar races existed in other
message handlers which change Objecter state.
The Objecter is not destroyed until after its Messenger in
the MDS, OSD, and librados, so this should be safe.
Xiaoxi Chen [Wed, 20 Aug 2014 07:35:44 +0000 (15:35 +0800)]
CrushWrapper: pick a ruleset same as rule_id
Originally in the add_simple_ruleset funtion, the ruleset_id
is not reused but rule_id is reused. So after some add/remove
against rules, the newly created rule likely to have
ruleset!=rule_id.
We dont want this happen because we are trying to hold the constraint
that ruleset == rule_id.
Samuel Just [Fri, 20 Feb 2015 21:43:46 +0000 (13:43 -0800)]
DBObjectMap: lock header_lock on sync()
Otherwise, we can race with another thread updating state.seq
resulting in the old, smaller value getting persisted. If there
is a crash at that time, we will reuse a sequence number, resulting
in an inconsistent node tree and bug #9891.
Fixes: 9891
Backport: giant, firefly, dumpling Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit 2b63dd25fc1c73fa42e52e9ea4ab5a45dd9422a0)
Conflicts:
src/os/DBObjectMap.cc
because we have state.v = 1; instead of state.v = 2;
Zhiqiang Wang [Tue, 28 Oct 2014 01:37:11 +0000 (09:37 +0800)]
osd: cache tiering: fix the atime logic of the eviction
Reported-by: Xinze Chi <xmdxcxz@gmail.com> Signed-off-by: Zhiqiang Wang <zhiqiang.wang@intel.com>
(cherry picked from commit 622c5ac41707069ef8db92cb67c9185acf125d40)
Samuel Just [Fri, 14 Nov 2014 23:44:20 +0000 (15:44 -0800)]
PG: always clear_primary_state when leaving Primary
Otherwise, entries from the log collection process might leak into the next
epoch, where we might end up choosing a different authoritative log. In this
case, it resulted in us not rolling back to log entries on one of the replicas
prior to trying to recover from an affected object due to the peer_missing not
being cleared.
Fixes: #10059
Backport: giant, firefly, dumpling Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit c87bde64dfccb5d6ee2877cc74c66fc064b1bcd7)
Greg Farnum [Tue, 2 Dec 2014 23:17:57 +0000 (15:17 -0800)]
SimpleMessenger: allow RESETSESSION whenever we forget an endpoint
In the past (e229f8451d37913225c49481b2ce2896ca6788a2) we decided to disable
reset of lossless Pipes, because lossless peers resetting caused trouble and
they can't forget about each other. But they actually can: if mark_down()
is called.
I can't figure out how else we could forget about a remote endpoint, so I think
it's okay if we tell them we reset in order to clean up state. That's desirable
so that we don't get into strange situations with out-of-whack counters.
Samuel Just [Fri, 6 Feb 2015 17:52:29 +0000 (09:52 -0800)]
FileJournal: fix journalq population in do_read_entry()
Fixes: 6003
Backport: dumpling, firefly, giant Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit bae1f3eaa09c4747b8bfc6fb5dc673aa6989b695)
Conflicts:
src/os/FileJournal.cc
because reinterpret_cast was added near two hunks after firefly
Loic Dachary [Wed, 18 Mar 2015 23:32:39 +0000 (00:32 +0100)]
doc,tests: force checkout of submodules
When updating submodules, always checkout even if the HEAD is the
desired commit hash (update --force) to avoid the following:
* a directory gmock exists in hammer
* a submodule gmock replaces the directory gmock in master
* checkout master + submodule update : gmock/.git is created
* checkout hammer : the gmock directory still contains the .git from
master because it did not exist at the time and checkout won't
remove untracked directories
* checkout master + submodule update : git rev-parse HEAD is
at the desired commit although the content of the gmock directory
is from hammer
Samuel Just [Thu, 11 Dec 2014 21:05:54 +0000 (13:05 -0800)]
ReplicatedPG::scan_range: an object can disappear between the list and the attr get
The first item in the range is often last_backfill, upon which writes
can be occuring. It's trimmed off on the primary side anyway.
Fixes: 10150
Backport: dumpling, firefly, giant Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit dce6f288ad541fe7f0ef8374301cd712dd3bfa39)
mon: Paxos: reset accept timeout before submiting work to the store
Otherwise we may trigger the timeout while waiting for the work to be
committed to the store -- and it would only take the write to take a bit
longer than 10 seconds (default accept timeout).
We do wait for the work to be properly committed to the store before
extending the lease though.
If probability is set to a value greater than 0, just before applying
the transaction, the store will decide whether to inject a delay,
randomly choosing a value between 0 and the max.
Samuel Just [Tue, 10 Feb 2015 01:11:38 +0000 (17:11 -0800)]
WorkQueue: make wait timeout on empty queue configurable
Fixes: 10817
Backport: giant, firefly, dumpling Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit 5aa6f910843e98a05bfcabe6f29d612cf335edbf)
mon: MonCap: take EntityName instead when expanding profiles
entity_name_t is tightly coupled to the messenger, while EntityName is
tied to auth. When expanding profiles we want to tie the profile
expansion to the entity that was authenticated. Otherwise we may incur
in weird behavior such as having caps validation failing because a given
client messenger inst does not match the auth entity it used.
has entity_name_t 'client.12345' and EntityName 'osd.0'. Using
entity_name_t during profile expansion would not allow the client access
to daemon-private/osd.X/foo (client.12345 != osd.X).
Sage Weil [Thu, 5 Feb 2015 11:07:50 +0000 (03:07 -0800)]
mon: ignore osd failures from before up_from
If the failure was generated for an instance of the OSD prior to when
it came up, ignore it.
This probably causes a fair bit of unnecessary flapping in the wild...
Backport: giant, firefly Fixes: #10762 Reported-by: Dan van der Ster <dan@vanderster.com> Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 400ac237d35d0d1d53f240fea87e8483c0e2a7f5)
Josh Durgin [Tue, 10 Feb 2015 04:50:23 +0000 (20:50 -0800)]
rados.py: keep reference to python callbacks
If we don't keep a reference to these, the librados aio calls will
segfault since the python-level callbacks will have been garbage
collected. Passing them to aio_create_completion() does not take a
reference to them. Keep a reference in the python Completion object
associated with the request, since they need the same lifetime.
Billy Olsen [Mon, 2 Feb 2015 23:24:59 +0000 (16:24 -0700)]
Fix memory leak in python rados bindings
A circular reference was inadvertently created when using the
CFUNCTYPE binding for callbacks for the asynchronous i/o callbacks.
This commit refactors the usage of the callbacks such that the
Ioctx object does not have a class reference to the callbacks.
Fixes: #10723
Backport: giant, firefly, dumpling Signed-off-by: Billy Olsen <billy.olsen@gmail.com> Reviewed-by: Dan Mick <dmick@redhat.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com>
(cherry picked from commit 60b019f69aa0e39d276c669698c92fc890599f50)
Sage Weil [Mon, 12 Jan 2015 01:28:04 +0000 (17:28 -0800)]
osd: requeue blocked op before flush it was blocked on
If we have request A (say, cache-flush) that blocks things, and then
request B that gets blocked on it, and we have an interval change, then we
need to requeue B first, then A, so that the resulting queue will keep
A before B and preserve the order.
Yehuda Sadeh [Wed, 7 Jan 2015 21:56:14 +0000 (13:56 -0800)]
rgw: index swift keys appropriately
Fixes: #10471
Backport: firefly, giant
We need to index the swift keys by the full uid:subuser when decoding
the json representation, to keep it in line with how we store it when
creating it through other mechanism.
Loic Dachary [Wed, 17 Dec 2014 15:06:55 +0000 (16:06 +0100)]
crush: set_choose_tries = 100 for erasure code rulesets
It is common for people to try to map 9 OSDs out of a 9 OSDs total ceph
cluster. The default tries (50) will frequently lead to bad mappings for
this use case. Changing it to 100 makes no significant CPU performance
difference, as tested manually by running crushtool on one million
mappings.
Samuel Just [Fri, 5 Dec 2014 23:29:52 +0000 (15:29 -0800)]
osd_types: op_queue_age_hist and fs_perf_stat should be in osd_stat_t::operator==
Fixes: 10259
Backport: giant, firefly, dumpling Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit 1ac17c0a662e6079c2c57edde2b4dc947f547f57)
mon: PGMonitor: available size 0 if no osds on pool's ruleset
get_rule_avail() may return < 0, which we were using blindly assuming it
would always return an unsigned value. We would end up with weird
values if the ruleset had no osds.
Sage Weil [Tue, 2 Dec 2014 02:15:59 +0000 (18:15 -0800)]
osd: tolerate sessionless con in fast dispatch path
We can now get a session cleared from a Connection at any time. Change
the assert to an if in ms_fast_dispatch to cope. It's pretty rare, but it
can happen, especially with delay injection. In particular, a racing
thread can call mark_down() on us.