Sage Weil [Sat, 8 Nov 2014 01:12:54 +0000 (17:12 -0800)]
librados: tell watcher if we cause a notify timeout
If we are a watcher and we fail to notify in a timely manner, or
circumstances otherwise conspire to prevent out ack from arriving in time,
initiate a callback.
Sage Weil [Fri, 17 Oct 2014 03:19:00 +0000 (20:19 -0700)]
osdc/Objecter: send regular PING ops
Send a full PING op to the object to ensure we are still connected. For
now just use the existing ping interval; we may want to change this in
the future.
Sage Weil [Fri, 17 Oct 2014 03:16:07 +0000 (20:16 -0700)]
osdc/Objecter: separate WATCH from RECONNECT
Use WATCH op for the initial registration. This is idempotent in that
it will succeed whether the watch information has been persisted or not.
It is used by the client if it does not know that it is registered.
The RECONNECT op is used for any subsequent session reconnect. It will
fail if the watch state isn't already persisted on the OSD.
Sage Weil [Fri, 17 Oct 2014 03:11:43 +0000 (20:11 -0700)]
librados: add infrastructure to deliver an error notification
Use a reusable context in the WatchNotifyInfo to trigger an error
event, delivered via the existing Finisher thread. Re-lookup the cookie
in the thread to cope with races with unregister (just as we do with
notify events).
Sage Weil [Fri, 17 Oct 2014 02:28:53 +0000 (19:28 -0700)]
osd/ReplicatedPG: handle PING and RECONNECT watch ops
The ping will essentially assert that a watch is still valid.
A reconnect will reestablish session state *only* if the watch is
still persistent. If not, it will fail (and the client will know it
may have missed something).
Note that the only difference here is that a PING is a bit lighter
weight; it will not reestablish the session state (which should already
be established). We could use a single op here but the unique op
code makes the messages easier to understand and simplifies the code
path a bit for PING.
Sage Weil [Fri, 10 Oct 2014 01:14:15 +0000 (18:14 -0700)]
librados: use new watch op codes; simplify Objecter helpers
- drop the useless add_watch() helper; do it explicitly
- drop the unused var arg everywhere
- make a separate notify member of the union that excludes
the other unused fields
Sage Weil [Thu, 21 Aug 2014 21:46:35 +0000 (14:46 -0700)]
osd: implement notify ack payloads
If the notified send back reply payloads, pass them back to the notifier.
Note that we have changed the on-wire behavior of the watch completion
message a bit: instead of sending the original notify payload back to the
notifier, we send the map of notified to replies. Note that only users of
the new API will know what to do with the notify acknowledgement
information. At the same time, we stop sending the original payload.
However, the old API users never saw that data; we were uselessly sending
it over the wire.
Sage Weil [Thu, 21 Aug 2014 21:32:48 +0000 (14:32 -0700)]
librados: define updated watch/notify interface
- new notify callback with the correct values:
- notify_id
- watch handle
- payload
- new notify_ack call
- not implicit when the callback returns (for new api only)
- optional payload
- new watch2 call
- that provides the new callback
- new notify2 call
- with the right arguments, and optional timeout
A couple refactors in here:
- IoCtx notify_ack is now called unlocked (Note: this will soon change
with pending Objecter locking changes)
- Objecter notify_ack takes a buffer
TODO:
- no timeout on the individual watch, yet...
Yan, Zheng [Thu, 4 Dec 2014 04:18:47 +0000 (12:18 +0800)]
osdc/Filer: use finisher to execute C_Probe and C_PurgeRange
Currently contexts C_Probe/C_PurgeRange are executed while holding
OSDSession::completion_lock. C_Probe and C_PurgeRange may call
Objecter::stat() and Objecter::remove() respectively, which acquire
Objecter::rwlock. This can cause deadlock because there is intermediate
dependency between Objecter::rwlock and OSDSession::completion_lock:
Ken Dreyer [Tue, 2 Dec 2014 01:24:22 +0000 (18:24 -0700)]
heap_profiler: support new gperftools header locations
The google/ headers location has been deprecated as of gperftools 2.0.
As of gperftools 2.2rc, the google/ headers will now give deprecation
warnings, and they will probably disappear in a future gperftools
update.
Jianpeng Ma [Wed, 3 Dec 2014 02:26:26 +0000 (10:26 +0800)]
test/perf_counters: Replace perfcounters_dump to perf dump.
The func of command perfcounters_dump and 'perf dump' are the same .
But from the print 'ceph --admin-daemon help', it only print 'perf
dump'. So replace.
In order to keep consistent, still keep perfcounters_dump in code for
old user.
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Sage Weil [Mon, 24 Nov 2014 17:22:30 +0000 (09:22 -0800)]
osd: require SNAPMAPPER feature from peers
This was introduced before cuttlefish. We require users to upgrade first
to a newer release, so there is no need to support a mixed cluster with
such old code.
Ken Dreyer [Tue, 2 Dec 2014 22:52:58 +0000 (15:52 -0700)]
doc: clarify "B" flag in os recommendations page
We don't exactly do continuous builds on all the platforms marked with
"B", but we have published binary RPMs for them. Adjust the "B"
footnote definition to reflect this.
Loic Dachary [Tue, 2 Dec 2014 00:07:34 +0000 (01:07 +0100)]
erasure-code: enforce chunk size alignment
Let say the ErasureCode::encode function is given a 4096 bytes
bufferlist made of a 1249 bytes bufferptr followed by a 2847 bytes
bufferptr, both properly starting on SIMD_ALIGN address. As a result the
second 2048 had to be reallocated when bufferlist::substr_of gets the
second 2048 buffer, the address starts at 799 bytes after the beginning
of the 2847 buffer ptr and is not SIMD_ALIGN'ed.
The ErasureCode::encode must enforce a size alignment based on the chunk
size in addition to the memory alignment required by SIMD operations,
using the bufferlist::rebuild_aligned_size_and_memory function instead of
bufferlist::rebuild_aligned.
Loic Dachary [Tue, 2 Dec 2014 01:04:14 +0000 (02:04 +0100)]
common: allow size alignment that is not a power of two
Do not assume the alignment is a power of two in the is_n_align_sized()
predicate. When used in the context of erasure code it is common
for chunks to not be powers of two.
The function bufferlist::rebuild_aligned checks memory and size
alignment with the same variable. It is however useful to separate
memory alignment constraints from size alignment constraints. For
instance rebuild_aligned could be called to allocate an erasure coded
buffer where each 2048 bytes chunk needs to start on a memory address
aligned on 32 bytes.
Sage Weil [Tue, 2 Dec 2014 02:15:59 +0000 (18:15 -0800)]
osd: tolerate sessionless con in fast dispatch path
We can now get a session cleared from a Connection at any time. Change
the assert to an if in ms_fast_dispatch to cope. It's pretty rare, but it
can happen, especially with delay injection. In particular, a racing
thread can call mark_down() on us.
Fixes: #10209
Backport: giant Signed-off-by: Sage Weil <sage@redhat.com>