Sage Weil [Mon, 17 Nov 2014 20:46:51 +0000 (12:46 -0800)]
osd/ReplicatedPG: drop unnecessary cache_mode checks
This currently enumerates all cache modes except none, and we don't
arrive in this function when caching is disabled. And creating a whiteout
is not cache_mode dependent. Simplify!
Sage Weil [Thu, 8 Jan 2015 19:10:45 +0000 (11:10 -0800)]
osd: assert there is a peering event
This became conditional way back in 12e22b3d44eba51a70d8babebc2684f0c46575a7
for unclear reasons. It probably predates the in_use checks. In any case,
at this point, we should only arrive here if the PG was queued, implying
that there will always be an event to process.
Sage Weil [Thu, 8 Jan 2015 21:34:52 +0000 (13:34 -0800)]
osd: requeue PG when we skip handling a peering event
If we don't handle the event, we need to put the PG back into the peering
queue or else the event won't get processed until the next event is
queued, at which point we'll be processing events with a delay.
The queue_null is not necessary (and is a waste of effort) because the
event is still in pg->peering_queue and the PG is queued.
Note that this only triggers when we exceeed osd_map_max_advance, usually
when there is a lot of peering and recovery activity going on. A
workaround is to increase that value, but if you exceed osd_map_cache_size
you expose yourself to crache thrashing by the peering work queue, which
can cause serious problems with heavily degraded clusters and bit lots of
people on dumpling.
Backport: giant, firefly Fixes: #10431 Signed-off-by: Sage Weil <sage@redhat.com>
Matt Richards [Thu, 8 Jan 2015 21:16:17 +0000 (13:16 -0800)]
librados: Translate operation flags from C APIs
The operation flags in the public C API are a distinct enum
and need to be translated to Ceph OSD flags, like as happens in
the C++ API. It seems like the C enum and the C++ enum consciously
use the same values, so I reused the C++ translation function.
Signed-off-by: Matthew Richards <mattjrichards@gmail.com>
John Spray [Wed, 7 Jan 2015 12:37:40 +0000 (12:37 +0000)]
mon/MDSMonitor: fix `mds fail` for standby MDSs
This command takes a gid, rank or name, but
in the name case it would previously only work if
the named daemon had a rank assigned (mds_info->rank >= 0),
otherwise it would fail silently.
John Spray [Wed, 7 Jan 2015 11:47:34 +0000 (11:47 +0000)]
mon/MDSMonitor: respect MDSMAP_DOWN when promoting standbys
Previously, a standby could become active even if 'cluster_down'
had been run. This was awkward, because it would get you a
"laggy or crashed" mds for the standby that was actually
up and running, just being ignored because of cluster_down.
Loic Dachary [Fri, 19 Dec 2014 14:54:33 +0000 (15:54 +0100)]
init-ceph: stop returns before daemons are dead
The existence of the pidfile must be checked outside of the loop to send
a signal to the daemon. Otherwise the daemon will remove the pidfile and
stop can return before the process is dead because it only checks
/proc/$pid if the pidfile exists.
Dong Yuan [Wed, 10 Dec 2014 10:02:51 +0000 (10:02 +0000)]
osd: build fields for Transaction::iterator when tbl is used
When tbl is used (for compatibility), the Transaction::begin method need
to build all fields used by iterator. That includes: coll_index,
object_index, data_bl, op_bl, etc.)
Dong Yuan [Tue, 2 Dec 2014 17:08:44 +0000 (17:08 +0000)]
osd: Transaction::append & Transaction::swap
Finish append and swap for new Transaction encode/decode layout.
Since append will modify the op_bl now, we changed the order of append
and swap in ReplicatedBackend::sub_op_modify and
ReplicatedBackend::submit_transaction to avoid append call on op_t, so
the op_t can be encode in message.
Dong Yuan [Mon, 1 Dec 2014 10:58:56 +0000 (10:58 +0000)]
osd: new Transaction::iterator interface
This patch add new Transaction::iterator interface according to new
encode/decode layout. The new iterator give the whole Op struct in a
single decode_op method.
All ObjectStore Impl (FileStore/MemStore/KeyValueStore) is also changed
to use the new interface.
Dong Yuan [Thu, 27 Nov 2014 14:52:36 +0000 (14:52 +0000)]
osd: add encode/deocde impl for new layout
When use_tbl is true, Transaction::encode will give the same result as
before, while when use_tbl is false, Transaction::encode will use new
field and logic to encode and all related methods such as
get_encoded_bytes, get_data_offset will do the same.
Dong Yuan [Wed, 26 Nov 2014 17:58:50 +0000 (17:58 +0000)]
osd: new format for Transaction encode/decode
This patch add a new fixed size struct Transaction::Op to represent
all actions.
All coll and ghobject used by the transaction are keeped in two maps:
coll: map<coll_t, __le32> coll_index;
object: map<ghobject_t, __le32> object_index;
And the Op struct use the map value(__le32) to refer coll and object,
so each coll and object is only need to encode once in the transaction.
Other variable-size fields(key/value/data) is encoded in bufferlist
data_bl.
Loic Dachary [Fri, 19 Dec 2014 11:26:37 +0000 (12:26 +0100)]
tests: resolve ceph-helpers races
Some tests were racing against the monitor. On a fast machine it worked
but slower machines (or sometime when running in parallel), the monitor
is lagging behind. Use wait_for_clean to make sure the monitor is in the
desired state for the test to succeed.
Ken Dreyer [Tue, 6 Jan 2015 15:16:20 +0000 (08:16 -0700)]
qa: drop tiobench suite
The tiobench software has been abandoned upstream for years. Fedora and
Debian are no longer shipping the tiobench package, so we've had to
carry the package ourselves in the Ceph project, and we're trying to
slim down our dependencies where it makes sense to do so.
Xiaoxi Chen [Fri, 21 Nov 2014 00:34:54 +0000 (08:34 +0800)]
Add MOSDRepOp and MOSDRepOpReply
Add the two new message type and change the corresponding code flow as well.
Basically the idea to have MOSDRepOp is to seperate subop(read/write)
out of other subop(pull/push,etc), so that we can cleanup some unused fields in
the message type, then save some encoding/decoding overhead.
The backward compatibility is also remian, if talking with old version OSD who
doesn't support osd_client_subop/subopreply, will fall back to osd_subop/subopreply.
Sage: rename MOSDClientSubOp -> MOSDRepOp
Signed-off-by: Xiaoxi Chen <xiaoxi.chen@intel.com> Signed-off-by: Sage Weil <sage@redhat.com>
Ken Dreyer [Mon, 5 Jan 2015 19:11:00 +0000 (12:11 -0700)]
configure: show pkg names when libkeyutils is missing
Prior to this commit, when ./configure can't find libkeyutils, it would
bail out with a terse error message.
Some of the other library checks helpfully print the DEB and RPM package
names in parentheses. Add the DEB and RPM package names to the
libkeyutils check.
Reported-by: Pankaj Garg <Pankaj.Garg@caviumnetworks.com> Signed-off-by: Ken Dreyer <kdreyer@redhat.com>