Loic Dachary [Sun, 16 Mar 2014 11:14:30 +0000 (12:14 +0100)]
erasure-code: remove dependency to the global context
Instead of relying on derr to display error messages, add them to an
ostream parameter given in argument to load() and factory(). The erasure
code convenience library no longer depends on the global context that is
indirectly referenced by debug.h
Loic Dachary [Sun, 16 Mar 2014 11:09:51 +0000 (12:09 +0100)]
common,erasure-code,mon: s/erasure-code-//
The parameters to erasure code do not need to be prefixed with the
erasure-code- string. There only are erasure-code parameters and the
prefix was originaly intended to desambiguate the erasure-code
properties, assuming that the properties map could be used for other
purposes.
Loic Dachary [Mon, 3 Mar 2014 14:40:13 +0000 (15:40 +0100)]
mon: tests for pool create erasure implicit ruleset creation
* Remove the tests checking that a missing or wrong crush_ruleset
parameters triggered an error.
* Add a test checking that a ruleset with the same name as the pool is
created implicitly when no crush_ruleset is specified.
Loic Dachary [Mon, 3 Mar 2014 14:25:21 +0000 (15:25 +0100)]
mon: pool create erasure implicit ruleset creation
If the crush_ruleset parameter is missing, set it to the pool name.
If the crush_ruleset parameter is set to a name that does not match any
of the existing rulesets, create one using the pool creation parameters.
If the ruleset exists and is in the pending map or if the ruleset was
just created (meaning it exists in the pending map), the
prepare_pool_crush_ruleset method returns EAGAIN so that the pool
creation message is retried after the pending map is proposed.
If the ruleset exists, it is used to initialize the newly created pool,
as before.
Create the ruleset and branch depending on the result:
* If it succeeds, wait
* If it already exists and is pending (-EALREADY), wait
* If it already exists (-EEXIST), return immediately
* If it fails for other reasons, return immediately
Loic Dachary [Mon, 3 Mar 2014 13:36:50 +0000 (14:36 +0100)]
mon: create crush_ruleset_create_erasure helper
Move the code bloc verbatim, from "osd crush rule create-erasure" to the
new crush_ruleset_create_erasure() method helper. This step helps
separate the code changes from the code moving around unmodified.
Sage Weil [Sat, 22 Feb 2014 17:35:27 +0000 (09:35 -0800)]
mon/PGMap: only recalculate min_last_epoch_clean if incremental touches old min
If the Incremental updates a value that used to equal the old min, we may
have raised it and need to recalculate it at the end. Otherwise, we can
avoid recalculating at all!
Sage Weil [Fri, 14 Mar 2014 19:46:57 +0000 (12:46 -0700)]
unittest_ceph_argparse: fix warnings
In file included from test/ceph_argparse.cc:17:0:
../src/gtest/include/gtest/gtest.h: In function ‘testing::AssertionResult testing::internal::CmpHelperEQ(const char*, const char*, const T1&, const T2&) [with T1 = int, T2 = long unsigned int]’:
../src/gtest/include/gtest/gtest.h:1333:30: instantiated from ‘static testing::AssertionResult testing::internal::EqHelper::Compare(const char*, const char*, const T1&, const T2&) [with T1 = int, T2 = long unsigned int]’
test/ceph_argparse.cc:344:207: instantiated from here
warning: ../src/gtest/include/gtest/gtest.h:1263:3: comparison between signed and unsigned integer expressions [-Wsign-compare]
Sage Weil [Fri, 14 Mar 2014 18:02:30 +0000 (11:02 -0700)]
mon: only do timecheck with known monmap
If we are still on monmap epoch 0, our mon ranks cannot yet be trusted
since there is not yet a shared source of truth from paxos. If we do
timechecks, the code gets confused about the ranks in e.g. the
timecheck_waiting map.
Fixes: #7692 Signed-off-by: Sage Weil <sage@inktank.com>
Samuel Just [Fri, 14 Mar 2014 01:16:19 +0000 (18:16 -0700)]
PG::activate: handle peer contigious with primary, but not auth_log
The added case covers a situation where a replica is not contiguous with
the auth_log, but is contiguous with the primary. Reshuffling the
active set to handle this would be tricky, so instead we just go ahead
and backfill it anyway. This is probably preferrable in any case since
the replica in question would have to be significantly behind.
Fixes: #7696 Signed-off-by: Samuel Just <sam.just@inktank.com>
ceph_mon: split postfork() in two and finish postfork just before daemonize
We split global_init_postfork() in two: start and finish, with the first
keeping much of postfork()'s tasks except closing stderr, which we leave
open until just before we daemonize. This allows the user to see any
error messages that the monitor may spit out before it daemonizes, making
sense of the error code (which we were already returning).
Fixes: 7489 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Sage Weil [Fri, 14 Mar 2014 05:02:01 +0000 (22:02 -0700)]
osd/ReplicatedPG: release op locks on on commit+applied
We were releasing the op locks when we applied the update but (potentially)
before we committed it. This means that another client can read object
state that is not yet durable.
Fixes: #7709 Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Mon, 10 Mar 2014 20:52:54 +0000 (13:52 -0700)]
osd: set default cache_target_{dirty,full}_ratios based on configurable
These were hard-coded in the pg_pool_t constructor, but that was a dumb
idea.
Note that decoding legacy pg_pool_t's no longer does what it used to. I'm
pretty sure that's okay since we care less about interim releases and
because we are pulling these normally out of OSDMap, which is freshly
encoded on a regular basis (and certainly recently with real values). Also,
let's not forget that this field is meaningless on old pools anyway.
Samuel Just [Thu, 13 Mar 2014 21:04:19 +0000 (14:04 -0700)]
PrioritizedQueue: cap costs at max_tokens_per_subqueue
Otherwise, you can get a recovery op in the queue which has a cost
higher than the max token value. It won't get serviced until all other
queues also do not have enough tokens and higher priority queues are
empty.
Fixes: #7706 Signed-off-by: Samuel Just <sam.just@inktank.com>
Yehuda Sadeh [Thu, 13 Mar 2014 18:25:24 +0000 (11:25 -0700)]
rgw: manifest hold the actual bucket used for tail objects
Fixes: 7703
Object can be copied between different buckets, so we need to keep track
of which bucket is used for naming the tail parts. The new manifest
requires that because older manifest just held all the tail objects
(each containing the appropriate bucket internally).
Sage Weil [Thu, 13 Mar 2014 18:22:34 +0000 (11:22 -0700)]
rbd-fuse: fix signed/unsigned warning
rbd_fuse/rbd-fuse.c: In function 'enumerate_images':
rbd_fuse/rbd-fuse.c:113:2: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
Danny Al-Gaaf [Thu, 13 Mar 2014 17:48:00 +0000 (18:48 +0100)]
mds/Mutation.h: init export_dir with NULL in ctor
CID 1188167 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
2. uninit_member: Non-static class member "export_dir" is not initialized in
this constructor nor in any functions that it calls.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Thu, 13 Mar 2014 17:39:32 +0000 (18:39 +0100)]
mds/Migrator.h: init some members of import_state_t in ctor
CID 1188166 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member "state" is not initialized in this
constructor nor in any functions that it calls.
4. uninit_member: Non-static class member "peer" is not initialized in this
constructor nor in any functions that it calls.
6. uninit_member: Non-static class member "tid" is not initialized in this
constructor nor in any functions that it calls.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Thu, 13 Mar 2014 17:30:54 +0000 (18:30 +0100)]
mds/Migrator.h: init some export_state_t members in ctor
CID 1188165 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member "state" is not initialized in
this constructor nor in any functions that it calls.
4. uninit_member: Non-static class member "peer" is not initialized in this
constructor nor in any functions that it calls.
6. uninit_member: Non-static class member "tid" is not initialized in this
constructor nor in any functions that it calls.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Thu, 13 Mar 2014 16:21:53 +0000 (17:21 +0100)]
test_filejournal.cc: use strncpy and terminate with '\0'
CID 966632 (#1 of 1): Copy into fixed size buffer (STRING_OVERFLOW)
2. fixed_size_dest: You might overrun the 200 byte fixed-size string
"path" by copying "args[0UL]" without checking the length.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Sharif Olorin [Thu, 13 Mar 2014 07:36:00 +0000 (18:36 +1100)]
Add unit test for race condition in libnss
This isn't in test/crypto.cc because common_init_finish is called prior
to running any tests. Will not build the test function if Ceph hasn't
been configured with NSS.
Sharif Olorin [Wed, 12 Mar 2014 08:01:00 +0000 (19:01 +1100)]
Work around race condition in libnss
This change prevents a segfault in ceph::crypto::init when using NSS and
calling rados_connect from multiple threads simultaneously on different
rados_t objects (and updates the documentation for rados_connect to
reflect the fix).
It's pretty simple, just one static mutex wrapping the
NSS definition of ceph::crypto::init. More details regarding the race
condition are in this[0] commit (and pull request #1424).
To reproduce the race condition in the existing codebase, the below[1]
C program will work (depending on number of cores and probably other
things, the number of threads needed to reliably reproduce varies, but
the more the better - in my environment five is sufficient, with four
cores.
int main() {
pthread_t ts[NTHREAD];
int i;
for (i = 0; i < NTHREAD; i++) {
pthread_create(&ts[i], NULL, init, NULL);
}
for (i = 0; i < NTHREAD; i++) {
int k;
void *p = (void*)&k;
pthread_join(ts[i], p);
}
Florian Haas [Thu, 13 Mar 2014 10:32:05 +0000 (11:32 +0100)]
doc: fix formatting on PG recommendation
Previous commit (047287afbe0ddfaaafd05e9dbf25c1c7dea9a1be) broke
formatting on the formula, and also made mixed formula and text oddly,
which on second thought didn't look too good.
Add the note about the power of two to the following paragraph
instead, in prose.
Danny Al-Gaaf [Wed, 12 Mar 2014 21:56:44 +0000 (22:56 +0100)]
RGWListBucketMultiparts: init max_uploads/default_max with 0
CID 717377 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member "max_uploads" is not initialized
in this constructor nor in any functions that it calls.
4. uninit_member: Non-static class member "default_max" is not initialized
in this constructor nor in any functions that it calls.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 12 Mar 2014 21:37:12 +0000 (22:37 +0100)]
AbstractWrite: initialize m_snap_seq with 0
CID 717223 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member "m_snap_seq" is not initialized
in this constructor nor in any functions that it calls.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 12 Mar 2014 20:03:25 +0000 (21:03 +0100)]
AdminSocket: initialize m_getdescs_hook in the constructor
CID 717212 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
2. uninit_member: Non-static class member "m_getdescs_hook" is not
initialized in this constructor nor in any functions that it calls.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 12 Mar 2014 19:27:57 +0000 (20:27 +0100)]
RGWPutCORS_ObjStore_S3::get_params: check data before dereference
CID 1063697 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
5. var_deref_model: Passing null pointer "data" to function
"RGWXMLParser::parse(char const *, int, int)", which dereferences it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 12 Mar 2014 19:09:22 +0000 (20:09 +0100)]
mds/Server.cc: check straydn before dereference
ID 1019554 (#1 of 1): Dereference after null check (FORWARD_NULL)
13. var_deref_model: Passing null pointer "straydn" to function
"MDSCacheObject::is_auth() const", which dereferences it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Florian Haas [Wed, 12 Mar 2014 18:31:54 +0000 (19:31 +0100)]
doc: Add "nearest power of two" to PG rule-of-thumb
Following an IRC discussion, it emerged that it would be helpful
to explain the merit of choosing a number of PGs per pool that is
a power of two, to keep PGs at roughly equal sizes in case of
PG splits.
See http://irclogs.ceph.widodh.nl/index.php?date=2014-03-12 for the
original discussion.
Samuel Just [Tue, 11 Mar 2014 21:23:10 +0000 (14:23 -0700)]
PG: do not wait for flushed before activation
This should reduce the sting of the previous commit somewhat. We wait
for the activation transactions to clear prior to accepting IO anyway,
so we can go ahead and get that process started without waiting for the
flush.