Jianpeng Ma [Mon, 13 Oct 2014 05:33:38 +0000 (13:33 +0800)]
FileStore:Round offset of fiemap down aligned with CEPH_PAGE_SIZE.
There is a bug on xfs about fiemap. If offset unsigned, the result of
fiemap will leak some data.
Kernel commit eedf32bfcace7d8e20cc66757d74fc68f3439ff7 fix this bug.
To avoid this bug on kernel which don't apply this commit, in ceph we
make the offset down aligned with CEPH_PAGE_SIZE.
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Jianpeng Ma [Mon, 29 Sep 2014 03:00:25 +0000 (11:00 +0800)]
os/FileStore: using FIEMAP_FLAGS_SYNC instead of fsync() before call
fiemap.
When call fiemap, it need sync the file. Now it used fsync() to achieve
this. But for fiemap, there is a flag FIEMAP_FLAGS_SYNC which do the
same thing like fsync().
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Johnu George [Wed, 24 Sep 2014 16:32:50 +0000 (09:32 -0700)]
Crush: Ensuring at most num-rep osds are selected
Crush temporary buffers are allocated as per replica size configured
by the user.When there are more final osds (to be selected as per
rule) than the replicas, buffer overlaps and it causes crash.Now, it
ensures that at most num-rep osds are selected even if more number of
osds are allowed by the rule.
Fixes: #9492 Signed-off-by: Johnu George <johnugeo@cisco.com>
documentation: revise placement group number guide
When a cluster has few OSDs (less than 50) propose a preselection of
values: as long as the number of placement groups is not too small nor
too large, it won't make much of a difference anyway.
Users of small clusters tend to blindly apply the (OSD*100)/(pool size)
formula and worry about chosing a wrong value because they do not
understand the tradeoffs. The preselection will hopefully save them from
this uncertainty.
Add an explanation of how placement groups relate to OSDs, CRUSH and
pools to help understand the tradeoffs. Explain the
tradeoffs (durability, distribution and resource usages) with examples.
Otherwise statfs may fail if mkfs hasn't been run yet or if the monitor
data directory does not exist. There are checks to account for the mon
data dir not existing and we should wait for them to clear before we go
ahead and check the fs stats.
Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
There are two new plugins (isa and lrc). When upgrading a cluster, there
must be a protection against the following scenario:
* the mon are upgraded but not the osd
* a new pool is created using plugin isa
* the osd fail to load the isa plugin because they have not been
upgraded
A feature bit is added : PLUGINS_V2. The monitor will only agree to
create an erasure code profile for the isa or lrc plugin if all OSDs
supports PLUGINS_V2. Once such an erasure code profile is stored in the
OSDMap, an OSD can only boot if it supports the PLUGINS_V2 feature,
which means it is able to load the isa and lrc plugins.
The monitors will only activate the PLUGINS_V2 feature if all monitors
in the quorum support it. It protects against the following scenario:
* the leader is upgraded the peons are not upgraded
* the leader creates a pool with plugin=lrc because all OSD have
the PLUGINS_V2 feature
* the leader goes down and a non upgraded peon becomes the leader
* an old OSD tries to join the cluster
* the new leader will let the OSD boot because it does not contain
the logic that would excluded it
* the old OSD will fail when required to load the plugin lrc
This is going to be needed each time new plugins are added, which is
impractical. A more generic plugin upgrade support should be added
instead, as described in http://tracker.ceph.com/issues/7291.
mon: LogMonitor: appropriately expand channel meta variables
We must only expand the log file's channel meta variables upon requiring
a channel's log file. As we may have a 'default' channel that will
cover all channels, we must wait to expand channels as they come in and
do so if they haven't yet been expanded. Expanding the 'log_file' in
place would have the unfortunate side effect of expanding, say,
default=/tmp/whatever.$channel.log
to
default=/tmp/whatever.default.log
which would not be what we wanted upon receiving a message that should
go into channel 'foo' -- assuming we specified no such channel in the
options, channel 'foo' should go into '/tmp/whatever.foo.log'.
Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
common: LogEntry: if channel is missing, default to "cluster"
Keeps backward compatibility when there are entities that do not know
what a channel is. This way we ensure that those messages are logged as
they were expected to be before channels were introduced: to the cluster
log.
Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
Danny Al-Gaaf [Fri, 19 Sep 2014 10:25:07 +0000 (12:25 +0200)]
rgw_main.cc: add missing virtual destructor for RGWRequest
CID 1160858 (#1 of 1): Non-virtual destructor (VIRTUAL_DTOR)
nonvirtual_dtor: Class RGWLoadGenRequest has a destructor
and a pointer to it is upcast to class RGWRequest which doesn't
have a virtual destructor.
Danny Al-Gaaf [Fri, 19 Sep 2014 10:06:49 +0000 (12:06 +0200)]
os/GenericObjectMap.cc: pass big parameter by reference
CID 1188142 (#1 of 1): Big parameter passed by value (PASS_BY_VALUE)
pass_by_value: Passing parameter header of type
GenericObjectMap::_Header (size 176 bytes) by value.
Danny Al-Gaaf [Wed, 17 Sep 2014 17:31:13 +0000 (19:31 +0200)]
mds/Beacon.*: fix UNINIT_CTOR cases
CID 1238905 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member want_state is not initialized
in this constructor nor in any functions that it calls.
uninit_member: Non-static class member last_send is not initialized
in this constructor nor in any functions that it calls.
CID 1238903 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member data_chunk_count is not
initialized in this constructor nor in any functions that it calls.
Danny Al-Gaaf [Fri, 19 Sep 2014 10:25:07 +0000 (12:25 +0200)]
rgw_main.cc: add missing virtual destructor for RGWRequest
CID 1160858 (#1 of 1): Non-virtual destructor (VIRTUAL_DTOR)
nonvirtual_dtor: Class RGWLoadGenRequest has a destructor
and a pointer to it is upcast to class RGWRequest which doesn't
have a virtual destructor.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Fri, 19 Sep 2014 10:06:49 +0000 (12:06 +0200)]
os/GenericObjectMap.cc: pass big parameter by reference
CID 1188142 (#1 of 1): Big parameter passed by value (PASS_BY_VALUE)
pass_by_value: Passing parameter header of type
GenericObjectMap::_Header (size 176 bytes) by value.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
CID 1021214 (#1 of 1): Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking cb_args suggests that it may be
null, but it has already been dereferenced on all paths leading to
the check.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 17 Sep 2014 17:31:13 +0000 (19:31 +0200)]
mds/Beacon.*: fix UNINIT_CTOR cases
CID 1238905 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member want_state is not initialized
in this constructor nor in any functions that it calls.
uninit_member: Non-static class member last_send is not initialized
in this constructor nor in any functions that it calls.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
CID 1238903 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member data_chunk_count is not
initialized in this constructor nor in any functions that it calls.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
If the file size does not fit in 32 bits the (unsigned) cast will
overflow. Cast to uint64_t which is the type of the value returned by
get_total_chunk_size.
Add a trivial osd health test at the beginning of each group of
tests. When facing an intermittent failure, it is difficult to diagnose
if the cluster appears to be missing an OSD but there is no indication
as to when the OSDs were last up.
The tests are now only run after all OSDs are up.
These checks can be disabled with --no-sanity-check to allow running
some tests that have less requirements than running all the tests.
Sage Weil [Thu, 18 Sep 2014 21:23:36 +0000 (14:23 -0700)]
mon: re-bootstrap if we get probed by a mon that is way ahead
During bootstrap we verify that our paxos commits overlap with the other
mons we will form a quorum with. If they do not, we do a sync.
However, it is possible we pass those checks, then fail to join a quorum
before the quorum moves ahead in time such that we no longer overlap.
Currently nothing kicks up back into a probing state to discover we need
to sync... we will just keep trying to call or join an election instead.
Fix this by jumping back to bootstrap if we get a probe that is ahead of
us. Only do this from non probe or sync states as these will be common;
it is only the active and electing states that matter (and probably just
electing!).
Fixes: #9301
Backport: giant, firefly Signed-off-by: Sage Weil <sage@redhat.com>