erasure-code: test isa encode/decode with various object sizes
Create an encode_decode() helper method to be called from the
encode_decode test function with various object size arguments. The
helper method is a copy/paste of the previous test that was using a
single object of a fixed size. The test is slightly adapted to
accommodate for different object sizes but the logic is not modified.
The object sizes being tested are chosen to be under the size of the
required size alignment or on multiple pages, size aligned or not.
erasure-code: isa encode tests adapted to per chunk alignment
The encode tests use the alignment constraints. It has been changed to
be aligned on a per chunk basis instead of computing a more expensive
object alignement constraint. The test function is modified to take the
change into account but the logic is otherwise unmodified.
erasure-code: isa uses per chunk alignment constraints
Copy code from the jerasure plugin to enforce alignment constraints per
chunk instead of using the total object size. It is simpler and reduces
the size of the chunks. See
https://github.com/ceph/ceph/commit/c7daaaf5e63d0bd1d444385e62611fe276f6ce29
for more information.
Andreas Peters [Thu, 25 Sep 2014 14:48:47 +0000 (16:48 +0200)]
erasure-code: [ISA] modify get_alignment function to imply a platform/compiler independent alignment constraint of 32-byte aligned buffer addresses & length
Otherwise statfs may fail if mkfs hasn't been run yet or if the monitor
data directory does not exist. There are checks to account for the mon
data dir not existing and we should wait for them to clear before we go
ahead and check the fs stats.
Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
There are two new plugins (isa and lrc). When upgrading a cluster, there
must be a protection against the following scenario:
* the mon are upgraded but not the osd
* a new pool is created using plugin isa
* the osd fail to load the isa plugin because they have not been
upgraded
A feature bit is added : PLUGINS_V2. The monitor will only agree to
create an erasure code profile for the isa or lrc plugin if all OSDs
supports PLUGINS_V2. Once such an erasure code profile is stored in the
OSDMap, an OSD can only boot if it supports the PLUGINS_V2 feature,
which means it is able to load the isa and lrc plugins.
The monitors will only activate the PLUGINS_V2 feature if all monitors
in the quorum support it. It protects against the following scenario:
* the leader is upgraded the peons are not upgraded
* the leader creates a pool with plugin=lrc because all OSD have
the PLUGINS_V2 feature
* the leader goes down and a non upgraded peon becomes the leader
* an old OSD tries to join the cluster
* the new leader will let the OSD boot because it does not contain
the logic that would excluded it
* the old OSD will fail when required to load the plugin lrc
This is going to be needed each time new plugins are added, which is
impractical. A more generic plugin upgrade support should be added
instead, as described in http://tracker.ceph.com/issues/7291.
mon: LogMonitor: appropriately expand channel meta variables
We must only expand the log file's channel meta variables upon requiring
a channel's log file. As we may have a 'default' channel that will
cover all channels, we must wait to expand channels as they come in and
do so if they haven't yet been expanded. Expanding the 'log_file' in
place would have the unfortunate side effect of expanding, say,
default=/tmp/whatever.$channel.log
to
default=/tmp/whatever.default.log
which would not be what we wanted upon receiving a message that should
go into channel 'foo' -- assuming we specified no such channel in the
options, channel 'foo' should go into '/tmp/whatever.foo.log'.
Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
common: LogEntry: if channel is missing, default to "cluster"
Keeps backward compatibility when there are entities that do not know
what a channel is. This way we ensure that those messages are logged as
they were expected to be before channels were introduced: to the cluster
log.
Signed-off-by: Joao Eduardo Luis <joao@redhat.com>
Danny Al-Gaaf [Fri, 19 Sep 2014 10:25:07 +0000 (12:25 +0200)]
rgw_main.cc: add missing virtual destructor for RGWRequest
CID 1160858 (#1 of 1): Non-virtual destructor (VIRTUAL_DTOR)
nonvirtual_dtor: Class RGWLoadGenRequest has a destructor
and a pointer to it is upcast to class RGWRequest which doesn't
have a virtual destructor.
Danny Al-Gaaf [Fri, 19 Sep 2014 10:06:49 +0000 (12:06 +0200)]
os/GenericObjectMap.cc: pass big parameter by reference
CID 1188142 (#1 of 1): Big parameter passed by value (PASS_BY_VALUE)
pass_by_value: Passing parameter header of type
GenericObjectMap::_Header (size 176 bytes) by value.
Danny Al-Gaaf [Wed, 17 Sep 2014 17:31:13 +0000 (19:31 +0200)]
mds/Beacon.*: fix UNINIT_CTOR cases
CID 1238905 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member want_state is not initialized
in this constructor nor in any functions that it calls.
uninit_member: Non-static class member last_send is not initialized
in this constructor nor in any functions that it calls.
CID 1238903 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member data_chunk_count is not
initialized in this constructor nor in any functions that it calls.
If the file size does not fit in 32 bits the (unsigned) cast will
overflow. Cast to uint64_t which is the type of the value returned by
get_total_chunk_size.
We have been setting it to the old head value. This is usually
harmless since the new head will virtually always be ahead of the
old head for claim_log_and_clear_rollback_info, but can cause trouble
in some edge cases.
Fixes: #9481
Backport: firefly Signed-off-by: Samuel Just <sam.just@inktank.com>
Samuel Just [Mon, 15 Sep 2014 23:53:21 +0000 (16:53 -0700)]
PG::find_best_info: let history.last_epoch_started provide a lower bound
If we find a info.history.last_epoch_started above any
info.last_epoch_started, we must be missing updates and
min_last_update_acceptable should provisionally be max().
Fixes: #9482
Backport: firefly Signed-off-by: Samuel Just <sam.just@inktank.com>
The implicit creation of a ruleset when creating a pool is convenient
when nothing is specified. However, if the caller sets a ruleset name,
it should not implicitly create it but return ENOENT instead. Silently
creating a ruleset when there is a typo in the ruleset name is
confusing.
Otherwise the FDCache will keep a file descriptor to a file that was
removed from the file system. This may create various type of errors
because the OSD checking the FDCache will assume the file that contains
information for an object exists although it does not. For instance in
the following:
* rados put object file
* rm file from the primary
* repair the pg to which the object is mapped
if the FDCache is not cleared, repair will incorrectly pull a copy from
a replica and write it to the now unlinked file. Later on, it will
assume the file exists on the primary and only be partially correct :
the data can still be accessed via the file descriptor but any operation
using the path name will fail.
osd: subscribe to the newest osdmap when reconnecting to a monitor
This is mostly relevant in testing clusters, but it ensures that an OSD
disconnecting from the monitor at the wrong time will still see any recent
map updates and prevent accidental loss of map injection into the OSD cluster. Fixes: #9219 Signed-off-by: Greg Farnum <greg@inktank.com>
Sage Weil [Mon, 15 Sep 2014 23:45:19 +0000 (16:45 -0700)]
osdc/Objecter: fix command op cancellation race
Cancel the command op timeout event before we clear out the op from the
session struct. This isn't strictly necessary because command_op_cancel
will "gracefully" handle the case where the tid is no longer present, but
this avoids that noise and is cleaner.
Sage Weil [Mon, 15 Sep 2014 23:40:39 +0000 (16:40 -0700)]
osdc/Objecter: cancel timeout before clearing op->session
The C_CancelOp path assumes op->session != NULL. Cancel that op before
we clear it. This fixes a crash like
#0 pthread_rwlock_wrlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:39
#1 0x00007fc82690a4b1 in RWLock::get_write (this=0x18, lockdep=<optimized out>) at ./common/RWLock.h:88
#2 0x00007fc8268f4d79 in Objecter::op_cancel (this=0x1f61830, s=0x0, tid=0, r=-110) at osdc/Objecter.cc:1850
#3 0x00007fc8268ba449 in Context::complete (this=0x1f68c20, r=<optimized out>) at ./include/Context.h:64
#4 0x00007fc8269769aa in RWTimer::timer_thread (this=0x1f61950) at common/Timer.cc:268
#5 0x00007fc82697a85d in RWTimerThread::entry (this=<optimized out>) at common/Timer.cc:200
#6 0x00007fc82651ce9a in start_thread (arg=0x7fc7e3fff700) at pthread_create.c:308
Sage Weil [Mon, 15 Sep 2014 22:29:08 +0000 (15:29 -0700)]
ceph-disk: mount xfs with inode64 by default
We did this forever ago with mkcephfs, but ceph-disk didn't. Note that for
modern XFS this option is obsolete, but for older kernels it was not the
default.
Backport: firefly Signed-off-by: Sage Weil <sage@redhat.com>