Josh Durgin [Tue, 4 Jun 2013 20:23:36 +0000 (13:23 -0700)]
rados.py: correct some C types
trunc was getting size_t instead of uint64_t, leading to bad results
in 32-bit environments. Explicitly cast to the desired type
everywhere, so it's clear the correct type is being used.
Loic Dachary [Sun, 2 Jun 2013 10:53:48 +0000 (12:53 +0200)]
unit tests for PGLog::merge_old_entry
The tests covers 100% of the LOC of merge_old_entry. It is broken down
in 13 cases to enumerate all the situations it must address. Each case
is isolated in a independant code block where the conditions are
reproduced. For instance:
Sage Weil [Mon, 3 Jun 2013 04:21:51 +0000 (21:21 -0700)]
ceph-fuse: create finisher threads after fork()
The ObjectCacher and MonClient classes both instantiate Finisher
threads. We need to make sure they are created *after* the fork(2)
or else the process will fail to join() them on shutdown, and the
threads will not exist while fuse is doing useful work.
Put CephFuse on the heap and move all this initalization into the child
block, and make sure errors are passed back to the parent.
Fix-proposed-by: Alexandre Marangone <alexandre.maragone@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Sat, 1 Jun 2013 00:09:19 +0000 (17:09 -0700)]
mon: start lease timer from peon_init()
In the scenario:
- leader wins, peons lose
- leader sees it is too far behind on paxos and bootstraps
- leader tries to sync with someone, waits for a quorum of the others
- peons sit around forever waiting
The problem is that they never time out because paxos never issues a lease,
which is the normal timeout that lets them detect a leader failure.
Avoid this by starting the lease timeout as soon as we lose the election.
The timeout callback just does a bootstrap and does not rely on any other
state.
I see one possible danger here: there may be some "normal" cases where the
leader takes a long time to issue its first lease that we currently
tolerate, but won't with this new check in place. I hope that raising
the lease interval/timeout or reducing the allowed paxos drift will make
that a non-issue. If it is problematic, we will need a separate explicit
"i am alive" from the leader while it is getting ready to issue the lease
to prevent a live-lock.
Sage Weil [Fri, 31 May 2013 05:52:21 +0000 (22:52 -0700)]
mon: discard messages from disconnected clients
If the client is not connected, discard the message. They will
reconnect and resend anyway, so there is no point in processing it
twice (now and later).
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
- trim more at a time (by an order of magnitude)
- rename fields to paxos_trim_{min,max}; only trim when there are min items
that are trimmable, and trim at most max items at a time.
- adjust the paxos_service_trim_{min,max} values up by a factor of 2.
Since we are compacting every time we trim, adjusting these up mean less
frequent compactions and less overall work for the monitor.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
Loic Dachary [Tue, 28 May 2013 21:34:59 +0000 (23:34 +0200)]
unit tests for pg_missing_t
All lines of code are tested. The conditions under which some methods
could corrupt the content of a pg_missing_t object have not been
investigated. Since the data members are public, the caller is
ultimately responsible for the consistency of the object and the
methods have no way to enforce it.
The semantics of is_missing have been discussed in
http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/15280
Samuel Just [Mon, 15 Apr 2013 23:33:48 +0000 (16:33 -0700)]
PG: don't write out pg map epoch every handle_activate_map
We don't actually need to write out the pg map epoch on every
activate_map as long as:
a) the osd does not trim past the oldest pg map persisted
b) the pg does update the persisted map epoch from time
to time.
To that end, we now keep a reference to the last map persisted.
The OSD already does not trim past the oldest live OSDMapRef.
Second, handle_activate_map will trim if the difference between
the current map and the last_persisted_map is large enough.
Fixes: #4731 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 2c5a9f0e178843e7ed514708bab137def840ab89)
Conflicts:
src/common/config_opts.h
src/osd/PG.cc
- last_persisted_osdmap_ref gets set in the non-static
PG::write_info
Danny Al-Gaaf [Fri, 31 May 2013 17:07:45 +0000 (19:07 +0200)]
mds/Server.cc: fix dereference after null check
CID 716927 (#1 of 1): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "diri->snaprealm" to function
"SnapRealm::resolve_snapname(std::string const &
Make sure not to dereference diri->snaprealm.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 29 May 2013 14:13:47 +0000 (16:13 +0200)]
mds/Server.cc: fix dereference after null check
CID 716926 (#1 of 1): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing "mdr" to function
"Server::apply_allocated_inos(MDRequest *)", which dereferences
null "mdr->session".
Add assert for 'mdr' and assert for session in apply_allocated_inos().
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 29 May 2013 13:51:48 +0000 (15:51 +0200)]
mds/Server.cc: fix dereference after null check
Add assert to fix:
CID 716925 (#1 of 1): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "straydn" to function
"CDentry::get_dir() const", which dereferences it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 29 May 2013 13:46:53 +0000 (15:46 +0200)]
mds/Migrator.cc: fix dereference after null check
Add asserts to check for 'dir' to fix:
CID 716924 (#1-5 of 5): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "dir" to function "operator
<<(std::ostream &, CDir &)", which dereferences it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 29 May 2013 13:37:31 +0000 (15:37 +0200)]
mds/Migrator.cc: fix dereference after null check
Add assert for 'le' to fix:
CID 716923 (#1 of 1): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "le" to function
"LogEvent::get_start_off() const", which dereferences it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 29 May 2013 13:24:39 +0000 (15:24 +0200)]
mds/MDCache.cc: fix dereference null return value
Add assert to fix:
CID 716994 (#1 of 1): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "dir" when
calling "CDir::mark_dirty(version_t, LogSegment *)".
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 29 May 2013 13:14:28 +0000 (15:14 +0200)]
mds/MDCache.cc: fix dereference null return value
CID 716993 (#1 of 2): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "in" when
calling "operator <<(std::ostream &, CInode &)".
Add assert for 'in'.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 29 May 2013 13:04:13 +0000 (15:04 +0200)]
mds/MDCache.cc: fix dereference after null check
Add assert for 'parent' before call assert on parent->is_auth().
CID 716922 (#1 of 1): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "parent" to function
"MDSCacheObject::is_auth() const", which dereferences it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Fri, 31 May 2013 16:56:21 +0000 (18:56 +0200)]
mds/MDCache.cc: fix dereference after null check
CID 716921 (#1 of 1): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "dir" to function
"operator <<(std::ostream &, CDir &)", which dereferences it.
CID 716992 (#1 of 1): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "dir"
when calling "MDSCacheObject::is_auth() const".
Add assert for 'dir' before use it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Fri, 31 May 2013 16:49:30 +0000 (18:49 +0200)]
mds/Locker.cc: fix dereference after null check
CID 716919 (#1 of 1): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "cap" to function
"Capability::inc_suppress()", which dereferences it.
Check for 'cap' before use it as in other places of the function.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Fri, 31 May 2013 16:42:20 +0000 (18:42 +0200)]
mds/Locker.cc: fix dereference after null check
CID 716918 (#1 of 1): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "in" to function
"MDSCacheObject::state_test(unsigned int) const", which
dereferences it.
Add assert for 'in == NULL' before use it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Fri, 31 May 2013 16:39:43 +0000 (18:39 +0200)]
mds/Locker.cc: fix dereference after null check
CID 716917 (#1 of 1): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "in" to function
"MDSCacheObject::state_test(unsigned int) const", which
dereferences it.
Add assert for in == NULL before using it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Tue, 28 May 2013 12:41:30 +0000 (14:41 +0200)]
bench/dumb_backend.cc: check return value of lseek()
CID 743395 (#1 of 1): Unchecked return value from library (CHECKED_RETURN)
check_return: Calling function "lseek(fd, offset, 0)" without checking
return value. This library function may fail and return an error code.
unchecked_value: No check of the return value of "lseek(fd, offset, 0)".
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Tue, 28 May 2013 12:26:29 +0000 (14:26 +0200)]
bench/dumb_backend.cc: check return value of posix_fadvise()
CID 743396 (#1 of 1): Unchecked return value from library (CHECKED_RETURN)
check_return: Calling function "posix_fadvise(fd, offset, bl->length(), 4)"
without checking return value. This library function may fail and return
an error code.
unchecked_value: No check of the return value of
"posix_fadvise(fd, offset, bl->length(), 4)".
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Tue, 28 May 2013 12:08:09 +0000 (14:08 +0200)]
small_io_bench_fs.cc: check return value of FileStore::mkfs/mount()
CID 743398 (#1 of 1): Unchecked return value (CHECKED_RETURN)
check_return: Calling function "FileStore::mount()" without
checking return value (as is done elsewhere 4 out of 5 times).
unchecked_value: No check of the return value of "fs.FileStore::mount()"
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Fri, 31 May 2013 16:28:07 +0000 (18:28 +0200)]
mds/Locker.cc: fix explicit null dereferenced
CID 716916 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
var_deref_model: Passing null pointer "in" to function
"CInode::is_head()", which dereferences it.
Add assert.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Tue, 28 May 2013 10:55:19 +0000 (12:55 +0200)]
mds/Server.cc: fix explicit null dereferenced
CID 716928 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
var_deref_model: Passing null pointer "session" to function
"Session::trim_completed_requests(tid_t)", which dereferences it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Yehuda Sadeh [Thu, 30 May 2013 19:58:11 +0000 (12:58 -0700)]
rgw: only append prefetched data if reading from head
Fixes: #5209
Backport: bobtail, cuttlefish
If the head object wrongfully contains data, but according to the
manifest we don't read from the head, we shouldn't copy the prefetched
data. Also fix the length calculation for that data.
Yehuda Sadeh [Thu, 30 May 2013 16:34:21 +0000 (09:34 -0700)]
rgw: don't copy object idtag when copying object
Fixes: #5204
When copying object we ended up also copying the original
object idtag which overrode the newly generated one. When
refcount put is called with the wrong idtag the count
does't go down.
Alex Elder [Thu, 30 May 2013 23:10:46 +0000 (18:10 -0500)]
rbd/kernel.sh: quit looking for snapshot sysfs entries
The sysfs entries for snapshots went away a while ago, and this
script used them to verify sizes matched what was expected.
Instead, look at the mapped size of the snapshot in the places
that used to look for the image's snapshot sysfs files.
Also, switch over to using "udevadm settle" rather than a delay to
wait for udev to do its thing. Insert them at more appropriate
places--right after "rmd map" commands and before and after the
"rbd unmap" calls.
Stop doing the manual refresh calls as well. The osd will trigger
refreshes whenever the image size or shapshot context changes.
Finally, the cleanup routine is called initially, when there really
isn't expected to be anything to clean up. Change the rbd commands
to run there conditionally, only if the target of the command
already exists.
Loic Dachary [Wed, 22 May 2013 12:14:26 +0000 (14:14 +0200)]
move log, ondisklog, missing from PG to PGLog
PG::log, PG::ondisklog, PG::missing are moved from PG to a new PGLog
class and are made protected data members. It is a preliminary step
before writing unit tests to cover the methods that have side effects
on these data members and define a clean PGLog API. It improves
encapsulation and does not change any of the logic already in
place.
Possible issues :
* an additional reference (PG->PGLog->IndexedLog instead of
PG->IndexedLog for instance) is introduced : is it optimized ?
* rewriting log.log into pg_log.get_log().log affects the readability
but should be optimized and have no impact on performances
The guidelines followed for this patch are:
* const access to the data members are preserved, no attempt is made
to define accessors
* all non const methods are in PGLog, no access to non const methods of
PGLog::log, PGLog::logondisk and PGLog::missing are provided
* when methods are moved from PG to PGLog the change to their
implementation is restricted to the minimum.
* the PG::OndiskLog and PG::IndexedLog sub classes are moved
to PGLog sub classes unmodified and remain public
A const version of the pg_log_t::find_entry method was added.
A const accessor is provided for PGLog::get_log, PGLog::get_missing,
PGLog::get_ondisklog but no non-const accessor.
Arguments are added to most of the methods moved from PG to PGLog so
that they can get access to PG data members such as info or log_oid.
The PGLog method are sorted according to the data member they modify.
//////////////////// missing ////////////////////
* The pg_missing_t::{got,have,need,add,rm} methods are wrapped as
PGLog::missing_{got,have,need,add,rm}
//////////////////// log ////////////////////
* PGLog::get_tail, PGLog::get_head getters are created
* PGLog::set_tail, PGLog::set_head, PGLog::set_last_requested setters
are created
* PGLog::index, PGLog::unindex, PGLog::add wrappers,
PGLog::reset_recovery_pointers are created
* PGLog::claim_log is created with code extracted from
PG::RecoveryState::Stray::react.
* PGLog::split_into is created with code extracted from
PG::split_into.
* PGLog::recover_got is created with code extracted from
ReplicatedPG::recover_got.
* PGLog::activate_not_complete is created with code extracted
from PG::active
* PGLog:proc_replica_log is created with code extracted from
PG::proc_replica_log
* PGLog:write_log is created with code extracted from
PG::write_log
* PGLog::merge_old_entry replaces PG::merge_old_entry
The remove_snap argument is used to collect hobject_t
* PGLog::rewind_divergent_log replaces PG::rewind_divergent_log
The remove_snap argument is used to collect hobject_t
A new PG::rewind_divergent_log method is added to call
remove_snap_mapped_object on each of the remove_snap
elements
* PGLog::merge_log replaces PG::merge_log
The remove_snap argument is used to collect hobject_t
A new PG::merge_log method is added to call
remove_snap_mapped_object on each of the remove_snap
elements
* PGLog:write_log is created with code extracted from PG::write_log. A
non-static version is created for convenience but is a simple
wrapper.
* PGLog:read_log replaces PG::read_log. A non-static version is
created for convenience but is a simple wrapper.