Sage Weil [Wed, 5 Jun 2013 14:55:46 +0000 (07:55 -0700)]
mon: upgrade auth database on leader
If we are the leader, and the auth database has not yet been upgraded,
do so. The upgrade consists of translating old-style (pre-v0.64) caps
to new-style caps (e.g., 'allow profile bootstrap-osd'). This happens
once and the conversion takes the form of a normal paxos transaction.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
Dan Mick [Tue, 4 Jun 2013 20:13:02 +0000 (13:13 -0700)]
librados, rados.py: add rados_create2/init2
librados clients, particularly the ceph tool, need to be able
to specify a full 'name'; rados_create enforced 'client.<param>'
with no workaround. New interface. Python Rados().__init__ selects
appropriate create function depending on whether name or id is
supplied.
Dan Mick [Mon, 3 Jun 2013 19:32:14 +0000 (12:32 -0700)]
ceph: backward-compatibility hack: blank line before JSON output
Many JSON commands (osd dump, et. al.) used to print a status
line first before the actual output; this has been fixed, but there
are scripts/tools/etc. that expect it. A simple compatibility hack
is to output a blank line, which won't confuse properly-written
JSON parsers, but will allow the tools-with-workarounds to continue
to work.
Dan Mick [Fri, 31 May 2013 04:58:46 +0000 (21:58 -0700)]
ceph: various cleanups
- make base class valid() do useful work
- remove valid from CephPoolname; pool need not exist for create
- add --user as alias for --id
- remove vestige of special --keyring handing
- be sure childargs is an empty list rather than None
- remove -- from childargs if present (to stop interpreting -- args)
- handle connection timeout cleanly
Dan Mick [Tue, 4 Jun 2013 04:06:34 +0000 (21:06 -0700)]
cmdparse, mon: add cmd_vartype_stringify for _allowed_command
cmd_vartype are not all strings, and need a type-variant function
to turn them into strings for authorization against caps. Use
boost::apply_visitor to get this behavior.
New parsing function to extract any known arguments from a vector
and return any unknowns; useful for ceph CLI to allow librados
first dibs on arguments so it doesn't have to reproduce the
argument recognition
Dan Mick [Wed, 29 May 2013 01:38:16 +0000 (18:38 -0700)]
MDSMonitor, cmdparse: increase resiliency of bad cmd_getval()
MDSMonitor: check for and handle bad maxmds get
cmdparse.h: Use gcc demangler to print bad boost::variant typenames,
add backtrace in case of bad boost::variant get
Sage Weil [Sat, 1 Jun 2013 00:09:19 +0000 (17:09 -0700)]
mon: start lease timer from peon_init()
In the scenario:
- leader wins, peons lose
- leader sees it is too far behind on paxos and bootstraps
- leader tries to sync with someone, waits for a quorum of the others
- peons sit around forever waiting
The problem is that they never time out because paxos never issues a lease,
which is the normal timeout that lets them detect a leader failure.
Avoid this by starting the lease timeout as soon as we lose the election.
The timeout callback just does a bootstrap and does not rely on any other
state.
I see one possible danger here: there may be some "normal" cases where the
leader takes a long time to issue its first lease that we currently
tolerate, but won't with this new check in place. I hope that raising
the lease interval/timeout or reducing the allowed paxos drift will make
that a non-issue. If it is problematic, we will need a separate explicit
"i am alive" from the leader while it is getting ready to issue the lease
to prevent a live-lock.
Sage Weil [Fri, 31 May 2013 05:52:21 +0000 (22:52 -0700)]
mon: discard messages from disconnected clients
If the client is not connected, discard the message. They will
reconnect and resend anyway, so there is no point in processing it
twice (now and later).
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
- trim more at a time (by an order of magnitude)
- rename fields to paxos_trim_{min,max}; only trim when there are min items
that are trimmable, and trim at most max items at a time.
- adjust the paxos_service_trim_{min,max} values up by a factor of 2.
Since we are compacting every time we trim, adjusting these up mean less
frequent compactions and less overall work for the monitor.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
Samuel Just [Mon, 15 Apr 2013 23:33:48 +0000 (16:33 -0700)]
PG: don't write out pg map epoch every handle_activate_map
We don't actually need to write out the pg map epoch on every
activate_map as long as:
a) the osd does not trim past the oldest pg map persisted
b) the pg does update the persisted map epoch from time
to time.
To that end, we now keep a reference to the last map persisted.
The OSD already does not trim past the oldest live OSDMapRef.
Second, handle_activate_map will trim if the difference between
the current map and the last_persisted_map is large enough.
Fixes: #4731 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 2c5a9f0e178843e7ed514708bab137def840ab89)
Conflicts:
src/common/config_opts.h
src/osd/PG.cc
- last_persisted_osdmap_ref gets set in the non-static
PG::write_info
Danny Al-Gaaf [Fri, 31 May 2013 17:07:45 +0000 (19:07 +0200)]
mds/Server.cc: fix dereference after null check
CID 716927 (#1 of 1): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "diri->snaprealm" to function
"SnapRealm::resolve_snapname(std::string const &
Make sure not to dereference diri->snaprealm.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 29 May 2013 14:13:47 +0000 (16:13 +0200)]
mds/Server.cc: fix dereference after null check
CID 716926 (#1 of 1): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing "mdr" to function
"Server::apply_allocated_inos(MDRequest *)", which dereferences
null "mdr->session".
Add assert for 'mdr' and assert for session in apply_allocated_inos().
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 29 May 2013 13:51:48 +0000 (15:51 +0200)]
mds/Server.cc: fix dereference after null check
Add assert to fix:
CID 716925 (#1 of 1): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "straydn" to function
"CDentry::get_dir() const", which dereferences it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 29 May 2013 13:46:53 +0000 (15:46 +0200)]
mds/Migrator.cc: fix dereference after null check
Add asserts to check for 'dir' to fix:
CID 716924 (#1-5 of 5): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "dir" to function "operator
<<(std::ostream &, CDir &)", which dereferences it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 29 May 2013 13:37:31 +0000 (15:37 +0200)]
mds/Migrator.cc: fix dereference after null check
Add assert for 'le' to fix:
CID 716923 (#1 of 1): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "le" to function
"LogEvent::get_start_off() const", which dereferences it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 29 May 2013 13:24:39 +0000 (15:24 +0200)]
mds/MDCache.cc: fix dereference null return value
Add assert to fix:
CID 716994 (#1 of 1): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "dir" when
calling "CDir::mark_dirty(version_t, LogSegment *)".
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 29 May 2013 13:14:28 +0000 (15:14 +0200)]
mds/MDCache.cc: fix dereference null return value
CID 716993 (#1 of 2): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "in" when
calling "operator <<(std::ostream &, CInode &)".
Add assert for 'in'.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 29 May 2013 13:04:13 +0000 (15:04 +0200)]
mds/MDCache.cc: fix dereference after null check
Add assert for 'parent' before call assert on parent->is_auth().
CID 716922 (#1 of 1): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "parent" to function
"MDSCacheObject::is_auth() const", which dereferences it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Fri, 31 May 2013 16:56:21 +0000 (18:56 +0200)]
mds/MDCache.cc: fix dereference after null check
CID 716921 (#1 of 1): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "dir" to function
"operator <<(std::ostream &, CDir &)", which dereferences it.
CID 716992 (#1 of 1): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "dir"
when calling "MDSCacheObject::is_auth() const".
Add assert for 'dir' before use it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Fri, 31 May 2013 16:49:30 +0000 (18:49 +0200)]
mds/Locker.cc: fix dereference after null check
CID 716919 (#1 of 1): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "cap" to function
"Capability::inc_suppress()", which dereferences it.
Check for 'cap' before use it as in other places of the function.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Fri, 31 May 2013 16:42:20 +0000 (18:42 +0200)]
mds/Locker.cc: fix dereference after null check
CID 716918 (#1 of 1): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "in" to function
"MDSCacheObject::state_test(unsigned int) const", which
dereferences it.
Add assert for 'in == NULL' before use it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Fri, 31 May 2013 16:39:43 +0000 (18:39 +0200)]
mds/Locker.cc: fix dereference after null check
CID 716917 (#1 of 1): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "in" to function
"MDSCacheObject::state_test(unsigned int) const", which
dereferences it.
Add assert for in == NULL before using it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Tue, 28 May 2013 12:41:30 +0000 (14:41 +0200)]
bench/dumb_backend.cc: check return value of lseek()
CID 743395 (#1 of 1): Unchecked return value from library (CHECKED_RETURN)
check_return: Calling function "lseek(fd, offset, 0)" without checking
return value. This library function may fail and return an error code.
unchecked_value: No check of the return value of "lseek(fd, offset, 0)".
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Tue, 28 May 2013 12:26:29 +0000 (14:26 +0200)]
bench/dumb_backend.cc: check return value of posix_fadvise()
CID 743396 (#1 of 1): Unchecked return value from library (CHECKED_RETURN)
check_return: Calling function "posix_fadvise(fd, offset, bl->length(), 4)"
without checking return value. This library function may fail and return
an error code.
unchecked_value: No check of the return value of
"posix_fadvise(fd, offset, bl->length(), 4)".
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Tue, 28 May 2013 12:08:09 +0000 (14:08 +0200)]
small_io_bench_fs.cc: check return value of FileStore::mkfs/mount()
CID 743398 (#1 of 1): Unchecked return value (CHECKED_RETURN)
check_return: Calling function "FileStore::mount()" without
checking return value (as is done elsewhere 4 out of 5 times).
unchecked_value: No check of the return value of "fs.FileStore::mount()"
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Fri, 31 May 2013 16:28:07 +0000 (18:28 +0200)]
mds/Locker.cc: fix explicit null dereferenced
CID 716916 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
var_deref_model: Passing null pointer "in" to function
"CInode::is_head()", which dereferences it.
Add assert.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Tue, 28 May 2013 10:55:19 +0000 (12:55 +0200)]
mds/Server.cc: fix explicit null dereferenced
CID 716928 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
var_deref_model: Passing null pointer "session" to function
"Session::trim_completed_requests(tid_t)", which dereferences it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Yehuda Sadeh [Thu, 30 May 2013 19:58:11 +0000 (12:58 -0700)]
rgw: only append prefetched data if reading from head
Fixes: #5209
Backport: bobtail, cuttlefish
If the head object wrongfully contains data, but according to the
manifest we don't read from the head, we shouldn't copy the prefetched
data. Also fix the length calculation for that data.
Yehuda Sadeh [Thu, 30 May 2013 16:34:21 +0000 (09:34 -0700)]
rgw: don't copy object idtag when copying object
Fixes: #5204
When copying object we ended up also copying the original
object idtag which overrode the newly generated one. When
refcount put is called with the wrong idtag the count
does't go down.
Alex Elder [Thu, 30 May 2013 23:10:46 +0000 (18:10 -0500)]
rbd/kernel.sh: quit looking for snapshot sysfs entries
The sysfs entries for snapshots went away a while ago, and this
script used them to verify sizes matched what was expected.
Instead, look at the mapped size of the snapshot in the places
that used to look for the image's snapshot sysfs files.
Also, switch over to using "udevadm settle" rather than a delay to
wait for udev to do its thing. Insert them at more appropriate
places--right after "rmd map" commands and before and after the
"rbd unmap" calls.
Stop doing the manual refresh calls as well. The osd will trigger
refreshes whenever the image size or shapshot context changes.
Finally, the cleanup routine is called initially, when there really
isn't expected to be anything to clean up. Change the rbd commands
to run there conditionally, only if the target of the command
already exists.
Loic Dachary [Wed, 22 May 2013 12:14:26 +0000 (14:14 +0200)]
move log, ondisklog, missing from PG to PGLog
PG::log, PG::ondisklog, PG::missing are moved from PG to a new PGLog
class and are made protected data members. It is a preliminary step
before writing unit tests to cover the methods that have side effects
on these data members and define a clean PGLog API. It improves
encapsulation and does not change any of the logic already in
place.
Possible issues :
* an additional reference (PG->PGLog->IndexedLog instead of
PG->IndexedLog for instance) is introduced : is it optimized ?
* rewriting log.log into pg_log.get_log().log affects the readability
but should be optimized and have no impact on performances
The guidelines followed for this patch are:
* const access to the data members are preserved, no attempt is made
to define accessors
* all non const methods are in PGLog, no access to non const methods of
PGLog::log, PGLog::logondisk and PGLog::missing are provided
* when methods are moved from PG to PGLog the change to their
implementation is restricted to the minimum.
* the PG::OndiskLog and PG::IndexedLog sub classes are moved
to PGLog sub classes unmodified and remain public
A const version of the pg_log_t::find_entry method was added.
A const accessor is provided for PGLog::get_log, PGLog::get_missing,
PGLog::get_ondisklog but no non-const accessor.
Arguments are added to most of the methods moved from PG to PGLog so
that they can get access to PG data members such as info or log_oid.
The PGLog method are sorted according to the data member they modify.
//////////////////// missing ////////////////////
* The pg_missing_t::{got,have,need,add,rm} methods are wrapped as
PGLog::missing_{got,have,need,add,rm}
//////////////////// log ////////////////////
* PGLog::get_tail, PGLog::get_head getters are created
* PGLog::set_tail, PGLog::set_head, PGLog::set_last_requested setters
are created
* PGLog::index, PGLog::unindex, PGLog::add wrappers,
PGLog::reset_recovery_pointers are created
* PGLog::claim_log is created with code extracted from
PG::RecoveryState::Stray::react.
* PGLog::split_into is created with code extracted from
PG::split_into.
* PGLog::recover_got is created with code extracted from
ReplicatedPG::recover_got.
* PGLog::activate_not_complete is created with code extracted
from PG::active
* PGLog:proc_replica_log is created with code extracted from
PG::proc_replica_log
* PGLog:write_log is created with code extracted from
PG::write_log
* PGLog::merge_old_entry replaces PG::merge_old_entry
The remove_snap argument is used to collect hobject_t
* PGLog::rewind_divergent_log replaces PG::rewind_divergent_log
The remove_snap argument is used to collect hobject_t
A new PG::rewind_divergent_log method is added to call
remove_snap_mapped_object on each of the remove_snap
elements
* PGLog::merge_log replaces PG::merge_log
The remove_snap argument is used to collect hobject_t
A new PG::merge_log method is added to call
remove_snap_mapped_object on each of the remove_snap
elements
* PGLog:write_log is created with code extracted from PG::write_log. A
non-static version is created for convenience but is a simple
wrapper.
* PGLog:read_log replaces PG::read_log. A non-static version is
created for convenience but is a simple wrapper.
Sage Weil [Thu, 30 May 2013 21:36:41 +0000 (14:36 -0700)]
mon: make compaction bounds overlap
When we trim items N to M, compact over range (N-1) to M so that the
items in the queue will share bounds and get merged. There is no harm in
compacting over a larger range here when the lower bound is a key that
doesn't exist anyway.