]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
12 years ago.gitignore: add directory from coverity tools 337/head
Danny Al-Gaaf [Wed, 29 May 2013 14:25:50 +0000 (16:25 +0200)]
.gitignore: add directory from coverity tools

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomds/Server.cc: fix dereference after null check
Danny Al-Gaaf [Fri, 31 May 2013 17:07:45 +0000 (19:07 +0200)]
mds/Server.cc: fix dereference after null check

CID 716927 (#1 of 1): Dereference after null check (FORWARD_NULL)
  var_deref_model: Passing null pointer "diri->snaprealm" to function
  "SnapRealm::resolve_snapname(std::string const &

Make sure not to dereference diri->snaprealm.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomds/Server.cc: fix dereference after null check
Danny Al-Gaaf [Wed, 29 May 2013 14:13:47 +0000 (16:13 +0200)]
mds/Server.cc: fix dereference after null check

CID 716926 (#1 of 1): Dereference after null check (FORWARD_NULL)
  var_deref_model: Passing "mdr" to function
  "Server::apply_allocated_inos(MDRequest *)", which dereferences
  null "mdr->session".

Add assert for 'mdr' and assert for session in apply_allocated_inos().

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomds/Server.cc: fix dereference after null check
Danny Al-Gaaf [Wed, 29 May 2013 13:51:48 +0000 (15:51 +0200)]
mds/Server.cc: fix dereference after null check

Add assert to fix:

CID 716925 (#1 of 1): Dereference after null check (FORWARD_NULL)
  var_deref_model: Passing null pointer "straydn" to function
  "CDentry::get_dir() const", which dereferences it.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomds/Migrator.cc: fix dereference after null check
Danny Al-Gaaf [Wed, 29 May 2013 13:46:53 +0000 (15:46 +0200)]
mds/Migrator.cc: fix dereference after null check

Add asserts to check for 'dir' to fix:

CID 716924 (#1-5 of 5): Dereference after null check (FORWARD_NULL)
  var_deref_model: Passing null pointer "dir" to function "operator
  <<(std::ostream &, CDir &)", which dereferences it.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomds/Migrator.cc: fix dereference after null check
Danny Al-Gaaf [Wed, 29 May 2013 13:37:31 +0000 (15:37 +0200)]
mds/Migrator.cc: fix dereference after null check

Add assert for 'le' to fix:

CID 716923 (#1 of 1): Dereference after null check (FORWARD_NULL)
  var_deref_model: Passing null pointer "le" to function
  "LogEvent::get_start_off() const", which dereferences it.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomds/MDCache.cc: fix dereference null return value
Danny Al-Gaaf [Wed, 29 May 2013 13:24:39 +0000 (15:24 +0200)]
mds/MDCache.cc: fix dereference null return value

Add assert to fix:

CID 716994 (#1 of 1): Dereference null return value (NULL_RETURNS)
  dereference: Dereferencing a pointer that might be null "dir" when
  calling "CDir::mark_dirty(version_t, LogSegment *)".

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomds/MDCache.cc: fix dereference null return value
Danny Al-Gaaf [Wed, 29 May 2013 13:14:28 +0000 (15:14 +0200)]
mds/MDCache.cc: fix dereference null return value

CID 716993 (#1 of 2): Dereference null return value (NULL_RETURNS)
  dereference: Dereferencing a pointer that might be null "in" when
  calling "operator <<(std::ostream &, CInode &)".

Add assert for 'in'.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomds/MDCache.cc: fix dereference after null check
Danny Al-Gaaf [Wed, 29 May 2013 13:04:13 +0000 (15:04 +0200)]
mds/MDCache.cc: fix dereference after null check

Add assert for 'parent' before call assert on parent->is_auth().

CID 716922 (#1 of 1): Dereference after null check (FORWARD_NULL)
  var_deref_model: Passing null pointer "parent" to function
  "MDSCacheObject::is_auth() const", which dereferences it.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomds/MDCache.cc: fix dereference after null check
Danny Al-Gaaf [Fri, 31 May 2013 16:56:21 +0000 (18:56 +0200)]
mds/MDCache.cc: fix dereference after null check

CID 716921 (#1 of 1): Dereference after null check (FORWARD_NULL)
  var_deref_model: Passing null pointer "dir" to function
  "operator <<(std::ostream &, CDir &)", which dereferences it.

CID 716992 (#1 of 1): Dereference null return value (NULL_RETURNS)
  dereference: Dereferencing a pointer that might be null "dir"
  when calling "MDSCacheObject::is_auth() const".

Add assert for 'dir' before use it.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomds/Locker.cc: fix dereference after null check
Danny Al-Gaaf [Fri, 31 May 2013 16:49:30 +0000 (18:49 +0200)]
mds/Locker.cc: fix dereference after null check

CID 716919 (#1 of 1): Dereference after null check (FORWARD_NULL)
  var_deref_model: Passing null pointer "cap" to function
  "Capability::inc_suppress()", which dereferences it.

Check for 'cap' before use it as in other places of the function.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomds/Locker.cc: fix dereference after null check
Danny Al-Gaaf [Fri, 31 May 2013 16:42:20 +0000 (18:42 +0200)]
mds/Locker.cc: fix dereference after null check

CID 716918 (#1 of 1): Dereference after null check (FORWARD_NULL)
  var_deref_model: Passing null pointer "in" to function
  "MDSCacheObject::state_test(unsigned int) const", which
  dereferences it.

Add assert for 'in == NULL' before use it.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomds/Locker.cc: fix dereference after null check
Danny Al-Gaaf [Fri, 31 May 2013 16:39:43 +0000 (18:39 +0200)]
mds/Locker.cc: fix dereference after null check

CID 716917 (#1 of 1): Dereference after null check (FORWARD_NULL)
  var_deref_model: Passing null pointer "in" to function
  "MDSCacheObject::state_test(unsigned int) const", which
  dereferences it.

Add assert for in == NULL before using it.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agobench/dumb_backend.cc: check return value of lseek()
Danny Al-Gaaf [Tue, 28 May 2013 12:41:30 +0000 (14:41 +0200)]
bench/dumb_backend.cc: check return value of lseek()

CID 743395 (#1 of 1): Unchecked return value from library (CHECKED_RETURN)
  check_return: Calling function "lseek(fd, offset, 0)" without checking
   return value. This library function may fail and return an error code.
  unchecked_value: No check of the return value of "lseek(fd, offset, 0)".

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agobench/dumb_backend.cc: check return value of posix_fadvise()
Danny Al-Gaaf [Tue, 28 May 2013 12:26:29 +0000 (14:26 +0200)]
bench/dumb_backend.cc: check return value of posix_fadvise()

CID 743396 (#1 of 1): Unchecked return value from library (CHECKED_RETURN)
  check_return: Calling function "posix_fadvise(fd, offset, bl->length(), 4)"
   without checking return value. This library function may fail and return
   an error code.
  unchecked_value: No check of the return value of
   "posix_fadvise(fd, offset, bl->length(), 4)".

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agosmall_io_bench_fs.cc: check return value of FileStore::mkfs/mount()
Danny Al-Gaaf [Tue, 28 May 2013 12:08:09 +0000 (14:08 +0200)]
small_io_bench_fs.cc: check return value of FileStore::mkfs/mount()

CID 743398 (#1 of 1): Unchecked return value (CHECKED_RETURN)
  check_return: Calling function "FileStore::mount()" without
   checking return value (as is done elsewhere 4 out of 5 times).
  unchecked_value: No check of the return value of "fs.FileStore::mount()"

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomds/Locker.cc: fix explicit null dereferenced
Danny Al-Gaaf [Fri, 31 May 2013 16:28:07 +0000 (18:28 +0200)]
mds/Locker.cc: fix explicit null dereferenced

CID 716916 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
  var_deref_model: Passing null pointer "in" to function
  "CInode::is_head()", which dereferences it.

Add assert.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomds/Server.cc: fix explicit null dereferenced
Danny Al-Gaaf [Tue, 28 May 2013 10:55:19 +0000 (12:55 +0200)]
mds/Server.cc: fix explicit null dereferenced

CID 716928 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
  var_deref_model: Passing null pointer "session" to function
  "Session::trim_completed_requests(tid_t)", which dereferences it.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoMerge branch 'wip-5046'
Samuel Just [Fri, 31 May 2013 05:39:12 +0000 (22:39 -0700)]
Merge branch 'wip-5046'

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agodoc: Updated to reflect glossary usage.
John Wilkins [Fri, 31 May 2013 03:28:22 +0000 (20:28 -0700)]
doc: Updated to reflect glossary usage.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Updated title and syntax to reflect glossary usage.
John Wilkins [Fri, 31 May 2013 03:27:42 +0000 (20:27 -0700)]
doc: Updated title and syntax to reflect glossary usage.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Updated to reflect glossary usage.
John Wilkins [Fri, 31 May 2013 03:27:01 +0000 (20:27 -0700)]
doc: Updated to reflect glossary usage.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Updated title to reflect glossary usage.
John Wilkins [Fri, 31 May 2013 03:26:03 +0000 (20:26 -0700)]
doc: Updated title to reflect glossary usage.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Updated conf with ServerAlias for S3 subdomains.
John Wilkins [Fri, 31 May 2013 03:25:25 +0000 (20:25 -0700)]
doc: Updated conf with ServerAlias for S3 subdomains.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Updated object storage quick start for S3-style subdomains.
John Wilkins [Fri, 31 May 2013 03:24:55 +0000 (20:24 -0700)]
doc: Updated object storage quick start for S3-style subdomains.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Updated text with new glossary terms.
John Wilkins [Fri, 31 May 2013 03:22:58 +0000 (20:22 -0700)]
doc: Updated text with new glossary terms.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Removed FAQ from the index.
John Wilkins [Fri, 31 May 2013 03:21:48 +0000 (20:21 -0700)]
doc: Removed FAQ from the index.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Removed FAQ doc. It's now in the wiki.
John Wilkins [Fri, 31 May 2013 03:21:20 +0000 (20:21 -0700)]
doc: Removed FAQ doc. It's now in the wiki.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agorbd/kernel.sh: quit looking for snapshot sysfs entries
Alex Elder [Thu, 30 May 2013 23:10:46 +0000 (18:10 -0500)]
rbd/kernel.sh: quit looking for snapshot sysfs entries

The sysfs entries for snapshots went away a while ago, and this
script used them to verify sizes matched what was expected.

Instead, look at the mapped size of the snapshot in the places
that used to look for the image's snapshot sysfs files.

Also, switch over to using "udevadm settle" rather than a delay to
wait for udev to do its thing.  Insert them at more appropriate
places--right after "rmd map" commands and before and after the
"rbd unmap" calls.

Stop doing the manual refresh calls as well.  The osd will trigger
refreshes whenever the image size or shapshot context changes.

Finally, the cleanup routine is called initially, when there really
isn't expected to be anything to clean up.  Change the rbd commands
to run there conditionally, only if the target of the command
already exists.

Signed-off-by: Alex Elder <elder@inktank.com>
12 years agomove log, ondisklog, missing from PG to PGLog
Loic Dachary [Wed, 22 May 2013 12:14:26 +0000 (14:14 +0200)]
move log, ondisklog, missing from PG to PGLog

PG::log, PG::ondisklog, PG::missing are moved from PG to a new PGLog
class and are made protected data members. It is a preliminary step
before writing unit tests to cover the methods that have side effects
on these data members and define a clean PGLog API. It improves
encapsulation and does not change any of the logic already in
place.

Possible issues :

* an additional reference (PG->PGLog->IndexedLog instead of
  PG->IndexedLog for instance) is introduced : is it optimized ?

* rewriting log.log into pg_log.get_log().log affects the readability
  but should be optimized and have no impact on performances

The guidelines followed for this patch are:

* const access to the data members are preserved, no attempt is made
  to define accessors

* all non const methods are in PGLog, no access to non const methods of
  PGLog::log, PGLog::logondisk and PGLog::missing are provided

* when methods are moved from PG to PGLog the change to their
  implementation is restricted to the minimum.

* the PG::OndiskLog and PG::IndexedLog sub classes are moved
  to PGLog sub classes unmodified and remain public

A const version of the pg_log_t::find_entry method was added.

A const accessor is provided for PGLog::get_log, PGLog::get_missing,
PGLog::get_ondisklog but no non-const accessor.

Arguments are added to most of the methods moved from PG to PGLog so
that they can get access to PG data members such as info or log_oid.

The PGLog method are sorted according to the data member they modify.

//////////////////// missing ////////////////////

* The pg_missing_t::{got,have,need,add,rm} methods are wrapped as
  PGLog::missing_{got,have,need,add,rm}

//////////////////// log ////////////////////

* PGLog::get_tail, PGLog::get_head getters are created

* PGLog::set_tail, PGLog::set_head, PGLog::set_last_requested setters
  are created

* PGLog::index, PGLog::unindex, PGLog::add wrappers,
  PGLog::reset_recovery_pointers are created

* PGLog::clear_info_log replaces PG::clear_info_log

* PGLog::trim replaces PG::trim

//////////////////// log & missing ////////////////////

* PGLog::claim_log is created with code extracted from
  PG::RecoveryState::Stray::react.

* PGLog::split_into is created with code extracted from
  PG::split_into.

* PGLog::recover_got is created with code extracted from
  ReplicatedPG::recover_got.

* PGLog::activate_not_complete is created with code extracted
  from PG::active

* PGLog:proc_replica_log is created with code extracted from
  PG::proc_replica_log

* PGLog:write_log is created with code extracted from
  PG::write_log

* PGLog::merge_old_entry replaces PG::merge_old_entry
  The remove_snap argument is used to collect hobject_t

* PGLog::rewind_divergent_log replaces PG::rewind_divergent_log
  The remove_snap argument is used to collect hobject_t
  A new PG::rewind_divergent_log method is added to call
  remove_snap_mapped_object on each of the remove_snap
  elements

* PGLog::merge_log replaces PG::merge_log
  The remove_snap argument is used to collect hobject_t
  A new PG::merge_log method is added to call
  remove_snap_mapped_object on each of the remove_snap
  elements

* PGLog:write_log is created with code extracted from PG::write_log. A
  non-static version is created for convenience but is a simple
  wrapper.

* PGLog:read_log replaces PG::read_log. A non-static version is
  created for convenience but is a simple wrapper.

* PGLog:read_log_old replaces PG::read_log_old.

http://tracker.ceph.com/issues/5046 refs #5046

Signed-off-by: Loic Dachary <loic@dachary.org>
12 years agoos/WBThrottle: remove asserts in clear()
Samuel Just [Thu, 30 May 2013 22:27:27 +0000 (15:27 -0700)]
os/WBThrottle: remove asserts in clear()

cur_ios, etc may not be zero due to an in progress
flush.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agodoc: note openstack changes for Grizzly
Josh Durgin [Thu, 30 May 2013 21:17:35 +0000 (14:17 -0700)]
doc: note openstack changes for Grizzly

These are just for the cinder configuration, nothing else changed.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoAdded -r option to usage
Christophe Courtaut [Wed, 29 May 2013 09:07:24 +0000 (11:07 +0200)]
Added -r option to usage

Added the -r option, which starts the radosgw and apache2 to access it
to the usage message.

Signed-off-by: Christophe Courtaut <christophe.courtaut@gmail.com>
12 years agorbd/concurrent.sh: probe rbd module at start
Alex Elder [Thu, 30 May 2013 15:10:16 +0000 (10:10 -0500)]
rbd/concurrent.sh: probe rbd module at start

There's no guarantee the rbd module is loaded when this script is
run, so add a line that loads it if necessary.

Signed-off-by: Alex Elder <elder@inktank.com>
12 years agoMerge pull request #331 from ceph/wip-osd-interfacecheck
Sage Weil [Thu, 30 May 2013 05:45:37 +0000 (22:45 -0700)]
Merge pull request #331 from ceph/wip-osd-interfacecheck

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoMerge branch 'next'
Sage Weil [Thu, 30 May 2013 05:44:40 +0000 (22:44 -0700)]
Merge branch 'next'

12 years agoosd: wait for healthy pings from peers in waiting-for-healthy state 331/head
Sage Weil [Wed, 29 May 2013 20:26:45 +0000 (13:26 -0700)]
osd: wait for healthy pings from peers in waiting-for-healthy state

If we are (wrongly) marked down, we need to go into the waiting-for-healthy
state and verify that our network interfaces are working before trying to
rejoin the cluster.

 - make _is_healthy() check require positive proof of pings working
 - do heartbeat checks and updates in this state
 - reset the random peers every heartbeat_interval, in case we keep picking
   bad ones

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: distinguish between definitely healthy and definitely not unhealthy
Sage Weil [Wed, 29 May 2013 20:15:41 +0000 (13:15 -0700)]
osd: distinguish between definitely healthy and definitely not unhealthy

is_unhealthy() will assume they are healthy for some period after we
send our first ping attempt.  is_healthy() is now a strict check that we
know they are healthy.

Switch the failure report check to use is_unhealthy(); use is_healthy()
everywhere else, including the waiting-for-healthy pre-boot checks.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: remove down hb peers
Sage Weil [Wed, 29 May 2013 19:24:28 +0000 (12:24 -0700)]
osd: remove down hb peers

If a (say, random) peer goes down, filter it out.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: only add pg peers if active
Sage Weil [Wed, 29 May 2013 19:24:04 +0000 (12:24 -0700)]
osd: only add pg peers if active

We will soon be in this method for the waiting-for-healthy state.  As
a consequence, we need to remove any down peers.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: factor out _remove_heartbeat_peer
Sage Weil [Wed, 29 May 2013 19:16:28 +0000 (12:16 -0700)]
osd: factor out _remove_heartbeat_peer

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: augment osd heartbeat peers with neighbors and randoms, to up some min
Sage Weil [Wed, 29 May 2013 18:27:38 +0000 (11:27 -0700)]
osd: augment osd heartbeat peers with neighbors and randoms, to up some min

- always include our neighbors to ensure we have a fully-connected
  graph
- include some random neighbors to get at least some min number of peers.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: initialize new_state field when we use it
Sage Weil [Wed, 29 May 2013 23:50:04 +0000 (16:50 -0700)]
osd: initialize new_state field when we use it

If we use operator[] on a new int field its value is undefined; avoid
reading it or using |= et al until we initialize it.

Fixes: #4967
Backport: cuttlefish, bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
12 years agoMerge branch 'wip_osd_throttle'
Samuel Just [Wed, 29 May 2013 22:06:18 +0000 (15:06 -0700)]
Merge branch 'wip_osd_throttle'

Fixes: #4782
Reviewed-by: Sage Weil
12 years agoWBThrottle: add some comments and some asserts
Samuel Just [Wed, 29 May 2013 22:05:51 +0000 (15:05 -0700)]
WBThrottle: add some comments and some asserts

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoWBThrottle: rename replica nocache
Samuel Just [Wed, 29 May 2013 22:05:34 +0000 (15:05 -0700)]
WBThrottle: rename replica nocache

We may want to influence the caching behavior for other
reasons.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: move health checks into a single helper
Sage Weil [Mon, 27 May 2013 22:27:59 +0000 (15:27 -0700)]
osd: move health checks into a single helper

For now we still only look at the internal heartbeats.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: avoid duplicate mon requests for a new osdmap
Sage Weil [Wed, 29 May 2013 20:16:24 +0000 (13:16 -0700)]
osd: avoid duplicate mon requests for a new osdmap

sub_want() returns true if this is a new sub; only renew then.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: tell peers that ping us if they are dead
Sage Weil [Wed, 29 May 2013 20:16:01 +0000 (13:16 -0700)]
osd: tell peers that ping us if they are dead

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: simplify is_healthy() check during boot
Sage Weil [Mon, 27 May 2013 22:24:56 +0000 (15:24 -0700)]
osd: simplify is_healthy() check during boot

This has a slight behavior change in that we ask the mon for the latest
osdmap if our internal heartbeat is failing.  That isn't useful yet, but
will be shortly.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomds: stay in SCAN state in file_eval
Sage Weil [Tue, 28 May 2013 17:51:11 +0000 (10:51 -0700)]
mds: stay in SCAN state in file_eval

If we are in the SCAN state, stay there until the recovery finishes.  Do
not jump to another state from file_eval().

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 0071b8e75bd3f5a09cc46e2225a018f6d1ef0680)

12 years agomds: stay in SCAN state in file_eval
Sage Weil [Tue, 28 May 2013 17:51:11 +0000 (10:51 -0700)]
mds: stay in SCAN state in file_eval

If we are in the SCAN state, stay there until the recovery finishes.  Do
not jump to another state from file_eval().

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMakefile: include new message header files
Sage Weil [Tue, 28 May 2013 22:52:46 +0000 (15:52 -0700)]
Makefile: include new message header files

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'yan/wip-mds'
Sage Weil [Wed, 29 May 2013 17:26:56 +0000 (10:26 -0700)]
Merge remote-tracking branch 'yan/wip-mds'

Reviewed-by: Sage Weil <sage@inktank.com>
Conflicts:
src/mds/MDCache.cc

12 years agoosd: do not assume head obc object exists when getting snapdir
Sage Weil [Wed, 29 May 2013 16:49:11 +0000 (09:49 -0700)]
osd: do not assume head obc object exists when getting snapdir

For a list-snaps operation on the snapdir, do not assume that the obc for the
head means the object exists.  This fixes a race between a head deletion and
a list-snaps that wrongly returns ENOENT, triggered by the DiffItersateStress
test when thrashing OSDs.

Fixes: #5183
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoMerge pull request #329 from javacruft/wip-fuse-deps
Sage Weil [Wed, 29 May 2013 15:14:27 +0000 (08:14 -0700)]
Merge pull request #329 from javacruft/wip-fuse-deps

Use new fuse package instead of fuse-utils

12 years agoUse new fuse package instead of fuse-utils 329/head
James Page [Wed, 29 May 2013 09:57:17 +0000 (10:57 +0100)]
Use new fuse package instead of fuse-utils

The fuse-utils package was deprecated a while ago.

Switch the primary dependency for fuse tools to use
the preferred 'fuse' package.

Signed-off-by: James Page <james.page@ubuntu.com>
12 years agomon: disable tdump by default
Sage Weil [Wed, 29 May 2013 05:13:11 +0000 (22:13 -0700)]
mon: disable tdump by default

Grr.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'gh/last'
Sage Weil [Wed, 29 May 2013 05:10:21 +0000 (22:10 -0700)]
Merge remote-tracking branch 'gh/last'

12 years agoMerge branch 'wip-5172'
Sage Weil [Wed, 29 May 2013 03:44:48 +0000 (20:44 -0700)]
Merge branch 'wip-5172'

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: fix note_down_osd
Sage Weil [Wed, 29 May 2013 03:38:43 +0000 (20:38 -0700)]
osd: fix note_down_osd

Fix bug introduced in 27381c0c6259ac89f5f9c592b4bfb585937a1cfc.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: fix hb con failure handler
Sage Weil [Wed, 29 May 2013 03:39:30 +0000 (20:39 -0700)]
osd: fix hb con failure handler

Fix a few bugs introduced by 27381c0c6259ac89f5f9c592b4bfb585937a1cfc:

- check against both front and back cons; either one may have failed.
- close *both* front and back before reopening either.  this is
  overkill, but slightly simpler code.
- fix leak of con when marking down
- handle race against osdmap update and note_down_osd

Fixes: #5172
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #319 from dalgaaf/wip-da-pylint-3
Sage Weil [Wed, 29 May 2013 02:52:41 +0000 (19:52 -0700)]
Merge pull request #319 from dalgaaf/wip-da-pylint-3

Fix some smaller Python issues

12 years agoMerge pull request #326 from dalgaaf/wip-da-CID-727978
Sage Weil [Tue, 28 May 2013 22:48:11 +0000 (15:48 -0700)]
Merge pull request #326 from dalgaaf/wip-da-CID-727978

kv_flat_btree_async.cc: fix AioCompletion resource leak

12 years agov0.63 v0.63
Gary Lowell [Tue, 28 May 2013 20:58:22 +0000 (13:58 -0700)]
v0.63

12 years agoHashIndex: sync top directory during start_split,merge,col_split
Samuel Just [Tue, 28 May 2013 18:10:05 +0000 (11:10 -0700)]
HashIndex: sync top directory during start_split,merge,col_split

Otherwise, the links might be ordered after the in progress
operation tag write.  We need the in progress operation tag to
correctly recover from an interrupted merge, split, or col_split.

Fixes: #5180
Backport: cuttlefish, bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agodoc/dev/osd_internals: add wbthrottle.rst 332/head
Samuel Just [Fri, 24 May 2013 20:35:14 +0000 (13:35 -0700)]
doc/dev/osd_internals: add wbthrottle.rst

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoWBThrottle: add perfcounters
Samuel Just [Tue, 28 May 2013 17:41:52 +0000 (10:41 -0700)]
WBThrottle: add perfcounters

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoMerge pull request #325 from dalgaaf/wip-da-CID-727980
Sage Weil [Tue, 28 May 2013 17:27:56 +0000 (10:27 -0700)]
Merge pull request #325 from dalgaaf/wip-da-CID-727980

kv_flat_btree_async.cc: fix AioCompletion resource leak

12 years agoMerge pull request #324 from dalgaaf/wip-da-CID-727979
Sage Weil [Tue, 28 May 2013 17:27:25 +0000 (10:27 -0700)]
Merge pull request #324 from dalgaaf/wip-da-CID-727979

kv_flat_btree_async.cc: fix AioCompletion resource leak

12 years agoosd/OSDMap: fix Incremental dump
Sage Weil [Tue, 28 May 2013 16:16:17 +0000 (09:16 -0700)]
osd/OSDMap: fix Incremental dump

The front hb addr entry may not be present.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #322 from guilhem/patch-1
Sage Weil [Tue, 28 May 2013 15:43:10 +0000 (08:43 -0700)]
Merge pull request #322 from guilhem/patch-1

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agokv_flat_btree_async.cc: fix AioCompletion resource leak 326/head
Danny Al-Gaaf [Tue, 28 May 2013 10:43:12 +0000 (12:43 +0200)]
kv_flat_btree_async.cc: fix AioCompletion resource leak

Call AioCompletion::release() if the completion is no longer needed.

CID 727978 (#1-2 of 2): Resource leak (RESOURCE_LEAK)
  leaked_storage: Variable "obj_aioc" going out of scope leaks the
  storage it points to.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agokv_flat_btree_async.cc: fix AioCompletion resource leak 324/head
Danny Al-Gaaf [Tue, 28 May 2013 10:38:57 +0000 (12:38 +0200)]
kv_flat_btree_async.cc: fix AioCompletion resource leak

Call AioCompletion::release() if the completion is no longer needed.

CID 727979 (#1-2 of 2): Resource leak (RESOURCE_LEAK)
  leaked_storage: Variable "a" going out of scope leaks the storage
  it points to.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agokv_flat_btree_async.cc: fix AioCompletion resource leak 325/head
Danny Al-Gaaf [Tue, 28 May 2013 10:27:37 +0000 (12:27 +0200)]
kv_flat_btree_async.cc: fix AioCompletion resource leak

Call AioCompletion::release() if the completion is no longer
needed.

CID 727980 (#1-4 of 4): Resource leak (RESOURCE_LEAK)
  leaked_storage: Variable "aioc" going out of scope leaks
  the storage it points to.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoRemove mon socket in post-stop 322/head
Guilhem Lettron [Mon, 27 May 2013 10:41:53 +0000 (12:41 +0200)]
Remove mon socket in post-stop

If ceph-mon segfault, socket file isn't removed.

By adding a remove in post-stop, upstart clean run directory properly.

Signed-off-by: Guilhem Lettron <guilhem@lettron.fr>
12 years agomds: use "open-by-ino" function to open remote link
Yan, Zheng [Sun, 26 May 2013 11:04:34 +0000 (19:04 +0800)]
mds: use "open-by-ino" function to open remote link

Also add a new config option "mds_open_remote_link_mode". The anchor
approach is used by default. If mode is non-zero, use the open-by-ino
function. In case open-by-ino function fails, if mode is 1, retry
using the anchor approach, otherwise trigger assertion.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: open missing cap inodes
Yan, Zheng [Sat, 25 May 2013 13:30:38 +0000 (21:30 +0800)]
mds: open missing cap inodes

When a recovering MDS enters reconnect stage, client sends reconnect
messages to it. The message lists open files, their path, and issued
caps. If an inode is not in the cache, the recovering MDS uses the
path client provides to determine if it's the inode's authority. If
not, the recovering MDS exports the inode's caps to other MDS. The
issue here is that the path client provides isn't always accuracy.

The fix is use recently added "open inode by ino" function to open
any missing cap inodes when the recovering MDS enters rejoin stage.
Send cache rejoin messages to other MDS after all caps' authorities
are determined.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: bump the protocol version
Yan, Zheng [Fri, 24 May 2013 05:42:15 +0000 (13:42 +0800)]
mds: bump the protocol version

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: open inode by ino
Yan, Zheng [Wed, 15 May 2013 02:28:58 +0000 (10:28 +0800)]
mds: open inode by ino

This patch adds "open-by-ino" helper. It utilizes backtrace to find
inode's path and open the inode. The algorithm looks like:

1. Check MDS peers. If any MDS has the inode in its cache, goto step 6.
2. Fetch backtrace. If backtrace was previously fetched and get the
   same backtrace again, return -EIO.
3. Traverse the path in backtrace. If the inode is found, goto step 6;
   if non-auth dirfrag is encountered, goto next step. If fail to find
   the inode in its parent dir, goto step 1.
4. Request MDS peers to traverse the path in backtrace. If the inode
   is found, goto step 6. If MDS peer encounters non-auth dirfrag, it
   stops traversing. If any MDS peer fails to find the inode in its
   parent dir, goto step 1.
5. Use the same algorithm to open the inode's parent. Goto step 3 if
   succeeds; goto step 1 if fails.
6. return the inode's auth MDS ID.

The algorithm has two main assumptions:
1. If an inode is in its auth MDS's cache, its on-disk backtrace
   can be out of date.
2. If an inode is not in any MDS's cache, its on-disk backtrace
   must be up to date.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: move fetch_backtrace() to class MDCache
Yan, Zheng [Fri, 17 May 2013 21:49:22 +0000 (05:49 +0800)]
mds: move fetch_backtrace() to class MDCache

We may want to fetch backtrace while corresponding inode isn't
instantiated. MDCache::fetch_backtrace() will be used by later
patch.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: remove old backtrace handling
Yan, Zheng [Fri, 17 May 2013 08:11:27 +0000 (16:11 +0800)]
mds: remove old backtrace handling

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: update backtraces when unlinking inodes
Yan, Zheng [Sat, 18 May 2013 09:16:03 +0000 (17:16 +0800)]
mds: update backtraces when unlinking inodes

unlink moves inodes to stray dir, it's a special form of rename.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: bring back old style backtrace handling
Yan, Zheng [Fri, 17 May 2013 08:43:01 +0000 (16:43 +0800)]
mds: bring back old style backtrace handling

To queue a backtrace update, current code allocates a BacktraceInfo
structure and adds it to log segment's update_backtraces list. The
main issue of this approach is that BacktraceInfo is independent
from inode. It's very inconvenient to find pending backtrace updates
for given inodes. When exporting inodes from one MDS to another
MDS, we need find and cancel all pending backtrace updates on the
source MDS.

This patch brings back old backtrace handling code and adapts it
for the current backtrace format. The basic idea behind of the old
code is: when an inode's backtrace becomes dirty, add the inode to
log segment's dirty_parent_inodes list.

Compare to the current backtrace handling, another difference is
that backtrace update is journalled in EMetaBlob::full_bit

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: rename last_renamed_version to backtrace_version
Yan, Zheng [Fri, 17 May 2013 06:24:57 +0000 (14:24 +0800)]
mds: rename last_renamed_version to backtrace_version

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: journal backtrace update in EMetaBlob::fullbit
Yan, Zheng [Fri, 17 May 2013 08:02:03 +0000 (16:02 +0800)]
mds: journal backtrace update in EMetaBlob::fullbit

Current way to journal backtrace update is set EMetaBlob::update_bt
to true. The problem is that an EMetaBlob can include several inodes.
If an EMetaBlob's update_bt is true, journal replay code has to queue
backtrace updates for all inodes in the EMetaBlob.

This patch adds two new flags to class EMetaBlob::fullbit, make it be
able to journal backtrace update.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: reorder EMetaBlob::add_primary_dentry's parameters
Yan, Zheng [Thu, 9 May 2013 03:27:53 +0000 (11:27 +0800)]
mds: reorder EMetaBlob::add_primary_dentry's parameters

prepare for adding new state parameter such as 'dirty_parent'

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: warn on unconnected snap realms
Yan, Zheng [Wed, 15 May 2013 03:24:36 +0000 (11:24 +0800)]
mds: warn on unconnected snap realms

When there are more than one active MDS, restarting MDS triggers
assertion "reconnected_snaprealms.empty()" quite often. If there
is no snapshot in the FS, the items left in reconnected_snaprealms
should be other MDS' mdsdir. I think it's harmless.

If there are snapshots in the FS, the assertion probably can catch
real bugs. But at present, snapshot feature is broken, fixing it is
non-trivial. So replace the assertion with a warning.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: slient MDCache::trim_non_auth()
Yan, Zheng [Wed, 22 May 2013 23:37:40 +0000 (07:37 +0800)]
mds: slient MDCache::trim_non_auth()

No need to output the function's debug message to console.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: fix check for base inode discovery
Yan, Zheng [Sat, 11 May 2013 10:47:49 +0000 (18:47 +0800)]
mds: fix check for base inode discovery

If a MDiscover message is for discovering base inode, want_base_dir
should be false, path should be empty.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: Fix replica's allowed caps for filelock in SYNC_LOCK state
Yan, Zheng [Mon, 6 May 2013 02:18:36 +0000 (10:18 +0800)]
mds: Fix replica's allowed caps for filelock in SYNC_LOCK state

For replica, filelock in LOCK_LOCK state doesn't allow Fc cap. So
filelock in LOCK_SYNC_LOCK/LOCK_EXCL_LOCK state shouldn't allow Fc
cap either.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: defer releasing cap if necessary
Yan, Zheng [Mon, 6 May 2013 01:09:59 +0000 (09:09 +0800)]
mds: defer releasing cap if necessary

When inode is freezing or frozen, we defer processing MClientCaps
messages and cap release embedded in requests. The same deferral
logical should also cover MClientCapRelease messages.

12 years agomds: fix Locker::request_inode_file_caps()
Yan, Zheng [Thu, 16 May 2013 17:44:23 +0000 (01:44 +0800)]
mds: fix Locker::request_inode_file_caps()

After sending cache rejoin message, replica need notify auth MDS when
cap_wanted changes. But it can send MInodeFileCaps message only after
receiving auth MDS' rejoin ack. Locker::request_inode_file_caps() has
correct wait logical, but it skips sending MInodeFileCaps message if
the auth MDS is still in rejoin state.

The fix is defer sending MInodeFileCaps message until the auth MDS
is active. It makes the function's wait logical less tricky.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: notify auth MDS when cap_wanted changes
Yan, Zheng [Mon, 6 May 2013 01:17:01 +0000 (09:17 +0800)]
mds: notify auth MDS when cap_wanted changes

So the auth MDS can choose locks' states base on our cap_wanted.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: export CInode:mds_caps_wanted
Yan, Zheng [Mon, 6 May 2013 01:06:52 +0000 (09:06 +0800)]
mds: export CInode:mds_caps_wanted

CInode:mds_caps_wanted is used to keep track of caps wanted by non-auth
MDS. The auth MDS checks it when choosing locks' states.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: export CInode::STATE_NEEDSRECOVER
Yan, Zheng [Mon, 6 May 2013 01:00:19 +0000 (09:00 +0800)]
mds: export CInode::STATE_NEEDSRECOVER

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: send slave request after target MDS is active
Yan, Zheng [Mon, 8 Apr 2013 08:17:11 +0000 (16:17 +0800)]
mds: send slave request after target MDS is active

when failure of peer is detected, MDCache::handle_mds_failure()
checks if there are requests waiting for slave replies from the
failed peer, and adds them to the "wait for active peer" list.
The "retry request" logical only covers slave requests sent before
MDCache::handle_mds_failure() is called. If a slave request was
sent while peer isn't up, we wait for its reply forever.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: unfreeze inode after rename rollback finishes
Yan, Zheng [Sat, 6 Apr 2013 22:35:56 +0000 (06:35 +0800)]
mds: unfreeze inode after rename rollback finishes

we should not wake up the unfreeze waiter while the inode is still
linked to a non-auth dirfrag.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: remove buggy cache rejoin code
Yan, Zheng [Tue, 7 May 2013 00:56:11 +0000 (08:56 +0800)]
mds: remove buggy cache rejoin code

I previously added code to handle a corner case of cache rejoin:
entire subtree, together with the inode subtree root belongs to,
were trimmed between sending cache rejoin and receiving rejoin ack.
In this case, we should send cache expire message to the subtree's
auth MDS. But the code is complete broken, remove it temporarily.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agomds: fix typo in Server::do_rename_rollback
Yan, Zheng [Sun, 7 Apr 2013 06:49:53 +0000 (14:49 +0800)]
mds: fix typo in Server::do_rename_rollback

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>