]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
12 years agoceph.in: reject --admin-daemon so it can't do harm
Dan Mick [Mon, 22 Jul 2013 18:31:09 +0000 (11:31 -0700)]
ceph.in: reject --admin-daemon so it can't do harm

Fixes: #3944
Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agoFileJournal: fix posix_fallocate error handling
Noah Watkins [Sun, 21 Jul 2013 17:54:00 +0000 (10:54 -0700)]
FileJournal: fix posix_fallocate error handling

From the man page for posix_fallocate:

    posix_fallocate() returns zero on success, or an error
    number on failure.  Note that errno is not set.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoOSD::_make_pg: use createmap, not osdmap
Samuel Just [Mon, 22 Jul 2013 18:08:04 +0000 (11:08 -0700)]
OSD::_make_pg: use createmap, not osdmap

The osd lock is not held at this point, we must use
the createmap passed in.

Fixes: #5656
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agorgw: read attributes when reading bucket entry point
Yehuda Sadeh [Sat, 20 Jul 2013 05:54:46 +0000 (22:54 -0700)]
rgw: read attributes when reading bucket entry point

Fixes: #5691
We need to also read the attributes, as bucket might be a legacy
bucket and might have all bucket instance info in that object.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Tested-by: Faidon Liambotis <faidon@wikimedia.org>
12 years agoOSD::RemoveWQ: do not apply_transaction while blocking _try_resurrect_pg
Samuel Just [Fri, 19 Jul 2013 22:56:52 +0000 (15:56 -0700)]
OSD::RemoveWQ: do not apply_transaction while blocking _try_resurrect_pg

Some callbacks take the osd lock, so we need to avoid blocking an
osd lock holding thread while waiting on a filestore callback.
Instead, just queue the transaction, and allow _try_resurrect_pg
to cancel us while we are waiting for the transaction to go through
(CLEARING_WAITING).

Fixes: #5672
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoFileStore: use complete() instead of finish() and delete
Samuel Just [Sat, 20 Jul 2013 00:35:22 +0000 (17:35 -0700)]
FileStore: use complete() instead of finish() and delete

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoFinisher: use complete() not finish() and delete
Samuel Just [Sat, 20 Jul 2013 00:34:53 +0000 (17:34 -0700)]
Finisher: use complete() not finish() and delete

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agocommon/Cond.h: add a simpler C_SaferCond Context
Samuel Just [Fri, 19 Jul 2013 22:55:08 +0000 (15:55 -0700)]
common/Cond.h: add a simpler C_SaferCond Context

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoceph.spec.in: Obsolete ceph-libs
Gary Lowell [Wed, 3 Jul 2013 18:28:28 +0000 (11:28 -0700)]
ceph.spec.in:  Obsolete ceph-libs

Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'gh/wip-mon-caps' into next
Sage Weil [Mon, 22 Jul 2013 16:27:35 +0000 (09:27 -0700)]
Merge remote-tracking branch 'gh/wip-mon-caps' into next

Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agoMerge pull request #453 from dalgaaf/wip-da-SCA-cppcheck-7
Sage Weil [Mon, 22 Jul 2013 04:42:07 +0000 (21:42 -0700)]
Merge pull request #453 from dalgaaf/wip-da-SCA-cppcheck-7

Fix SCA and CID issues

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #451 from dalgaaf/wip-da-SCA-cppcheck-6-v2
Sage Weil [Mon, 22 Jul 2013 04:40:22 +0000 (21:40 -0700)]
Merge pull request #451 from dalgaaf/wip-da-SCA-cppcheck-6-v2

Fix some issues from SCA - v2 - against ceph:next

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agocls_replica_log_types.h: pass const std::list<> by reference 453/head
Danny Al-Gaaf [Sat, 20 Jul 2013 18:14:20 +0000 (20:14 +0200)]
cls_replica_log_types.h: pass const std::list<> by reference

Pass  const std::list<> parameter by refrence to
cls_replica_log_progress_marker().

From cppcheck:
 [src/cls/replica_log/cls_replica_log_types.h:64]: (performance)
  Function parameter 'b' should be passed by reference.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomon/PGMonitor.cc: reduce scope of local 'num_slow_osds' variable
Danny Al-Gaaf [Sat, 20 Jul 2013 18:00:13 +0000 (20:00 +0200)]
mon/PGMonitor.cc: reduce scope of local 'num_slow_osds' variable

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agorgw/rgw_bucket.cc: use static_cast<>() instead of C-Style cast
Danny Al-Gaaf [Sat, 20 Jul 2013 17:51:10 +0000 (19:51 +0200)]
rgw/rgw_bucket.cc: use static_cast<>() instead of C-Style cast

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agotest_cls_version.cc: don't free object twice, free the right one
Danny Al-Gaaf [Sat, 20 Jul 2013 17:36:32 +0000 (19:36 +0200)]
test_cls_version.cc: don't free object twice, free the right one

Object 'librados::ObjectWriteOperation *op' is freed twice in the TEST
test_version_inc_read. Free instead 'librados::ObjectReadOperation *rop'

Related cppcheck warning:
 [src/test/cls_version/test_cls_version.cc:79]: (error) Memory
  pointed to by 'op' is freed twice.

This should also fix:

CID 1049247 (#1 of 1): Use after free (USE_AFTER_FREE)
  deref_arg: Calling "librados::ObjectWriteOperation::~ObjectWriteOperation()"
  dereferences freed pointer "op". (The dereference happens because this is
  a virtual function call.)
CID 1049218 (#4 of 4): Resource leak (RESOURCE_LEAK)
  leaked_storage: Variable "rop" going out of scope leaks the storage it
  points to.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agorgw/rgw_metadata.cc: use static_cast<>() instead of C-Style cast
Danny Al-Gaaf [Sat, 20 Jul 2013 17:13:15 +0000 (19:13 +0200)]
rgw/rgw_metadata.cc: use static_cast<>() instead of C-Style cast

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agorgw: change RGWOp::name() to return string instead of char*
Danny Al-Gaaf [Sat, 20 Jul 2013 17:00:50 +0000 (19:00 +0200)]
rgw: change RGWOp::name() to return string instead of char*

Return 'const string' instead of 'const char *' from RGWOp::name() to
avoid the usage of std::string:c_str() to return 'const char *' in
some cases in rgw_rest_replica_log.h.

Returning result of c_str() from a function is dangerous since the
result gets (may) invalid after the related string object gets
destroyed or out of scope (which is the case with return). So you
may end up with garbage in this case.

Related warning from cppcheck:
 [src/rgw/rgw_rest_replica_log.h:39]: (error) Dangerous usage of
  c_str(). The value returned by c_str() is invalid after this call.
 [src/rgw/rgw_rest_replica_log.h:59]: (error) Dangerous usage of
  c_str(). The value returned by c_str() is invalid after this call.
 [src/rgw/rgw_rest_replica_log.h:79]: (error) Dangerous usage of
  c_str(). The value returned by c_str() is invalid after this call

This should also fix:

CID 1049250 (#1 of 1): Wrapper object use after free (WRAPPER_ESCAPE)
  escape: The internal representation of "s" escapes, but is destroyed
  when it exits scope.
CID 1049251 (#1 of 1): Wrapper object use after free (WRAPPER_ESCAPE)
  escape: The internal representation of "s" escapes, but is destroyed
  when it exits scope.
CID 1049252 (#1 of 1): Wrapper object use after free (WRAPPER_ESCAPE)
  escape: The internal representation of "s" escapes, but is destroyed
  when it exits scope.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoqa/workunits/mon/caps.sh: clean up users; rename 450/head
Sage Weil [Sat, 20 Jul 2013 04:50:06 +0000 (21:50 -0700)]
qa/workunits/mon/caps.sh: clean up users; rename

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon/MonCap: simplify rwx match logic
Sage Weil [Sat, 20 Jul 2013 04:48:26 +0000 (21:48 -0700)]
mon/MonCap: simplify rwx match logic

Make this a positive check instead of double negative.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: fix command caps check
Sage Weil [Sat, 20 Jul 2013 04:44:26 +0000 (21:44 -0700)]
mon: fix command caps check

We must require something or else the caps check is going to pass in
a degenerate sense.  Use X for commands.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoqa: workunits: mon: test mon caps permissions
Joao Eduardo Luis [Sat, 20 Jul 2013 03:30:59 +0000 (04:30 +0100)]
qa: workunits: mon: test mon caps permissions

set env var TEST_EXIT_ON_ERROR=0 to obtain all errors instead of exiting
with return 1 on first error found.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agoMerge remote-tracking branch 'gh/wip-swift' into next
Sage Weil [Sat, 20 Jul 2013 04:08:18 +0000 (21:08 -0700)]
Merge remote-tracking branch 'gh/wip-swift' into next

Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agomon/PaxosService: update on_active() docs to clarify calling rules
Sage Weil [Fri, 19 Jul 2013 23:59:15 +0000 (16:59 -0700)]
mon/PaxosService: update on_active() docs to clarify calling rules

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon/OSDMonitor: discard failure waiters, info on shutdown
Sage Weil [Fri, 19 Jul 2013 23:55:03 +0000 (16:55 -0700)]
mon/OSDMonitor: discard failure waiters, info on shutdown

This would prevent a leak, if we didn't assert before that in the
failure_reporter_t dtor.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: OSDMonitor: only thrash and propose if we are the leader
Sage Weil [Fri, 19 Jul 2013 23:36:01 +0000 (16:36 -0700)]
mon: OSDMonitor: only thrash and propose if we are the leader

'thrash_map' is only set if we are the leader, so we would thrash and
propose the pending value if we are the leader.  However, we should keep
the 'is_leader()' check not only for clarity's sake (an unfamiliar reader
may cry OMGBUG, prompting to a patch much like this), but also because
we may lose a subsequent election and become a peon instead, while still
holding a 'thrash_map' value > 0 -- and we really don't want to propose
while being a peon.

[This is a rebased version of 5eac38797d9eb5a59fcff1d81571cff7a2f10e66,
complete with the typo fix in d656aed599ee754646e16386ce5a4ab0117f2d6e.]

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agomon/OSDMonitor: do not wait for readable in send_latest()
Sage Weil [Fri, 19 Jul 2013 23:35:02 +0000 (16:35 -0700)]
mon/OSDMonitor: do not wait for readable in send_latest()

send_latest() checks for readable and, if untrue, will wait before sending
out the latest OSDMap.  This is completely unnecessary; I think it is a
hold-over from when we have independent paxos states.  An audit of all
callers confirms that everyone would be happy with whatever is committed,
even if we are in the process of committing an even newer version.

Effectively, everyone waits *above* this layer in the usual PaxosService
traps for whether we are readable or not.  This means that waiting_for_map
and send_to_waiting() go away entirely, which is nice.

This addresses, among other things: send_to_waiting() is called from
update_from_paxos(), which can be called when we are not readable due to
the paxos commit/finish timing changes in f1ce8d7c955a24 and
c711203c0d4b.  If no subsequent update happens, those waiters never get
their maps.

Instead, we send them immediately--we know they are committed and old
history is as good as future history.

Fixes: #5643
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoRevert "mon/OSDMonitor: send_to_waiting() in on_active()"
Sage Weil [Fri, 19 Jul 2013 23:39:47 +0000 (16:39 -0700)]
Revert "mon/OSDMonitor: send_to_waiting() in on_active()"

This reverts commit f06a124a7fa0717ef8c523408b31d814df57caca.

On peons, on_active() is only called when we *first* become active after an
election.  Only on the leader is it called after each commit/update.  This
makes this change cause other problems (broken subscriptions on peons, in
particular).  We possibly should fix that, but there is also a simpler fix
for the original problem we were trying to solve.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoRevert "mon: OSDMonitor: only thrash and propose if we are the leader"
Sage Weil [Fri, 19 Jul 2013 23:23:04 +0000 (16:23 -0700)]
Revert "mon: OSDMonitor: only thrash and propose if we are the leader"

This reverts commit 5eac38797d9eb5a59fcff1d81571cff7a2f10e66.

12 years agoRevert "mon/OSDMonitor: fix typo"
Sage Weil [Fri, 19 Jul 2013 23:22:48 +0000 (16:22 -0700)]
Revert "mon/OSDMonitor: fix typo"

This reverts commit d656aed599ee754646e16386ce5a4ab0117f2d6e.

12 years agoceph_rest_api.py: remove unused imports
Dan Mick [Thu, 18 Jul 2013 23:33:43 +0000 (16:33 -0700)]
ceph_rest_api.py: remove unused imports

Fixes: #5684
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoceph.in: better error message when daemon command returns nothing
Dan Mick [Wed, 17 Jul 2013 05:14:15 +0000 (22:14 -0700)]
ceph.in: better error message when daemon command returns nothing

Fixes: #5683
signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agomon: improve osdmap subscription debug output
Sage Weil [Fri, 19 Jul 2013 21:50:03 +0000 (14:50 -0700)]
mon: improve osdmap subscription debug output

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'gh/wip-stats' into next
Sage Weil [Fri, 19 Jul 2013 21:49:25 +0000 (14:49 -0700)]
Merge remote-tracking branch 'gh/wip-stats' into next

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoMerge branch 'wip-rgw-next-2' into next
Greg Farnum [Fri, 19 Jul 2013 20:25:48 +0000 (13:25 -0700)]
Merge branch 'wip-rgw-next-2' into next

Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agorgw: remove extra unused param from RGWRados::get_attr()
Yehuda Sadeh [Fri, 19 Jul 2013 20:06:53 +0000 (13:06 -0700)]
rgw: remove extra unused param from RGWRados::get_attr()

No user for the extra obj_version param.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agocls_rgw: quiet down verbose log message
Yehuda Sadeh [Fri, 19 Jul 2013 18:19:05 +0000 (11:19 -0700)]
cls_rgw: quiet down verbose log message

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agorgw: replace logic that compares regions
Yehuda Sadeh [Fri, 19 Jul 2013 16:44:43 +0000 (09:44 -0700)]
rgw: replace logic that compares regions

The logic was a bit broken. Basically, we want to make sure
that region names are the same. However, if region name is not
set then we need to check whether it's the master region. This
can happen in upgrade cases where originally we didn't have
a region name set.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agorgw-admin: link / unlink should report errors
Yehuda Sadeh [Wed, 17 Jul 2013 23:14:02 +0000 (16:14 -0700)]
rgw-admin: link / unlink should report errors

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agorgw: fix time parsing in replica log
Yehuda Sadeh [Fri, 19 Jul 2013 04:50:51 +0000 (21:50 -0700)]
rgw: fix time parsing in replica log

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agorgw: bucket entry point object ver fixes
Yehuda Sadeh [Fri, 19 Jul 2013 00:40:52 +0000 (17:40 -0700)]
rgw: bucket entry point object ver fixes

Multiple fixes:
 - sync master, secondary entry point ver on creation
 - use correct entry point version when removing entry point
 - check correct version on bucket removal

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agorgw: remove s->objv_tracker
Yehuda Sadeh [Thu, 18 Jul 2013 20:07:55 +0000 (13:07 -0700)]
rgw: remove s->objv_tracker

was never initialized correctly anyway. It was only supposed to
be used for buckets, but it was never initialized in that case.
Using s->bucket_info.objv_tracker instead.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agorgw: forward delete bucket request to master after removal
Yehuda Sadeh [Thu, 18 Jul 2013 18:16:15 +0000 (11:16 -0700)]
rgw: forward delete bucket request to master after removal

We can only forward the bucket removal to the master if it was
successfully removed locally.
The master region has no knowledge about whether the
bucket can be removed or not, e.g., there are still objects in the
bucket. If we send it to the master first, then it'll happily remove it
even though it might fail in the end.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agorgw: adjust error for bucket removal on secondary region
Yehuda Sadeh [Thu, 18 Jul 2013 17:48:39 +0000 (10:48 -0700)]
rgw: adjust error for bucket removal on secondary region

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agorgw: forward x_amz_meta headers when forwarding a request
Yehuda Sadeh [Thu, 18 Jul 2013 00:20:30 +0000 (17:20 -0700)]
rgw: forward x_amz_meta headers when forwarding a request

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agorgw: fix bucket re-creation on secondary region
Yehuda Sadeh [Wed, 17 Jul 2013 23:34:50 +0000 (16:34 -0700)]
rgw: fix bucket re-creation on secondary region

We had a problem with bucket recreation, where we identified
that bucket has already existed, but missed the fact that it's
the same bucket, so removal of the bucket index was wrong.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agomon/MonClient: fix small leak
Sage Weil [Thu, 18 Jul 2013 23:58:50 +0000 (16:58 -0700)]
mon/MonClient: fix small leak

We need to delete the version_req_d here.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomsgr: mark addr-based [lazy_]send_message and get_connection deprecated
Sage Weil [Thu, 18 Jul 2013 22:05:22 +0000 (15:05 -0700)]
msgr: mark addr-based [lazy_]send_message and get_connection deprecated

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoclient: mark_down by con
Sage Weil [Thu, 18 Jul 2013 21:50:32 +0000 (14:50 -0700)]
client: mark_down by con

We have the con handy; use it.  This avoids generate a spurious RESET
event, which we do not need or do anything useful with.  Note that in this
case we are not attaching anything to the Connection priv field.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: mark_down session by con, not addr
Sage Weil [Thu, 18 Jul 2013 21:46:57 +0000 (14:46 -0700)]
mon: mark_down session by con, not addr

We have the ConnectionRef here; use it.  This avoids generating a spurious
RESET event for the connection.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: break con <-> session ref cycle in mon even if shutting down
Sage Weil [Thu, 18 Jul 2013 21:44:17 +0000 (14:44 -0700)]
mon: break con <-> session ref cycle in mon even if shutting down

If we get a reset during shutdown, we should still break the cycle to avoid
tripping the valgrind leak detection.  Note that we are touching no
internal Monitor state here and the locking has not changed.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomsg/SimpleMessenger: remove duplicated interface docs
Sage Weil [Thu, 18 Jul 2013 18:28:09 +0000 (11:28 -0700)]
msg/SimpleMessenger: remove duplicated interface docs

Document these in the interface, not the implementation; having two copies
clutters the header and invites them to get out of sync.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomsgr: update docs for mark_down, mark_down_all semantics
Sage Weil [Thu, 18 Jul 2013 17:53:04 +0000 (10:53 -0700)]
msgr: update docs for mark_down, mark_down_all semantics

* RESET events
* note that the reset detection only happens if it is enabled in the
  policy.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomsgr: generate reset event on mark_down to addr (not con)
Sage Weil [Wed, 17 Jul 2013 05:43:26 +0000 (22:43 -0700)]
msgr: generate reset event on mark_down to addr (not con)

If the caller is marking down an addr, they presumably don't have the
Connection* handy, so we should generate a reset event to help them
clean up con <-> session ref cycles.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd/ReplicatedPG: fix obc leak on invalid LIST_SNAPS op
Sage Weil [Thu, 18 Jul 2013 22:02:07 +0000 (15:02 -0700)]
osd/ReplicatedPG: fix obc leak on invalid LIST_SNAPS op

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: break con <-> session cycle when marking down old peers
Sage Weil [Thu, 18 Jul 2013 22:02:02 +0000 (15:02 -0700)]
osd: break con <-> session cycle when marking down old peers

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: make ms_handle_reset debug more useful
Sage Weil [Thu, 18 Jul 2013 22:01:53 +0000 (15:01 -0700)]
osd: make ms_handle_reset debug more useful

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agomon/PGMap: don't mangle stamp_delta in clear_delta()
Sage Weil [Fri, 19 Jul 2013 17:55:02 +0000 (10:55 -0700)]
mon/PGMap: don't mangle stamp_delta in clear_delta()

This is a delta, not a timestamp.

This triggered when a cluster is idle for 2* the mon_delta_reset_interval,
and required a mon restart to fix.

Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: log PG state changes at level 5
Sage Weil [Wed, 10 Jul 2013 19:54:18 +0000 (12:54 -0700)]
osd: log PG state changes at level 5

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agomon/PGMap: avoid negative pg stats when calculating rates 446/head
Sage Weil [Fri, 19 Jul 2013 17:37:16 +0000 (10:37 -0700)]
mon/PGMap: avoid negative pg stats when calculating rates

We periodically see strange values come out of the estimated cluster
throughput and recovery rates.  Pretty sure this is cause by feeding
negative values into the rate arithmetic and then giving the si_t
helpers mangled (sign-extended + bit shifted) values.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon/PGMap: use signed values for calculated rates
Sage Weil [Fri, 19 Jul 2013 17:39:17 +0000 (10:39 -0700)]
mon/PGMap: use signed values for calculated rates

si_t (and friends) does not handle signed values, but at least we can
give the Formatters unmangled values.  This shouldn't happen (tm), but
if it does this will make things a bit less confusing and makes the code
a bit less fragile.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoReplicatedPG: track temp collection contents, clear during on_change
Samuel Just [Fri, 19 Jul 2013 02:26:02 +0000 (19:26 -0700)]
ReplicatedPG: track temp collection contents, clear during on_change

We also assert in on_flushed() that the temp collection is actually
empty.

Fixes: #5670
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoPG, ReplicatedPG: pass a transaction down to ReplicatedPG::on_change
Samuel Just [Fri, 19 Jul 2013 02:25:14 +0000 (19:25 -0700)]
PG, ReplicatedPG: pass a transaction down to ReplicatedPG::on_change

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoosd: add floor() method to pg/osd stat structs
Sage Weil [Thu, 18 Jul 2013 04:52:50 +0000 (21:52 -0700)]
osd: add floor() method to pg/osd stat structs

We often want to maintain a nonnegative value.  We generalize
this to floors other than zero only because it makes the function
call make intuitive sense; I don't think it is at all useful.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: make pool_stat_t *log_size fields signed
Sage Weil [Thu, 18 Jul 2013 04:47:14 +0000 (21:47 -0700)]
osd: make pool_stat_t *log_size fields signed

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon/MonClient: better debugging on version requests
Sage Weil [Fri, 19 Jul 2013 16:59:25 +0000 (09:59 -0700)]
mon/MonClient: better debugging on version requests

From leak hunting, but useful.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomsg/Pipe: work around incorrect features reported by earlier versions
Sage Weil [Thu, 18 Jul 2013 23:24:00 +0000 (16:24 -0700)]
msg/Pipe: work around incorrect features reported by earlier versions

If we see a peer reporting features ~0ull, we know they are deluded in a
particular way and should infer what features they *actually* have.  Do
this right when the features come over the wire to catch all users.

Fixes: #5655
Signed-off-by: Samuel Just <sam.just@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMessage,OSD,PG: make Connection::features private
Sage Weil [Fri, 19 Jul 2013 15:08:02 +0000 (08:08 -0700)]
Message,OSD,PG: make Connection::features private

Use has_feature() method too.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agotest: update cli test for radosgw-admin
Yehuda Sadeh [Fri, 19 Jul 2013 14:47:51 +0000 (07:47 -0700)]
test: update cli test for radosgw-admin

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agoMerge pull request #448 from kri5/wip-5416
Yehuda Sadeh [Fri, 19 Jul 2013 14:20:51 +0000 (07:20 -0700)]
Merge pull request #448 from kri5/wip-5416

rgw: Adds --rgw-zone --rgw-region help text.

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agorgw: Adds --rgw-zone --rgw-region help text. 448/head
Christophe Courtaut [Fri, 19 Jul 2013 08:13:51 +0000 (10:13 +0200)]
rgw: Adds --rgw-zone --rgw-region help text.

Signed-off-by: Christophe Courtaut <christophe.courtaut@gmail.com>
12 years agomon/MonClient: fix small leak
Sage Weil [Thu, 18 Jul 2013 23:58:50 +0000 (16:58 -0700)]
mon/MonClient: fix small leak

We need to delete the version_req_d here.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #445 from ceph/wip-osd-leaks
Sage Weil [Fri, 19 Jul 2013 01:03:48 +0000 (18:03 -0700)]
Merge pull request #445 from ceph/wip-osd-leaks

fix msgr issues causing osd leaks on shutdown

Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoinit-ceph: don't activate-all for vstart clusters
Sage Weil [Fri, 19 Jul 2013 00:10:51 +0000 (17:10 -0700)]
init-ceph: don't activate-all for vstart clusters

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon/PGMonitor: fix 'pg map' output key names
Sage Weil [Thu, 18 Jul 2013 23:53:23 +0000 (16:53 -0700)]
mon/PGMonitor: fix 'pg map' output key names

This got lost in a big file of fixes a while back.  :/

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoPG: add perf counter for peering latency
Samuel Just [Thu, 18 Jul 2013 21:33:37 +0000 (14:33 -0700)]
PG: add perf counter for peering latency

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agomsgr: mark addr-based [lazy_]send_message and get_connection deprecated 445/head
Sage Weil [Thu, 18 Jul 2013 22:05:22 +0000 (15:05 -0700)]
msgr: mark addr-based [lazy_]send_message and get_connection deprecated

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoclient: mark_down by con
Sage Weil [Thu, 18 Jul 2013 21:50:32 +0000 (14:50 -0700)]
client: mark_down by con

We have the con handy; use it.  This avoids generate a spurious RESET
event, which we do not need or do anything useful with.  Note that in this
case we are not attaching anything to the Connection priv field.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: mark_down session by con, not addr
Sage Weil [Thu, 18 Jul 2013 21:46:57 +0000 (14:46 -0700)]
mon: mark_down session by con, not addr

We have the ConnectionRef here; use it.  This avoids generating a spurious
RESET event for the connection.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: break con <-> session ref cycle in mon even if shutting down
Sage Weil [Thu, 18 Jul 2013 21:44:17 +0000 (14:44 -0700)]
mon: break con <-> session ref cycle in mon even if shutting down

If we get a reset during shutdown, we should still break the cycle to avoid
tripping the valgrind leak detection.  Note that we are touching no
internal Monitor state here and the locking has not changed.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomsg/SimpleMessenger: remove duplicated interface docs
Sage Weil [Thu, 18 Jul 2013 18:28:09 +0000 (11:28 -0700)]
msg/SimpleMessenger: remove duplicated interface docs

Document these in the interface, not the implementation; having two copies
clutters the header and invites them to get out of sync.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomsgr: update docs for mark_down, mark_down_all semantics
Sage Weil [Thu, 18 Jul 2013 17:53:04 +0000 (10:53 -0700)]
msgr: update docs for mark_down, mark_down_all semantics

* RESET events
* note that the reset detection only happens if it is enabled in the
  policy.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomsgr: generate reset event on mark_down to addr (not con)
Sage Weil [Wed, 17 Jul 2013 05:43:26 +0000 (22:43 -0700)]
msgr: generate reset event on mark_down to addr (not con)

If the caller is marking down an addr, they presumably don't have the
Connection* handy, so we should generate a reset event to help them
clean up con <-> session ref cycles.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd/ReplicatedPG: fix obc leak on invalid LIST_SNAPS op
Sage Weil [Thu, 18 Jul 2013 22:02:07 +0000 (15:02 -0700)]
osd/ReplicatedPG: fix obc leak on invalid LIST_SNAPS op

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: break con <-> session cycle when marking down old peers
Sage Weil [Thu, 18 Jul 2013 22:02:02 +0000 (15:02 -0700)]
osd: break con <-> session cycle when marking down old peers

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: make ms_handle_reset debug more useful
Sage Weil [Thu, 18 Jul 2013 22:01:53 +0000 (15:01 -0700)]
osd: make ms_handle_reset debug more useful

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agocls_lock: fix duration test
Sage Weil [Thu, 18 Jul 2013 21:06:41 +0000 (14:06 -0700)]
cls_lock: fix duration test

It's possible for us to just be really slow when getting the reply to the
first op or doing the second op, resulting in a successful lock.  If we
do get a success, assert that at least that amount of time has passed to
avoid any false positives.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agomds: tracedn should be NULL for LOOKUPINO/LOOKUPHASH reply
Yan, Zheng [Thu, 18 Jul 2013 02:01:09 +0000 (10:01 +0800)]
mds: tracedn should be NULL for LOOKUPINO/LOOKUPHASH reply

Fixes: #5658
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoFileStore: add global replay guard for split, collection_rename
Samuel Just [Thu, 18 Jul 2013 17:12:17 +0000 (10:12 -0700)]
FileStore: add global replay guard for split, collection_rename

In the event of a split or collection rename, we need to ensure that
we don't replay any operations on objects within those collections
prior to that point.  Thus, we mark a global replay guard on the
collection after doing a syncfs and make sure to check that in
_check_replay_guard() for all object operations.

Fixes: #5154
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agomsg/Pipe: do not hold pipe_lock for verify_authorizer()
Sage Weil [Thu, 18 Jul 2013 16:55:43 +0000 (09:55 -0700)]
msg/Pipe: do not hold pipe_lock for verify_authorizer()

We shouldn't hold the pipe_lock while doing the ms_verify_authorizer
upcalls.

Fix by unlocking a bit earlier, and verifying our state is still correct
in the failure path.

This regression was introduced by ecab4bb9513385bd765cca23e4e2fadb7ac4bac2.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agomon: fix off-by-one in check for when sync falls behind
Sage Weil [Thu, 18 Jul 2013 04:31:46 +0000 (21:31 -0700)]
mon: fix off-by-one in check for when sync falls behind

This is what e213b1bc25a212ffe42623c1d4b4eadf9f69319e intended to do
but managed to bungle by using >= instead of >.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agoMerge pull request #444 from ceph/wip-osd-latency
Sage Weil [Thu, 18 Jul 2013 05:03:07 +0000 (22:03 -0700)]
Merge pull request #444 from ceph/wip-osd-latency

osd: include op queue age histogram in osd_stat_t

Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agorgw: drop unused assignment
Sage Weil [Wed, 17 Jul 2013 04:33:28 +0000 (21:33 -0700)]
rgw: drop unused assignment

rgw/rgw_rados.cc: In member function 'virtual int RGWPutObjProcessor_Atomic::handle_data(ceph::bufferlist&, off_t, void**)':
rgw/rgw_rados.cc:648:5: warning: parameter 'ofs' set but not used [-Wunused-but-set-parameter]

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agomon: make 'health' warn about slow requests 444/head
Sage Weil [Wed, 17 Jul 2013 22:49:16 +0000 (15:49 -0700)]
mon: make 'health' warn about slow requests

Currently we see slow request warnings go by in the cluster log, but they
are not reflected by 'ceph health'.  Use the new op queue histograms to
raise a flag there as well.

For example:

HEALTH_WARN 59 requests are blocked > 32 sec; 2 osds have slow requests
21 ops are blocked > 65.536 sec
38 ops are blocked > 32.768 sec
16 ops are blocked > 65.536 sec on osd.1
23 ops are blocked > 32.768 sec on osd.1
5 ops are blocked > 65.536 sec on osd.2
15 ops are blocked > 32.768 sec on osd.2
2 osds have slow requests

Fixes: #5505
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: include op queue age histogram in osd_stat_t
Sage Weil [Wed, 17 Jul 2013 21:21:40 +0000 (14:21 -0700)]
osd: include op queue age histogram in osd_stat_t

This includes a simple power-of-2 histogram of op ages in the op queue
inside osd_stat_t.  This can be used for a coarse view of overall cluster
performance (it will get summed by the mon), to identify specific outlier
osds who have a higher latency than the others, or to identify stuck ops.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoqa/workunits/cephtool/test.sh: test 'osd create <uuid>'
Sage Weil [Thu, 18 Jul 2013 01:17:29 +0000 (18:17 -0700)]
qa/workunits/cephtool/test.sh: test 'osd create <uuid>'

Make sure it gives us back the same id.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
12 years agoPG: start flush on primary only after we process the master log
Samuel Just [Wed, 17 Jul 2013 22:04:10 +0000 (15:04 -0700)]
PG: start flush on primary only after we process the master log

Once we start serving reads, stray objects must have already
been removed.  Therefore, we have to flush all operations
up to the transaction writing out the authoritative log.
On replicas, we flush in Stray() if we will not eventually
be activated and in ReplicaActive if we are in the acting
set.  This way a replica won't serve a replica read until
the store is consistent.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoReplicatedPG: replace clean_up_local with a debug check
Samuel Just [Wed, 17 Jul 2013 19:51:19 +0000 (12:51 -0700)]
ReplicatedPG: replace clean_up_local with a debug check

Stray objects should have been cleaned up in the merge_log
transactions.  Only on the primary have those operations
necessarily been flushed at activate().

Fixes: 5084
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agomsgr: fix a typo/goto-cross from dd4addef2d
Greg Farnum [Wed, 17 Jul 2013 22:23:12 +0000 (15:23 -0700)]
msgr: fix a typo/goto-cross from dd4addef2d

We didn't build or review carefully enough!

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #441 from ceph/wip-5626
Sage Weil [Wed, 17 Jul 2013 21:50:41 +0000 (14:50 -0700)]
Merge pull request #441 from ceph/wip-5626

msgr fixes for lossless peer sessions

Reviewed-by: Greg Farnum <greg@inktank.com>