Samuel Just [Mon, 29 Jul 2013 16:36:04 +0000 (09:36 -0700)]
OSD: suspend tp timeout while taking pg lock in OpWQ
If N op_tp threads are configured, and recovery_max_active
is set to a sufficiently large number, all N op_tp threads
might grab a MOSDPGPush op off of the queue for the same PG.
The last thread to get the lock will have waited
N*time_to_handle_push before completing its item and pinging
the heartbeat timeout. If that time exceeds the timeout
and there are enough ops waiting, each thread subsequently
will end up exceeding the timeout before completeing an
item preventing the OSD from heartbeating indefinitely.
We prevent this by suspending the timeout while we try to
get the PG lock. Even if we do block for an excessive
period of time attempting to get the lock, hopefully,
the thread holding the lock will cause the threadpool
to time out.
Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Danny Al-Gaaf [Sun, 28 Jul 2013 21:25:58 +0000 (23:25 +0200)]
ceph_authtool.cc: update help/usage text
Added implemented but not listed commands to the help/usage text:
* -g shortcut for --gen-key
* -a shortcut for --add-key
* -u/--set-uid to set auid
* --gen-print-key
* --import-keyring
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Dan Mick [Sat, 27 Jul 2013 00:47:32 +0000 (17:47 -0700)]
ceph-rest-api: clean up options/environment
ceph-rest-api:
* create app from wrapper by calling generate_app()
* pass args to generate_app() (early parsed in wrapper)
* parse -i/--id here as well
* set addr:port on returned app object
* handle only EnvironmentError exceptions; let others spew traceback
* turn off debug when running singlethreaded server
ceph_rest_api.py:
* put glob.* on app.ceph_* instead; pass around app in init code
* drop conf parsing (let librados do its job)
Documentation updated to match.
Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
CID 1058391 (#1 of 1): Out-of-bounds access (OVERRUN)
32. alloc_strlen: Allocating insufficient memory for the terminating null of the string.
CID 1058390 (#1 of 1): Unchecked return value from library (CHECKED_RETURN)
13. check_return: Calling function "this->class_handler->open_all_classes()" without checking return value. It wraps a library function that may fail and return an error code. [show details]
14. unchecked_value: No check of the return value of "this->class_handler->open_all_classes()".
Dan Mick [Tue, 23 Jul 2013 07:50:15 +0000 (00:50 -0700)]
ceph_rest_api.py: obtain and handle tell <osd-or-pgid> commands
Contact an OSD that's up to get a list of the commands, and use
them to add to the URL map.
Special treatment throughout for these commands:
* hack the help signature dump
* keep a 'flavor' per command to allow special handler() processing
* strip off 'tell/<target>' when constructing command
* allow multiple dicts with the same url
(the parameters and get/put methods can change)
* because of above, method must be validated in handler()
* validate the given OSD
* calculate target for command (mon, osd, pg)
Unrelated: make method_dict into global METHOD_DICT
Sage Weil [Fri, 26 Jul 2013 22:25:12 +0000 (15:25 -0700)]
mon/PGMonitor: reset in-core PGMap if on-disk format changes
We might have a sequence like:
- start mon, load pgmap 100
- sync
- including a format upgrade at say v 150
- refresh
- see format_version==1, and try read pgmap:101 as new format
This simply clears our in-memory state if we see that the format has
changed. That will make update_from_paxos reload the latest and prevent
it from walking through the old and useless inc updates.
Note: this does not affect the auth monitor because we unconditionally
load the latest map in update_from_paxos on upgrade. Also, the upgrade
there wasn't a format change--just a translation of cap strings from the
old to new style.
Fixes: #5764 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
Danny Al-Gaaf [Fri, 26 Jul 2013 21:28:44 +0000 (23:28 +0200)]
rgw/rgw_metadata.cc: delete md_log (RGWMetadataLog) in destructor
Call delete on md_log in the destructor.
CID 1054826 (#1 of 1): Resource leak in object (CTOR_DTOR_LEAK)
1. alloc_new: Allocating memory by calling "new RGWMetadataLog(_cct, _store)".
2. var_assign: Assigning: "this->md_log" = "new RGWMetadataLog(_cct, _store)".
3. ctor_dtor_leak: The constructor allocates field "md_log" of
"RGWMetadataManager" but the destructor and whatever functions it calls
do not free it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Sage Weil [Fri, 26 Jul 2013 20:58:46 +0000 (13:58 -0700)]
osd: load all classes on startup
This avoid creating a wide window between when ceph-osd is started and
when a request arrives needing a class and it is loaded. In particular,
upgrading the packages in that window may cause linkage errors (if the
class API has changed, for example).
Fixes: #5752 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
Dan Mick [Wed, 24 Jul 2013 00:23:50 +0000 (17:23 -0700)]
Formatter, admin_socket: make default formatter be json-pretty
If not given, default to json-pretty; if given but not equal to one
of the formatter choices, return NULL as before. Remove defaulting
code in admin_socket.cc in favor of this.
Dan Mick [Thu, 18 Jul 2013 21:38:33 +0000 (14:38 -0700)]
AdminSocket users: use generic formatting
All call() routines get a format parameter; all places where
JSONFormatter was created get a new_formatter() instead.
'plain' formatting is unsupported, and help is forced to be
'json-pretty' as it was.
Dan Mick [Fri, 26 Jul 2013 02:40:26 +0000 (19:40 -0700)]
ceph_rest_api.py: return error in nonformatted mode
When a nonformatted request is made, currently the only text in the
response is the (probably empty) response buffer. Add the statusmsg
as well, where the error is likely to be explained. This lets
the http client get a clue what happened.
Sage Weil [Thu, 25 Jul 2013 18:10:53 +0000 (11:10 -0700)]
mon/Paxos: share uncommitted value when leader is/was behind
If the leader has and older lc than we do, and we are sharing states to
bring them up to date, we still want to also share our uncommitted value.
This particular case was broken by b26b7f6e, which was only contemplating
the case where the leader was ahead of us or at the same point as us, but
not the case where the leader was behind. Note that the call to
share_state() a few lines up will bring them fully up to date, so
after they receive and store_state() for this message they will be at the
same lc as we are.
Fixes: #5750
Backport: cuttlefish Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com> Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
rgw: expose the version of synced items to the poster
To support this, we add an optional out argument to
RGWMetadatManager::put() and fill in the read_version. When the
function returns, that contains whatever the current on-disk version
of the object is (either what already existed or what we just wrote).
Add new STATUS_APPLIED, then specify the RGWX_UPDATE_STATUS header
based on that return code when doing metadata puts.
Add a send_response() function to RGWOp_Metadata_Put in order to
support sending back our new headers. Move the translation from
STATUS_NO_APPLY from set_req_state_err() to this function, so we
can turn different sync results into failures if necessary elsewhere.
rgw: add preliminary support for sync update policies on metadata sync
We want to be able to conditionally apply new updates:
1) if we already have a newer version than the sync is applying for some
reason (replay of logs?), we don't want to go back in time.
2) If both zones were active at the same time, then we'd like to be
able to do a merge based on timestamps.
In order to support this, we add a sync_type flag to the implementations of
RGWMetadataHandler::put, and then check the version or the mtime of the
incoming put to what we have on disk, and refuse the update if needed.
We return the 204 NoContent success code when refusing sync; for the
moment the conversion is automatic but we're going to pull it out in
the next couple commits.
This commit does not complete the feature: we don't provide an interface
for specifying a different sync protocol.
Also increase fd limit defaults to accomodate the larger number
of fds.
Fixes: #5692 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Mark Nelson <mark.nelson@inktank.com>
Samuel Just [Tue, 23 Jul 2013 20:51:26 +0000 (13:51 -0700)]
FileStore::_collection_rename: fix global replay guard
If the replay is being replayed, we might have already
performed the rename, skip it. Also, we must set the
collection replay guard only after we have done the
rename.
Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Danny Al-Gaaf [Thu, 25 Jul 2013 17:12:44 +0000 (19:12 +0200)]
rgw/rgw_metadata.h: init prefix in initialization list
For performance reasons: init 'prefix' with META_LOG_OBJ_PREFIX
in the initialization list of RGWMetadataLog instead of assigning
the value in the constructor body.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>