From: Greg Farnum Date: Tue, 27 Aug 2013 22:16:29 +0000 (-0700) Subject: doc: include plan for new user_version support X-Git-Tag: v0.69~40^2~16 X-Git-Url: http://git-server-git.apps.pok.os.sepia.ceph.com/?a=commitdiff_plain;h=295a84b9d947cf257479e429d7ab01f3dcbd0197;p=ceph.git doc: include plan for new user_version support Signed-off-by: Greg Farnum --- diff --git a/doc/dev/versions.rst b/doc/dev/versions.rst index 0de563a4bf1..ae383873106 100644 --- a/doc/dev/versions.rst +++ b/doc/dev/versions.rst @@ -44,3 +44,49 @@ but in unusual circumstances it might be different. So far no users expect that version to have any relationship to the reassert_version, though; they just want get_current_version() to be monotonically increasing. + +Plan for new user_version support +=============================== + +In order to properly support caching pools, we would like to separate +the user_version from the underlying pg log versions. This is +necessary to support moving objects back and forth between different +pools while still maintaining the version semantics we have previously +provided. Obviously old clients won't be able to handle the caching +pools, so we don't need to worry about them when we do these. But we +*do* need them to interoperate with new clients on non-caching +pools. This obviously means: + +*) new and old clients must use and spread the same version on +operations in normal pools. + +However, the old clients have only one version (the reassert_version), +which they must use both for replay and as the version they spread +(eg, in notifies). That implies: +1) we cannot fix the conflict in reassert_version on watch ops for old +clients. +2) the reassert version must match the user version seen by new +clients for all ops on normal pools + 2b) the user version and the replay version must be the same for ops + on normal pools, except on watch ops +3) we need to keep track of the old-style user version and return that +in the reassert_version from the OSD + +Conclusions: +We add two new fields to MOSDOpReply: user_version and +replay_version. The old replay_version becomes bad_replay_version (or +similar) and stays in the same place it used to be in the encoding for +old clients to use. New clients reference replay_version and +user_version as appropriate. The OSD updates user_version on all +user-visibile writes (ie, non-watch write ops) to the max of +at_version and user_version+1, and fills in the bad_replay_version +based on the user_version instead of the OpContext::at_version. For +normal pools, the at_version will always be the higher value and so +old clients see the same behavior they always have. New clients, +seeing a user_version matching the bad_replay_version which the old +clients are consuming, interoperate successfully on normal pools. +On caching pools, we don't need to worry about old clients, and new +clients are happy because our MAX algorithm ensures they never see +versions go backwards. Hurray! + +