Samuel Just [Wed, 30 May 2012 23:01:38 +0000 (16:01 -0700)]
OSD,FileStore: clean up filestore convsersion
Previously, we messed with the filestore_update_collections config
option to enable upgrades in the filestore. We now pass that in as a
parameter to the FileStore,IndexManager constructors.
Further, the user must now specify the version to which to update in
order to prevent accidental updates.
Samuel Just [Wed, 30 May 2012 00:08:45 +0000 (17:08 -0700)]
ReplicatedPG: adjust missing at push_start
When we start recieving an object, we remove the old copy. This will
prevent the primary from using that old copy after that point.
We do the same on the pushee.
Samuel Just [Sat, 26 May 2012 02:18:41 +0000 (19:18 -0700)]
DBObjectMap: restructure for unique hobject_t's
Previously, the ObjectStore operated in terms of (coll_t,hobject_t)
tupples. Now that hobject_t's are globally unique within the
ObjectStore, it is no longer necessary to support multiple names for the
same DBObjectMap node.
Signed-off-by: Samuel Just <sam.just@dreamhost.com>
Samuel Just [Fri, 25 May 2012 22:06:55 +0000 (15:06 -0700)]
FileStore,DBObjectMap: remove ObjectMap link method
hobject_t's are now globally unique in filestore. Essentially, there is
a 1-to-1 mapping from inodes to hobject_t's. The entry in the
DBObjectMap is now tied to the inode/hobject_t. Thus, links needn't be
tracked. Rather, we delete the ObjectMap entry when nlink == 0.
Samuel Just [Thu, 24 May 2012 17:57:22 +0000 (10:57 -0700)]
src/: Add namespace and pool fields to hobject_t
From this point, hobjects in the ObjectStore will be globally unique. This
will allow us to avoid including the collection in the ObjectMap key encoding
and thereby enable efficient collection renames and, eventually, collection
splits.
Sage Weil [Sat, 2 Jun 2012 22:19:28 +0000 (15:19 -0700)]
upstart: simplify start; allow group stop via an abstract job
Use a 'ceph-mds' or 'ceph-mon' event to start instances instead of
explicitly calling start. This avoids the ugly is-this-already-running
check. [Thanks Guilhem Lettron for that!]
Make the -all job abstract (which means it stays started and can be
stopped). Trigger a helper task (-all-starter) to trigger instance
start. Make instances stop with the -all task. This allows you to do
Samuel Just [Fri, 1 Jun 2012 22:39:41 +0000 (15:39 -0700)]
ReplicatedPG: fix pgls listing, add max listing size
Previously, a client requesting a large pgls could tie up the
osd for an unacceptable amount of time. Also, it's possible
for the osd to return less than the requested number of
entries anyway, so we now return 1 when we have completed the
listing.
Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 1 Jun 2012 22:44:51 +0000 (15:44 -0700)]
objecter: fix pgls
First problem: if the osd returns more entries than we ask for, max_entries
was going negative, and we were requesting (u64)(-small number) on the
next iteration, slamming the OSD when the PG was big. We fix that by
finishing response_size >= max_entries.
Second problem: AFAICS we were not requesting the second chunk on a large
PG at all, here, if the OSD returned less than what we wanted. Fix this
by asking for more in that case.
That means we detect the end of a PG in two ways:
* if the OSD sets the return value to 1 (instead of 0)
* if we get 0 items in the response
Another patch will change the OSD behavior to return 1, and all will be
well. If we run against an old OSD, we'll send an extra request for each
PG and get nothing back before we realize we've hit the end and move on.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Sam Just <sam.just@inktank.com>
Sage Weil [Fri, 1 Jun 2012 20:54:28 +0000 (13:54 -0700)]
mon: fix slurp latest race
It is possible for the latest version to get out in front of the
last_committed version:
a- start slurping
a- slurp a bunch of states, through X
a- get them back, write them out
b- monitor commits many new states
a- slurp latest, X+100 say, but only get some of those states due to the
slurp per-message byte limit
a- write latest + some (but not all) prior states
a- call back into slurp(), update_from_paxos(), trigger assert
This fix ensures that we make note of the source's new latest, so that on
the next pass through slurp() we will grab any missing states.
We *also* explicitly require that we get everything up through what we have
stashed, in defense against some future kludging that might only require we
nearly (but not completely) in sync before finishing the slurp.
Fixes: #2379 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
Sage Weil [Thu, 31 May 2012 03:28:51 +0000 (20:28 -0700)]
admin_socket: pass args separately
This avoids making the callback parse off the command portion on their own.
It also lets them assert that the command portion is in the set of
registered commands.
Yehuda Sadeh [Wed, 30 May 2012 22:40:39 +0000 (15:40 -0700)]
rgw: put_bucket_info does not override attrs
This fixes #2487. When writing bucket info we just
wrote the object content, and were overriding any
attrs that object contained (that is -- corrupted
the ACLs).
Sage Weil [Tue, 29 May 2012 18:05:51 +0000 (11:05 -0700)]
admin_socket: initialize explicitly on startup; disallow changes
There is an annoying dependency between the config lock and the admin
socket lock due to the fact that we initialize (or reinitialize) the socket
via a config observer.
Instead, explicitly initialize on startup. Do not allow the admin socket
location to be changed at runtime.