Sage Weil [Tue, 2 Apr 2013 20:04:48 +0000 (13:04 -0700)]
mds: initialize tableservers/clients on mds creation
The handle_mds_recovery(who) path initializes the anchorclients by having
the server send a 'ready' message on recovery when the server is active
and a peer becomes active. Similarly, recovery_done() does the same when
the server becomes active. However, this misses the creation path. Handle
that explicitly in boot_create.
Fixes: #4619 Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Tue, 2 Apr 2013 15:58:35 +0000 (08:58 -0700)]
mds: trigger tableserver active/recovery hook even for self
The tableserver now sends a READY message to clients when they go active;
we need to do this even for our own local tableclients, or else they do
not initialize and hang on first use after bringing up a fresh cluster.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
Danny Al-Gaaf [Tue, 2 Apr 2013 13:43:12 +0000 (15:43 +0200)]
rgw/rgw_cors.cc: fix inefficient usage of string::find()
Fix warning from cppcheck:
[src/rgw/rgw_cors.cc:70]: (performance) Inefficient usage of
string::find() in condition; string::compare() would be faster.
Instead of string::find() use boost::algorithm::starts_with().
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
caleb miles [Mon, 25 Mar 2013 15:46:34 +0000 (11:46 -0400)]
rgw: Create RESTful endpoint for user and bucket administration.
Expose the following operations through a RESTful endpoint:
user create
user modify
user remove
subuser create
subuser modify
subuser remove
key create
key remove
bucket list
bucket stats
bucket link
bucket unlink
bucket check
bucket remove
remove object
building on the existing /{admin} endpoint.
Signed-off-by caleb miles <caleb.miles@inktank.com>
This is a quick workaround for the next branch. A more complete fix
will be done for the master branch. This does not affect correctness,
just what qa runs with lockdep enabled do.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Sage Weil <sage.weil@inktank.com>
Samuel Just [Tue, 26 Mar 2013 20:08:29 +0000 (13:08 -0700)]
PG: make _select_auth_object smarter
Previously, we just picked the first one to have the object in
question. Now, we will attempt to choose one that has as
much of the following as possible:
1) has the object (there must be one)
2) has an object_info attr
3) has a valid object_info attr
4) has an object_info whose size matches the scrubbed size
We've changed quite a lot of the restart behavior, as well as one
of the message encodings. This is cheaper and easier than using feature bits,
and CephFS is still a tech preview or whatever, so let's cover them using this.
Yan, Zheng [Sun, 31 Mar 2013 06:19:17 +0000 (14:19 +0800)]
mds: don't roll back prepared table updates
When table server is recovering, it re-sends 'agree' messages for
prepared table updates. It is possible table client receives an
'agree' messages before it commits the corresponding update. Don't
send 'rollback' message back to the server in this case.
Yan, Zheng [Thu, 14 Mar 2013 04:24:54 +0000 (12:24 +0800)]
mds: fix export cancel notification
The comment says that if the importer is dead, bystanders thinks the
exporter is the only auth, as per mdcache->handle_mds_failure(). But
there is no such code in MDCache::handle_mds_failure().
Yan, Zheng [Tue, 12 Mar 2013 08:51:53 +0000 (16:51 +0800)]
mds: send lock action message when auth MDS is in proper state.
For rejoining object, don't send lock ACK message because lock states
are still uncertain. The lock ACK may confuse object's auth MDS and
trigger assertion.
If object's auth MDS is not active, just skip sending NUDGE, REQRDLOCK
and REQSCATTER messages. MDCache::handle_mds_recovery() will take care
of them.
Also defer caps release message until clientreplay or active
Yan, Zheng [Tue, 12 Mar 2013 08:27:22 +0000 (16:27 +0800)]
mds: share inode max size after MDS recovers
The MDS may crash after journaling the new max size, but before sending
the new max size to the client. Later when the MDS recovers, the client
re-requests the new max size, but the MDS finds max size unchanged. So
the client waits for the new max size forever. This issue can be avoided
by checking client cap's last_sent, share inode max size if it is zero.
(reconnected cap's last_sent is zero)
Yan, Zheng [Thu, 14 Mar 2013 12:06:27 +0000 (20:06 +0800)]
mds: handle linkage mismatch during cache rejoin
For MDS cluster, not all file system namespace operations that impact
multiple MDS use two phase commit. Some operations use dentry link/unlink
message to update replica dentry's linkage after they are committed by
the master MDS. It's possible the master MDS crashes after journaling an
operation, but before sending the dentry link/unlink messages. Later when
the MDS recovers and receives cache rejoin messages from the surviving
MDS, it will find linkage mismatch.
The original cache rejoin code does not properly handle the case that
dentry unlink messages were missing. Unlinked inodes were linked to stray
dentries. So the cache rejoin ack message need push replicas of these
stray dentries to the surviving MDS.
This patch also adds code that handles cache expiration in the middle of
cache rejoining.
Yan, Zheng [Wed, 13 Mar 2013 12:58:26 +0000 (20:58 +0800)]
mds: encode dirfrag base in cache rejoin ack
Cache rejoin ack message already encodes inode base, make it also encode
dirfrag base. This allowes the message to replicate stray dentries like
MDentryUnlink message. The function will be used by later patch.
Yan, Zheng [Wed, 13 Mar 2013 12:47:11 +0000 (20:47 +0800)]
mds: include replica nonce in MMDSCacheRejoin::inode_strong
So the recovering MDS can properly handle cache expire messages.
Also increase the nonce value when sending the cache rejoin acks.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>
Also update the MMDSCacheRejoin encoding to the new format. Signed-off-by: Greg Farnum <greg@inktank.com>
Yan, Zheng [Wed, 13 Mar 2013 11:23:18 +0000 (19:23 +0800)]
mds: remove MDCache::rejoin_fetch_dirfrags()
In commit 77946dcdae (mds: fetch missing inodes from disk), I introduced
MDCache::rejoin_fetch_dirfrags(). But it basicly duplicates the function
of MDCache::open_undef_dirfrags(), so just remove rejoin_fetch_dirfrags()
and make open_undef_dirfrags() also handle undefined inodes.
For mds cluster, rename operation may involve multiple MDS. If the
rename source's auth MDS crashes after some witness MDS have prepared
the rename but before the rename is committing. Later when the MDS
recovers, its subtree map and linkages are different from the prepared
MDS'. This causes problems for both subtree resolve and cache rejoin.
The solution is, if the rename source's auth MDS fails, the prepared
witness MDS query the master MDS if the operation is committing. If
it's not, rollback the rename, then send resolve message to the
recovering MDS.
Another similar case is a prepared witness MDS crashes when the
rename source's auth MDS has prepared or is preparing the operation.
when the witness recovers, the master just delay sending the resolve
ack message until the it commits the operation.
This patch also updates Server::handle_client_rename(). Make preparing
the rename source's auth MDS be the final step before committing the
rename.
Yan, Zheng [Fri, 15 Mar 2013 02:34:09 +0000 (10:34 +0800)]
mds: don't send MDentry{Link,Unlink} before receiving cache rejoin
The active MDS calls MDCache::rejoin_scour_survivor_replicas() when it
receives the cache rejoin message. The function will remove the objects
replicated by MDentry{Link,Unlink} from replica map.
Yan, Zheng [Thu, 14 Mar 2013 16:08:39 +0000 (00:08 +0800)]
mds: set resolve/rejoin gather MDS set in advance
For active MDS, it may receive resolve/rejoin message before receiving
the mdsmap message that claims the MDS cluster is in resolving/rejoning
state. So instead of set the gather MDS set when receiving the mdsmap.
set them in advance when detecting MDS' failure.
Yan, Zheng [Thu, 14 Mar 2013 04:27:51 +0000 (12:27 +0800)]
mds: don't send resolve message between active MDS
When MDS cluster is resolving, current behavior is sending subtree resolve
message to all other MDS and waiting for all other MDS' resolve message.
The problem is that active MDS can have diffent subtree map due to rename.
Besides gathering active MDS's resolve messages are also racy. The only
function for these messages is disambiguate other MDS' import. We can
replace it by import finish notification.