-- test exports w/ new capability stuff
+/- finish hashed_subset notify business
+- hashed readdir
+- carefully define/document frozen wrt dir_auth vs hashing
-KNOWN BUGS to fix after fast
-- RDWR on synthetic client results in
-fakesyn: mds/MDS.cc:2334: void MDS::handle_client_close(MClientRequest*, CInode*): Assertion `cur->softlock.can_write(true)' failed.
+/- lacc
+/- streakwave
+/- mapreduce
+
+- make logstream.flush align itself to stipes
+
-- hard links!
+KNOWN BUGS to fix after fast
+- hard links!
- implement truncate() for real
/- redo client capability stuff
- finish buffer cache
-- hash directories!!
-
- plan out osd replication, recovery structures
-- redo CDir hash in terms of const char * in CDentry?
+- redo CDir hash_map in terms of const char * in CDentry?
+ - or try google's hash library!!
- what to do about fuse direct_io and mmap()?
and will forward relevant messages on to the authority.
When a repica is expired from cache, and expire is sent to the
-authority. The expire incudes the serial number issued when the
-replica was originally created.
+authority. The expire includes the serial number issued when the
+replica was originally created to disambiguate potentially concurrent
+replication activity.
-Exports are tricky:
+EXPORTS
- The old authority suddenly becomes a replica. It's serial is well
defined. It also becomes a CACHEPROXY, which means its cached_by
can cease CACHEPROXY responsibilities and become a regular replica.
At this point it's cached_by is no longer defined.
+- Replicas always know who the authority for the inode is, OR they
+ know prior owner acting as a CACHEPROXY. (They don't know which it
+ is.)
+CACHED_BY
+The authority always has an inclusive list of nodes who cache an item.
+As such it can confidently send updates to replicas for locking,
+invalidating, etc. When a replica is expired from cache, an expire is
+sent to the authority. If the serial matches, the node is removed
+from the cached_by list.
-- Replicas always know who the authority for the inode is, OR they
- know prior owner acting as a CACHEPROXY. (They don't know which it
- is.)
-Because the authority always knows who caches an item, it can
-confidently send updates to replicas for locking, invalidating, etc.
+
+SUBTREE AUTHORITY DELEGATION: imports versus hashing
+
+Authority is generally defined recursively: an inode's authority
+matches the containing directory, and a directory's authority matches
+the directory inode's. Thus the authority delegation chain can be
+broken/redefined in two ways:
+
+ - Imports and exports redefine the directory inode -> directory
+ linkage, such that the directory authority is explicitly specified
+ via dir.dir_auth:
+
+ dir.dir_auth == -1 -> directory matches its inode
+ dir.dir_auth >= 0 -> directory authority is dir.dir_auth
+
+ - Hashed directories redefine the directory -> inode linkage. In
+ non-hashed directories, inodes match their containing directory.
+ In hashed directories, each dentry's authority is defined by a hash
+ function.
+
+ inode.hash_seed == 0 -> inode matches containing directory
+ inode.hash_seed > 0 -> defined by hash(hash_seed, dentry)
+
+A directory's "containing_import" (bad name, FIXME) is either the
+import or hashed directory that is responsible for delegating a
+subtree. Note that the containing_import of a directory may be itself
+because it is an import, but it cannot be itself because it is hashed.
+
+Thus:
+
+ - Import and export operations' manipulation of dir_auth is
+ completely orthogonal to hashing operations. Hashing methods can
+ ignore dir_auth, except when they create imports/exports (and break
+ the inode<->dir auth linkage).
+
+ - Hashdirs act sort of like imports in that they bound an
+ authoritative region. That is, either hashdirs or imports can be
+ the key for nested_exports. In some cases, a dir may be both an
+ import and a hash.
+
+ - Export_dir won't export a hashdir. This is because it's tricky
+ (tho not necessarily impossible) due to the way nested_exports is
+ used with imports versus hashdirs.
+
+
+
+
+FREEZING
+
+There are two types of freezing:
+
+ - TREE: recursively freezes everything nested beneath a directory,
+ until an export of edge of cache is reached.
+ - DIR: freezes the contents of a single directory.
+
+Some notes:
+
+ - Occurs on the authoritative node only.
+
+ - Used for suspending critical operations while migrating authority
+ between nodes or hashing/unhashing directories.
+
+ - Freezes the contents of the cache such that items may not be added,
+ items cannot be auth pinned, and/or subsequently reexported. The
+ namespace of the affected portions of the hierarchy may not change.
+ The content of inodes and other orthogonal operations
+ (e.g. replication, inode locking and modification) are unaffected.
+
+Two states are defined: freezing and frozen. The freezing state is
+used while waiting for auth_pins to be removed. Once all auth_pins
+are gone, the state is changed to frozen. New auth_pins cannot be
+added while freezing or frozen.
+
+
+AUTH PINS
+
+An auth pin keeps a given item on the authoritative node until it is
+removed. The pins are tracked recursively, so that a subtree cannot
+be frozen if it contains any auth pins.
+
+If a pin is placed on a non-authoritative item, the item is allowed to
+become authoritative; the specific restriction is it cannot be frozen,
+which only happens during export-type operations.
+
+
+TYPES OF EXPORTS
+
+- Actual export of a subtree from one node to another
+- A rename between directories on different nodes exports the renamed
+_inode_. (If it is a directory, it becomes an export such that the
+directory itself does not move.)
+- A hash or unhash operation will migrate inodes within the directory
+either to or from the directory's main authority.
+
+EXPORT PROCESS
+
+
+
+
+HASHING
+
+- All nodes discover and open directory
+
+- Prep message distributes subdir inode replicas for exports so that
+ peers can open those dirs. This is necessary because subdirs are
+ converted into exports or imports as needed to avoid migrating
+ anything except the hashed dir itself. The prep is needed for the
+ same reasons its important with exports: the inode authority must
+ always have the exported dir open so that it gets accurate dir
+ authority updates, and can keep the inode->dir_auth up to date.
+
+- MHashDir messsage distributes the directory contents.
+
+- While auth is frozen_dir, we can't get_or_open_dir. Otherwise the
+ Prep messages won't be inclusive of all dirs, and the
+ imports/exports won't get set up properly.
+
+TODO
+readdir
+
+
+- subtrees stop at hashed dir. hashed dir's dir_auth follows parent
+ subtree, unless the dir is also an explicit import. thus a hashed
+ dir can also be an import dir.
-Expiration:
+bananas
+apples
+blueberries
+green pepper
+carrots
+celery
-When a replica is expired from cache, an expire is sent to the
-authority. If the receiving node is the authority, it simply removes
-the node from the cached_by list.
-If the receiving node is not the replica, it is acting as a CACHEPROXY
-(because it recently exported the data).
int main(int oargc, char **oargv)
{
- //cerr << "fakesyn starting" << endl;
+ cerr << "fakesyn starting" << endl;
int argc;
char **argv;
-
#ifndef __CONTEXT_H
#define __CONTEXT_H
#include "config.h"
-#include <assert.h>
#include <list>
#include <iostream>
using namespace std;
-class MDS;
-// Context, for retaining context of a message being processed..
-// pure abstract!
+/*
+ * Context - abstract callback class
+ */
class Context {
- private:
- int result;
-
public:
virtual ~Context() {} // we want a virtual destructor!!!
-
virtual void finish(int r) = 0;
- //virtual void fail(int r) = 0;
-
- virtual bool can_redelegate() {
- return false;
- }
- virtual void redelegate(MDS *mds, int newmds) {
- assert(false);
- }
-
};
+/*
+ * finish and destroy a list of Contexts
+ */
inline void finish_contexts(list<Context*>& finished,
int result = 0)
{
}
}
+/*
+ * C_Contexts - set of Contexts
+ */
+class C_Contexts : public Context {
+ list<Context*> clist;
+
+public:
+ void add(Context* c) {
+ clist.push_back(c);
+ }
+ void take(list<Context*>& ls) {
+ clist.splice(clist.end(), ls);
+ }
+ void finish(int r) {
+ finish_contexts(clist, r);
+ }
+};
+
+
#endif
namespace __gnu_cxx {
template<> struct hash<unsigned long long> {
- size_t operator()(unsigned long long __x) const { return __x; }
+ size_t operator()(unsigned long long __x) const {
+ static hash<unsigned long> H;
+ return H((__x >> 32) ^ (__x & 0xffffffff));
+ }
};
template<> struct hash< std::string >
{
size_t operator()( const std::string& x ) const
{
- return hash< char >()( (x.c_str())[0] );
+ static hash<const char*> H;
+ return H(x.c_str());
}
};
}
#include <ext/hash_map>
using namespace __gnu_cxx;
+class MDS;
class Anchor {
public:
#include "CDir.h"
#include "CDentry.h"
#include "CInode.h"
-#include "MDStore.h"
+
#include "MDS.h"
#include "MDCluster.h"
#undef dout
#define dout(x) if (x <= g_conf.debug) cout << "mds" << mds->get_nodeid() << " cdir: "
-map<int,int> cdir_pins;
+
+// PINS
+int cdir_pins[CDIR_NUM_PINS] = { 0,0,0,0,0,0,0,0,0,0,0,0,0,0 };
static char* cdir_pin_names[CDIR_NUM_PINS] = {
"child",
"opened",
- "hashed",
"waiter",
"import",
"export",
"proxy",
"authpin",
"imping",
- "impgex",
- "reqpins",
- "dirty"
+ "impex",
+ "hashed",
+ "hashing",
+ "dirty",
+ "reqpins"
};
+
ostream& operator<<(ostream& out, CDir& dir)
{
string path;
if (dir.is_dirty()) out << " dirty";
if (dir.is_import()) out << " import";
if (dir.is_export()) out << " export";
+ if (dir.is_hashed()) out << " hashed";
if (dir.is_auth()) {
out << " auth";
if (dir.is_open_by_anyone())
}
+// -------------------------------------------------------------------
// CDir
CDir::CDir(CInode *in, MDS *mds, bool auth)
ref = 0;
// auth
+ dir_auth = -1;
assert(in->is_dir());
if (auth)
state |= CDIR_STATE_AUTH;
- if (in->dir_is_hashed())
+ if (in->dir_is_hashed()) {
+ assert(0); // when does this happen?
state |= CDIR_STATE_HASHED;
+ }
auth_pins = 0;
nested_auth_pins = 0;
}
-inodeno_t CDir::ino() {
- return inode->ino();
-}
-
-
-CDir *CDir::get_parent_dir()
-{
- return inode->get_parent_dir();
-}
-
-
-
-int CDir::get_rep_count(MDCluster *mdc)
-{
- if (dir_rep == CDIR_REP_NONE)
- return 1;
- if (dir_rep == CDIR_REP_LIST)
- return 1 + dir_rep_by.size();
- if (dir_rep == CDIR_REP_ALL)
- return mdc->get_num_mds();
- assert(2+2==5);
-}
-
-
-
-
-
-CDentry* CDir::lookup(const string& n) {
- //cout << " lookup " << n << " in " << this << endl;
-
- map<string,CDentry*>::iterator iter = items.find(n);
- if (iter != items.end()) return iter->second;
-
- return NULL;
-}
-
+/***
+ * linking fun
+ */
CDentry* CDir::add_dentry( const string& dname, inodeno_t ino)
{
-// state encoding
-
-/*
-
-what DIR state is encoded when
-
-- dir open / discover
- nonce
- dir_auth
- dir_rep/by
-
-- dir update
- dir_rep/by
-
-- export
- dir_rep/by
- nitems
- version
- state
- popularity
-
-
-*/
-
-crope CDir::encode_basic_state()
-{
- crope r;
-
- // dir rep
- r.append((char*)&dir_rep, sizeof(int));
-
- // dir_rep_by
- int n = dir_rep_by.size();
- r.append((char*)&n, sizeof(int));
- for (set<int>::iterator it = dir_rep_by.begin();
- it != dir_rep_by.end();
- it++) {
- int j = *it;
- r.append((char*)&j, sizeof(j));
- }
-
- return r;
-}
-
-int CDir::decode_basic_state(crope r, int off)
-{
- // dir_rep
- r.copy(off, sizeof(int), (char*)&dir_rep);
- off += sizeof(int);
-
- // dir_rep_by
- int n;
- r.copy(off, sizeof(int), (char*)&n);
- off += sizeof(int);
- for (int i=0; i<n; i++) {
- int j;
- r.copy(off, sizeof(int), (char*)&j);
- dir_rep_by.insert(j);
- off += sizeof(int);
- }
-
- return off;
-}
-
-
-
-// wiating
+/****************************************
+ * WAITING
+ */
bool CDir::waiting_for(int tag)
{
void CDir::add_waiter(int tag, Context *c) {
// hierarchical?
if (tag & CDIR_WAIT_ATFREEZEROOT && (is_freezing() || is_frozen())) {
- if (is_freezing_tree_root() || is_frozen_tree_root()) {
+ if (is_freezing_tree_root() || is_frozen_tree_root() ||
+ is_freezing_dir() || is_frozen_dir()) {
// it's us, pin here. (fall thru)
} else {
// pin parent!
-// authority
+/********************************
+ * AUTHORITY
+ */
/*
-void CDir::update_auth(int whoami) {
- if (inode->dir_is_auth(whoami) && !is_auth())
- state_set(CDIR_STATE_AUTH);
- if (!inode->dir_is_auth(whoami) && is_auth())
- state_clear(CDIR_STATE_AUTH);
-}
-*/
-
+ * simple rule: if dir_auth isn't explicit, auth is the same as the inode.
+ */
int CDir::authority()
{
- if (dir_auth >= 0)
- return dir_auth;
+ if (get_dir_auth() >= 0)
+ return get_dir_auth();
+ /*
CDir *parent = inode->get_parent_dir();
if (parent)
return parent->authority();
-
+
// root, or dangling
- assert(inode->is_root() || inode->is_dangling());
+ assert(inode->is_root()); // no dirs under danglers!?
+ //assert(inode->is_root() || inode->is_dangling());
+ */
+
return inode->authority();
}
int CDir::dentry_authority(const string& dn )
{
+ // hashing -- subset of nodes have hashed the contents
+ if (is_hashing() && !hashed_subset.empty()) {
+ int hashauth = mds->get_cluster()->hash_dentry( inode->ino(), dn ); // hashed
+ if (hashed_subset.count(hashauth))
+ return hashauth;
+ }
+
+ // hashed
if (is_hashed()) {
return mds->get_cluster()->hash_dentry( inode->ino(), dn ); // hashed
}
- if (dir_auth == CDIR_AUTH_PARENT) {
- dout(15) << "dir_auth = parent at " << *this << endl;
+ if (get_dir_auth() == CDIR_AUTH_PARENT) {
+ //dout(15) << "dir_auth = parent at " << *this << endl;
return inode->authority(); // same as my inode
}
// it's explicit for this whole dir
- dout(15) << "dir_auth explicit " << dir_auth << " at " << *this << endl;
- return dir_auth;
+ //dout(15) << "dir_auth explicit " << dir_auth << " at " << *this << endl;
+ return get_dir_auth();
}
+void CDir::set_dir_auth(int d)
+{
+ dout(10) << "setting dir_auth=" << d << " from " << dir_auth << " on " << *this << endl;
+ dir_auth = d;
+}
-// auth pins
+/*****************************************
+ * AUTH PINS
+ */
void CDir::auth_pin() {
if (auth_pins == 0)
}
}
+
+
+/*****************************************************************************
+ * FREEZING
+ */
+
void CDir::on_freezeable()
{
// check for anything pending freezeable
finish_waiting(CDIR_WAIT_FREEZEABLE);
}
-
-// FREEZE
+// FREEZE TREE
class C_MDS_FreezeTree : public Context {
CDir *dir;
while (1) {
if (dir->is_freezing_tree_root()) return true;
if (dir->is_import()) return false;
+ if (dir->is_hashed()) return false;
if (dir->inode->parent)
dir = dir->inode->parent->dir;
else
while (1) {
if (dir->is_frozen_tree_root()) return true;
if (dir->is_import()) return false;
+ if (dir->is_hashed()) return false;
if (dir->inode->parent)
dir = dir->inode->parent->dir;
else
-
+// -----------------------------------------------------------------
// debug shite
}
-
-void CDir::dump_to_disk(MDS *mds)
-{
- map<string,CDentry*>::iterator iter = items.begin();
- while (iter != items.end()) {
- CDentry* d = iter->second;
- if (d->inode->dir != NULL) {
- dout(10) << "dump2disk: " << d->inode->inode.ino << " " << d->name << '/' << endl;
- d->inode->dump_to_disk(mds);
- }
- iter++;
- }
-
- dout(10) << "dump2disk: writing dir " << inode->inode.ino << endl;
- mds->mdstore->commit_dir(inode->dir, NULL);
-}
#include <ext/hash_map>
using namespace __gnu_cxx;
-class CInode;
+
+#include "CInode.h"
+
class CDentry;
class MDS;
class MDCluster;
#define CDIR_STATE_COMMITTING (1<<8) // mid-commit
#define CDIR_STATE_FETCHING (1<<9) // currenting fetching
-#define CDIR_STATE_IMPORT (1<<10) // flag set if this is an import.
-#define CDIR_STATE_EXPORT (1<<11)
+#define CDIR_STATE_DELETED (1<<10)
+
+#define CDIR_STATE_IMPORT (1<<11) // flag set if this is an import.
+#define CDIR_STATE_EXPORT (1<<12)
+#define CDIR_STATE_IMPORTINGEXPORT (1<<13)
-#define CDIR_STATE_HASHED (1<<12) // if hashed. only hashed+auth on auth node.
-#define CDIR_STATE_HASHING (1<<13)
-#define CDIR_STATE_UNHASHING (1<<14)
+#define CDIR_STATE_HASHED (1<<14) // if hashed
+#define CDIR_STATE_HASHING (1<<15)
+#define CDIR_STATE_UNHASHING (1<<16)
-#define CDIR_STATE_SYNCBYME (1<<15)
-#define CDIR_STATE_PRESYNC (1<<16)
-#define CDIR_STATE_SYNCBYAUTH (1<<17)
-#define CDIR_STATE_WAITONUNSYNC (1<<18)
-#define CDIR_STATE_AUTHMOVING (1<<19) // dir replica bystander
-#define CDIR_STATE_IMPORTINGEXPORT (1<<20)
-#define CDIR_STATE_DELETED (1<<21)
// these state bits are preserved by an import/export
#define CDIR_PIN_CHILD 0
#define CDIR_PIN_OPENED 1 // open by another node
-#define CDIR_PIN_HASHED 2 // hashed
-#define CDIR_PIN_WAITER 3 // waiter(s)
+#define CDIR_PIN_WAITER 2 // waiter(s)
-#define CDIR_PIN_IMPORT 4
-#define CDIR_PIN_EXPORT 5
-#define CDIR_PIN_FREEZE 6
-#define CDIR_PIN_PROXY 7 // auth just changed.
+#define CDIR_PIN_IMPORT 3
+#define CDIR_PIN_EXPORT 4
+#define CDIR_PIN_FREEZE 5
+#define CDIR_PIN_PROXY 6 // auth just changed.
-#define CDIR_PIN_AUTHPIN 8
+#define CDIR_PIN_AUTHPIN 7
-#define CDIR_PIN_IMPORTING 9
-#define CDIR_PIN_IMPORTINGEXPORT 10
+#define CDIR_PIN_IMPORTING 8
+#define CDIR_PIN_IMPORTINGEXPORT 9
-#define CDIR_PIN_DIRTY 11
+#define CDIR_PIN_HASHED 10
+#define CDIR_PIN_HASHING 11
+#define CDIR_PIN_DIRTY 12
-#define CDIR_PIN_REQUEST 12
+#define CDIR_PIN_REQUEST 13
-#define CDIR_NUM_PINS 13
+#define CDIR_NUM_PINS 14
// waiter export_dir
// trigger handel_export_dir_prep_ack
+#define CDIR_WAIT_HASHED (1<<19) // hash finish
+
#define CDIR_WAIT_DNREAD (1<<20)
#define CDIR_WAIT_DNLOCK (1<<21)
#define CDIR_WAIT_DNUNPINNED (1<<22)
typedef map<string, CDentry*> CDir_map_t;
-extern map<int, int> cdir_pins; // counts
+extern int cdir_pins[CDIR_NUM_PINS];
class CDir {
int nested_auth_pins;
int request_pins;
+ // hashed dirs
+ set<int> hashed_subset; // HASHING: subset of mds's that are hashed
+
// context
MDS *mds;
int dir_rep;
set<int> dir_rep_by; // if dir_rep == CDIR_REP_LIST
-
- // sync (for hashed dirs)
- set<int> sync_waiting_for_ack;
-
+ // popularity
DecayCounter popularity[MDS_NPOP];
+ // friends
friend class CInode;
friend class MDCache;
friend class MDiscover;
// -- accessors --
- CInode *get_inode() { return inode; }
- CDir *get_parent_dir();
- inodeno_t ino();
+ inodeno_t ino() { return inode->ino(); }
+ CInode *get_inode() { return inode; }
+ CDir *get_parent_dir() { return inode->get_parent_dir(); }
CDir_map_t::iterator begin() { return items.begin(); }
CDir_map_t::iterator end() { return items.end(); }
*/
- // -- manipulation --
- CDentry* lookup(const string& n);
-
- // dentries and inodes
+ // -- dentries and inodes --
public:
+ CDentry* lookup(const string& n) {
+ map<string,CDentry*>::iterator iter = items.find(n);
+ if (iter == items.end())
+ return 0;
+ else
+ return iter->second;
+ }
+
CDentry* add_dentry( const string& dname, CInode *in=0 );
CDentry* add_dentry( const string& dname, inodeno_t ino );
void remove_dentry( CDentry *dn ); // delete dentry
int authority();
int dentry_authority(const string& d);
int get_dir_auth() { return dir_auth; }
+ void set_dir_auth(int d);
bool is_open_by_anyone() { return !open_by.empty(); }
bool is_open_by(int mds) { return open_by.count(mds); }
if (dir_rep == CDIR_REP_NONE) return false;
return true;
}
- int get_rep_count(MDCluster *mdc);
-
- void update_auth(int whoami);
+
// -- dirtyness --
bool is_clean() { return !state_test(CDIR_STATE_DIRTY); }
- // -- encoded state --
- crope encode_basic_state();
- int decode_basic_state(crope r, int off=0);
-
// -- reference counting --
if (request_pins == 0) put(CDIR_PIN_REQUEST);
}
-
- // -- sync --
- bool is_sync() { return is_syncbyme() || is_syncbyauth(); }
- bool is_syncbyme() { return state & CDIR_STATE_SYNCBYME; }
- bool is_syncbyauth() { return state & CDIR_STATE_SYNCBYAUTH; }
- bool is_presync() { return state & CDIR_STATE_PRESYNC; }
- bool is_waitonnsync() { return state & CDIR_STATE_WAITONUNSYNC; }
-
-
+
// -- waiters --
bool waiting_for(int tag);
bool waiting_for(int tag, const string& dn);
// debuggin bs
void dump(int d = 0);
- void dump_to_disk(MDS *m);
};
inodeno_t get_ino() { return ino; }
- void _rope(crope& r) {
- r.append((char*)&ino, sizeof(ino));
- r.append((char*)&nonce, sizeof(nonce));
- r.append((char*)&dir_auth, sizeof(dir_auth));
- r.append((char*)&dir_rep, sizeof(dir_rep));
-
- int nrep_by = rep_by.size();
- r.append((char*)&nrep_by, sizeof(nrep_by));
-
- // rep_by
- for (set<int>::iterator it = rep_by.begin();
- it != rep_by.end();
- it++) {
- int m = *it;
- r.append((char*)&m, sizeof(int));
- }
+ void _encode(bufferlist& bl) {
+ bl.append((char*)&ino, sizeof(ino));
+ bl.append((char*)&nonce, sizeof(nonce));
+ bl.append((char*)&dir_auth, sizeof(dir_auth));
+ bl.append((char*)&dir_rep, sizeof(dir_rep));
+ ::_encode(rep_by, bl);
}
- int _unrope(crope s, int off = 0) {
- s.copy(off, sizeof(ino), (char*)&ino);
+ void _decode(bufferlist& bl, int& off) {
+ bl.copy(off, sizeof(ino), (char*)&ino);
off += sizeof(ino);
- s.copy(off, sizeof(nonce), (char*)&nonce);
+ bl.copy(off, sizeof(nonce), (char*)&nonce);
off += sizeof(nonce);
- s.copy(off, sizeof(dir_auth), (char*)&dir_auth);
+ bl.copy(off, sizeof(dir_auth), (char*)&dir_auth);
off += sizeof(dir_auth);
- s.copy(off, sizeof(dir_rep), (char*)&dir_rep);
+ bl.copy(off, sizeof(dir_rep), (char*)&dir_rep);
off += sizeof(dir_rep);
-
- int nrep_by;
- s.copy(off, sizeof(int), (char*)&nrep_by);
- off += sizeof(int);
-
- // open_by
- for (int i=0; i<nrep_by; i++) {
- int m;
- s.copy(off, sizeof(int), (char*)&m);
- off += sizeof(int);
- rep_by.insert(m);
- }
-
- return off;
+ ::_decode(rep_by, bl, off);
}
};
//dir->nitems = st.nitems;
dir->version = st.version;
- dir->state = (dir->state & CDIR_MASK_STATE_IMPORT_KEPT) | // remember import flag, etc.
- (st.state & CDIR_MASK_STATE_EXPORTED);
+ if (dir->state & CDIR_STATE_HASHED)
+ dir->state |= CDIR_STATE_AUTH; // just inherit auth flag when hashed
+ else
+ dir->state = (dir->state & CDIR_MASK_STATE_IMPORT_KEPT) | // remember import flag, etc.
+ (st.state & CDIR_MASK_STATE_EXPORTED);
dir->dir_auth = st.dir_auth;
dir->dir_rep = st.dir_rep;
if (dir) return dir;
+ // can't open a dir if we're frozen_dir, bc of hashing stuff.
+ assert(!is_frozen_dir());
+
// only auth can open dir alone.
assert(is_auth());
set_dir( new CDir(this, mds, true) );
return false;
}
+bool CInode::is_frozen_dir()
+{
+ if (parent && parent->dir->is_frozen_dir())
+ return true;
+ return false;
+}
+
bool CInode::is_freezing()
{
if (parent && parent->dir->is_freezing())
}
-
+CInodeDiscover* CInode::replicate_to( int rep )
+{
+ // relax locks?
+ if (!is_cached_by_anyone())
+ replicate_relax_locks();
+
+ // return the thinger
+ int nonce = cached_by_add( rep );
+ return new CInodeDiscover( this, nonce );
+}
// debug crap -----------------------------
dir->dump(dep);
}
-void CInode::dump_to_disk(MDS *mds)
-{
- if (dir)
- dir->dump_to_disk(mds);
-}
-
-
using namespace __gnu_cxx;
-// crap
-/*
-#define CINODE_SYNC_START 1 // starting sync
-#define CINODE_SYNC_LOCK 2 // am synced
-#define CINODE_SYNC_FINISH 4 // finishing
-
-#define CINODE_MASK_SYNC (CINODE_SYNC_START|CINODE_SYNC_LOCK|CINODE_SYNC_FINISH)
-
-#define CINODE_MASK_IMPORT 16
-#define CINODE_MASK_EXPORT 32
-*/
// pins for keeping an item in cache (and debugging)
#define CINODE_PIN_DIR 0
class MDCluster;
class Message;
class CInode;
+class CInodeDiscover;
//class MInodeSyncStart;
set<int>::iterator cached_by_end() { return cached_by.end(); }
set<int>& get_cached_by() { return cached_by; }
+ CInodeDiscover* replicate_to(int rep);
+
// -- waiting --
bool waiting_for(int tag);
// -- freeze --
bool is_frozen();
+ bool is_frozen_dir();
bool is_freezing();
// dbg
void dump(int d = 0);
- void dump_to_disk(MDS *m);
};
CInodeDiscover(CInode *in, int nonce) {
inode = in->inode;
replica_nonce = nonce;
+
hardlock_state = in->hardlock.get_replica_state();
softlock_state = in->softlock.get_replica_state();
}
in->softlock.set_state(softlock_state);
}
- void _rope(crope& r) {
- r.append((char*)&inode, sizeof(inode));
- r.append((char*)&replica_nonce, sizeof(replica_nonce));
- r.append((char*)&hardlock_state, sizeof(hardlock_state));
- r.append((char*)&softlock_state, sizeof(softlock_state));
+ void _encode(bufferlist& bl) {
+ bl.append((char*)&inode, sizeof(inode));
+ bl.append((char*)&replica_nonce, sizeof(replica_nonce));
+ bl.append((char*)&hardlock_state, sizeof(hardlock_state));
+ bl.append((char*)&softlock_state, sizeof(softlock_state));
}
- int _unrope(crope s, int off = 0) {
- s.copy(off,sizeof(inode_t), (char*)&inode);
+ void _decode(bufferlist& bl, int& off) {
+ bl.copy(off,sizeof(inode_t), (char*)&inode);
off += sizeof(inode_t);
- s.copy(off, sizeof(int), (char*)&replica_nonce);
+ bl.copy(off, sizeof(int), (char*)&replica_nonce);
off += sizeof(int);
- s.copy(off, sizeof(hardlock_state), (char*)&hardlock_state);
+ bl.copy(off, sizeof(hardlock_state), (char*)&hardlock_state);
off += sizeof(hardlock_state);
- s.copy(off, sizeof(softlock_state), (char*)&softlock_state);
+ bl.copy(off, sizeof(softlock_state), (char*)&softlock_state);
off += sizeof(softlock_state);
- return off;
}
};
st.is_dirty = in->is_dirty();
cached_by = in->cached_by;
cached_by_nonce = in->cached_by_nonce;
+
hardlock = in->hardlock;
softlock = in->softlock;
if (tail + size > sync_pos) {
size = sync_pos - tail;
dout(15) << "wait_for_next_event ugh.. read_pos is " << read_pos << ", tail is " << tail << ", sync_pos only " << sync_pos << ", flush_pos " << flush_pos << ", append_pos " << append_pos << endl;
- assert(size > 0); // bleh, wait for sync, etc.
+
+ if (size == 0) {
+ // assert(size > 0); // bleh, wait for sync, etc.
+ // just do it. communication is ordered, right? FIXME SOMEDAY this is totally gross blech
+ //size = flush_pos - tail;
+ // read tiny bit, kill some time
+ assert(flush_pos > sync_pos);
+ size = 1;
+ }
}
dout(15) << "wait_for_next_event reading from pos " << tail << " len " << size << endl;
if (!in) continue;
if (!in->is_dir()) continue;
if (!in->dir) continue; // clearly not popular
- if (mds->mdcache->exports.count(in->dir)) continue;
+
+ if (in->dir->is_export()) continue;
+ if (in->dir->is_hashed()) continue;
if (already_exporting.count(in->dir)) continue;
if (in->dir->is_frozen()) continue; // can't export this right now!
if (in->dir->get_size() == 0) continue; // don't export empty dirs, even if they're not complete. for now!
+ // how popular?
double pop = in->dir->popularity[MDS_POP_CURDOM].get();
-
//cout << " in " << in->inode.ino << " " << pop << endl;
if (pop < minchunk) continue;
{
int db = 7; //debug level
- int num = mds->mdcache->imports.size();
- if (num == 0) {
- dout(db) << "no imports/exports" << endl;
+
+ if (mds->mdcache->imports.empty() &&
+ mds->mdcache->hashdirs.empty()) {
+ dout(db) << "no imports/exports/hashdirs" << endl;
return;
}
- dout(db) << "imports/exports:" << endl;
+ dout(db) << "imports/exports/hashdirs:" << endl;
set<CDir*> ecopy = mds->mdcache->exports;
- for (set<CDir*>::iterator it = mds->mdcache->imports.begin();
- it != mds->mdcache->imports.end();
- it++) {
+ set<CDir*>::iterator it = mds->mdcache->hashdirs.begin();
+ while (1) {
+ if (it == mds->mdcache->hashdirs.end()) it = mds->mdcache->imports.begin();
+ if (it == mds->mdcache->imports.end() ) break;
+
CDir *im = *it;
- dout(db) << " + import (" << im->popularity[MDS_POP_CURDOM].get() << "/" << im->popularity[MDS_POP_ANYDOM].get() << ") " << *im << endl;
- assert( im->is_import() );
- assert( im->is_auth() );
+
+ if (im->is_import()) {
+ dout(db) << " + import (" << im->popularity[MDS_POP_CURDOM].get() << "/" << im->popularity[MDS_POP_ANYDOM].get() << ") " << *im << endl;
+ assert( im->is_auth() );
+ }
+ else if (im->is_hashed()) {
+ if (im->is_import()) continue; // if import AND hash, list as import.
+ dout(db) << " + hash (" << im->popularity[MDS_POP_CURDOM].get() << "/" << im->popularity[MDS_POP_ANYDOM].get() << ") " << *im << endl;
+ }
for (set<CDir*>::iterator p = mds->mdcache->nested_exports[im].begin();
p != mds->mdcache->nested_exports[im].end();
p++) {
CDir *exp = *p;
- dout(db) << " - ex (" << exp->popularity[MDS_POP_NESTED].get() << ", " << exp->popularity[MDS_POP_ANYDOM].get() << ") " << *exp << " to " << exp->dir_auth << endl;
- assert( exp->is_export() );
- assert( !exp->is_auth() );
-
- if ( mds->mdcache->get_containing_import(exp) != im ) {
- dout(1) << "uh oh, containing import is " << mds->mdcache->get_containing_import(exp) << endl;
- dout(1) << "uh oh, containing import is " << *mds->mdcache->get_containing_import(exp) << endl;
- assert( mds->mdcache->get_containing_import(exp) == im );
+ if (exp->is_hashed()) {
+ assert(0); // we don't do it this way actually
+ dout(db) << " - hash (" << exp->popularity[MDS_POP_NESTED].get() << ", " << exp->popularity[MDS_POP_ANYDOM].get() << ") " << *exp << " to " << exp->dir_auth << endl;
+ assert( exp->is_auth() );
+ } else {
+ dout(db) << " - ex (" << exp->popularity[MDS_POP_NESTED].get() << ", " << exp->popularity[MDS_POP_ANYDOM].get() << ") " << *exp << " to " << exp->dir_auth << endl;
+ assert( exp->is_export() );
+ assert( !exp->is_auth() );
+ }
+
+ if ( mds->mdcache->get_auth_container(exp) != im ) {
+ dout(1) << "uh oh, auth container is " << mds->mdcache->get_auth_container(exp) << endl;
+ dout(1) << "uh oh, auth container is " << *mds->mdcache->get_auth_container(exp) << endl;
+ assert( mds->mdcache->get_auth_container(exp) == im );
}
if (ecopy.count(exp) != 1) {
- dout(1) << " nested_export " << *exp << " not in exports" << endl;
+ dout(1) << "***** nested_export " << *exp << " not in exports" << endl;
assert(0);
}
ecopy.erase(exp);
}
+
+ it++;
}
if (ecopy.size()) {
for (set<CDir*>::iterator it = ecopy.begin();
it != ecopy.end();
it++)
- dout(1) << " stray item in exports: " << **it << endl;
+ dout(1) << "***** stray item in exports: " << **it << endl;
assert(ecopy.size() == 0);
}
}
#include "messages/MExportDirNotifyAck.h"
#include "messages/MExportDirFinish.h"
+#include "messages/MHashDirDiscover.h"
+#include "messages/MHashDirDiscoverAck.h"
+#include "messages/MHashDirPrep.h"
+#include "messages/MHashDirPrepAck.h"
#include "messages/MHashDir.h"
+#include "messages/MHashDirNotify.h"
+#include "messages/MHashDirAck.h"
+
+#include "messages/MUnhashDirPrep.h"
+#include "messages/MUnhashDirPrepAck.h"
#include "messages/MUnhashDir.h"
#include "messages/MUnhashDirAck.h"
+#include "messages/MUnhashDirNotify.h"
+#include "messages/MUnhashDirNotifyAck.h"
//#include "messages/MInodeUpdate.h"
#include "messages/MDirUpdate.h"
lru.lru_set_midpoint(g_conf.mds_cache_mid);
did_shutdown_exports = false;
+
+ shutdown_commits = 0;
}
MDCache::~MDCache()
in->dir->state_clear(CDIR_STATE_IMPORT);
in->dir->put(CDIR_PIN_IMPORT);
- in->dir->dir_auth = CDIR_AUTH_PARENT;
- dout(7) << " fixing dir_auth to be " << in->dir->dir_auth << endl;
+ in->dir->set_dir_auth( CDIR_AUTH_PARENT );
+ dout(7) << " fixing dir_auth to be " << in->dir->get_dir_auth() << endl;
// move my nested imports to in's containing import
- CDir *con = get_containing_import(in->dir);
+ CDir *con = get_auth_container(in->dir);
assert(con);
for (set<CDir*>::iterator p = nested_exports[in->dir].begin();
p != nested_exports[in->dir].end();
// inode was ours, still ours.
dout(7) << "inode was ours, still ours." << endl;
assert(!in->dir->is_import());
- assert(in->dir->dir_auth == CDIR_AUTH_PARENT);
+ assert(in->dir->get_dir_auth() == CDIR_AUTH_PARENT);
// move any exports nested beneath me?
- CDir *newcon = get_containing_import(in->dir);
+ CDir *newcon = get_auth_container(in->dir);
assert(newcon);
- CDir *oldcon = get_containing_import(srcdir);
+ CDir *oldcon = get_auth_container(srcdir);
assert(oldcon);
if (newcon != oldcon) {
dout(7) << "moving nested exports under new container" << endl;
in->dir->state_set(CDIR_STATE_IMPORT);
in->dir->get(CDIR_PIN_IMPORT);
- in->dir->dir_auth = mds->get_nodeid();
- dout(7) << " fixing dir_auth to be " << in->dir->dir_auth << endl;
+ in->dir->set_dir_auth( mds->get_nodeid() );
+ dout(7) << " fixing dir_auth to be " << in->dir->get_dir_auth() << endl;
// find old import
- CDir *oldcon = get_containing_import(srcdir);
+ CDir *oldcon = get_auth_container(srcdir);
assert(oldcon);
dout(7) << " oldcon is " << *oldcon << endl;
assert(in->dir->is_import());
// verify dir_auth
- assert(in->dir->dir_auth == mds->get_nodeid()); // me, because i'm auth for dir.
- assert(in->authority() != in->dir->dir_auth); // inode not me.
+ assert(in->dir->get_dir_auth() == mds->get_nodeid()); // me, because i'm auth for dir.
+ assert(in->authority() != in->dir->get_dir_auth()); // inode not me.
}
assert(in->dir->is_import());
in->dir->get(CDIR_PIN_EXPORT);
assert(dir_auth >= 0); // better be defined
- in->dir->dir_auth = dir_auth;
- dout(7) << " fixing dir_auth to be " << in->dir->dir_auth << endl;
+ in->dir->set_dir_auth( dir_auth );
+ dout(7) << " fixing dir_auth to be " << in->dir->get_dir_auth() << endl;
- CDir *newcon = get_containing_import(in->dir);
+ CDir *newcon = get_auth_container(in->dir);
assert(newcon);
nested_exports[newcon].insert(in->dir);
// sanity
assert(in->dir->is_export());
- assert(in->dir->dir_auth >= 0);
- assert(in->dir->dir_auth != in->authority());
+ assert(in->dir->get_dir_auth() >= 0);
+ assert(in->dir->get_dir_auth() != in->authority());
// moved under new import?
- CDir *oldcon = get_containing_import(srcdir);
- CDir *newcon = get_containing_import(in->dir);
+ CDir *oldcon = get_auth_container(srcdir);
+ CDir *newcon = get_auth_container(in->dir);
if (oldcon != newcon) {
dout(7) << "moving myself under new import " << *newcon << endl;
nested_exports[oldcon].erase(in->dir);
in->dir->state_clear(CDIR_STATE_EXPORT);
in->dir->put(CDIR_PIN_EXPORT);
- CDir *oldcon = get_containing_import(srcdir);
+ CDir *oldcon = get_auth_container(srcdir);
assert(oldcon);
assert(nested_exports[oldcon].count(in->dir) == 1);
nested_exports[oldcon].erase(in->dir);
// simplify dir_auth
if (in->authority() == in->dir->authority()) {
- in->dir->dir_auth = CDIR_AUTH_PARENT;
+ in->dir->set_dir_auth( CDIR_AUTH_PARENT );
dout(7) << "simplified dir_auth to -1, inode auth is (also) " << in->authority() << endl;
} else {
- assert(in->dir->dir_auth >= 0); // someone else's export,
+ assert(in->dir->get_dir_auth() >= 0); // someone else's export,
}
} else {
// fix dir_auth?
if (in->authority() == dir_auth)
- in->dir->dir_auth = CDIR_AUTH_PARENT;
+ in->dir->set_dir_auth( CDIR_AUTH_PARENT );
else
- in->dir->dir_auth = dir_auth;
+ in->dir->set_dir_auth( dir_auth );
dout(7) << " fixing dir_auth to be " << dir_auth << endl;
// do nothing.
map<int, MCacheExpire*> expiremap;
+ dout(7) << "trim" << endl;
+ assert(expiremap.empty());
+
while (lru.lru_get_size() > (unsigned)max) {
CInode *in = (CInode*)lru.lru_expire();
if (!in) break; //return false;
{
int auth = in->authority();
if (auth != mds->get_nodeid()) {
+ assert(!in->is_auth());
dout(7) << "sending expire to mds" << auth << " on " << *in << endl;
if (expiremap.count(auth) == 0) expiremap[auth] = new MCacheExpire(mds->get_nodeid());
expiremap[auth]->add_inode(in->ino(), in->replica_nonce);
- }
+ } else {
+ assert(in->is_auth());
+ }
}
CInode *diri = NULL;
if (in->parent)
{
dout(1) << "shutdown_start" << endl;
- shutdown_commits = 0;
- if (g_conf.mds_commit_on_shutdown) {
- dout(1) << "shutdown_start committing all dirty dirs" << endl;
-
- for (hash_map<inodeno_t, CInode*>::iterator it = inode_map.begin();
- it != inode_map.end();
- it++) {
- CInode *in = it->second;
-
- // commit any dirty dir that's ours
- if (in->is_dir() && in->dir && in->dir->is_auth() && in->dir->is_dirty()) {
- mds->mdstore->commit_dir(in->dir, new C_MDC_ShutdownCommit(this));
- shutdown_commits++;
- }
- }
- }
-
}
bool MDCache::shutdown_pass()
return true;
}
- // commits?
- if (g_conf.mds_commit_on_shutdown &&
- shutdown_commits > 0) {
- dout(7) << "shutdown_commits still waiting for " << shutdown_commits << endl;
+ // unhash dirs?
+ if (!hashdirs.empty()) {
+ // unhash any of my dirs?
+ for (set<CDir*>::iterator it = hashdirs.begin();
+ it != hashdirs.end();
+ it++) {
+ CDir *dir = *it;
+ if (!dir->is_auth()) continue;
+ if (dir->is_unhashing()) continue;
+ unhash_dir(dir);
+ }
+
+ dout(7) << "waiting for dirs to unhash" << endl;
return false;
}
+ // commit dirs?
+ if (g_conf.mds_commit_on_shutdown) {
+
+ if (shutdown_commits < 0) {
+ dout(1) << "shutdown_pass committing all dirty dirs" << endl;
+ shutdown_commits = 0;
+
+ for (hash_map<inodeno_t, CInode*>::iterator it = inode_map.begin();
+ it != inode_map.end();
+ it++) {
+ CInode *in = it->second;
+
+ // commit any dirty dir that's ours
+ if (in->is_dir() && in->dir && in->dir->is_auth() && in->dir->is_dirty()) {
+ mds->mdstore->commit_dir(in->dir, new C_MDC_ShutdownCommit(this));
+ shutdown_commits++;
+ }
+ }
+ }
+
+ // commits?
+ if (shutdown_commits > 0) {
+ dout(7) << "shutdown_commits still waiting for " << shutdown_commits << endl;
+ return false;
+ }
+ }
+
// flush anything we can from the cache
trim(0);
dout(5) << "cache size now " << lru.lru_get_size() << endl;
// waiting for imports? (e.g. root?)
if (exports.size()) {
dout(7) << "still have " << exports.size() << " exports" << endl;
+ show_cache();
return false;
}
// imports?
if (!imports.empty()) {
dout(7) << "still have " << imports.size() << " imports" << endl;
+ show_cache();
return false;
}
// done?
if (lru.lru_get_size() > 0) {
dout(7) << "there's still stuff in the cache: " << lru.lru_get_size() << endl;
- //show_cache();
+ show_cache();
//dump();
return false;
}
// root directory too
assert(root->dir == NULL);
root->set_dir( new CDir(root, mds, true) );
- root->dir->dir_auth = 0; // me!
+ root->dir->set_dir_auth( 0 ); // me!
root->dir->dir_rep = CDIR_REP_NONE;
// root is sort of technically an import (from a vacuum)
}
-CDir *MDCache::get_containing_import(CDir *dir)
+
+/*
+ * some import/export helpers
+ */
+
+/** con = get_auth_container(dir)
+ * Returns the directory in which authority is delegated for *dir.
+ * This may be because a directory is an import, or because it is hashed
+ * and we are nested underneath an inode in that dir (that hashes to us).
+ * Thus do not assume con->is_auth()! It is_auth() || is_hashed().
+ */
+CDir *MDCache::get_auth_container(CDir *dir)
{
CDir *imp = dir; // might be *dir
- // find the underlying import!
- while (imp &&
- !imp->is_import()) {
+ // find the underlying import or hash that delegates dir
+ while (true) {
+ if (imp->is_import()) break; // import
imp = imp->get_parent_dir();
+ assert(imp);
+ if (imp->is_hashed()) break; // hash
}
- assert(imp);
return imp;
}
-CDir *MDCache::get_containing_export(CDir *dir)
+
+void MDCache::find_nested_exports(CDir *dir, set<CDir*>& s)
+{
+ CDir *import = get_auth_container(dir);
+ find_nested_exports_under(import, dir, s);
+}
+
+void MDCache::find_nested_exports_under(CDir *import, CDir *dir, set<CDir*>& s)
{
- CDir *ex = dir; // might be *dir
+ dout(10) << "find_nested_exports for " << *dir << endl;
+ dout(10) << "find_nested_exports_under import " << *import << endl;
- // find the underlying import!
- while (ex && // while not at root,
- exports.count(ex) == 0) { // we didn't find an export,
- ex = ex->get_parent_dir();
+ if (import == dir) {
+ // yay, my job is easy!
+ for (set<CDir*>::iterator p = nested_exports[import].begin();
+ p != nested_exports[import].end();
+ p++) {
+ CDir *nested = *p;
+ s.insert(nested);
+ dout(10) << "find_nested_exports " << *dir << " " << *nested << endl;
+ }
+ return;
}
- return ex;
+ // ok, my job is annoying.
+ for (set<CDir*>::iterator p = nested_exports[import].begin();
+ p != nested_exports[import].end();
+ p++) {
+ CDir *nested = *p;
+
+ dout(12) << "find_nested_exports checking " << *nested << endl;
+
+ // trace back to import, or dir
+ CDir *cur = nested->get_parent_dir();
+ while (!cur->is_import() || cur == dir) {
+ if (cur == dir) {
+ s.insert(nested);
+ dout(10) << "find_nested_exports " << *dir << " " << *nested << endl;
+ break;
+ } else {
+ cur = cur->get_parent_dir();
+ }
+ }
+ }
}
break;
+ // hashing
+ case MSG_MDS_HASHDIRDISCOVER:
+ handle_hash_dir_discover((MHashDirDiscover*)m);
+ break;
+ case MSG_MDS_HASHDIRDISCOVERACK:
+ handle_hash_dir_discover_ack((MHashDirDiscoverAck*)m);
+ break;
+ case MSG_MDS_HASHDIRPREP:
+ handle_hash_dir_prep((MHashDirPrep*)m);
+ break;
+ case MSG_MDS_HASHDIRPREPACK:
+ handle_hash_dir_prep_ack((MHashDirPrepAck*)m);
+ break;
+ case MSG_MDS_HASHDIR:
+ handle_hash_dir((MHashDir*)m);
+ break;
+ case MSG_MDS_HASHDIRACK:
+ handle_hash_dir_ack((MHashDirAck*)m);
+ break;
+ case MSG_MDS_HASHDIRNOTIFY:
+ handle_hash_dir_notify((MHashDirNotify*)m);
+ break;
+
+ // unhashing
+ case MSG_MDS_UNHASHDIRPREP:
+ handle_unhash_dir_prep((MUnhashDirPrep*)m);
+ break;
+ case MSG_MDS_UNHASHDIRPREPACK:
+ handle_unhash_dir_prep_ack((MUnhashDirPrepAck*)m);
+ break;
+ case MSG_MDS_UNHASHDIR:
+ handle_unhash_dir((MUnhashDir*)m);
+ break;
+ case MSG_MDS_UNHASHDIRACK:
+ handle_unhash_dir_ack((MUnhashDirAck*)m);
+ break;
+ case MSG_MDS_UNHASHDIRNOTIFY:
+ handle_unhash_dir_notify((MUnhashDirNotify*)m);
+ break;
+ case MSG_MDS_UNHASHDIRNOTIFYACK:
+ handle_unhash_dir_notify_ack((MUnhashDirNotifyAck*)m);
+ break;
+
+
default:
dout(7) << "cache unknown message " << m->get_type() << endl;
// open dir
if (!cur->dir) {
if (cur->dir_is_auth()) {
+ // parent dir frozen_dir?
+ if (cur->is_frozen_dir()) {
+ dout(7) << "traverse: " << *cur->get_parent_dir() << " is frozen_dir, waiting" << endl;
+ cur->get_parent_dir()->add_waiter(CDIR_WAIT_UNFREEZE, ondelay);
+ if (onfinish) delete onfinish;
+ return 1;
+ }
+
cur->get_or_open_dir(mds);
assert(cur->dir);
} else {
}
// frozen?
+ /*
if (cur->dir->is_frozen()) {
// doh!
// FIXME: traverse is allowed?
if (onfinish) delete onfinish;
return 1;
}
-
+ */
+
// must read directory hard data (permissions, x bit) to traverse
if (!noperm && !inode_hard_read_try(cur, ondelay)) {
if (onfinish) delete onfinish;
//fullpath = dis->get_want();
- if (!root->is_cached_by_anyone())
- root->replicate_relax_locks();
-
// add root
reply = new MDiscoverReply(0);
- reply->add_inode( new CInodeDiscover( root,
- root->cached_by_add( dis->get_asker() ) ) );
+ reply->add_inode( root->replicate_to( dis->get_asker() ) );
dout(10) << "added root " << *root << endl;
cur = root;
cur = get_inode(dis->get_base_ino());
assert(cur);
- /*string p;
- cur->make_path(p);
- p += "/";
- p += dis->get_want().get_path();
- fullpath = p;
- */
-
if (dis->wants_base_dir()) {
dout(7) << "discover from mds" << dis->get_asker() << " has " << *cur << " wants dir+" << dis->get_want().get_path() << endl;
} else {
return;
}
- cur->get_or_open_dir(mds);
- assert(cur);
+ // frozen_dir?
+ if (!cur->dir && cur->is_frozen_dir()) {
+ dout(7) << "is frozen_dir, waiting" << endl;
+ cur->get_parent_dir()->add_waiter(CDIR_WAIT_UNFREEZE,
+ new C_MDS_RetryMessage(mds, dis));
+ return;
+ }
+
+ if (!cur->dir)
+ cur->get_or_open_dir(mds);
+ assert(cur->dir);
dout(10) << "dir is " << *cur->dir << endl;
break;
}
- cur->get_or_open_dir(mds);
-
- // frozen?
- /* hmmm do we care, actually?
- if (dir->is_frozen()) {
- dout(7) << *dir << " frozen, waiting" << endl;
- dir->add_waiter(new C_MDS_RetryMessage( dis, mds ));
- delete reply;
- return;
+ // did we hit a frozen_dir?
+ if (!cur->dir && cur->is_frozen_dir()) {
+ dout(7) << *cur << " is frozen_dir, stopping" << endl;
+ break;
}
- */
-
+
+ if (!cur->dir) cur->get_or_open_dir(mds);
+
reply->add_dir( new CDirDiscover( cur->dir,
cur->dir->open_by_add( dis->get_asker() ) ) );
dout(7) << "added dir " << *cur->dir << endl;
CInode *next = dn->inode;
assert(next->is_auth());
- // relax inode lock before we replicate?
- if (!next->is_cached_by_anyone()) {
- next->replicate_relax_locks();
- }
-
// add inode
- int nonce = next->cached_by_add(dis->get_asker());
- reply->add_inode( new CInodeDiscover(next,
- nonce) );
- dout(7) << "added inode " << *next << " nonce=" << nonce<< endl;
+ //int nonce = next->cached_by_add(dis->get_asker());
+ reply->add_inode( next->replicate_to( dis->get_asker() ) );
+ dout(7) << "added inode " << *next << endl;// " nonce=" << nonce<< endl;
// descend
cur = next;
}
-/*
-void MDCache::handle_inode_writer_closed(MInodeWriterClosed *m)
-{
- CInode *in = get_inode(m->get_ino());
- assert(in);
- assert(in->is_auth() || in->is_proxy());
-
- int from = m->get_from();
-
- if (in->is_proxy()) {
- int newauth = in->authority();
- assert(newauth >= 0);
- dout(7) << "handle_inode_writer_closed " << m->get_ino() << " from " << from << ": proxy, fw to " << newauth << endl;
- mds->messenger->send_message(m,
- MSG_ADDR_MDS(newauth), MDS_PORT_CACHE,
- MDS_PORT_CACHE);
- return;
- }
-
- dout(7) << "handle_inode_wrtier_closed " << *in << " from " << from << endl;
-
- // remove from my set
- inode_soft_eval(in);
-
- delete m;
-}
-*/
-
int MDCache::send_dir_updates(CDir *dir, bool bcast)
{
MDS_PORT_CACHE);
}
- //g_conf.debug = 10;
-
return 0;
}
// encode and export inode state
bufferlist inode_state;
encode_export_inode(in, inode_state, destauth);
- // HACK FIXME
- crope inode_state_rope;
- inode_state_rope.append(inode_state.c_str(), inode_state.length());
// send
MRename *m = new MRename(initiator,
srcdir->ino(), srcdn->name, destdirino, destname,
- inode_state_rope);
+ inode_state);
mds->messenger->send_message(m,
MSG_ADDR_MDS(destauth), MDS_PORT_CACHE, MDS_PORT_CACHE);
int off = 0;
// HACK
bufferlist bufstate;
- bufstate.append(m->get_inode_state().c_str(), m->get_inode_state().length());
+ bufstate.claim_append(m->get_inode_state());
decode_import_inode(destdn, bufstate, off, m->get_source());
CInode *in = destdn->inode;
-
+// ==========================================================
// IMPORT/EXPORT
-void MDCache::find_nested_exports(CDir *dir, set<CDir*>& s)
-{
- CDir *import = get_containing_import(dir);
- find_nested_exports_under(import, dir, s);
-}
-
-void MDCache::find_nested_exports_under(CDir *import, CDir *dir, set<CDir*>& s)
-{
-
- dout(10) << "find_nested_exports for " << *dir << endl;
- dout(10) << "find_nested_exports under import " << *import << endl;
-
- if (import == dir) {
- // yay, my job is easy!
- for (set<CDir*>::iterator p = nested_exports[import].begin();
- p != nested_exports[import].end();
- p++) {
- CDir *nested = *p;
- s.insert(nested);
- dout(10) << "find_nested_exports " << *dir << " " << *nested << endl;
- }
- return;
- }
-
- // ok, my job is annoying.
- for (set<CDir*>::iterator p = nested_exports[import].begin();
- p != nested_exports[import].end();
- p++) {
- CDir *nested = *p;
-
- dout(12) << "find_nested_exports checking " << *nested << endl;
-
- // trace back to import, or dir
- CDir *cur = nested->get_parent_dir();
- while (!cur->is_import() || cur == dir) {
- if (cur == dir) {
- s.insert(nested);
- dout(10) << "find_nested_exports " << *dir << " " << *nested << endl;
- break;
- } else {
- cur = cur->get_parent_dir();
- }
- }
- }
-}
-class C_MDS_ExportFreeze : public Context {
+class C_MDC_ExportFreeze : public Context {
MDS *mds;
CDir *ex; // dir i'm exporting
int dest;
public:
- C_MDS_ExportFreeze(MDS *mds, CDir *ex, int dest) {
+ C_MDC_ExportFreeze(MDS *mds, CDir *ex, int dest) {
this->mds = mds;
this->ex = ex;
this->dest = dest;
};
-class C_MDS_ExportGo : public Context {
- MDS *mds;
- CDir *ex; // dir i'm exporting
- int dest;
-
-public:
- C_MDS_ExportGo(MDS *mds, CDir *ex, int dest) {
- this->mds = mds;
- this->ex = ex;
- this->dest = dest;
- }
- virtual void finish(int r) {
- mds->mdcache->export_dir_go(ex, dest);
- }
-};
-
-
-class C_MDS_ExportFinish : public Context {
- MDS *mds;
- CDir *ex; // dir i'm exporting
-
-public:
- // contexts for waiting operations on the affected subtree
- list<Context*> will_redelegate;
- list<Context*> will_fail;
-
- C_MDS_ExportFinish(MDS *mds, CDir *ex, int dest) {
- this->mds = mds;
- this->ex = ex;
- }
-
- // suck up and categorize waitlists
- void assim_waitlist(list<Context*>& ls) {
- for (list<Context*>::iterator it = ls.begin();
- it != ls.end();
- it++) {
- dout(7) << "assim_waitlist context " << *it << endl;
- if ((*it)->can_redelegate())
- will_redelegate.push_back(*it);
- else
- will_fail.push_back(*it);
- }
- ls.clear();
- }
- void assim_waitlist(hash_map< string, list<Context*> >& cmap) {
- for (hash_map< string, list<Context*> >::iterator hit = cmap.begin();
- hit != cmap.end();
- hit++) {
- for (list<Context*>::iterator lit = hit->second.begin(); lit != hit->second.end(); lit++) {
- dout(7) << "assim_waitlist context " << *lit << endl;
- if ((*lit)->can_redelegate())
- will_redelegate.push_back(*lit);
- else
- will_fail.push_back(*lit);
- }
- }
- cmap.clear();
- }
-
-
- virtual void finish(int r) {
- if (r >= 0) {
-
- finish_contexts(will_fail);
- finish_contexts(will_redelegate);
- return;
-
- // THIS IS ALL STUPID: (???)
- /*
- // redelegate
- list<Context*>::iterator it;
- for (it = will_redelegate.begin(); it != will_redelegate.end(); it++) {
- (*it)->redelegate(mds, ex->authority());
- delete *it; // delete context
- }
-
- // fail
- // this happens with:
- // - commit_dir
- // - ?
- for (it = will_fail.begin(); it != will_fail.end(); it++) {
- Context *c = *it;
- dout(7) << "failing context " << c << endl;
- //assert(false);
- c->finish(-1); // fail
- delete c; // delete context
- }
- */
- } else {
- assert(false); // now what?
- }
- }
-};
-
+/** export_dir(dir, dest)
+ * public method to initiate an export.
+ * will fail if the directory is freezing, frozen, unpinnable, or root.
+ */
void MDCache::export_dir(CDir *dir,
int dest)
{
+ dout(7) << "export_dir " << *dir << " to " << dest << endl;
assert(dest != mds->get_nodeid());
-
+ assert(!dir->is_hashed());
+
if (dir->inode->is_root()) {
dout(7) << "i won't export root" << endl;
assert(0);
dout(7) << " can't export, freezing|frozen. wait for other exports to finish first." << endl;
return;
}
+ if (dir->is_hashed()) {
+ dout(7) << "can't export hashed dir right now. implement me carefully later." << endl;
+ return;
+ }
+
// pin path?
vector<CDentry*> trace;
return;
}
+ // ok, let's go.
+
// send ExportDirDiscover (ask target)
- dout(7) << "export_dir " << *dir << " to " << dest << ", sending ExportDirDiscover" << endl;
+ export_gather[dir].insert(dest);
mds->messenger->send_message(new MExportDirDiscover(dir->inode),
dest, MDS_PORT_CACHE, MDS_PORT_CACHE);
- dir->auth_pin(); // pin dir, to hang up our freeze
-
- // take away popularity (and pass it on to the context, MExportDir request later)
- mds->balancer->subtract_export(dir);
-
- // freeze the subtree
- dir->freeze_tree(new C_MDS_ExportFreeze(mds, dir, dest));
+ dir->auth_pin(); // pin dir, to hang up our freeze (unpin on prep ack)
- // get waiter ready to do actual export
- dir->add_waiter(CDIR_WAIT_EXPORTPREPACK,
- new C_MDS_ExportGo(mds, dir, dest));
+ // take away the popularity we're sending. FIXME: do this later?
+ mds->balancer->subtract_export(dir);
- // drop any sync or lock if sticky
- /*
- if (g_conf.mds_cache_sticky_sync_normal ||
- g_conf.mds_cache_sticky_sync_softasync)
- export_dir_dropsync(dir);
- */
- // NOTE: we don't need to worry about hard locks; those aren't sticky (yet?).
+
+ // freeze the subtree
+ dir->freeze_tree(new C_MDC_ExportFreeze(mds, dir, dest));
}
-
+/*
+ * called on receipt of MExportDirDiscoverAck
+ * the importer now has the directory's _inode_ in memory, and pinned.
+ */
void MDCache::handle_export_dir_discover_ack(MExportDirDiscoverAck *m)
{
CInode *in = get_inode(m->get_ino());
assert(in);
CDir *dir = in->dir;
assert(dir);
-
- dout(7) << "export_dir_discover_ack " << *dir << ", releasing auth_pin" << endl;
- dir->auth_unpin(); // unpin to allow freeze to complete
+ int from = m->get_source();
+ assert(export_gather[dir].count(from));
+ export_gather[dir].erase(from);
- // done
- delete m;
+ if (export_gather[dir].empty()) {
+ dout(7) << "export_dir_discover_ack " << *dir << ", releasing auth_pin" << endl;
+ dir->auth_unpin(); // unpin to allow freeze to complete
+ } else {
+ dout(7) << "export_dir_discover_ack " << *dir << ", still waiting for " << export_gather[dir] << endl;
+ }
+
+ delete m; // done
}
dout(5) << " added " << *in << endl;
prep->add_inode( in->parent->dir->ino(),
in->parent->name,
- new CInodeDiscover(in, in->cached_by_add(dest)) );
+ in->replicate_to(dest) );
}
}
dout(7) << "export_dir_prep_ack " << *dir << ", starting export" << endl;
- dir->finish_waiting(CDIR_WAIT_EXPORTPREPACK);
+ // start export.
+ export_dir_go(dir, m->get_source());
// done
delete m;
// update imports/exports
- CDir *containing_import = get_containing_import(dir);
+ CDir *containing_import = get_auth_container(dir);
if (containing_import == dir) {
dout(7) << " i'm rexporting a previous import" << endl;
+ assert(dir->is_import());
imports.erase(dir);
dir->state_clear(CDIR_STATE_IMPORT);
dir->put(CDIR_PIN_IMPORT); // unpin, no longer an import
// discard nested exports (that we're handing off
- // NOTE: possible concurrent modification bug?
for (set<CDir*>::iterator p = nested_exports[dir].begin();
- p != nested_exports[dir].end();
- p++) {
+ p != nested_exports[dir].end(); ) {
CDir *nested = *p;
+ p++;
// add to export message
req->add_export(nested);
if (nested == dir) continue; // ignore myself
// container of parent; otherwise we get ourselves.
- CDir *containing_export = get_containing_export(nested->get_parent_dir());
+ CDir *containing_export = nested->get_parent_dir();
+ while (containing_export && !containing_export->is_export())
+ containing_export = containing_export->get_parent_dir();
if (!containing_export) continue;
if (containing_export == dir) {
req->add_export(nested);
} else {
dout(12) << " export " << *nested << " is under other export " << *containing_export << ", which is unrelated" << endl;
- assert(get_containing_import(containing_export) != containing_import);
+ assert(get_auth_container(containing_export) != containing_import);
}
}
}
// note new authority (locally)
if (dir->inode->authority() == dest)
- dir->dir_auth = CDIR_AUTH_PARENT;
+ dir->set_dir_auth( CDIR_AUTH_PARENT );
else
- dir->dir_auth = dest;
+ dir->set_dir_auth( dest );
// make list of nodes i expect an export_dir_notify_ack from
// (everyone w/ this dir open, but me!)
assert(export_notify_ack_waiting[dir].count( dest ));
// fill export message with cache data
- C_MDS_ExportFinish *fin = new C_MDS_ExportFinish(mds, dir, dest);
+ C_Contexts *fin = new C_Contexts;
int num_exported_inodes = export_dir_walk( req,
fin,
dir, // base
MSG_ADDR_CLIENT(it->first));
}
- // relax locks
+ // relax locks?
if (!in->is_cached_by_anyone())
in->replicate_relax_locks();
-
+
// add inode
+ assert(in->cached_by.count(mds->get_nodeid()) == 0);
CInodeExport istate( in );
istate._encode( enc_state );
int MDCache::export_dir_walk(MExportDir *req,
- C_MDS_ExportFinish *fin,
- CDir *basedir,
- CDir *dir,
- int newauth)
+ C_Contexts *fin,
+ CDir *basedir,
+ CDir *dir,
+ int newauth)
{
int num_exported = 0;
assert(dir->is_auth());
dir->state_clear(CDIR_STATE_AUTH);
dir->replica_nonce = CDIR_NONCE_EXPORT;
-
- if (dir->is_dirty()) {
- dir->mark_clean();
- }
- // discard most dir state
- dir->state &= CDIR_MASK_STATE_EXPORT_KEPT; // i only retain a few things.
-
// proxy
dir->state_set(CDIR_STATE_PROXY);
dir->get(CDIR_PIN_PROXY);
export_proxy_dirinos[basedir].push_back(dir->ino());
-
- // suck up all waiters
- list<Context*> waiting;
- dir->take_waiting(CDIR_WAIT_ANY, waiting); // all dir waiters
- fin->assim_waitlist(waiting);
-
-
- // inodes
list<CDir*> subdirs;
- CDir_map_t::iterator it;
- for (it = dir->begin(); it != dir->end(); it++) {
- CDentry *dn = it->second;
- CInode *in = dn->inode;
-
- num_exported++;
+ if (dir->is_hashed()) {
+ // fix state
+ dir->state_clear( CDIR_STATE_AUTH );
- // -- dentry
- dout(7) << "export_dir_walk exporting " << *dn << endl;
- _encode(it->first, enc_dir);
+ } else {
- if (dn->is_dirty())
- enc_dir.append("D", 1); // dirty
- else
- enc_dir.append("C", 1); // clean
-
- // null dentry?
- if (dn->is_null()) {
- enc_dir.append("N", 1); // null dentry
- assert(dn->is_sync());
- continue;
- }
-
- if (dn->is_remote()) {
- // remote link
- enc_dir.append("L", 1); // remote link
-
- inodeno_t ino = dn->get_remote_ino();
- enc_dir.append((char*)&ino, sizeof(ino));
- continue;
- }
-
- // primary link
- // -- inode
- enc_dir.append("I", 1); // inode dentry
+ if (dir->is_dirty())
+ dir->mark_clean();
- encode_export_inode(in, enc_dir, newauth); // encode, and (update state for) export
+ // discard most dir state
+ dir->state &= CDIR_MASK_STATE_EXPORT_KEPT; // i only retain a few things.
- // directory?
- if (in->is_dir() && in->dir) {
- if (in->dir->is_auth()) {
- // nested subdir
- assert(in->dir->dir_auth == CDIR_AUTH_PARENT);
- subdirs.push_back(in->dir); // it's ours, recurse (later)
-
- } else {
- // nested export
- assert(in->dir->dir_auth >= 0);
- dout(7) << " encountered nested export " << *in->dir << " dir_auth " << in->dir->dir_auth << "; removing from exports" << endl;
- assert(exports.count(in->dir) == 1);
- exports.erase(in->dir); // discard nested export (nested_exports updated above)
-
- in->dir->state_clear(CDIR_STATE_EXPORT);
- in->dir->put(CDIR_PIN_EXPORT);
-
- // simplify dir_auth?
- if (in->dir->dir_auth == newauth)
- in->dir->dir_auth = CDIR_AUTH_PARENT;
- }
- }
+ // suck up all waiters
+ list<Context*> waiting;
+ dir->take_waiting(CDIR_WAIT_ANY, waiting); // all dir waiters
+ fin->take(waiting);
- // add to proxy
- export_proxy_inos[basedir].push_back(in->ino());
- in->state_set(CINODE_STATE_PROXY);
- in->get(CINODE_PIN_PROXY);
+ // inodes
- // waiters
- list<Context*> waiters;
- in->take_waiting(CINODE_WAIT_ANY, waiters);
- fin->assim_waitlist(waiters);
+ CDir_map_t::iterator it;
+ for (it = dir->begin(); it != dir->end(); it++) {
+ CDentry *dn = it->second;
+ CInode *in = dn->inode;
+
+ num_exported++;
+
+ // -- dentry
+ dout(7) << "export_dir_walk exporting " << *dn << endl;
+ _encode(it->first, enc_dir);
+
+ if (dn->is_dirty())
+ enc_dir.append("D", 1); // dirty
+ else
+ enc_dir.append("C", 1); // clean
+
+ // null dentry?
+ if (dn->is_null()) {
+ enc_dir.append("N", 1); // null dentry
+ assert(dn->is_sync());
+ continue;
+ }
+
+ if (dn->is_remote()) {
+ // remote link
+ enc_dir.append("L", 1); // remote link
+
+ inodeno_t ino = dn->get_remote_ino();
+ enc_dir.append((char*)&ino, sizeof(ino));
+ continue;
+ }
+
+ // primary link
+ // -- inode
+ enc_dir.append("I", 1); // inode dentry
+
+ encode_export_inode(in, enc_dir, newauth); // encode, and (update state for) export
+
+ // directory?
+ if (in->is_dir() && in->dir) {
+ if (in->dir->is_auth()) {
+ // nested subdir
+ assert(in->dir->get_dir_auth() == CDIR_AUTH_PARENT);
+ subdirs.push_back(in->dir); // it's ours, recurse (later)
+
+ } else {
+ // nested export
+ assert(in->dir->get_dir_auth() >= 0);
+ dout(7) << " encountered nested export " << *in->dir << " dir_auth " << in->dir->get_dir_auth() << "; removing from exports" << endl;
+ assert(exports.count(in->dir) == 1);
+ exports.erase(in->dir); // discard nested export (nested_exports updated above)
+
+ in->dir->state_clear(CDIR_STATE_EXPORT);
+ in->dir->put(CDIR_PIN_EXPORT);
+
+ // simplify dir_auth?
+ if (in->dir->get_dir_auth() == newauth)
+ in->dir->set_dir_auth( CDIR_AUTH_PARENT );
+ }
+ }
+
+ // add to proxy
+ export_proxy_inos[basedir].push_back(in->ino());
+ in->state_set(CINODE_STATE_PROXY);
+ in->get(CINODE_PIN_PROXY);
+
+ // waiters
+ list<Context*> waiters;
+ in->take_waiting(CINODE_WAIT_ANY, waiters);
+ fin->take(waiters);
+ }
}
req->add_dir( enc_dir );
-
+
// subdirs
for (list<CDir*>::iterator it = subdirs.begin(); it != subdirs.end(); it++)
num_exported += export_dir_walk(req, fin, basedir, *it, newauth);
if (!in->dir) {
dout(5) << " opening nested export on " << *in << endl;
-
- // open (send discover back to old auth for fw to dir auth)
- filepath want;
- mds->messenger->send_message(new MDiscover(mds->get_nodeid(),
- in->ino(),
- want,
- true),
- MSG_ADDR_MDS(in->authority()), MDS_PORT_CACHE, MDS_PORT_CACHE);
-
- // wait
- in->add_waiter(CINODE_WAIT_DIR,
- new C_MDS_RetryMessage(mds, m));
+ open_remote_dir(in,
+ new C_MDS_RetryMessage(mds, m));
}
}
} else {
show_imports();
- // note new authority (locally) in inode
+ // note new authority (locally)
if (dir->inode->is_auth())
- dir->dir_auth = CDIR_AUTH_PARENT;
+ dir->set_dir_auth( CDIR_AUTH_PARENT );
else
- dir->dir_auth = mds->get_nodeid();
- dout(10) << " set dir_auth to " << dir->dir_auth << endl;
+ dir->set_dir_auth( mds->get_nodeid() );
+ dout(10) << " set dir_auth to " << dir->get_dir_auth() << endl;
// update imports/exports
CDir *containing_import;
dir->state_clear(CDIR_STATE_EXPORT);
dir->put(CDIR_PIN_EXPORT); // unpin, no longer an export
- containing_import = get_containing_import(dir);
+ containing_import = get_auth_container(dir);
dout(7) << " it is nested under import " << *containing_import << endl;
nested_exports[containing_import].erase(dir);
} else {
}
nested_exports.erase(ex); // de-list under old import
- ex->dir_auth = CDIR_AUTH_PARENT;
+ ex->set_dir_auth( CDIR_AUTH_PARENT );
ex->put(CDIR_PIN_IMPORT); // imports are pinned, no longer import
} else {
// assimilate state
dstate.update_dir( dir );
- if (diri->is_auth()) dir->dir_auth = CDIR_AUTH_PARENT; // update_dir may hose dir_auth
+ if (diri->is_auth())
+ dir->set_dir_auth( CDIR_AUTH_PARENT ); // update_dir may hose dir_auth
// mark (may already be marked from get_or_open_dir() above)
if (!dir->is_auth())
if (dir->is_open_by(mds->get_nodeid()))
dir->open_by_remove(mds->get_nodeid());
- // take all waiters on this dir
- // NOTE: a pass of imported data is guaranteed to get all of my waiters because
- // a replica's presense in my cache implies/forces it's presense in authority's.
- list<Context*> waiters;
- dir->take_waiting(CDIR_WAIT_ANY, waiters);
- for (list<Context*>::iterator it = waiters.begin();
- it != waiters.end();
- it++)
- import_root->add_waiter(CDIR_WAIT_IMPORTED, *it);
-
- dout(15) << "doing contents" << endl;
-
- // contents
- int num_imported = 0;
- for (long nden = dstate.get_nden(); nden>0; nden--) {
-
- num_imported++;
-
- // dentry
- string dname;
- _decode(dname, bl, off);
- dout(15) << "dname is " << dname << endl;
-
- char dirty;
- bl.copy(off, 1, &dirty);
- off++;
-
- char icode;
- bl.copy(off, 1, &icode);
- off++;
-
- CDentry *dn = dir->lookup(dname);
- if (!dn)
- dn = dir->add_dentry(dname); // null
-
- // mark dn dirty _after_ we link the inode (scroll down)
+ if (dir->is_hashed()) {
- if (icode == 'N') {
+ // do nothing; dir is hashed
+ return 0;
+ } else {
+ // take all waiters on this dir
+ // NOTE: a pass of imported data is guaranteed to get all of my waiters because
+ // a replica's presense in my cache implies/forces it's presense in authority's.
+ list<Context*> waiters;
+
+ dir->take_waiting(CDIR_WAIT_ANY, waiters);
+ for (list<Context*>::iterator it = waiters.begin();
+ it != waiters.end();
+ it++)
+ import_root->add_waiter(CDIR_WAIT_IMPORTED, *it);
+
+ dout(15) << "doing contents" << endl;
+
+ // contents
+ int num_imported = 0;
+ long nden = dstate.get_nden();
- // null dentry
- assert(dn->is_null());
+ for (; nden>0; nden--) {
+
+ num_imported++;
+
+ // dentry
+ string dname;
+ _decode(dname, bl, off);
+ dout(15) << "dname is " << dname << endl;
+
+ char dirty;
+ bl.copy(off, 1, &dirty);
+ off++;
+
+ char icode;
+ bl.copy(off, 1, &icode);
+ off++;
+
+ CDentry *dn = dir->lookup(dname);
+ if (!dn)
+ dn = dir->add_dentry(dname); // null
+
+ // mark dn dirty _after_ we link the inode (scroll down)
+
+ if (icode == 'N') {
+ // null dentry
+ assert(dn->is_null());
+
+ // fall thru
+ }
+ else if (icode == 'L') {
+ // remote link
+ inodeno_t ino;
+ bl.copy(off, sizeof(ino), (char*)&ino);
+ off += sizeof(ino);
+ dir->link_inode(dn, ino);
+ }
+ else if (icode == 'I') {
+ // inode
+ decode_import_inode(dn, bl, off, oldauth);
+ }
+
+ // mark dentry dirty? (only _after_ we link the inode!)
+ if (dirty == 'D') dn->mark_dirty();
- // fall thru
- }
- else if (icode == 'L') {
- // remote link
- inodeno_t ino;
- bl.copy(off, sizeof(ino), (char*)&ino);
- off += sizeof(ino);
- dir->link_inode(dn, ino);
}
- else if (icode == 'I') {
- // inode
- decode_import_inode(dn, bl, off, oldauth);
- }
-
- // mark dentry dirty? (only _after_ we link the inode!)
- if (dirty == 'D') dn->mark_dirty();
-
- }
-
- return num_imported;
-}
-
-
-void MDCache::got_hashed_replica(CDir *import,
- inodeno_t dir_ino,
- inodeno_t replica_ino)
-{
-
- dout(7) << "got_hashed_replica for import " << *import << " ino " << replica_ino << " in dir " << dir_ino << endl;
-
- // remove from import_hashed_replicate_waiting.
- for (multimap<inodeno_t,inodeno_t>::iterator it = import_hashed_replicate_waiting.find(dir_ino);
- it != import_hashed_replicate_waiting.end();
- it++) {
- if (it->second == replica_ino) {
- import_hashed_replicate_waiting.erase(it);
- break;
- } else
- assert(it->first == dir_ino); // it better be here!
- }
-
- // last one for that dir?
- CInode *diri = get_inode(dir_ino);
- assert(diri && diri->dir);
- if (import_hashed_replicate_waiting.count(dir_ino) > 0)
- return; // still more
-
- // done with this dir!
- diri->dir->unfreeze_dir();
-
- // remove from import_hashed_frozen_waiting
- for (multimap<inodeno_t,inodeno_t>::iterator it = import_hashed_frozen_waiting.find(import->ino());
- it != import_hashed_frozen_waiting.end();
- it++) {
- if (it->second == dir_ino) {
- import_hashed_frozen_waiting.erase(it);
- break;
- } else
- assert(it->first == import->ino()); // it better be here!
- }
-
- // last one for this import?
- if (import_hashed_frozen_waiting.count(import->ino()) == 0) {
- // all done, we can finish import!
-
-
- // THISIS BROKEN FOR HASHED... FIXME
- // mds->mdcache->import_dir_finish(import);
+ return num_imported;
}
}
if (!ndir) continue;
int boundauth = ndir->authority();
- dout(7) << "export_dir_notify bound " << *ndir << " was dir_auth " << ndir->dir_auth << " (" << boundauth << ")" << endl;
- if (ndir->dir_auth == CDIR_AUTH_PARENT) {
+ dout(7) << "export_dir_notify bound " << *ndir << " was dir_auth " << ndir->get_dir_auth() << " (" << boundauth << ")" << endl;
+ if (ndir->get_dir_auth() == CDIR_AUTH_PARENT) {
if (boundauth != m->get_new_auth())
- ndir->dir_auth = boundauth;
+ ndir->set_dir_auth( boundauth );
else assert(dir->authority() == m->get_new_auth()); // apparently we already knew!
} else {
if (boundauth == m->get_new_auth())
- ndir->dir_auth = CDIR_AUTH_PARENT;
+ ndir->set_dir_auth( CDIR_AUTH_PARENT );
}
}
// update dir_auth
if (in->authority() == m->get_new_auth()) {
dout(7) << "handle_export_dir_notify on " << *in << ": inode auth is the same, setting dir_auth -1" << endl;
- dir->dir_auth = -1;
+ dir->set_dir_auth( CDIR_AUTH_PARENT );
assert(!in->is_auth());
assert(!dir->is_auth());
} else {
- dir->dir_auth = m->get_new_auth();
+ dir->set_dir_auth( m->get_new_auth() );
}
assert(dir->authority() != mds->get_nodeid());
assert(!dir->is_auth());
CInode *diri = get_inode(*it);
if (!diri) continue; // don't have it, don't care
if (!diri->dir) continue;
- dout(10) << "handle_export_dir_notify checking subdir " << *diri->dir << " is auth " << diri->dir->dir_auth << endl;
+ dout(10) << "handle_export_dir_notify checking subdir " << *diri->dir << " is auth " << diri->dir->get_dir_auth() << endl;
assert(diri->dir != dir); // base shouldn't be in subdir list
- if (diri->dir->dir_auth != CDIR_AUTH_PARENT) {
- dout(7) << "*** weird value for dir_auth " << diri->dir->dir_auth << " on " << *diri->dir << ", should have been -1 probably??? ******************" << endl;
+ if (diri->dir->get_dir_auth() != CDIR_AUTH_PARENT) {
+ dout(7) << "*** weird value for dir_auth " << diri->dir->get_dir_auth() << " on " << *diri->dir << ", should have been -1 probably??? ******************" << endl;
assert(0); // bad news!
- //dir->dir_auth = -1;
+ //dir->set_dir_auth( CDIR_AUTH_PARENT );
}
assert(diri->dir->authority() == m->get_new_auth());
}
+
+// =======================================================================
// HASHING
+
+void MDCache::import_hashed_content(CDir *dir, bufferlist& bl, int nden, int oldauth)
+{
+ int off = 0;
+
+ for (; nden>0; nden--) {
+ // dentry
+ string dname;
+ _decode(dname, bl, off);
+ dout(15) << "dname is " << dname << endl;
+
+ char icode;
+ bl.copy(off, 1, &icode);
+ off++;
+
+ CDentry *dn = dir->lookup(dname);
+ if (!dn)
+ dn = dir->add_dentry(dname); // null
+
+ // mark dn dirty _after_ we link the inode (scroll down)
+
+ if (icode == 'N') {
+
+ // null dentry
+ assert(dn->is_null());
+
+ // fall thru
+ }
+ else if (icode == 'L') {
+ // remote link
+ inodeno_t ino;
+ bl.copy(off, sizeof(ino), (char*)&ino);
+ off += sizeof(ino);
+ dir->link_inode(dn, ino);
+ }
+ else if (icode == 'I') {
+ // inode
+ decode_import_inode(dn, bl, off, oldauth);
+
+ // fix up subdir export?
+ if (dn->inode->dir) {
+ assert(dn->inode->dir->state_test(CDIR_STATE_IMPORTINGEXPORT));
+ dn->inode->dir->put(CDIR_PIN_IMPORTINGEXPORT);
+ dn->inode->dir->state_clear(CDIR_STATE_IMPORTINGEXPORT);
+
+ if (dn->inode->dir->is_auth()) {
+ // mine. must have been an import.
+ assert(dn->inode->dir->is_import());
+ dout(7) << "unimporting subdir now that inode is mine " << *dn->inode->dir << endl;
+ dn->inode->dir->set_dir_auth( CDIR_AUTH_PARENT );
+ imports.erase(dn->inode->dir);
+ dn->inode->dir->put(CDIR_PIN_IMPORT);
+ dn->inode->dir->state_clear(CDIR_STATE_IMPORT);
+
+ // move nested under hashdir
+ for (set<CDir*>::iterator it = nested_exports[dn->inode->dir].begin();
+ it != nested_exports[dn->inode->dir].end();
+ it++)
+ nested_exports[dir].insert(*it);
+ nested_exports.erase(dn->inode->dir);
+
+ // now it matches the inode
+ dn->inode->dir->set_dir_auth( CDIR_AUTH_PARENT );
+ }
+ else {
+ // not mine. make it an export.
+ dout(7) << "making subdir into export " << *dn->inode->dir << endl;
+ dn->inode->dir->get(CDIR_PIN_EXPORT);
+ dn->inode->dir->state_set(CDIR_STATE_EXPORT);
+ exports.insert(dn->inode->dir);
+ nested_exports[dir].insert(dn->inode->dir);
+
+ if (dn->inode->dir->get_dir_auth() == CDIR_AUTH_PARENT)
+ dn->inode->dir->set_dir_auth( oldauth ); // no longer matches inode
+ assert(dn->inode->dir->get_dir_auth() >= 0);
+ }
+ }
+ }
+
+ // mark dentry dirty? (only _after_ we link the inode!)
+ dn->mark_dirty();
+ }
+}
+
/*
- interaction of hashing and export/import:
+ notes on interaction of hashing and export/import:
- dir->is_auth() is completely independent of hashing. for a hashed dir,
- all nodes are partially authoritative
- all nodes dir->is_hashed() == true
- all nodes dir->inode->dir_is_hashed() == true
- - one node dir->is_auth == true, the rest == false
- - dir_auth for all items in a hashed dir will likely be explicit.
+ - one node dir->is_auth() == true, the rest == false
+ - dir_auth for all subdirs in a hashed dir will (likely?) be explicit.
+
+ - remember simple rule: dir auth follows inode, unless dir_auth is explicit.
- - export_dir_walk and import_dir_block take care with dir_auth:
+ - export_dir_walk and import_dir_block take care with dir_auth: (for import/export)
- on export, -1 is changed to mds->get_nodeid()
- on import, nothing special, actually.
- - hashed dir files aren't included in export
- - hashed dir dirs ARE included in export, but as replicas. this is important
- because dirs are needed to tie together hierarchy, for auth to know about
+ - hashed dir files aren't included in export; subdirs are converted to imports
+ or exports as necessary.
+ - hashed dir subdirs are discovered on export. this is important
+ because dirs are needed to tie together auth hierarchy, for auth to know about
imports/exports, etc.
- - if exporter is auth, adds importer to cached_by
- - if importer is auth, importer will be fine
- - if third party is auth, sends MExportReplicatedHashed to auth
- - auth sends MExportReplicatedHashedAck to importer, who can proceed
- (ie send export ack) when all such messages are received.
- - dir state is preserved
- - COMPLETE and DIRTY aren't transferred
- - new auth should already know the dir is hashed.
-
+ - dir state is maintained on auth.
+ - COMPLETE and HASHED are transfered to importers.
+ - DIRTY is set everywhere.
+
+ - hashed dir is like an import: hashed dir used for nested_exports map.
+ - nested_exports is updated appropriately on auth and replicas.
+ - a subtree terminates as a hashed dir, since the hashing explicitly
+ redelegates all inodes. thus export_dir_walk includes hashed dirs, but
+ not their inodes.
*/
// HASH on auth
-/*void MDCache::drop_sync_in_dir(CDir *dir)
-{
- for (CDir_map_t::iterator it = dir->begin(); it != dir->end(); it++) {
- CInode *in = it->second->inode;
- if (in->is_auth() &&
- in->is_syncbyme()) {
- dout(7) << "dropping sticky(?) sync on " << *in << endl;
- inode_sync_release(in);
- }
- }
-}
-*/
-
-class C_MDS_HashFreeze : public Context {
+class C_MDC_HashFreeze : public Context {
public:
MDS *mds;
CDir *dir;
- C_MDS_HashFreeze(MDS *mds, CDir *dir) {
+ C_MDC_HashFreeze(MDS *mds, CDir *dir) {
this->mds = mds;
this->dir = dir;
}
virtual void finish(int r) {
- mds->mdcache->hash_dir_finish(dir);
+ mds->mdcache->hash_dir_frozen(dir);
}
};
-class C_MDS_HashComplete : public Context {
+class C_MDC_HashComplete : public Context {
public:
MDS *mds;
CDir *dir;
- C_MDS_HashComplete(MDS *mds, CDir *dir) {
+ C_MDC_HashComplete(MDS *mds, CDir *dir) {
this->mds = mds;
this->dir = dir;
}
}
};
+
+/** hash_dir(dir)
+ * start hashing a directory.
+ */
void MDCache::hash_dir(CDir *dir)
{
- assert(!dir->is_hashing());
+ dout(7) << "hash_dir " << *dir << endl;
+
assert(!dir->is_hashed());
assert(dir->is_auth());
dout(7) << " can't hash, freezing|frozen." << endl;
return;
}
-
- dout(7) << "hash_dir " << *dir << endl;
- // fix state
+ // pin path?
+ vector<CDentry*> trace;
+ make_trace(trace, dir->inode);
+ if (!path_pin(trace, 0, 0)) {
+ dout(7) << "hash_dir couldn't pin path, failing." << endl;
+ return;
+ }
+
+ // ok, go
dir->state_set(CDIR_STATE_HASHING);
- dir->auth_pin();
+ dir->get(CDIR_PIN_HASHING);
+ assert(dir->hashed_subset.empty());
+ // discover on all mds
+ assert(hash_gather.count(dir) == 0);
+ for (int i=0; i<mds->get_cluster()->get_num_mds(); i++) {
+ if (i == mds->get_nodeid()) continue; // except me
+ hash_gather[dir].insert(i);
+ mds->messenger->send_message(new MHashDirDiscover(dir->inode),
+ i, MDS_PORT_CACHE, MDS_PORT_CACHE);
+ }
+ dir->auth_pin(); // pin until discovers are all acked.
+
// start freeze
- dir->freeze_dir(new C_MDS_HashFreeze(mds, dir));
+ dir->freeze_dir(new C_MDC_HashFreeze(mds, dir));
// make complete
if (!dir->is_complete()) {
dout(7) << "hash_dir " << *dir << " not complete, fetching" << endl;
mds->mdstore->fetch_dir(dir,
- new C_MDS_HashComplete(mds, dir));
+ new C_MDC_HashComplete(mds, dir));
} else
hash_dir_complete(dir);
+}
- // drop any sync or lock if sticky
- /*
- if (g_conf.mds_cache_sticky_sync_normal ||
- g_conf.mds_cache_sticky_sync_softasync)
- drop_sync_in_dir(dir);
- */
+
+/*
+ * wait for everybody to discover and open the hashing dir
+ * then auth_unpin, to let the freeze happen
+ */
+void MDCache::handle_hash_dir_discover_ack(MHashDirDiscoverAck *m)
+{
+ CInode *in = get_inode(m->get_ino());
+ assert(in);
+ CDir *dir = in->dir;
+ assert(dir);
+
+ int from = m->get_source();
+ assert(hash_gather[dir].count(from));
+ hash_gather[dir].erase(from);
+
+ if (hash_gather[dir].empty()) {
+ hash_gather.erase(dir);
+ dout(7) << "hash_dir_discover_ack " << *dir << ", releasing auth_pin" << endl;
+ dir->auth_unpin(); // unpin to allow freeze to complete
+ } else {
+ dout(7) << "hash_dir_discover_ack " << *dir << ", still waiting for " << hash_gather[dir] << endl;
+ }
+
+ delete m; // done
}
+
+
+/*
+ * once the dir is completely in memory,
+ * mark all migrating inodes dirty (to pin in cache)
+ */
void MDCache::hash_dir_complete(CDir *dir)
{
- assert(dir->is_hashing());
+ dout(7) << "hash_dir_complete " << *dir << ", dirtying inodes" << endl;
+
assert(!dir->is_hashed());
assert(dir->is_auth());
-
+
// mark dirty to pin in cache
- for (CDir_map_t::iterator it = dir->begin(); it != dir->end(); it++) {
+ for (CDir_map_t::iterator it = dir->begin();
+ it != dir->end();
+ it++) {
CInode *in = it->second->inode;
- int dentryhashcode = mds->get_cluster()->hash_dentry( dir->inode->ino(), it->first );
- if (dentryhashcode == mds->get_nodeid())
- in->mark_dirty();
+ in->mark_dirty();
}
- hash_dir_finish(dir);
+ if (dir->is_frozen_dir())
+ hash_dir_go(dir);
}
-void MDCache::hash_dir_finish(CDir *dir)
+
+/*
+ * once the dir is frozen,
+ * make sure it's complete
+ * send the prep messages!
+ */
+void MDCache::hash_dir_frozen(CDir *dir)
{
- /*
- assert(dir->is_hashing());
+ dout(7) << "hash_dir_frozen " << *dir << endl;
+
assert(!dir->is_hashed());
assert(dir->is_auth());
+ assert(dir->is_frozen_dir());
- if (!dir->is_frozen_dir()) {
- dout(7) << "hash_dir_finish !frozen yet " << *dir->inode << endl;
- return;
- }
if (!dir->is_complete()) {
- dout(7) << "hash_dir_finish !complete, waiting still " << *dir->inode << endl;
+ dout(7) << "hash_dir_frozen !complete, waiting still on " << *dir << endl;
return;
}
- dout(7) << "hash_dir_finish " << *dir << endl;
+ // send prep messages w/ export directories to open
+ vector<MHashDirPrep*> msgs(mds->get_cluster()->get_num_mds());
+
+ // check for subdirs
+ for (CDir_map_t::iterator it = dir->begin();
+ it != dir->end();
+ it++) {
+ CDentry *dn = it->second;
+ CInode *in = dn->inode;
+
+ if (!in->is_dir()) continue;
+ if (!in->dir) continue;
+
+ int dentryhashcode = mds->get_cluster()->hash_dentry( dir->inode->inode.hash_seed, it->first );
+ if (dentryhashcode == mds->get_nodeid()) continue;
+
+ // msg?
+ if (msgs[dentryhashcode] == 0) {
+ msgs[dentryhashcode] = new MHashDirPrep(dir->ino());
+ }
+ msgs[dentryhashcode]->add_inode(it->first, in->replicate_to(dentryhashcode));
+ }
+
+ // send them!
+ assert(hash_gather[dir].empty());
+ for (unsigned i=0; i<msgs.size(); i++) {
+ if (msgs[i]) {
+ mds->messenger->send_message(msgs[i], MSG_ADDR_MDS(i), MDS_PORT_CACHE, MDS_PORT_CACHE);
+ hash_gather[dir].insert(i);
+ }
+ }
+
+ if (hash_gather[dir].empty()) {
+ // no subdirs! continue!
+ hash_gather.erase(dir);
+ hash_dir_go(dir);
+ } else {
+ // wait!
+ }
+}
+
+/*
+ * wait for peers to open all subdirs
+ */
+void MDCache::handle_hash_dir_prep_ack(MHashDirPrepAck *m)
+{
+ CInode *in = get_inode(m->get_ino());
+ assert(in);
+ CDir *dir = in->dir;
+ assert(dir);
+
+ int from = m->get_source();
+
+ assert(hash_gather[dir].count(from) == 1);
+ hash_gather[dir].erase(from);
+
+ if (hash_gather[dir].empty()) {
+ hash_gather.erase(dir);
+ dout(7) << "handle_hash_dir_prep_ack on " << *dir << ", last one" << endl;
+ hash_dir_go(dir);
+ } else {
+ dout(7) << "handle_hash_dir_prep_ack on " << *dir << ", waiting for " << hash_gather[dir] << endl;
+ }
+
+ delete m;
+}
+
+
+/*
+ * once the dir is frozen,
+ * make sure it's complete
+ * do the hashing!
+ */
+void MDCache::hash_dir_go(CDir *dir)
+{
+ dout(7) << "hash_dir_go " << *dir << endl;
+ assert(!dir->is_hashed());
+ assert(dir->is_auth());
+ assert(dir->is_frozen_dir());
+
// get messages to other nodes ready
- vector<MHashDir*> msgs;
- string path;
- dir->inode->make_path(path);
+ vector<MHashDir*> msgs(mds->get_cluster()->get_num_mds());
+ for (int i=0; i<mds->get_cluster()->get_num_mds(); i++) {
+ if (i == mds->get_nodeid()) continue;
+ msgs[i] = new MHashDir(dir->ino());
+ }
+
+ // pick a hash seed.
+ dir->inode->inode.hash_seed = dir->ino();
+
+ // suck up all waiters
+ C_Contexts *fin = new C_Contexts;
+ list<Context*> waiting;
+ dir->take_waiting(CDIR_WAIT_ANY, waiting); // all dir waiters
+ fin->take(waiting);
+
+ // get containing import. might be me.
+ CDir *containing_import = get_auth_container(dir);
+ assert(containing_import != dir || dir->is_import());
+
+ // divy up contents
+ for (CDir_map_t::iterator it = dir->begin();
+ it != dir->end();
+ it++) {
+ CDentry *dn = it->second;
+ CInode *in = dn->inode;
+
+ int dentryhashcode = mds->get_cluster()->hash_dentry( dir->inode->inode.hash_seed, it->first );
+ if (dentryhashcode == mds->get_nodeid()) {
+ continue; // still mine!
+ }
+
+ bufferlist *bl = msgs[dentryhashcode]->get_state_ptr();
+ assert(bl);
+
+ // -- dentry
+ dout(7) << "hash_dir_go sending to " << dentryhashcode << " dn " << *dn << endl;
+ _encode(it->first, *bl);
+
+ // null dentry?
+ if (dn->is_null()) {
+ bl->append("N", 1); // null dentry
+ assert(dn->is_sync());
+ continue;
+ }
+
+ if (dn->is_remote()) {
+ // remote link
+ bl->append("L", 1); // remote link
+
+ inodeno_t ino = dn->get_remote_ino();
+ bl->append((char*)&ino, sizeof(ino));
+ continue;
+ }
+
+ // primary link
+ // -- inode
+ bl->append("I", 1); // inode dentry
+
+ encode_export_inode(in, *bl, dentryhashcode); // encode, and (update state for) export
+ msgs[dentryhashcode]->inc_nden();
+
+ if (dn->is_dirty())
+ dn->mark_clean();
+
+ // add to proxy
+ hash_proxy_inos[dir].push_back(in);
+ in->state_set(CINODE_STATE_PROXY);
+ in->get(CINODE_PIN_PROXY);
+
+ // fix up subdirs
+ if (in->dir) {
+ if (in->dir->is_auth()) {
+ // mine. make it into an import.
+ dout(7) << "making subdir into import " << *in->dir << endl;
+ in->dir->set_dir_auth( mds->get_nodeid() );
+ imports.insert(in->dir);
+ in->dir->get(CDIR_PIN_IMPORT);
+ in->dir->state_set(CDIR_STATE_IMPORT);
+
+ // fix nested bits
+ for (set<CDir*>::iterator it = nested_exports[containing_import].begin();
+ it != nested_exports[containing_import].end(); ) {
+ CDir *ex = *it;
+ it++;
+ if (get_auth_container(ex) == in->dir) {
+ dout(10) << "moving nested export " << *ex << endl;
+ nested_exports[containing_import].erase(ex);
+ nested_exports[in->dir].insert(ex);
+ }
+ }
+ }
+ else {
+ // not mine.
+ dout(7) << "un-exporting subdir that's being hashed away " << *in->dir << endl;
+ assert(in->dir->is_export());
+ in->dir->put(CDIR_PIN_EXPORT);
+ in->dir->state_clear(CDIR_STATE_EXPORT);
+ exports.erase(in->dir);
+ nested_exports[containing_import].erase(in->dir);
+ if (in->dir->authority() == dentryhashcode)
+ in->dir->set_dir_auth( CDIR_AUTH_PARENT );
+ else
+ in->dir->set_dir_auth( in->dir->authority() );
+ }
+ }
+
+ // waiters
+ list<Context*> waiters;
+ in->take_waiting(CINODE_WAIT_ANY, waiters);
+ fin->take(waiters);
+ }
+
+ // dir state
+ dir->state_set(CDIR_STATE_HASHED);
+ dir->get(CDIR_PIN_HASHED);
+ hashdirs.insert(dir);
+ dir->mark_dirty();
+
+ // inode state
+ if (dir->inode->is_auth()) {
+ dir->inode->mark_dirty();
+ mds->mdlog->submit_entry(new EInodeUpdate(dir->inode));
+ }
+
+ // fix up nested_exports?
+ if (containing_import != dir) {
+ dout(7) << "moving nested exports under hashed dir" << endl;
+ for (set<CDir*>::iterator it = nested_exports[containing_import].begin();
+ it != nested_exports[containing_import].end(); ) {
+ CDir *ex = *it;
+ it++;
+ if (get_auth_container(ex) == dir) {
+ dout(7) << " moving nested export under hashed dir: " << *ex << endl;
+ nested_exports[containing_import].erase(ex);
+ nested_exports[dir].insert(ex);
+ } else {
+ dout(7) << " NOT moving nested export under hashed dir: " << *ex << endl;
+ }
+ }
+ }
+
+ // send hash messages
+ assert(hash_gather[dir].empty());
+ assert(hash_notify_gather[dir].empty());
+ assert(dir->hashed_subset.empty());
for (int i=0; i<mds->get_cluster()->get_num_mds(); i++) {
- msgs.push_back(new MHashDir(path));
+ // all nodes hashed locally..
+ dir->hashed_subset.insert(i);
+
+ if (i == mds->get_nodeid()) continue;
+
+ // init hash_gather and hash_notify_gather sets
+ hash_gather[dir].insert(i);
+
+ assert(hash_notify_gather[dir][i].empty());
+ for (int j=0; j<mds->get_cluster()->get_num_mds(); j++) {
+ if (j == mds->get_nodeid()) continue;
+ if (j == i) continue;
+ hash_notify_gather[dir][i].insert(j);
+ }
+
+ mds->messenger->send_message(msgs[i],
+ MSG_ADDR_MDS(i), MDS_PORT_CACHE, MDS_PORT_CACHE);
}
+
+ // wait for all the acks.
+}
+
+
+void MDCache::handle_hash_dir_ack(MHashDirAck *m)
+{
+ CInode *in = get_inode(m->get_ino());
+ assert(in);
+ CDir *dir = in->dir;
+ assert(dir);
+
+ assert(dir->is_hashed());
+ assert(dir->is_hashing());
+
+ int from = m->get_source();
+ assert(hash_gather[dir].count(from) == 1);
+ hash_gather[dir].erase(from);
+
+ if (hash_gather[dir].empty()) {
+ dout(7) << "handle_hash_dir_ack on " << *dir << ", last one" << endl;
+
+ if (hash_notify_gather[dir].empty()) {
+ dout(7) << "got notifies too, all done" << endl;
+ hash_dir_finish(dir);
+ } else {
+ dout(7) << "waiting on notifies" << endl;
+ }
+
+ } else {
+ dout(7) << "handle_hash_dir_ack on " << *dir << ", waiting for " << hash_gather[dir] << endl;
+ }
+
+ delete m;
+}
+
+
+void MDCache::hash_dir_finish(CDir *dir)
+{
+ dout(7) << "hash_dir_finish finishing " << *dir << endl;
+ assert(dir->is_hashed());
+ assert(dir->is_hashing());
+
+ // dir state
+ hash_gather.erase(dir);
+ dir->state_clear(CDIR_STATE_HASHING);
+ dir->put(CDIR_PIN_HASHING);
+ dir->hashed_subset.clear();
+
+ // unproxy inodes
+ // this _could_ happen sooner, on a per-peer basis, but no harm in waiting a few more seconds.
+ for (list<CInode*>::iterator it = hash_proxy_inos[dir].begin();
+ it != hash_proxy_inos[dir].end();
+ it++) {
+ CInode *in = *it;
+ in->state_clear(CINODE_STATE_PROXY);
+ in->put(CINODE_PIN_PROXY);
+ }
+ hash_proxy_inos.erase(dir);
+
+ // unpin path
+ vector<CDentry*> trace;
+ make_trace(trace, dir->inode);
+ path_unpin(trace, 0);
+
+ // unfreeze
+ dir->unfreeze_dir();
+
+ show_imports();
+ assert(hash_gather.count(dir) == 0);
+
+ // stats
+ //mds->logger->inc("nh", 1);
+
+}
+
+
+
+
+// HASH on auth and non-auth
+
+void MDCache::handle_hash_dir_notify(MHashDirNotify *m)
+{
+ CInode *in = get_inode(m->get_ino());
+ assert(in);
+ CDir *dir = in->dir;
+ assert(dir);
+ assert(dir->is_hashing());
+
+ dout(5) << "handle_hash_dir_notify " << *dir << endl;
+ int from = m->get_from();
+
+ if (dir->is_auth()) {
+ // gather notifies
+ assert(dir->is_hashed());
+
+ assert( hash_notify_gather[dir][from].count(m->get_source()) );
+ hash_notify_gather[dir][from].erase(m->get_source());
+
+ if (hash_notify_gather[dir][from].empty()) {
+ dout(7) << "last notify from " << from << endl;
+ hash_notify_gather[dir].erase(from);
+
+ if (hash_notify_gather[dir].empty()) {
+ dout(7) << "last notify!" << endl;
+ hash_notify_gather.erase(dir);
+
+ if (hash_gather[dir].empty()) {
+ dout(7) << "got acks too, all done" << endl;
+ hash_dir_finish(dir);
+ } else {
+ dout(7) << "still waiting on acks from " << hash_gather[dir] << endl;
+ }
+ } else {
+ dout(7) << "still waiting for notify gathers from " << hash_notify_gather[dir].size() << " others" << endl;
+ }
+ } else {
+ dout(7) << "still waiting for notifies from " << from << " via " << hash_notify_gather[dir][from] << endl;
+ }
+
+ // delete msg
+ delete m;
+ } else {
+ // update dir hashed_subset
+ assert(dir->hashed_subset.count(from) == 0);
+ dir->hashed_subset.insert(from);
+
+ // update open subdirs
+ for (CDir_map_t::iterator it = dir->begin();
+ it != dir->end();
+ it++) {
+ CDentry *dn = it->second;
+ CInode *in = dn->get_inode();
+ if (!in) continue;
+ if (!in->dir) continue;
+
+ int dentryhashcode = mds->get_cluster()->hash_dentry( dir->inode->inode.hash_seed, it->first );
+ if (dentryhashcode != from) continue; // we'll import these in a minute
+
+ if (in->dir->authority() != dentryhashcode)
+ in->dir->set_dir_auth( in->dir->authority() );
+ else
+ in->dir->set_dir_auth( CDIR_AUTH_PARENT );
+ }
+
+ // remove from notify gather set
+ assert(hash_gather[dir].count(from));
+ hash_gather[dir].erase(from);
+
+ // last notify?
+ if (hash_gather[dir].empty()) {
+ dout(7) << "gathered all the notifies, finishing hash of " << *dir << endl;
+ hash_gather.erase(dir);
+
+ dir->state_clear(CDIR_STATE_HASHING);
+ dir->put(CDIR_PIN_HASHING);
+ dir->hashed_subset.clear();
+ } else {
+ dout(7) << "still waiting for notify from " << hash_gather[dir] << endl;
+ }
+
+ // fw notify to auth
+ mds->messenger->send_message(m, MSG_ADDR_MDS(dir->authority()), MDS_PORT_CACHE, MDS_PORT_CACHE);
+ }
+}
+
+
+
+
+// HASH on non-auth
+
+/*
+ * discover step:
+ * each peer needs to open up the directory and pin it before we start
+ */
+class C_MDC_HashDirDiscover : public Context {
+ MDCache *mdc;
+ MHashDirDiscover *m;
+public:
+ vector<CDentry*> trace;
+ C_MDC_HashDirDiscover(MDCache *mdc, MHashDirDiscover *m) {
+ this->mdc = mdc;
+ this->m = m;
+ }
+ void finish(int r) {
+ CInode *in = 0;
+ if (r >= 0) in = trace[trace.size()-1]->get_inode();
+ mdc->handle_hash_dir_discover_2(m, in, r);
+ }
+};
+
+void MDCache::handle_hash_dir_discover(MHashDirDiscover *m)
+{
+ assert(m->get_source() != mds->get_nodeid());
+
+ dout(7) << "handle_hash_dir_discover on " << m->get_path() << endl;
+
+ // must discover it!
+ C_MDC_HashDirDiscover *onfinish = new C_MDC_HashDirDiscover(this, m);
+ filepath fpath(m->get_path());
+ path_traverse(fpath, onfinish->trace, true,
+ m, new C_MDS_RetryMessage(mds,m), // on delay/retry
+ MDS_TRAVERSE_DISCOVER,
+ onfinish); // on completion|error
+}
+
+void MDCache::handle_hash_dir_discover_2(MHashDirDiscover *m, CInode *in, int r)
+{
+ // yay!
+ if (in) {
+ dout(7) << "handle_hash_dir_discover_2 has " << *in << endl;
+ }
+
+ if (r < 0 || !in->is_dir()) {
+ dout(7) << "handle_hash_dir_discover_2 failed to discover or not dir " << m->get_path() << ", NAK" << endl;
+ assert(0); // this shouldn't happen if the auth pins his path properly!!!!
+ }
+ assert(in->is_dir());
+
+ // is dir open?
+ if (!in->dir) {
+ dout(7) << "handle_hash_dir_discover_2 opening dir " << *in << endl;
+ open_remote_dir(in,
+ new C_MDS_RetryMessage(mds, m));
+ return;
+ }
+ CDir *dir = in->dir;
+
+ // pin dir, set hashing flag
+ dir->state_set(CDIR_STATE_HASHING);
+ dir->get(CDIR_PIN_HASHING);
+ assert(dir->hashed_subset.empty());
+
+ // inode state
+ dir->inode->inode.hash_seed = dir->ino();
+ if (dir->inode->is_auth())
+ dir->inode->mark_dirty();
+
+ // get gather set ready for notifies
+ assert(hash_gather[dir].empty());
+ for (int i=0; i<mds->get_cluster()->get_num_mds(); i++) {
+ if (i == mds->get_nodeid()) continue;
+ if (i == dir->authority()) continue;
+ hash_gather[dir].insert(i);
+ }
+
+ // reply
+ dout(7) << " sending hash_dir_discover_ack on " << *dir << endl;
+ mds->messenger->send_message(new MHashDirDiscoverAck(dir->ino()),
+ m->get_source(), MDS_PORT_CACHE, MDS_PORT_CACHE);
+ delete m;
+}
+
+/*
+ * prep step:
+ * peers need to open up all subdirs of the hashed dir
+ */
+
+void MDCache::handle_hash_dir_prep(MHashDirPrep *m)
+{
+ CInode *in = get_inode(m->get_ino());
+ assert(in);
+ CDir *dir = in->dir;
+ assert(dir);
- // divy up contents
- for (CDir_map_t::iterator it = dir->begin(); it != dir->end(); it++) {
- CInode *in = it->second->inode;
-
- int dentryhashcode = mds->get_cluster()->hash_dentry( dir->inode->ino(), it->first );
- if (dentryhashcode == mds->get_nodeid())
- continue; // still mine!
+ dout(7) << "handle_hash_dir_prep " << *dir << endl;
- // giving it away.
- in->version++; // so log entries are ignored, etc.
-
- // mark my children explicitly mine
- if (in->dir_auth == CDIR_AUTH_PARENT)
- in->dir_auth = mds->get_nodeid();
-
- // add dentry and inode to message
- msgs[dentryhashcode]->dir_rope.append( it->first.c_str(), it->first.length()+1 );
- msgs[dentryhashcode]->dir_rope.append( in->encode_export_state() );
-
- // fix up my state
- if (in->is_dirty()) in->mark_clean();
- in->cached_by_clear();
-
- assert(in->auth == true);
- in->set_auth(false);
+ if (!m->did_assim()) {
+ m->mark_assim(); // only do this the first time!
- // there should be no waiters.
- }
+ // assimilate dentry+inodes for exports
+ for (map<string,CInodeDiscover*>::iterator it = m->get_inodes().begin();
+ it != m->get_inodes().end();
+ it++) {
+ CInode *in = get_inode( it->second->get_ino() );
+ if (in) {
+ it->second->update_inode(in);
+ dout(5) << " updated " << *in << endl;
+ } else {
+ in = new CInode(false);
+ it->second->update_inode(in);
+ add_inode(in);
+
+ // link
+ dir->add_dentry( it->first, in );
+ dout(5) << " added " << *in << endl;
+ }
- // send them
- for (int i=0; i<mds->get_cluster()->get_num_mds(); i++) {
- mds->messenger->send_message(msgs[i],
- MSG_ADDR_MDS(i), MDS_PORT_CACHE, MDS_PORT_CACHE);
+ // open!
+ if (!in->dir) {
+ dout(5) << " opening nested export on " << *in << endl;
+ open_remote_dir(in,
+ new C_MDS_RetryMessage(mds, m));
+ }
+ }
}
- // inode state
- dir->inode->inode.isdir = INODE_DIR_HASHED;
- if (dir->inode->is_auth())
- dir->inode->mark_dirty();
+ // verify!
+ int waiting_for = 0;
+ for (map<string,CInodeDiscover*>::iterator it = m->get_inodes().begin();
+ it != m->get_inodes().end();
+ it++) {
+ CInode *in = get_inode( it->second->get_ino() );
+ assert(in);
- // dir state
- dir->state_set(CDIR_STATE_HASHED);
- dir->state_clear(CDIR_STATE_HASHING);
- dir->mark_dirty();
+ if (in->dir) {
+ if (!in->dir->state_test(CDIR_STATE_IMPORTINGEXPORT)) {
+ dout(5) << " pinning nested export " << *in->dir << endl;
+ in->dir->get(CDIR_PIN_IMPORTINGEXPORT);
+ in->dir->state_set(CDIR_STATE_IMPORTINGEXPORT);
+ } else {
+ dout(5) << " already pinned nested export " << *in << endl;
+ }
+ } else {
+ dout(5) << " waiting for nested export dir on " << *in << endl;
+ waiting_for++;
+ }
+ }
- // FIXME: log!
+ if (waiting_for) {
+ dout(5) << "waiting for " << waiting_for << " dirs to open" << endl;
+ return;
+ }
- // unfreeze
- dir->unfreeze_dir();
-*/
+ // ack!
+ mds->messenger->send_message(new MHashDirPrepAck(dir->ino()),
+ m->get_source(), MDS_PORT_CACHE, MDS_PORT_CACHE);
+
+ // done.
+ delete m;
}
/*
-hmm, not going to need to do this for now!
-
-void handle_hash_dir_ack(MHashDirAck *m)
-{
- CInode *in =
-
- // done
- delete m;
-}
-*/
+ * hash step:
+ */
void MDCache::handle_hash_dir(MHashDir *m)
{
- /*
- // traverse to node
- vector<CInode*> trav;
- int r = path_traverse(m->get_path(), trav, m, MDS_TRAVERSE_DISCOVER);
- if (r > 0) return; // fw or delay
-
- CInode *diri = trav[trav.size()-1];
- CDir *dir = diri->get_dir(mds->get_nodeid());
-
- dout(7) << "handle_hash_dir " << *dir << endl;
-
+ CInode *in = get_inode(m->get_ino());
+ assert(in);
+ CDir *dir = in->dir;
+ assert(dir);
assert(!dir->is_auth());
assert(!dir->is_hashed());
+ assert(dir->is_hashing());
- // dir state
- dir->state_set(CDIR_STATE_HASHING);
-
- // assimilate contents
+ dout(5) << "handle_hash_dir " << *dir << endl;
int oldauth = m->get_source();
- const char *p = m->dir_rope.c_str();
- const char *pend = p + m->dir_rope.length();
- while (p < pend) {
- CInode *in = import_dentry_inode(dir, p, oldauth);
- in->mark_dirty(); // pin in cache
- }
+
+ // content
+ import_hashed_content(dir, m->get_state(), m->get_nden(), oldauth);
// dir state
- dir->state_clear(CDIR_STATE_HASHING);
dir->state_set(CDIR_STATE_HASHED);
-
+ dir->get(CDIR_PIN_HASHED);
+ hashdirs.insert(dir);
+ dir->hashed_subset.insert(mds->get_nodeid());
+
// dir is complete
dir->mark_complete();
dir->mark_dirty();
- // inode state
- diri->inode.isdir = INODE_DIR_HASHED;
- if (diri->is_auth())
- diri->mark_dirty();
-
- // FIXME: log
+ // commit
+ mds->mdstore->commit_dir(dir, 0);
+
+ // send notifies
+ dout(7) << "sending notifies" << endl;
+ for (int i=0; i<mds->get_cluster()->get_num_mds(); i++) {
+ if (i == mds->get_nodeid()) continue;
+ if (i == m->get_source()) continue;
+ mds->messenger->send_message(new MHashDirNotify(dir->ino(), mds->get_nodeid()),
+ MSG_ADDR_MDS(i), MDS_PORT_CACHE, MDS_PORT_CACHE);
+ }
+ // ack
+ dout(7) << "acking" << endl;
+ mds->messenger->send_message(new MHashDirAck(dir->ino()),
+ m->get_source(), MDS_PORT_CACHE, MDS_PORT_CACHE);
+
// done.
delete m;
- */
+
+ show_imports();
}
-// UNHASHING
-class C_MDS_UnhashFreeze : public Context {
+// UNHASH on auth
+
+class C_MDC_UnhashFreeze : public Context {
public:
MDS *mds;
CDir *dir;
- C_MDS_UnhashFreeze(MDS *mds, CDir *dir) {
+ C_MDC_UnhashFreeze(MDS *mds, CDir *dir) {
this->mds = mds;
this->dir = dir;
}
virtual void finish(int r) {
- mds->mdcache->unhash_dir_finish(dir);
+ mds->mdcache->unhash_dir_frozen(dir);
}
};
-class C_MDS_UnhashComplete : public Context {
+class C_MDC_UnhashComplete : public Context {
public:
MDS *mds;
CDir *dir;
- C_MDS_UnhashComplete(MDS *mds, CDir *dir) {
+ C_MDC_UnhashComplete(MDS *mds, CDir *dir) {
this->mds = mds;
this->dir = dir;
}
}
};
-/*
+
void MDCache::unhash_dir(CDir *dir)
{
+ dout(7) << "unhash_dir " << *dir << endl;
+
assert(dir->is_hashed());
assert(!dir->is_unhashing());
assert(dir->is_auth());
+ assert(hash_gather.count(dir)==0);
+
+ // pin path?
+ vector<CDentry*> trace;
+ make_trace(trace, dir->inode);
+ if (!path_pin(trace, 0, 0)) {
+ dout(7) << "unhash_dir couldn't pin path, failing." << endl;
+ return;
+ }
+
+ // twiddle state
+ dir->state_set(CDIR_STATE_UNHASHING);
+
+ // first, freeze the dir.
+ dir->freeze_dir(new C_MDC_UnhashFreeze(mds, dir));
+
+ // make complete
+ if (!dir->is_complete()) {
+ dout(7) << "unhash_dir " << *dir << " not complete, fetching" << endl;
+ mds->mdstore->fetch_dir(dir,
+ new C_MDC_UnhashComplete(mds, dir));
+ } else
+ unhash_dir_complete(dir);
+
+}
+
+void MDCache::unhash_dir_frozen(CDir *dir)
+{
+ dout(7) << "unhash_dir_frozen " << *dir << endl;
+
+ assert(dir->is_hashed());
+ assert(dir->is_auth());
+ assert(dir->is_frozen_dir());
- if (dir->is_frozen() ||
- dir->is_freezing()) {
- dout(7) << " can't un_hash, freezing|frozen." << endl;
+ if (!dir->is_complete()) {
+ dout(7) << "unhash_dir_frozen !complete, waiting still on " << *dir << endl;
+ } else
+ unhash_dir_prep(dir);
+}
+
+
+/*
+ * ask peers to freeze and complete hashed dir
+ */
+void MDCache::unhash_dir_prep(CDir *dir)
+{
+ dout(7) << "unhash_dir_prep " << *dir << endl;
+ assert(dir->is_hashed());
+ assert(dir->is_auth());
+ assert(dir->is_frozen_dir());
+ assert(dir->is_complete());
+
+ if (!hash_gather[dir].empty()) return; // already been here..freeze must have been instantaneous
+
+ // send unhash prep to all peers
+ assert(hash_gather[dir].empty());
+ for (int i=0; i<mds->get_cluster()->get_num_mds(); i++) {
+ if (i == mds->get_nodeid()) continue;
+ hash_gather[dir].insert(i);
+ mds->messenger->send_message(new MUnhashDirPrep(dir->ino()),
+ MSG_ADDR_MDS(i), MDS_PORT_CACHE, MDS_PORT_CACHE);
+ }
+}
+
+/*
+ * wait for peers to freeze and complete hashed dirs
+ */
+void MDCache::handle_unhash_dir_prep_ack(MUnhashDirPrepAck *m)
+{
+ CInode *in = get_inode(m->get_ino());
+ assert(in);
+ CDir *dir = in->dir;
+ assert(dir);
+
+ int from = m->get_source();
+ dout(7) << "handle_unhash_dir_prep_ack from " << from << " " << *dir << endl;
+
+ if (!m->did_assim()) {
+ m->mark_assim(); // only do this the first time!
+
+ // assimilate dentry+inodes for exports
+ for (map<string,CInodeDiscover*>::iterator it = m->get_inodes().begin();
+ it != m->get_inodes().end();
+ it++) {
+ CInode *in = get_inode( it->second->get_ino() );
+ if (in) {
+ it->second->update_inode(in);
+ dout(5) << " updated " << *in << endl;
+ } else {
+ in = new CInode(false);
+ it->second->update_inode(in);
+ add_inode(in);
+
+ // link
+ dir->add_dentry( it->first, in );
+ dout(5) << " added " << *in << endl;
+ }
+
+ // open!
+ if (!in->dir) {
+ dout(5) << " opening nested export on " << *in << endl;
+ open_remote_dir(in,
+ new C_MDS_RetryMessage(mds, m));
+ }
+ }
+ }
+
+ // verify!
+ int waiting_for = 0;
+ for (map<string,CInodeDiscover*>::iterator it = m->get_inodes().begin();
+ it != m->get_inodes().end();
+ it++) {
+ CInode *in = get_inode( it->second->get_ino() );
+ assert(in);
+
+ if (in->dir) {
+ if (!in->dir->state_test(CDIR_STATE_IMPORTINGEXPORT)) {
+ dout(5) << " pinning nested export " << *in->dir << endl;
+ in->dir->get(CDIR_PIN_IMPORTINGEXPORT);
+ in->dir->state_set(CDIR_STATE_IMPORTINGEXPORT);
+ } else {
+ dout(5) << " already pinned nested export " << *in << endl;
+ }
+ } else {
+ dout(5) << " waiting for nested export dir on " << *in << endl;
+ waiting_for++;
+ }
+ }
+
+ if (waiting_for) {
+ dout(5) << "waiting for " << waiting_for << " dirs to open" << endl;
return;
+ }
+
+ // ok, done with this PrepAck
+ assert(hash_gather[dir].count(from) == 1);
+ hash_gather[dir].erase(from);
+
+ if (hash_gather[dir].empty()) {
+ hash_gather.erase(dir);
+ dout(7) << "handle_unhash_dir_prep_ack on " << *dir << ", last one" << endl;
+ unhash_dir_go(dir);
+ } else {
+ dout(7) << "handle_unhash_dir_prep_ack on " << *dir << ", waiting for " << hash_gather[dir] << endl;
}
- dout(7) << "unhash_dir " << *dir << endl;
+ delete m;
+}
- // fix state
- dir->state_set(CDIR_STATE_UNHASHING);
- // freeze
- dir->freeze_dir(new C_MDS_UnhashFreeze(mds, dir));
+/*
+ * auth:
+ * send out MHashDir's to peers
+ */
+void MDCache::unhash_dir_go(CDir *dir)
+{
+ dout(7) << "unhash_dir_go " << *dir << endl;
+ assert(dir->is_hashed());
+ assert(dir->is_auth());
+ assert(dir->is_frozen_dir());
+ assert(dir->is_complete());
- // request unhash from other nodes
- string path;
- dir->inode->make_path(path);
+ // send unhash prep to all peers
+ assert(hash_gather[dir].empty());
for (int i=0; i<mds->get_cluster()->get_num_mds(); i++) {
if (i == mds->get_nodeid()) continue;
- mds->messenger->send_message(new MUnhashDir(path),
+ hash_gather[dir].insert(i);
+ mds->messenger->send_message(new MUnhashDir(dir->ino()),
MSG_ADDR_MDS(i), MDS_PORT_CACHE, MDS_PORT_CACHE);
- unhash_waiting.insert(pair<CDir*,int>(dir,i));
}
+}
+
+/*
+ * auth:
+ * assimilate unhashing content
+ */
+void MDCache::handle_unhash_dir_ack(MUnhashDirAck *m)
+{
+ CInode *in = get_inode(m->get_ino());
+ assert(in);
+ CDir *dir = in->dir;
+ assert(dir);
- // make complete
- if (!dir->is_complete()) {
- dout(7) << "hash_dir " << *dir << " not complete, fetching" << endl;
- mds->mdstore->fetch_dir(dir->inode,
- new C_MDS_UnhashComplete(mds, dir));
- } else
- unhash_dir_complete(dir);
+ dout(7) << "handle_unhash_dir_ack " << *dir << endl;
+ assert(dir->is_hashed());
+
+ // assimilate content
+ int from = m->get_source();
+ import_hashed_content(dir, m->get_state(), m->get_nden(), from);
+ delete m;
- // drop any sync or lock if sticky
- if (g_conf.mds_cache_sticky_sync_normal ||
- g_conf.mds_cache_sticky_sync_softasync)
- drop_sync_in_dir(dir);
+ // done?
+ assert(hash_gather[dir].count(from));
+ hash_gather[dir].erase(from);
+
+ if (!hash_gather[dir].empty()) {
+ dout(7) << "still waiting for unhash acks from " << hash_gather[dir] << endl;
+ return;
+ }
+
+ // done!
+
+ // fix up nested_exports
+ CDir *containing_import = get_auth_container(dir);
+ if (containing_import != dir) {
+ for (set<CDir*>::iterator it = nested_exports[dir].begin();
+ it != nested_exports[dir].end();
+ it++) {
+ dout(7) << "moving nested export out from under hashed dir : " << **it << endl;
+ nested_exports[containing_import].insert(*it);
+ }
+ nested_exports.erase(dir);
+ }
+
+ // dir state
+ //dir->state_clear(CDIR_STATE_UNHASHING); //later
+ dir->state_clear(CDIR_STATE_HASHED);
+ dir->put(CDIR_PIN_HASHED);
+ hashdirs.erase(dir);
+
+ // commit!
+ assert(dir->is_complete());
+ //dir->mark_complete();
+ dir->mark_dirty();
+ mds->mdstore->commit_dir(dir, 0);
+
+ // inode state
+ dir->inode->inode.hash_seed = 0;
+ if (dir->inode->is_auth()) {
+ dir->inode->mark_dirty();
+ mds->mdlog->submit_entry(new EInodeUpdate(dir->inode));
+ }
+
+ // notify
+ assert(hash_gather[dir].empty());
+ for (int i=0; i<mds->get_cluster()->get_num_mds(); i++) {
+ if (i == mds->get_nodeid()) continue;
+
+ hash_gather[dir].insert(i);
+
+ mds->messenger->send_message(new MUnhashDirNotify(dir->ino()),
+ MSG_ADDR_MDS(i), MDS_PORT_CACHE, MDS_PORT_CACHE);
+ }
}
-
-void MDCache::unhash_dir_complete(CDir *dir)
+
+/*
+ * sent by peer to flush mds links. unfreeze when all gathered.
+ */
+void MDCache::handle_unhash_dir_notify_ack(MUnhashDirNotifyAck *m)
{
- // mark all my inodes dirty (to avoid a race)
- for (CDir_map_t::iterator it = dir->begin(); it != dir->end(); it++) {
- CInode *in = it->second->inode;
- int dentryhashcode = mds->get_cluster()->hash_dentry( dir->inode->ino(), it->first );
- if (dentryhashcode == mds->get_nodeid())
- in->mark_dirty();
- }
+ CInode *in = get_inode(m->get_ino());
+ assert(in);
+ CDir *dir = in->dir;
+ assert(dir);
- unhash_dir_finish(dir);
+ dout(7) << "handle_unhash_dir_ack " << *dir << endl;
+ assert(!dir->is_hashed());
+ assert(dir->is_unhashing());
+ assert(dir->is_frozen_dir());
+
+ // done?
+ int from = m->get_source();
+ assert(hash_gather[dir].count(from));
+ hash_gather[dir].erase(from);
+ delete m;
+
+ if (!hash_gather[dir].empty()) {
+ dout(7) << "still waiting for notifyack from " << hash_gather[dir] << " on " << *dir << endl;
+ } else {
+ unhash_dir_finish(dir);
+ }
}
+/*
+ * all mds links are flushed. unfreeze dir!
+ */
void MDCache::unhash_dir_finish(CDir *dir)
{
- if (!dir->is_frozen_dir()) {
- dout(7) << "unhash_dir_finish still waiting for freeze on " << *dir->inode << endl;
- return;
- }
- if (!dir->is_complete()) {
- dout(7) << "unhash_dir_finish still waiting for complete on " << *dir->inode << endl;
- return;
- }
- if (unhash_waiting.count(dir) > 0) {
- dout(7) << "unhash_dir_finish still waiting for all acks on " << *dir->inode << endl;
- return;
- }
-
dout(7) << "unhash_dir_finish " << *dir << endl;
-
- // dir state
- dir->state_clear(CDIR_STATE_HASHED);
+ hash_gather.erase(dir);
+
+ // unpin path
+ vector<CDentry*> trace;
+ make_trace(trace, dir->inode);
+ path_unpin(trace, 0);
+
+ // state
dir->state_clear(CDIR_STATE_UNHASHING);
- dir->mark_dirty();
- dir->mark_complete();
-
- // inode state
- dir->inode->inode.hash_seed = 0;
- dir->inode->mark_dirty();
- // unfreeze!
+ // unfreeze
dir->unfreeze_dir();
+
}
-*/
-void MDCache::handle_unhash_dir_ack(MUnhashDirAck *m)
-{
- /*
- CInode *diri = get_inode(m->get_ino());
- assert(diri && diri->dir);
- assert(diri->dir->is_auth());
- assert(diri->dir->is_hashed());
- assert(diri->dir->is_unhashing());
- dout(7) << "handle_unhash_dir_ack " << *diri->dir << endl;
+
+// UNHASH on all
+
+/*
+ * hashed dir is complete.
+ * mark all migrating inodes dirty (to pin in cache)
+ * if frozen too, then go to next step (depending on auth)
+ */
+void MDCache::unhash_dir_complete(CDir *dir)
+{
+ dout(7) << "unhash_dir_complete " << *dir << ", dirtying inodes" << endl;
- // assimilate contents
- int oldauth = m->get_source();
- const char *p = m->dir_rope.c_str();
- const char *pend = p + m->dir_rope.length();
- while (p < pend) {
- CInode *in = import_dentry_inode(diri->dir, p, oldauth);
- in->mark_dirty(); // pin in cache
+ assert(dir->is_hashed());
+ assert(dir->is_complete());
+
+ // mark dirty to pin in cache
+ for (CDir_map_t::iterator it = dir->begin();
+ it != dir->end();
+ it++) {
+ CInode *in = it->second->inode;
+ if (in->is_auth())
+ in->mark_dirty();
}
-
- // remove from waiting list
- multimap<CDir*,int>::iterator it = unhash_waiting.find(diri->dir);
- while (it->second != oldauth) {
- it++;
- assert(it->first == diri->dir);
+
+ if (!dir->is_frozen_dir()) {
+ dout(7) << "dir complete but !frozen, waiting " << *dir << endl;
+ } else {
+ if (dir->is_auth())
+ unhash_dir_prep(dir); // auth
+ else
+ unhash_dir_prep_finish(dir); // nonauth
}
- unhash_waiting.erase(it);
-
- unhash_dir_finish(diri->dir); // try to finish
-
- // done.
- delete m;
- */
}
-// unhash on non-auth
-
-class C_MDS_HandleUnhashFreeze : public Context {
-public:
- MDS *mds;
- CDir *dir;
- int auth;
- C_MDS_HandleUnhashFreeze(MDS *mds, CDir *dir, int auth) {
- this->mds = mds;
- this->dir = dir;
- this->auth = auth;
- }
- virtual void finish(int r) {
- mds->mdcache->handle_unhash_dir_finish(dir, auth);
- }
-};
+// UNHASH on non-auth
-class C_MDS_HandleUnhashComplete : public Context {
+class C_MDC_UnhashPrepFreeze : public Context {
public:
MDS *mds;
CDir *dir;
- int auth;
- C_MDS_HandleUnhashComplete(MDS *mds, CDir *dir, int auth) {
+ C_MDC_UnhashPrepFreeze(MDS *mds, CDir *dir) {
this->mds = mds;
this->dir = dir;
- this->auth = auth;
}
virtual void finish(int r) {
- mds->mdcache->handle_unhash_dir_complete(dir, auth);
+ mds->mdcache->unhash_dir_prep_frozen(dir);
}
};
/*
-void MDCache::handle_unhash_dir(MUnhashDir *m)
+ * peers need to freeze their dir and make them complete
+ */
+void MDCache::handle_unhash_dir_prep(MUnhashDirPrep *m)
{
- // traverse to node
- vector<CInode*> trav;
- int r = path_traverse(m->get_path(), trav, m, MDS_TRAVERSE_DISCOVER);
- if (r > 0) return; // fw or delay
-
- CInode *diri = trav[trav.size()-1];
- if (!diri->dir) diri->dir = new CDir(diri, mds->get_nodeid());
- CDir *dir = diri->dir;
-
- dout(7) << "handle_unhash_dir " << *diri->dir << endl;
+ CInode *in = get_inode(m->get_ino());
+ assert(in);
+ CDir *dir = in->dir;
+ assert(dir);
+ dout(7) << "handle_unhash_dir_prep " << *dir << endl;
assert(dir->is_hashed());
-
- int auth = m->get_source();
-
- // fix state
- dir->state_set(CDIR_STATE_UNHASHING);
// freeze
- dir->freeze_dir(new C_MDS_HandleUnhashFreeze(mds, dir, auth));
+ dir->freeze_dir(new C_MDC_UnhashPrepFreeze(mds, dir));
// make complete
if (!dir->is_complete()) {
- dout(7) << "handle_unhash_dir " << *dir << " not complete, fetching" << endl;
- mds->mdstore->fetch_dir(dir->inode,
- new C_MDS_HandleUnhashComplete(mds, dir, auth));
- } else
- handle_unhash_dir_complete(dir, auth);
-
- // drop any sync or lock if sticky
- if (g_conf.mds_cache_sticky_sync_normal ||
- g_conf.mds_cache_sticky_sync_softasync)
- drop_sync_in_dir(dir);
-
- // done with message
+ dout(7) << "unhash_dir " << *dir << " not complete, fetching" << endl;
+ mds->mdstore->fetch_dir(dir,
+ new C_MDC_UnhashComplete(mds, dir));
+ } else {
+ unhash_dir_complete(dir);
+ }
+
delete m;
}
-*/
-void MDCache::handle_unhash_dir_complete(CDir *dir, int auth)
+/*
+ * peer has hashed dir frozen.
+ * complete too?
+ */
+void MDCache::unhash_dir_prep_frozen(CDir *dir)
{
- // mark all my inodes dirty (to avoid a race)
- for (CDir_map_t::iterator it = dir->begin(); it != dir->end(); it++) {
- CInode *in = it->second->inode;
- int dentryhashcode = mds->get_cluster()->hash_dentry( dir->inode->ino(), it->first );
- if (dentryhashcode == mds->get_nodeid())
- in->mark_dirty();
- }
+ dout(7) << "unhash_dir_prep_frozen " << *dir << endl;
+
+ assert(dir->is_hashed());
+ assert(dir->is_frozen_dir());
+ assert(!dir->is_auth());
- handle_unhash_dir_finish(dir, auth);
+ if (!dir->is_complete()) {
+ dout(7) << "unhash_dir_prep_frozen !complete, waiting still on " << *dir << endl;
+ } else
+ unhash_dir_prep_finish(dir);
}
-void MDCache::handle_unhash_dir_finish(CDir *dir, int auth)
-{
/*
- assert(dir->is_unhashing());
+ * peer has hashed dir complete and frozen. ack.
+ */
+void MDCache::unhash_dir_prep_finish(CDir *dir)
+{
+ dout(7) << "unhash_dir_prep_finish " << *dir << endl;
assert(dir->is_hashed());
+ assert(!dir->is_auth());
+ assert(dir->is_frozen());
+ assert(dir->is_complete());
+
+ // twiddle state
+ if (dir->is_unhashing())
+ return; // already replied.
+ dir->state_set(CDIR_STATE_UNHASHING);
- if (!dir->is_complete()) {
- dout(7) << "still waiting for complete on " << *dir->inode << endl;
- return;
- }
- if (!dir->is_frozen_dir()) {
- dout(7) << "still waiting for frozen_dir on " << *dir->inode << endl;
- return;
+ // send subdirs back to auth
+ MUnhashDirPrepAck *ack = new MUnhashDirPrepAck(dir->ino());
+ int auth = dir->authority();
+
+ for (CDir_map_t::iterator it = dir->begin();
+ it != dir->end();
+ it++) {
+ CDentry *dn = it->second;
+ CInode *in = dn->inode;
+
+ if (!in->is_dir()) continue;
+ if (!in->dir) continue;
+
+ int dentryhashcode = mds->get_cluster()->hash_dentry( dir->inode->inode.hash_seed, it->first );
+ if (dentryhashcode != mds->get_nodeid()) continue;
+
+ // msg?
+ ack->add_inode(it->first, in->replicate_to(auth));
}
+
+ // ack
+ mds->messenger->send_message(ack,
+ MSG_ADDR_MDS(auth), MDS_PORT_CACHE, MDS_PORT_CACHE);
+}
- assert(dir->is_frozen_dir());
- assert(dir->is_complete());
- dout(7) << "handle_unhash_dir_finish " << *dir->inode << endl;
- // okay, we are complete and frozen.
+
+/*
+ * peer needs to send hashed dir content back to auth.
+ * unhash dir.
+ */
+void MDCache::handle_unhash_dir(MUnhashDir *m)
+{
+ CInode *in = get_inode(m->get_ino());
+ assert(in);
+ CDir *dir = in->dir;
+ assert(dir);
- // get message to auth ready
- MUnhashDirAck *msg = new MUnhashDirAck(dir->inode->ino());
+ dout(7) << "handle_unhash_dir " << *dir << endl;
+ assert(dir->is_hashed());
+ assert(dir->is_unhashing());
+ assert(!dir->is_auth());
- // include contents
- for (CDir_map_t::iterator it = dir->begin(); it != dir->end(); it++) {
- CInode *in = it->second->inode;
-
- int dentryhashcode = mds->get_cluster()->hash_dentry( dir->inode->ino(), it->first );
-
- if (dentryhashcode != mds->get_nodeid())
- continue; // not mine
+ // get message ready
+ bufferlist bl;
+ int nden = 0;
+
+ // suck up all waiters
+ C_Contexts *fin = new C_Contexts;
+ list<Context*> waiting;
+ dir->take_waiting(CDIR_WAIT_ANY, waiting); // all dir waiters
+ fin->take(waiting);
+
+ // divy up contents
+ for (CDir_map_t::iterator it = dir->begin();
+ it != dir->end();
+ it++) {
+ CDentry *dn = it->second;
+ CInode *in = dn->inode;
- // give it away.
- in->version++; // so log entries are ignored, etc.
+ int dentryhashcode = mds->get_cluster()->hash_dentry( dir->inode->inode.hash_seed, it->first );
+ if (dentryhashcode != mds->get_nodeid()) {
+ // not mine!
+ // twiddle dir_auth?
+ if (in->dir) {
+ if (in->dir->authority() != dir->authority())
+ in->dir->set_dir_auth( in->dir->authority() );
+ else
+ in->dir->set_dir_auth( CDIR_AUTH_PARENT );
+ }
+ continue;
+ }
- // add dentry and inode to message
- msg->dir_rope.append( it->first.c_str(), it->first.length()+1 );
- msg->dir_rope.append( in->encode_export_state() );
+ // -- dentry
+ dout(7) << "unhash_dir_go sending to " << dentryhashcode << " dn " << *dn << endl;
+ _encode(it->first, bl);
- if (in->dir_auth == auth)
- in->dir_auth = CDIR_AUTH_PARENT;
+ // null dentry?
+ if (dn->is_null()) {
+ bl.append("N", 1); // null dentry
+ assert(dn->is_sync());
+ continue;
+ }
+
+ if (dn->is_remote()) {
+ // remote link
+ bl.append("L", 1); // remote link
+
+ inodeno_t ino = dn->get_remote_ino();
+ bl.append((char*)&ino, sizeof(ino));
+ continue;
+ }
- // fix up my state
- if (in->is_dirty()) in->mark_clean();
- in->cached_by_clear();
+ // primary link
+ // -- inode
+ bl.append("I", 1); // inode dentry
- assert(in->auth == true);
- in->set_auth(false);
+ encode_export_inode(in, bl, dentryhashcode); // encode, and (update state for) export
+ nden++;
+
+ if (dn->is_dirty())
+ dn->mark_clean();
+
+ // proxy
+ in->state_set(CINODE_STATE_PROXY);
+ in->get(CINODE_PIN_PROXY);
+ hash_proxy_inos[dir].push_back(in);
- // there should be no waiters.
+ if (in->dir) {
+ if (in->dir->is_auth()) {
+ // mine. make it into an import.
+ dout(7) << "making subdir into import " << *in->dir << endl;
+ in->dir->set_dir_auth( mds->get_nodeid() );
+ imports.insert(in->dir);
+ in->dir->get(CDIR_PIN_IMPORT);
+ in->dir->state_set(CDIR_STATE_IMPORT);
+ }
+ else {
+ // not mine.
+ dout(7) << "un-exporting subdir that's being unhashed away " << *in->dir << endl;
+ assert(in->dir->is_export());
+ in->dir->put(CDIR_PIN_EXPORT);
+ in->dir->state_clear(CDIR_STATE_EXPORT);
+ exports.erase(in->dir);
+ nested_exports[dir].erase(in->dir);
+ }
+ }
+
+ // waiters
+ list<Context*> waiters;
+ in->take_waiting(CINODE_WAIT_ANY, waiters);
+ fin->take(waiters);
}
- // send back to auth
- mds->messenger->send_message(msg,
- MSG_ADDR_MDS(auth), MDS_PORT_CACHE, MDS_PORT_CACHE);
+ // we should have no nested exports; we're not auth for the dir!
+ assert(nested_exports[dir].empty());
+ nested_exports.erase(dir);
+
+ // dir state
+ //dir->state_clear(CDIR_STATE_UNHASHING); // later
+ dir->state_clear(CDIR_STATE_HASHED);
+ dir->put(CDIR_PIN_HASHED);
+ hashdirs.erase(dir);
+ dir->mark_clean();
// inode state
- dir->inode->inode.isdir = INODE_DIR_NORMAL;
+ dir->inode->inode.hash_seed = 0;
if (dir->inode->is_auth())
dir->inode->mark_dirty();
- // dir state
- dir->state_clear(CDIR_STATE_HASHED);
- dir->state_clear(CDIR_STATE_UNHASHING);
- dir->mark_clean(); // it's not mine.
+ // init gather set
+ hash_gather[dir] = mds->get_cluster()->get_mds_set();
+ hash_gather[dir].erase(mds->get_nodeid());
- // FIXME log
+ // send unhash message
+ mds->messenger->send_message(new MUnhashDirAck(dir->ino(), bl, nden),
+ MSG_ADDR_MDS(dir->authority()), MDS_PORT_CACHE, MDS_PORT_CACHE);
+}
+
+
+/*
+ * first notify comes from auth.
+ * send notifies to all other peers, with peer = self
+ * if we get notify from peer=other, remove from our gather list.
+ * when we've gotten notifies from everyone,
+ * unpin proxies,
+ * send notify_ack to auth.
+ * this ensures that all mds links are flushed of cache_expire type messages.
+ */
+void MDCache::handle_unhash_dir_notify(MUnhashDirNotify *m)
+{
+ CInode *in = get_inode(m->get_ino());
+ assert(in);
+ CDir *dir = in->dir;
+ assert(dir);
+
+ dout(7) << "handle_unhash_dir_finish " << *dir << endl;
+ assert(!dir->is_hashed());
+ assert(dir->is_unhashing());
+ assert(!dir->is_auth());
+ int from = m->get_source();
+ assert(hash_gather[dir].count(from) == 1);
+ hash_gather[dir].erase(from);
+ delete m;
+
+ // did we send our shout out?
+ if (from == dir->authority()) {
+ // send notify to everyone else in weird chatter storm
+ for (int i=0; i<mds->get_cluster()->get_num_mds(); i++) {
+ if (i == from) continue;
+ if (i == mds->get_nodeid()) continue;
+ mds->messenger->send_message(new MUnhashDirNotify(dir->ino()),
+ MSG_ADDR_MDS(i), MDS_PORT_CACHE, MDS_PORT_CACHE);
+ }
+ }
+
+ // are we done?
+ if (!hash_gather[dir].empty()) {
+ dout(7) << "still waiting for notify from " << hash_gather[dir] << endl;
+ return;
+ }
+ hash_gather.erase(dir);
+
+ // all done!
+ dout(7) << "all mds links flushed, unpinning unhash proxies" << endl;
+
+ // unpin proxies
+ for (list<CInode*>::iterator it = hash_proxy_inos[dir].begin();
+ it != hash_proxy_inos[dir].end();
+ it++) {
+ CInode *in = *it;
+ in->state_clear(CINODE_STATE_PROXY);
+ in->put(CINODE_PIN_PROXY);
+ }
+
// unfreeze
dir->unfreeze_dir();
-*/
+
+ // ack
+ dout(7) << "sending notify_ack to auth for unhash of " << *dir << endl;
+ mds->messenger->send_message(new MUnhashDirNotifyAck(dir->ino()),
+ MSG_ADDR_MDS(dir->authority()), MDS_PORT_CACHE, MDS_PORT_CACHE);
+
}
+
+
+
+
+
+
+// ==============================================================
// debug crap
void MDCache::show_imports()
{
- //mds->balancer->show_imports(true);
+ mds->balancer->show_imports();
}
void MDCache::show_cache()
{
+ dout(7) << "show_cache" << endl;
for (inode_map_t::iterator it = inode_map.begin();
it != inode_map.end();
it++) {
class MCacheExpire;
class MDirUpdate;
class MDentryUnlink;
-class MInodeWriterClosed;
class MLock;
class MRenameWarning;
class MRenameReq;
class MRenameAck;
-class C_MDS_ExportFinish;
-
class MClientRequest;
+class MHashDirDiscover;
+class MHashDirDiscoverAck;
+class MHashDirPrep;
+class MHashDirPrepAck;
class MHashDir;
class MHashDirAck;
+class MHashDirNotify;
+
+class MUnhashDirPrep;
+class MUnhashDirPrepAck;
class MUnhashDir;
class MUnhashDirAck;
+class MUnhashDirNotify;
+class MUnhashDirNotifyAck;
// MDCache
// root
list<Context*> waiting_for_root;
- // imports and exports
+ // imports, exports, and hashes.
set<CDir*> imports; // includes root (on mds0)
set<CDir*> exports;
- map<CDir*,set<CDir*> > nested_exports;
+ set<CDir*> hashdirs;
+ map<CDir*,set<CDir*> > nested_exports; // exports nested under imports _or_ hashdirs
// export fun
map<CDir*, set<int> > export_notify_ack_waiting; // nodes i am waiting to get export_notify_ack's from
}
protected:
- CDir *get_containing_import(CDir *in);
- CDir *get_containing_export(CDir *in);
+ CDir *get_auth_container(CDir *in);
+ void find_nested_exports(CDir *dir, set<CDir*>& s);
+ void find_nested_exports_under(CDir *import, CDir *dir, set<CDir*>& s);
// adding/removing
void handle_inode_link_ack(class MInodeLinkAck *m);
// == messages ==
+ public:
int proc_message(Message *m);
+ protected:
// -- replicas --
void handle_discover(MDiscover *dis);
void handle_discover_reply(MDiscoverReply *m);
- void handle_inode_writer_closed(MInodeWriterClosed *m);
// -- namespace --
// these handle logging, cache sync themselves.
+ // UNLINK
+ public:
void dentry_unlink(CDentry *in, Context *c);
+ protected:
void dentry_unlink_finish(CDentry *in, CDir *dir, Context *c);
void handle_dentry_unlink(MDentryUnlink *m);
void handle_inode_unlink(class MInodeUnlink *m);
void handle_inode_unlink_ack(class MInodeUnlinkAck *m);
+ friend class C_MDC_DentryUnlink;
+ // RENAME
// initiator
+ public:
void file_rename(CDentry *srcdn, CDentry *destdn, Context *c);
+ protected:
void handle_rename_ack(MRenameAck *m); // dest -> init (almost always)
void file_rename_finish(CDir *srcdir, CInode *in, Context *c);
+ friend class C_MDC_RenameAck;
// src
void handle_rename_req(MRenameReq *m); // dest -> src
void file_rename_warn(CInode *in, set<int>& notify);
void handle_rename_notify_ack(MRenameNotifyAck *m); // bystanders -> src
void file_rename_ack(CInode *in, int initiator);
+ friend class C_MDC_RenameNotifyAck;
// dest
void handle_rename_prep(MRenamePrep *m); // init -> dest
// -- file i/o --
+ public:
__uint64_t issue_file_data_version(CInode *in);
Capability* issue_file_caps(CInode *in, int mode, MClientRequest *req);
void eval_file_caps(CInode *in);
+ protected:
void handle_client_file_caps(class MClientFileCaps *m);
// -- import/export --
- bool is_import(CDir *dir) {
- assert(dir->is_import() == imports.count(dir));
- return dir->is_import();
- }
- bool is_export(CDir *dir) {
- return exports.count(dir);
- }
- void find_nested_exports(CDir *dir, set<CDir*>& s);
- void find_nested_exports_under(CDir *import, CDir *dir, set<CDir*>& s);
-
// exporter
+ public:
void export_dir(CDir *dir,
int mds);
- void export_dir_dropsync(CDir *dir);
+ protected:
+ map< CDir*, set<int> > export_gather;
void handle_export_dir_discover_ack(MExportDirDiscoverAck *m);
void export_dir_frozen(CDir *dir, int dest);
void handle_export_dir_prep_ack(MExportDirPrepAck *m);
void export_dir_go(CDir *dir,
int dest);
int export_dir_walk(MExportDir *req,
- class C_MDS_ExportFinish *fin,
+ class C_Contexts *fin,
CDir *basedir,
CDir *dir,
int newauth);
void encode_export_inode(CInode *in, bufferlist& enc_state, int newauth);
+ friend class C_MDC_ExportFreeze;
// importer
void handle_export_dir_discover(MExportDirDiscover *m);
void decode_import_inode(CDentry *dn, bufferlist& bl, int &off, int oldauth);
+ friend class C_MDC_ExportDirDiscover;
+
// bystander
void handle_export_dir_warning(MExportDirWarning *m);
void handle_export_dir_notify(MExportDirNotify *m);
// -- hashed directories --
+
+ // HASH
+ public:
+ void hash_dir(CDir *dir); // on auth
+ protected:
+ map< CDir*, set<int> > hash_gather;
+ map< CDir*, map< int, set<int> > > hash_notify_gather;
+ map< CDir*, list<CInode*> > hash_proxy_inos;
+
// hash on auth
- void hash_dir(CDir *dir);
+ void handle_hash_dir_discover_ack(MHashDirDiscoverAck *m);
void hash_dir_complete(CDir *dir);
+ void hash_dir_frozen(CDir *dir);
+ void handle_hash_dir_prep_ack(MHashDirPrepAck *m);
+ void hash_dir_go(CDir *dir);
+ void handle_hash_dir_ack(MHashDirAck *m);
void hash_dir_finish(CDir *dir);
+ friend class C_MDC_HashFreeze;
+ friend class C_MDC_HashComplete;
+
+ // auth and non-auth
+ void handle_hash_dir_notify(MHashDirNotify *m);
// hash on non-auth
+ void handle_hash_dir_discover(MHashDirDiscover *m);
+ void handle_hash_dir_discover_2(MHashDirDiscover *m, CInode *in, int r);
+ void handle_hash_dir_prep(MHashDirPrep *m);
void handle_hash_dir(MHashDir *m);
+ friend class C_MDC_HashDirDiscover;
+
+ // UNHASH
+ public:
+ void unhash_dir(CDir *dir); // on auth
+ protected:
+ map< CDir*, list<MUnhashDirAck*> > unhash_content;
+ void import_hashed_content(CDir *dir, bufferlist& bl, int nden, int oldauth);
// unhash on auth
- void unhash_dir(CDir *dir);
- void unhash_dir_complete(CDir *dir);
+ void unhash_dir_frozen(CDir *dir);
+ void unhash_dir_prep(CDir *dir);
+ void handle_unhash_dir_prep_ack(MUnhashDirPrepAck *m);
+ void unhash_dir_go(CDir *dir);
void handle_unhash_dir_ack(MUnhashDirAck *m);
+ void handle_unhash_dir_notify_ack(MUnhashDirNotifyAck *m);
void unhash_dir_finish(CDir *dir);
+ friend class C_MDC_UnhashFreeze;
+ friend class C_MDC_UnhashComplete;
+
+ // unhash on all
+ void unhash_dir_complete(CDir *dir);
// unhash on non-auth
+ void handle_unhash_dir_prep(MUnhashDirPrep *m);
+ void unhash_dir_prep_frozen(CDir *dir);
+ void unhash_dir_prep_finish(CDir *dir);
void handle_unhash_dir(MUnhashDir *m);
- void handle_unhash_dir_complete(CDir *dir, int auth);
- void handle_unhash_dir_finish(CDir *dir, int auth);
-
- void drop_sync_in_dir(CDir *dir);
+ void handle_unhash_dir_notify(MUnhashDirNotify *m);
+ friend class C_MDC_UnhashPrepFreeze;
// -- updates --
//int send_inode_updates(CInode *in);
// -- locks --
// high level interface
+ public:
bool inode_hard_read_try(CInode *in, Context *con);
bool inode_hard_read_start(CInode *in, MClientRequest *m);
void inode_hard_read_finish(CInode *in);
bool inode_soft_write_start(CInode *in, MClientRequest *m);
void inode_soft_write_finish(CInode *in);
+ void inode_hard_eval(CInode *in);
+ void inode_soft_eval(CInode *in);
+
+ protected:
void inode_hard_mode(CInode *in, int mode);
void inode_soft_mode(CInode *in, int mode);
void inode_soft_lock(CInode *in);
void inode_soft_async(CInode *in);
- void inode_hard_eval(CInode *in);
- void inode_soft_eval(CInode *in);
-
// messengers
void handle_lock(MLock *m);
void handle_lock_inode_hard(MLock *m);
void handle_lock_dir(MLock *m);
// dentry locks
+ public:
bool dentry_xlock_start(CDentry *dn,
Message *m, CInode *ref);
void dentry_xlock_finish(CDentry *dn, bool quiet=false);
// == crap fns ==
+ public:
CInode* hack_get_file(string& fn);
vector<CInode*> hack_add_file(string& fn, CInode* in);
void show_imports();
void show_cache();
- void dump_to_disk(MDS *m) {
- if (root) root->dump_to_disk(m);
- }
};
#include <iostream>
using namespace std;
+#include <ext/hash_map>
+using namespace __gnu_cxx;
+
#include <sys/types.h>
#include <unistd.h>
*/
int MDCluster::hash_dentry( inodeno_t dirino, const string& dn )
{
+ static hash<const char*> H;
unsigned r = dirino;
- for (unsigned i=0; i<dn.length(); i++)
- r += (dn[r] ^ i);
-
+ if (1) {
+ r += H(dn.c_str());
+ } else {
+ for (unsigned i=0; i<dn.length(); i++)
+ r += (dn[i] ^ (r+i));
+ }
+
r %= num_mds;
- dout(12) << "hash_dentry(" << dirino << ", " << dn << ") -> " << r;
+ dout(22) << "hash_dentry(" << dirino << ", " << dn << ") -> " << r << endl;
return r;
}
#include "messages/MClientReply.h"
#include "messages/MLock.h"
-#include "messages/MInodeWriterClosed.h"
#include "messages/MInodeLink.h"
switch (m->get_dest_port()) {
+ /*
case MDS_PORT_STORE:
mdstore->proc_message(m);
break;
+ */
case MDS_PORT_ANCHORMGR:
anchormgr->proc_message(m);
*/
- // flush log
+ // flush log to disk after every op. for now.
mdlog->flush();
// trim cache
num_bal_times--;
}
+ static map<int,int> didhash;
+ if (0 && elapsed.sec() > 15 && !didhash[whoami]) {
+ CInode *in = mdcache->get_inode(100000010);
+ if (in && in->dir) {
+ if (in->dir->is_auth())
+ mdcache->hash_dir(in->dir);
+ didhash[whoami] = 1;
+ }
+ }
+ if (0 && elapsed.sec() > 25 && didhash[whoami] == 1) {
+ CInode *in = mdcache->get_inode(100000010);
+ if (in && in->dir) {
+ if (in->dir->is_auth())
+ mdcache->unhash_dir(in->dir);
+ didhash[whoami] = 2;
+ }
+ }
+
+
}
// hack
- if (whoami == 0) {
+ if (false && whoami == 0) {
static bool didit = false;
// 7 to 1
messenger->send_message(new MClientReply(req, r),
MSG_ADDR_CLIENT(req->get_client()), 0,
MDS_PORT_SERVER);
+
+ // <HACK>
+ if (refpath.last_bit() == ".hash" &&
+ refpath.depth() > 1) {
+ dout(1) << "got explicit hash command " << refpath << endl;
+ CDir *dir = trace[trace.size()-1]->get_inode()->dir;
+ if (!dir->is_hashed() &&
+ !dir->is_hashing() &&
+ dir->is_auth())
+ mdcache->hash_dir(dir);
+ }
+ // </HACK>
+
+
delete req;
return;
}
+bool MDS::try_open_dir(CInode *in, MClientRequest *req)
+{
+ if (!in->dir && in->is_frozen_dir()) {
+ // doh!
+ dout(10) << " dir inode is frozen, can't open dir, waiting " << *in << endl;
+ assert(in->get_parent_dir());
+ in->get_parent_dir()->add_waiter(CDIR_WAIT_UNFREEZE,
+ new C_MDS_RetryRequest(this, req, in));
+ return false;
+ }
+
+ in->get_or_open_dir(this);
+ return true;
+}
// DIRECTORY and NAMESPACE OPS
+void MDS::encode_dir_contents(CDir *dir, list<c_inode_info*>& items, int& numfiles)
+{
+ for (CDir_map_t::iterator it = dir->begin();
+ it != dir->end();
+ it++) {
+ CDentry *dn = it->second;
+
+ // hashed?
+ if (dir->is_hashed() &&
+ whoami != get_cluster()->hash_dentry( dir->inode->inode.hash_seed, it->first ))
+ continue;
+
+ // is dentry readable?
+ if (dn->is_xlocked()) {
+ // ***** FIXME *****
+ dout(10) << "warning, returning xlocked dentry, we are technically WRONG" << endl;
+ }
+
+ CInode *in = dn->inode;
+ if (!in) continue; // null dentry?
+
+ dout(12) << "including inode " << *in << endl;
+
+ // add this item
+ // note: c_inode_info makes note of whether inode data is readable.
+ items.push_back( new c_inode_info(in, whoami, it->first) );
+ numfiles++;
+ }
+}
+
+
void MDS::handle_client_readdir(MClientRequest *req,
CInode *cur)
{
return;
}
- cur->get_or_open_dir(this);
- assert(cur->dir->is_auth());
-
- // frozen?
- if (cur->dir->is_frozen()) {
- // doh!
- dout(10) << " dir is frozen, waiting" << endl;
- cur->dir->add_waiter(CDIR_WAIT_UNFREEZE,
- new C_MDS_RetryRequest(this, req, cur));
+ if (!try_open_dir(cur, req))
return;
- }
+ assert(cur->dir->is_auth());
// check perm
if (!mdcache->inode_hard_read_start(cur,req))
return;
}
- // yay, reply
- MClientReply *reply = new MClientReply(req);
-
// build dir contents
- CDir_map_t::iterator it;
+ list<c_inode_info*> items;
int numfiles = 0;
- for (it = cur->dir->begin(); it != cur->dir->end(); it++) {
- CDentry *dn = it->second;
-
- // is dentry readable?
- if (dn->is_xlocked()) {
- // ***** FIXME *****
- dout(10) << "warning, returning xlocked dentry, we are technically WRONG" << endl;
- }
-
- CInode *in = dn->inode;
- if (!in) continue; // null dentry?
-
- dout(12) << "including inode " << *in << endl;
-
- // add this item
- // note: c_inode_info makes note of whether inode data is readable.
- reply->add_dir_item(new c_inode_info(in, whoami, it->first));
- numfiles++;
- }
+ encode_dir_contents(cur->dir, items, numfiles);
+ // yay, reply
+ MClientReply *reply = new MClientReply(req);
+ reply->take_dir_items(items);
+
dout(10) << "reply to " << *req << " readdir " << numfiles << " files" << endl;
reply->set_result(0);
return 0;
}
- CDir *dir = diri->get_or_open_dir(this);
+ if (!try_open_dir(diri, req)) return 0;
+ CDir *dir = diri->dir;
// make sure it's my dentry
int dnauth = dir->dentry_authority(name);
// ok, done passing buck.
+ // frozen?
+ if (dir->is_frozen()) {
+ dout(7) << "dir is frozen " << *dir << endl;
+ dir->add_waiter(CDIR_WAIT_UNFREEZE,
+ new C_MDS_RetryRequest(this, req, diri));
+ return 0;
+ }
+
// make sure name doesn't already exist
CDentry *dn = dir->lookup(name);
if (dn) {
return;
}
- CDir *dir = ref->get_or_open_dir(this);
+ if (!try_open_dir(ref, req)) return;
+ CDir *dir = ref->dir;
dout(7) << "handle_client_link dir is " << *dir << endl;
// make sure it's my dentry
return;
}
- CDir *dir = diri->get_or_open_dir(this);
+ if (!try_open_dir(diri, req)) return;
+ CDir *dir = diri->dir;
int dnauth = dir->dentry_authority(name);
// does it exist?
// rmdir
// open dir?
- if (in->is_auth() && !in->dir) in->get_or_open_dir(this);
+ if (in->is_auth() && !in->dir) {
+ if (!try_open_dir(in, req)) return;
+ }
// not dir auth? (or not open, which implies the same!)
if (!in->dir) {
return;
}
- CDir *srcdir = srcdiri->get_or_open_dir(this);
+ if (!try_open_dir(srcdiri, req)) return;
+ CDir *srcdir = srcdiri->dir;
dout(7) << "handle_client_rename srcdir is " << *srcdir << endl;
// make sure it's my dentry
if (trace.size() == destpath.depth()) {
if (d->is_dir()) {
// mv /some/thing /to/some/dir
- destdir = d->get_or_open_dir(this); // /to/some/dir
+ if (!try_open_dir(d, req)) return;
+ destdir = d->dir; // /to/some/dir
destname = req->get_filepath().last_bit(); // thing
destpath.add_dentry(destname);
} else {
else if (trace.size() == destpath.depth()-1) {
if (d->is_dir()) {
// mv /some/thing /to/some/place_that_maybe_dne (we might be replica)
- destdir = d->get_or_open_dir(this); // /to/some
- destname = destpath.last_bit(); // place_that_MAYBE_dne
+ if (!try_open_dir(d, req)) return;
+ destdir = d->dir; // /to/some
+ destname = destpath.last_bit(); // place_that_MAYBE_dne
} else {
dout(7) << "dest dne" << endl;
reply_request(req, -EINVAL);
newi->inode.mode |= INODE_MODE_DIR;
// init dir to be empty
+ assert(!newi->is_frozen_dir()); // bc mknod worked
CDir *newdir = newi->get_or_open_dir(this);
newdir->mark_complete();
newdir->mark_dirty();
// update soft metadata
if (cap->issued() & CAP_FILE_WR) {
- assert(cur->softlock.can_write(true)); // otherwise we're toast???
+
+ // FIXME THIS IS BROKEN
+ //assert(cur->softlock.can_write(true)); // otherwise we're toast???
+
if (!mdcache->inode_soft_write_start(cur, req))
return; // wait
LogEvent *event,
LogEvent *event2 = 0);
+ bool try_open_dir(CInode *in, MClientRequest *req);
+
// special message types
void handle_ping(class MPing *m);
CInode *ref);
// namespace
+ void encode_dir_contents(CDir *dir, list<class c_inode_info*>& items, int& numfiles);
void handle_client_readdir(MClientRequest *req, CInode *ref);
void handle_client_mknod(MClientRequest *req, CInode *ref);
void handle_client_link(MClientRequest *req, CInode *ref);
#define dout(l) if (l<=g_conf.debug) cout << "mds" << mds->get_nodeid() << ".store "
-
-
-
-void MDStore::proc_message(Message *m)
-{
- switch (m->get_type()) {
-
- default:
- dout(7) << "store unknown message " << m->get_type() << endl;
- assert(0);
- }
+/*
+ * separate hashed dir slices into "regions"
+ */
+size_t get_hash_offset(int hashcode) {
+ if (hashcode < 0)
+ return 0; // not hashed
+ else
+ return (size_t)(1<<30) * (size_t)hashcode;
}
-// == fetch_dir
+
+// ==========================================================================
+// FETCH
-class MDFetchDirContext : public Context {
+class C_MDS_Fetch : public Context {
protected:
MDStore *ms;
inodeno_t ino;
public:
- MDFetchDirContext(MDStore *ms, inodeno_t ino) : Context() {
+ C_MDS_Fetch(MDStore *ms, inodeno_t ino) : Context() {
this->ms = ms;
this->ino = ino;
}
}
};
-
+/** fetch_dir(dir, context)
+ * public call to fetch a dir.
+ */
void MDStore::fetch_dir( CDir *dir,
Context *c )
{
dout(7) << "fetch_dir " << *dir << " context is " << c << endl;
- if (c)
- dir->add_waiter(CDIR_WAIT_COMPLETE, c);
-
- assert(dir->is_auth());
+ assert(dir->is_auth() ||
+ dir->is_hashed());
+ // wait
+ if (c) dir->add_waiter(CDIR_WAIT_COMPLETE, c);
+
// already fetching?
if (dir->state_test(CDIR_STATE_FETCHING)) {
dout(7) << "already fetching " << *dir << "; waiting" << endl;
return;
}
+ // state
dir->state_set(CDIR_STATE_FETCHING);
-
+
// stats
mds->logger->inc("fdir");
-
- // create return context
- MDFetchDirContext *fin = new MDFetchDirContext( this, dir->ino() );
+ // create return context
+ Context *fin = new C_MDS_Fetch( this, dir->ino() );
if (dir->is_hashed())
- do_fetch_dir( dir, fin, mds->get_nodeid()); // hashed
+ fetch_dir_hash( dir, fin, mds->get_nodeid()); // hashed
else
- do_fetch_dir( dir, fin ); // normal
+ fetch_dir_hash( dir, fin ); // normal
}
+/*
+ * called by low level fn when it's fetched.
+ * fix up dir state.
+ */
void MDStore::fetch_dir_2( int result,
inodeno_t ino)
{
CInode *idir = mds->mdcache->get_inode(ino);
- if (result < 0)
- dout(7) << "fetch_dir_2 failed on " << ino << endl;
- if (!idir) return;
+ if (!idir || result < 0) return; // hmm! nevermind i guess.
assert(idir);
- assert(idir->dir);
+ CDir *dir = idir->dir;
+ assert(dir);
// dir is now complete
- if (result >= 0)
- idir->dir->state_set(CDIR_STATE_COMPLETE);
- idir->dir->state_clear(CDIR_STATE_FETCHING);
+ dir->state_set(CDIR_STATE_COMPLETE);
+ dir->state_clear(CDIR_STATE_FETCHING);
// finish
list<Context*> finished;
- idir->dir->take_waiting(CDIR_WAIT_COMPLETE|CDIR_WAIT_DENTRY, finished);
+ dir->take_waiting(CDIR_WAIT_COMPLETE|CDIR_WAIT_DENTRY, finished);
+ finish_contexts(finished, result);
+}
+
+
+/** low level methods **/
+
+class C_MDS_FetchHash : public Context {
+protected:
+ MDS *mds;
+ inodeno_t ino;
+ int hashcode;
+ Context *context;
+
+public:
+ bufferlist bl;
+ bufferlist bl2;
+
+ C_MDS_FetchHash(MDS *mds, inodeno_t ino, Context *c, int hashcode) : Context() {
+ this->mds = mds;
+ this->ino = ino;
+ this->hashcode = hashcode;
+ this->context = c;
+ }
+
+ void finish(int result) {
+ assert(result>0);
+
+ // combine bufferlists bl + bl2 -> bl
+ bl.claim_append(bl2);
+
+ // did i get the whole thing?
+ size_t size;
+ bl.copy(0, sizeof(size_t), (char*)&size);
+ size_t got = bl.length() - sizeof(size);
+ size_t left = size - got;
+ size_t from = bl.length();
+
+ // what part of dir are we getting?
+ from += get_hash_offset(hashcode);
+
+ if (got >= size) {
+ // done.
+ mds->mdstore->fetch_dir_hash_2( bl, ino, context, hashcode );
+ }
+ else {
+ // read the rest!
+ dout(12) << "fetch_dir_hash_2 dir size is " << size << ", got " << got << ", reading remaniing " << left << " from off " << from << endl;
+
+ // create return context
+ C_MDS_FetchHash *fin = new C_MDS_FetchHash( mds, ino, context, hashcode );
+ fin->bl.claim( bl );
+ mds->filer->read(ino,
+ g_OSD_MDDirLayout,
+ left, from,
+ &fin->bl2,
+ fin );
+ return;
+ }
+ }
+};
+
+/** fetch_dir_hash
+ * low level method.
+ * fetch part of a dir. either the whole thing if hashcode is -1, or a specific
+ * hash segment.
+ */
+void MDStore::fetch_dir_hash( CDir *dir,
+ Context *c,
+ int hashcode)
+{
+ dout(11) << "fetch_dir_hash hashcode " << hashcode << " " << *dir << endl;
- finish_contexts(finished);
+ // create return context
+ C_MDS_FetchHash *fin = new C_MDS_FetchHash( mds, dir->ino(), c, hashcode );
- // trim cache?
- mds->mdcache->trim();
+ // grab first stripe bit (which had better be more than 16 bytes!)
+ assert(g_OSD_MDDirLayout.stripe_size >= 16);
+ mds->filer->read(dir->ino(),
+ g_OSD_MDDirLayout,
+ g_OSD_MDDirLayout.stripe_size, get_hash_offset(hashcode),
+ &fin->bl,
+ fin );
}
+void MDStore::fetch_dir_hash_2( bufferlist& bl,
+ inodeno_t ino,
+ Context *c,
+ int hashcode)
+{
+ CInode *idir = mds->mdcache->get_inode(ino);
+ if (!idir) {
+ dout(7) << "fetch_dir_hash_2 on ino " << ino << " but no longer in our cache!" << endl;
+ c->finish(-1);
+ delete c;
+ return;
+ }
+
+ if (!idir->dir_is_auth() ||
+ !idir->dir) {
+ dout(7) << "fetch_dir_hash_2 on " << *idir << ", but i'm not auth, or dir not open" << endl;
+ c->finish(-1);
+ delete c;
+ return;
+ }
+
+ // make sure we have a CDir
+ CDir *dir = idir->get_or_open_dir(mds);
+
+ // do it
+ dout(7) << "fetch_dir_hash_2 hashcode " << hashcode << " dir " << *dir << endl;
+
+ // parse buffer contents into cache
+ dout(15) << "bl is " << bl << endl;
+ size_t size;
+ bl.copy(0, sizeof(size), (char*)&size);
+ assert(bl.length() >= size + sizeof(size));
+
+ int n;
+ bl.copy(sizeof(size), sizeof(n), (char*)&n);
+
+ char *buffer = bl.c_str(); // contiguous ptr to whole buffer(list)
+ size_t buflen = bl.length();
+ size_t p = sizeof(size_t);
+
+ __uint32_t num = *(__uint32_t*)(buffer + p);
+ p += sizeof(num);
+
+ dout(10) << " " << num << " items in " << size << " bytes" << endl;
+
+ unsigned parsed = 0;
+ while (parsed < num) {
+ assert(p < buflen && num > 0);
+ parsed++;
+
+ dout(24) << " " << parsed << "/" << num << " pos " << p-8 << endl;
+
+ // dentry
+ string dname = buffer+p;
+ p += dname.length() + 1;
+ dout(24) << "parse filename '" << dname << "'" << endl;
+
+ CDentry *dn = dir->lookup(dname); // existing dentry?
+
+ if (*(buffer+p) == 'L') {
+ // hard link, we don't do that yet.
+ p++;
+
+ inodeno_t ino = *(inodeno_t*)(buffer+p);
+ p += sizeof(ino);
+
+ // what to do?
+ if (hashcode >= 0) {
+ int dentryhashcode = mds->get_cluster()->hash_dentry( dir->ino(), dname );
+ assert(dentryhashcode == hashcode);
+ }
+
+ if (dn) {
+ if (dn->get_inode() == 0) {
+ // negative dentry?
+ dout(12) << "readdir had NEG dentry " << dname << endl;
+ } else {
+ // had dentry
+ dout(12) << "readdir had dentry " << dname << endl;
+ }
+ continue;
+ }
+ // (remote) link
+ CDentry *dn = dir->add_dentry( dname, ino );
+ // link to inode?
+ CInode *in = mds->mdcache->get_inode(ino); // we may or may not have it.
+ if (in) {
+ dn->link_remote(in);
+ dout(12) << "readdir got remote link " << ino << " which we have " << *in << endl;
+ } else {
+ dout(12) << "readdir got remote link " << ino << " (dont' have it)" << endl;
+ }
+ }
+ else if (*(buffer+p) == 'I') {
+ // inode
+ p++;
+
+ // parse out inode
+ inode_t *inode = (inode_t*)(buffer+p);
+ p += sizeof(inode_t);
+
+ string symlink;
+ if ((inode->mode & INODE_TYPE_MASK) == INODE_MODE_SYMLINK) {
+ symlink = (char*)(buffer+p);
+ p += symlink.length() + 1;
+ }
+
+ // what to do?
+ if (hashcode >= 0) {
+ int dentryhashcode = mds->get_cluster()->hash_dentry( dir->ino(), dname );
+ assert(dentryhashcode == hashcode);
+ }
+
+ if (dn) {
+ if (dn->get_inode() == 0) {
+ // negative dentry?
+ dout(12) << "readdir had NEG dentry " << dname << endl;
+ } else {
+ // had dentry
+ dout(12) << "readdir had dentry " << dname << endl;
+ }
+ continue;
+ }
+
+ // add inode
+ CInode *in = 0;
+ if (mds->mdcache->have_inode(inode->ino)) {
+ in = mds->mdcache->get_inode(inode->ino);
+ dout(12) << "readdir got (but i already had) " << *in << " mode " << in->inode.mode << " mtime " << in->inode.mtime << endl;
+ } else {
+ // inode
+ in = new CInode();
+ memcpy(&in->inode, inode, sizeof(inode_t));
+
+ // symlink?
+ if (in->is_symlink()) {
+ in->symlink = symlink;
+ }
+
+ // add
+ mds->mdcache->add_inode( in );
+ }
+ // link
+ dir->add_dentry( dname, in );
+ dout(12) << "readdir got " << *in << " mode " << in->inode.mode << " mtime " << in->inode.mtime << endl;
+ }
+ else {
+ dout(1) << "corrupt directory, i got tag char '" << *(buffer+p) << "' val " << (int)(*(buffer+p)) << " at pos " << p << endl;
+ assert(0);
+ }
+ }
+ dout(15) << "parsed " << parsed << endl;
+
+ if (c) {
+ c->finish(0);
+ delete c;
+ }
+}
-// ----------------------------
-// commit_dir
+// ==================================================================
+// COMMIT
class C_MDS_CommitDirVerify : public Context {
public:
inodeno_t ino;
__uint64_t version;
Context *c;
+
C_MDS_CommitDirVerify( MDS *mds,
inodeno_t ino,
__uint64_t version,
this->version = version;
this->ino = ino;
}
+
virtual void finish(int r) {
if (r >= 0) {
};
-
-
void MDStore::commit_dir( CDir *dir,
Context *c )
{
assert(dir->is_dirty());
-
- if (!dir->is_dirty()) {
- dout(7) << "commit_dir not dirty " << *dir << endl;
- if (c) {
- c->finish(0);
- delete c;
- }
- }
-
+
// commit thru current version
commit_dir(dir, dir->get_version(), c);
}
if (dir->is_hashed()) {
// hashed
- do_commit_dir( dir, fin, mds->get_nodeid() );
+ commit_dir_slice( dir, fin, mds->get_nodeid() );
} else {
// non-hashed
- do_commit_dir( dir, fin );
+ commit_dir_slice( dir, fin );
}
}
// low-level committer (hashed or normal)
-class MDDoCommitDirContext : public Context {
+class C_MDS_CommitSlice : public Context {
protected:
MDStore *ms;
CDir *dir;
public:
bufferlist bl;
- MDDoCommitDirContext(MDStore *ms, CDir *dir, Context *c, int w) : Context() {
+ C_MDS_CommitSlice(MDStore *ms, CDir *dir, Context *c, int w) : Context() {
this->ms = ms;
this->dir = dir;
this->c = c;
}
void finish(int result) {
- ms->do_commit_dir_2( result, dir, c, version, hashcode );
+ ms->commit_dir_slice_2( result, dir, c, version, hashcode );
}
};
-void MDStore::do_commit_dir( CDir *dir,
- Context *c,
- int hashcode)
+void MDStore::commit_dir_slice( CDir *dir,
+ Context *c,
+ int hashcode)
{
- assert(dir->is_auth());
-
- dout(11) << "do_commit_dir hashcode " << hashcode << " " << *dir << " version " << dir->get_version() << endl;
+ if (hashcode >= 0) {
+ assert(dir->is_hashed());
+ dout(11) << "commit_dir_slice hashcode " << hashcode << " " << *dir << " version " << dir->get_version() << endl;
+ } else {
+ assert(dir->is_auth());
+ dout(11) << "commit_dir_slice (whole dir) " << *dir << " version " << dir->get_version() << endl;
+ }
// get continuation ready
- MDDoCommitDirContext *fin = new MDDoCommitDirContext(this, dir, c, hashcode);
+ C_MDS_CommitSlice *fin = new C_MDS_CommitSlice(this, dir, c, hashcode);
// fill buffer
__uint32_t num = 0;
}
-void MDStore::do_commit_dir_2( int result,
- CDir *dir,
- Context *c,
- __uint64_t committed_version,
- int hashcode )
+void MDStore::commit_dir_slice_2( int result,
+ CDir *dir,
+ Context *c,
+ __uint64_t committed_version,
+ int hashcode )
{
- dout(11) << "do_commit_dir_2 hashcode " << hashcode << " " << *dir << endl;
+ dout(11) << "commit_dir_slice_2 hashcode " << hashcode << " " << *dir << endl;
// mark inodes and dentries clean too (if we committed them!)
list<CDentry*> null_clean;
-class MDDoFetchDirContext : public Context {
- protected:
- MDS *mds;
- inodeno_t ino;
- int hashcode;
- Context *context;
- public:
- bufferlist bl;
- bufferlist bl2;
- MDDoFetchDirContext(MDS *mds, inodeno_t ino, Context *c, int which) : Context() {
- this->mds = mds;
- this->ino = ino;
- this->hashcode = which;
- this->context = c;
- }
-
- void finish(int result) {
- assert(result>0);
-
- // combine bufferlists bl + bl2 -> bl
- bl.claim_append(bl2);
-
- // did i get the whole thing?
- size_t size;
- bl.copy(0, sizeof(size_t), (char*)&size);
- size_t got = bl.length() - sizeof(size);
- size_t left = size - got;
- size_t from = bl.length();
-
- if (got >= size) {
- // done.
- mds->mdstore->do_fetch_dir_2( bl, ino, context, hashcode );
- }
- else {
- // read the rest!
- cout << "do_fetch_dir_2 dir size is " << size << ", got " << got << ", reading remaniing " << left << " from off " << from << endl;
-
- // create return context
- MDDoFetchDirContext *fin = new MDDoFetchDirContext( mds, ino, context, hashcode );
- fin->bl.claim( bl );
- mds->filer->read(ino,
- g_OSD_MDDirLayout,
- left, from,
- &fin->bl2,
- fin );
- return;
- }
- }
-};
-
-
-void MDStore::do_fetch_dir( CDir *dir,
- Context *c,
- int hashcode)
-{
-
- dout(11) << "fetch_hashed_dir hashcode " << hashcode << " " << *dir << " context is " << c << endl;
-
- // create return context
- MDDoFetchDirContext *fin = new MDDoFetchDirContext( mds, dir->ino(), c, hashcode );
-
- // read first bit
- mds->filer->read(dir->ino(),
- g_OSD_MDDirLayout,
- //FILE_OBJECT_SIZE, 0, // get first object's bit
- //16, 0, // just get front bit
- g_OSD_MDDirLayout.stripe_size, 0, // grab first stripe bit (better be more than 16 bytes!)
- &fin->bl,
- fin );
-}
-
-void MDStore::do_fetch_dir_2( bufferlist& bl,
- inodeno_t ino,
- Context *c,
- int hashcode)
-{
- CInode *idir = mds->mdcache->get_inode(ino);
- if (!idir) {
- dout(7) << "do_fetch_dir_2 on ino " << ino << " but no longer in our cache!" << endl;
- c->finish(-1);
- delete c;
- return;
- }
-
- if (!idir->dir_is_auth()) {
- dout(7) << "do_fetch_dir_2 on " << *idir << ", but i'm not auth" << endl;
- c->finish(-1);
- delete c;
- return;
- }
-
- // make sure we have a CDir
- CDir *dir = idir->get_or_open_dir(mds);
-
- // do it
- dout(7) << "do_fetch_dir_2 hashcode " << hashcode << " dir " << *dir << endl;
-
- // parse buffer contents into cache
- dout(15) << "bl is " << bl << endl;
- size_t size;
- bl.copy(0, sizeof(size), (char*)&size);
- assert(bl.length() >= size + sizeof(size));
-
- int n;
- bl.copy(sizeof(size), sizeof(n), (char*)&n);
-
- char *buffer = bl.c_str(); // contiguous ptr to whole buffer(list)
- size_t buflen = bl.length();
- size_t p = sizeof(size_t);
-
- __uint32_t num = *(__uint32_t*)(buffer + p);
- p += sizeof(num);
-
- dout(10) << " " << num << " items in " << size << " bytes" << endl;
-
- unsigned parsed = 0;
- while (parsed < num) {
- assert(p < buflen && num > 0);
- parsed++;
-
- dout(24) << " " << parsed << "/" << num << " pos " << p-8 << endl;
-
- // dentry
- string dname = buffer+p;
- p += dname.length() + 1;
- dout(24) << "parse filename '" << dname << "'" << endl;
-
- CDentry *dn = dir->lookup(dname); // existing dentry?
-
-
- if (*(buffer+p) == 'L') {
- // hard link, we don't do that yet.
- p++;
-
- inodeno_t ino = *(inodeno_t*)(buffer+p);
- p += sizeof(ino);
-
- // what to do?
- if (hashcode >= 0) {
- int dentryhashcode = mds->get_cluster()->hash_dentry( dir->ino(), dname );
- assert(dentryhashcode == hashcode);
- }
-
- if (dn) {
- if (dn->get_inode() == 0) {
- // negative dentry?
- dout(12) << "readdir had NEG dentry " << dname << endl;
- } else {
- // had dentry
- dout(12) << "readdir had dentry " << dname << endl;
- }
- continue;
- }
-
- // (remote) link
- CDentry *dn = dir->add_dentry( dname, ino );
-
- // link to inode?
- CInode *in = mds->mdcache->get_inode(ino); // we may or may not have it.
- if (in) {
- dn->link_remote(in);
- dout(12) << "readdir got remote link " << ino << " which we have " << *in << endl;
- } else {
- dout(12) << "readdir got remote link " << ino << " (dont' have it)" << endl;
- }
- }
- else if (*(buffer+p) == 'I') {
- // inode
- p++;
-
- // parse out inode
- inode_t *inode = (inode_t*)(buffer+p);
- p += sizeof(inode_t);
-
- string symlink;
- if ((inode->mode & INODE_TYPE_MASK) == INODE_MODE_SYMLINK) {
- symlink = (char*)(buffer+p);
- p += symlink.length() + 1;
- }
-
- // what to do?
- if (hashcode >= 0) {
- int dentryhashcode = mds->get_cluster()->hash_dentry( dir->ino(), dname );
- assert(dentryhashcode == hashcode);
- }
-
- if (dn) {
- if (dn->get_inode() == 0) {
- // negative dentry?
- dout(12) << "readdir had NEG dentry " << dname << endl;
- } else {
- // had dentry
- dout(12) << "readdir had dentry " << dname << endl;
- }
- continue;
- }
-
- // add inode
- CInode *in = 0;
- if (mds->mdcache->have_inode(inode->ino)) {
- in = mds->mdcache->get_inode(inode->ino);
- dout(12) << "readdir got (but i already had) " << *in << " mode " << in->inode.mode << " mtime " << in->inode.mtime << endl;
- } else {
- // inode
- in = new CInode();
- memcpy(&in->inode, inode, sizeof(inode_t));
-
- // symlink?
- if (in->is_symlink()) {
- in->symlink = symlink;
- }
-
- // add
- mds->mdcache->add_inode( in );
- }
-
- // link
- dir->add_dentry( dname, in );
- dout(12) << "readdir got " << *in << " mode " << in->inode.mode << " mtime " << in->inode.mtime << endl;
- }
- else {
- dout(1) << "corrupt directory, i got tag char '" << *(buffer+p) << "' val " << (int)(*(buffer+p)) << " at pos " << p << endl;
- assert(0);
- }
- }
- dout(15) << "parsed " << parsed << endl;
-
- if (c) {
- c->finish(0);
- delete c;
- }
-}
-
-
-
-
-// hashing
-/*
-class C_MDS_HashDir : public Context {
-public:
- MDS *mds;
- CInode *in;
- Context *c;
- set<int> *left;
- int hashcode;
-
- C_MDS_HashDir(MDS *mds, CInode *in, Context *c, set<int> *left, int hashcode) {
- this->mds = mds;
- this->in = in;
- this->c = c;
- this->left = left;
- this->hashcode = hashcode;
- }
-
- virtual void finish(int r) {
- mds->mdstore->hash_dir_2( in, c, left, hashcode );
- }
-
-};
-
-
-
-void MDStore::hash_dir( CInode *in,
- Context *c )
-{
- assert(in->dir->state_test(CDIR_STATE_HASHING));
-
- // do them all!
- dout(5) << "hash_dir writing all segments of dir " << *in << endl;
- int nummds = mds->get_cluster()->get_num_mds();
- set<int> *left = new set<int>;
-
- for (int i=0; i<nummds; i++)
- left->insert(i);
-
- for (int i=0; i<nummds; i++) {
- Context *fin = new C_MDS_HashDir(mds, in, c, left, i);
- do_commit_dir(in, fin, i);
- }
-}
-
-void MDStore::hash_dir_2( CInode *in,
- Context *c,
- set<int> *left,
- int hashcode)
-{
- assert(in->dir->state_test(CDIR_STATE_HASHING));
-
- left->erase(hashcode);
- dout(5) << "hash_dir_2 wrote segment " << hashcode << " of dir " << *in << endl;
-
- if (!left->empty()) return;
-
- // all items written!
- dout(5) << "hash_dir_2 wrote all segments of dir " << *in << endl;
- delete left;
-
- // done
- if (c) {
- c->finish(0);
- delete c;
- }
-}
-
-
-void MDStore::unhash_dir( CInode *in,
- Context *c )
-{
- assert(in->dir->state_test(CDIR_STATE_UNHASHING));
-
- // do them all!
- dout(5) << "unhash_dir reading all other segments of dir " << *in << endl;
- int nummds = mds->get_cluster()->get_num_mds();
- set<int> *left = new set<int>;
-
- for (int i=0; i<nummds; i++)
- if (i != mds->get_nodeid()) left->insert(i);
-
- for (int i=0; i<nummds; i++) {
- Context *fin = new C_MDS_HashDir(mds, in, c, left, i);
- do_commit_dir(in, fin, i);
- }
-
-}
-
-void MDStore::unhash_dir_2( CInode *in,
- Context *c,
- set<int> *left,
- int hashcode)
-{
-*/
#define __MDSTORE_H
#include "include/types.h"
-using namespace std;
-
-#include <ext/rope>
-using namespace __gnu_cxx;
-
#include "include/bufferlist.h"
class MDS;
class CDir;
class Context;
-class Message;
class MDStore {
protected:
MDS *mds;
-
+
public:
MDStore(MDS *m) {
mds = m;
}
- // basic interface (normal or unhashed)
- void fetch_dir( CDir *dir,
- Context *c );
- void fetch_dir_2( int result,
- inodeno_t ino );
+ // fetch
+ public:
+ void fetch_dir( CDir *dir, Context *c );
+ protected:
+ void fetch_dir_2( int result, inodeno_t ino );
+
+ void fetch_dir_hash( CDir *dir,
+ Context *c,
+ int hashcode = -1);
+ void fetch_dir_hash_2( bufferlist &bl,
+ inodeno_t ino,
+ Context *c,
+ int which);
+ friend class C_MDS_Fetch;
+ friend class C_MDS_FetchHash;
+
+ // commit
+ public:
void commit_dir( CDir *dir, Context *c ); // commit current dir version to disk.
void commit_dir( CDir *dir, __uint64_t version, Context *c ); // commit specified version to disk
+ protected:
void commit_dir_2( int result, CDir *dir, __uint64_t committed_version );
- // low level committer
- void do_commit_dir( CDir *dir,
- Context *c,
- int hashcode = -1);
- void do_commit_dir_2( int result,
- CDir *dir,
+ // low level committers
+ void commit_dir_slice( CDir *dir,
Context *c,
- __uint64_t version,
- int hashcode );
-
- void do_fetch_dir( CDir *dir,
- Context *c,
- int hashcode = -1);
- void do_fetch_dir_2( bufferlist &bl,
- inodeno_t ino,
- Context *c,
- int which);
-
- /*
- // hashing
- void hash_dir( CInode *in, Context *c );
- void unhash_dir( CInode *in, Context *c );
- */
-
- // process a message
- void proc_message( Message *m );
+ int hashcode = -1);
+ void commit_dir_slice_2( int result,
+ CDir *dir,
+ Context *c,
+ __uint64_t version,
+ int hashcode );
+ friend class C_MDS_CommitDirFinish;
+ friend class C_MDS_CommitSlice;
};
#endif
void add_dir_item(c_inode_info *c) {
dir_contents.push_back(c);
}
+ void take_dir_items(list<c_inode_info*>& l) {
+ for (list<c_inode_info*>::iterator it = l.begin();
+ it != l.end();
+ it++) {
+ dir_contents.push_back(*it);
+ }
+ l.clear();
+ }
void set_trace_dist(CInode *in, int whoami) {
while (in) {
// ...
- virtual void decode_payload(crope& r, int& off) {
- r.copy(off, sizeof(base_ino), (char*)&base_ino);
+ virtual void decode_payload() {
+ int off = 0;
+ payload.copy(off, sizeof(base_ino), (char*)&base_ino);
off += sizeof(base_ino);
- r.copy(off, sizeof(bool), (char*)&no_base_dir);
+ payload.copy(off, sizeof(bool), (char*)&no_base_dir);
off += sizeof(bool);
- r.copy(off, sizeof(bool), (char*)&no_base_dentry);
+ payload.copy(off, sizeof(bool), (char*)&no_base_dentry);
off += sizeof(bool);
- // r.copy(off, sizeof(bool), (char*)&flag_forward);
+ // payload.copy(off, sizeof(bool), (char*)&flag_forward);
//off += sizeof(bool);
- r.copy(off, sizeof(bool), (char*)&flag_error_dn);
+ payload.copy(off, sizeof(bool), (char*)&flag_error_dn);
off += sizeof(bool);
- error_dentry = r.c_str() + off;
- off += error_dentry.length() + 1;
- r.copy(off, sizeof(bool), (char*)&flag_error_dir);
+
+ _decode(error_dentry, payload, off);
+ payload.copy(off, sizeof(bool), (char*)&flag_error_dir);
off += sizeof(bool);
// dirs
int n;
- r.copy(off, sizeof(int), (char*)&n);
+ payload.copy(off, sizeof(int), (char*)&n);
off += sizeof(int);
for (int i=0; i<n; i++) {
dirs.push_back( new CDirDiscover() );
- off = dirs[i]->_unrope(r, off);
+ dirs[i]->_decode(payload, off);
}
//dout(12) << n << " dirs out" << endl;
// inodes
- r.copy(off, sizeof(int), (char*)&n);
+ payload.copy(off, sizeof(int), (char*)&n);
off += sizeof(int);
for (int i=0; i<n; i++) {
inodes.push_back( new CInodeDiscover() );
- off = inodes[i]->_unrope(r, off);
+ inodes[i]->_decode(payload, off);
}
//dout(12) << n << " inodes out" << endl;
// filepath
- path._unrope(r, off);
+ path._decode(payload, off);
//dout(12) << path.depth() << " dentries out" << endl;
// path_xlock
- r.copy(off, sizeof(int), (char*)&n);
+ payload.copy(off, sizeof(int), (char*)&n);
off += sizeof(int);
for (int i=0; i<n; i++) {
bool b;
- r.copy(off, sizeof(bool), (char*)&b);
+ payload.copy(off, sizeof(bool), (char*)&b);
off += sizeof(bool);
path_xlock.push_back(b);
}
}
- virtual void encode_payload(crope& r) {
- r.append((char*)&base_ino, sizeof(base_ino));
- r.append((char*)&no_base_dir, sizeof(bool));
- r.append((char*)&no_base_dentry, sizeof(bool));
- // r.append((char*)&flag_forward, sizeof(bool));
- r.append((char*)&flag_error_dn, sizeof(bool));
+ void encode_payload() {
+ payload.append((char*)&base_ino, sizeof(base_ino));
+ payload.append((char*)&no_base_dir, sizeof(bool));
+ payload.append((char*)&no_base_dentry, sizeof(bool));
+ // payload.append((char*)&flag_forward, sizeof(bool));
+ payload.append((char*)&flag_error_dn, sizeof(bool));
- r.append((char*)error_dentry.c_str());
- r.append((char)0);
-
- r.append((char*)&flag_error_dir, sizeof(bool));
+ _encode(error_dentry, payload);
+ payload.append((char*)&flag_error_dir, sizeof(bool));
// dirs
int n = dirs.size();
- r.append((char*)&n, sizeof(int));
+ payload.append((char*)&n, sizeof(int));
for (vector<CDirDiscover*>::iterator it = dirs.begin();
it != dirs.end();
it++)
- (*it)->_rope( r );
+ (*it)->_encode( payload );
//dout(12) << n << " dirs in" << endl;
// inodes
n = inodes.size();
- r.append((char*)&n, sizeof(int));
+ payload.append((char*)&n, sizeof(int));
for (vector<CInodeDiscover*>::iterator it = inodes.begin();
it != inodes.end();
it++)
- (*it)->_rope( r );
+ (*it)->_encode( payload );
//dout(12) << n << " inodes in" << endl;
// path
- path._rope( r );
+ path._encode( payload );
//dout(12) << path.depth() << " dentries in" << endl;
// path_xlock
n = path_xlock.size();
- r.append((char*)&n, sizeof(int));
+ payload.append((char*)&n, sizeof(int));
for (vector<bool>::iterator it = path_xlock.begin();
it != path_xlock.end();
it++) {
bool b = *it;
- r.append((char*)&b, sizeof(bool));
+ payload.append((char*)&b, sizeof(bool));
}
}
}
- virtual void decode_payload(crope& s, int& off) {
- s.copy(off, sizeof(ino), (char*)&ino);
+ virtual void decode_payload() {
+ int off = 0;
+ payload.copy(off, sizeof(ino), (char*)&ino);
off += sizeof(ino);
// exports
int ne;
- s.copy(off, sizeof(int), (char*)&ne);
+ payload.copy(off, sizeof(int), (char*)&ne);
off += sizeof(int);
for (int i=0; i<ne; i++) {
inodeno_t ino;
- s.copy(off, sizeof(ino), (char*)&ino);
+ payload.copy(off, sizeof(ino), (char*)&ino);
off += sizeof(ino);
exports.push_back(ino);
}
// inodes
int ni;
- s.copy(off, sizeof(int), (char*)&ni);
+ payload.copy(off, sizeof(int), (char*)&ni);
off += sizeof(int);
for (int i=0; i<ni; i++) {
// inode
CInodeDiscover *in = new CInodeDiscover;
- off = in->_unrope(s, off);
+ in->_decode(payload, off);
inodes.push_back(in);
// dentry
- string d = s.c_str() + off;
- off += d.length() + 1;
- inode_dentry.insert(pair<inodeno_t, string>(in->get_ino(), d));
+ string d;
+ _decode(d, payload, off);
+ inode_dentry[in->get_ino()] = d;
// dir ino
inodeno_t dino;
- s.copy(off, sizeof(dino), (char*)&dino);
+ payload.copy(off, sizeof(dino), (char*)&dino);
off += sizeof(dino);
- inode_dirino.insert(pair<inodeno_t,inodeno_t>(in->get_ino(),dino));
+ inode_dirino[in->get_ino()] = dino;
}
// dirs
int nd;
- s.copy(off, sizeof(int), (char*)&nd);
+ payload.copy(off, sizeof(int), (char*)&nd);
off += sizeof(int);
for (int i=0; i<nd; i++) {
CDirDiscover *dir = new CDirDiscover;
- off = dir->_unrope(s, off);
- dirs.insert(pair<inodeno_t,CDirDiscover*>(dir->get_ino(), dir));
+ dir->_decode(payload, off);
+ dirs[dir->get_ino()] = dir;
}
}
- virtual void encode_payload(crope& s) {
- s.append((char*)&ino, sizeof(ino));
+ virtual void encode_payload() {
+ payload.append((char*)&ino, sizeof(ino));
// exports
int ne = exports.size();
- s.append((char*)&ne, sizeof(int));
+ payload.append((char*)&ne, sizeof(int));
for (list<inodeno_t>::iterator it = exports.begin();
it != exports.end();
it++) {
inodeno_t ino = *it;
- s.append((char*)&ino, sizeof(ino));
+ payload.append((char*)&ino, sizeof(ino));
}
// inodes
int ni = inodes.size();
- s.append((char*)&ni, sizeof(int));
+ payload.append((char*)&ni, sizeof(int));
for (list<CInodeDiscover*>::iterator iit = inodes.begin();
iit != inodes.end();
iit++) {
- (*iit)->_rope(s);
+ (*iit)->_encode(payload);
// dentry
- s.append(inode_dentry[(*iit)->get_ino()].c_str());
- s.append((char)0);
+ _encode(inode_dentry[(*iit)->get_ino()], payload);
// dir ino
inodeno_t ino = inode_dirino[(*iit)->get_ino()];
- s.append((char*)&ino, sizeof(ino));
+ payload.append((char*)&ino, sizeof(ino));
}
// dirs
int nd = dirs.size();
- s.append((char*)&nd, sizeof(int));
+ payload.append((char*)&nd, sizeof(int));
for (map<inodeno_t,CDirDiscover*>::iterator dit = dirs.begin();
dit != dirs.end();
dit++)
- dit->second->_rope(s);
+ dit->second->_encode(payload);
}
};
#define __MHASHDIR_H
#include "msg/Message.h"
-
-#include <ext/rope>
-using namespace std;
+#include "include/bufferlist.h"
class MHashDir : public Message {
- string path;
+ inodeno_t ino;
+ bufferlist state;
+ int nden;
public:
- crope dir_rope;
-
MHashDir() {}
- MHashDir(string& path) :
+ MHashDir(inodeno_t ino) :
Message(MSG_MDS_HASHDIR) {
- this->path = path;
+ this->ino = ino;
+ nden = 0;
}
virtual char *get_type_name() { return "Ha"; }
- string& get_path() { return path; }
- crope& get_state() { return dir_rope; }
+ inodeno_t get_ino() { return ino; }
+ bufferlist& get_state() { return state; }
+ bufferlist* get_state_ptr() { return &state; }
+ int get_nden() { return nden; }
- virtual void decode_payload(crope& s) {
- path = s.c_str();
- dir_rope = s.substr(path.length() + 1, s.length() - path.length() - 1);
+ void set_nden(int n) { nden = n; }
+ void inc_nden() { nden++; }
+
+ void decode_payload() {
+ int off = 0;
+ payload.copy(off, sizeof(ino), (char*)&ino);
+ off += sizeof(ino);
+ payload.copy(off, sizeof(nden), (char*)&nden);
+ off += sizeof(nden);
+
+ size_t len;
+ payload.copy(off, sizeof(len), (char*)&len);
+ off += sizeof(len);
+ state.substr_of(payload, off, len);
}
- virtual void encode_payload(crope& s) {
- s.append(path.c_str(), path.length() + 1);
- s.append(dir_rope);
+ void encode_payload() {
+ payload.append((char*)&ino, sizeof(ino));
+ payload.append((char*)&nden, sizeof(nden));
+ size_t size = state.length();
+ payload.append((char*)&size, sizeof(size));
+ payload.claim_append(state);
}
};
--- /dev/null
+#ifndef __MHASHDIRACK_H
+#define __MHASHDIRACK_H
+
+#include "MHashDir.h"
+
+class MHashDirAck : public Message {
+ inodeno_t ino;
+
+ public:
+ inodeno_t get_ino() { return ino; }
+
+ MHashDirAck() {}
+ MHashDirAck(inodeno_t ino) :
+ Message(MSG_MDS_HASHDIRACK) {
+ this->ino = ino;
+ }
+ virtual char *get_type_name() { return "HAck"; }
+
+ virtual void decode_payload() {
+ payload.copy(0, sizeof(ino), (char*)&ino);
+ }
+ virtual void encode_payload() {
+ payload.append((char*)&ino, sizeof(ino));
+ }
+
+};
+
+#endif
--- /dev/null
+#ifndef __MHASHDIRDISCOVER_H
+#define __MHASHDIRDISCOVER_H
+
+#include "msg/Message.h"
+#include "mds/CInode.h"
+#include "include/types.h"
+
+class MHashDirDiscover : public Message {
+ inodeno_t ino;
+ string path;
+
+ public:
+ inodeno_t get_ino() { return ino; }
+ string& get_path() { return path; }
+
+ MHashDirDiscover() {}
+ MHashDirDiscover(CInode *in) :
+ Message(MSG_MDS_HASHDIRDISCOVER) {
+ in->make_path(path);
+ ino = in->ino();
+ }
+ virtual char *get_type_name() { return "HDis"; }
+
+
+ void decode_payload() {
+ int off = 0;
+ payload.copy(off, sizeof(ino), (char*)&ino);
+ off += sizeof(ino);
+ _decode(path, payload, off);
+ }
+
+ void encode_payload() {
+ payload.append((char*)&ino, sizeof(ino));
+ _encode(path, payload);
+ }
+};
+
+#endif
--- /dev/null
+#ifndef __MHASHDIRDISCOVERACK_H
+#define __MHASHDIRDISCOVERACK_H
+
+#include "msg/Message.h"
+#include "mds/CInode.h"
+#include "include/types.h"
+
+class MHashDirDiscoverAck : public Message {
+ inodeno_t ino;
+ bool success;
+
+ public:
+ inodeno_t get_ino() { return ino; }
+ bool is_success() { return success; }
+
+ MHashDirDiscoverAck() {}
+ MHashDirDiscoverAck(inodeno_t ino, bool success=true) :
+ Message(MSG_MDS_HASHDIRDISCOVERACK) {
+ this->ino = ino;
+ this->success = false;
+ }
+ virtual char *get_type_name() { return "HDisA"; }
+
+
+ void decode_payload() {
+ int off = 0;
+ payload.copy(off, sizeof(ino), (char*)&ino);
+ off += sizeof(ino);
+ payload.copy(off, sizeof(success), (char*)&success);
+ off += sizeof(success);
+ }
+
+ void encode_payload() {
+ payload.append((char*)&ino, sizeof(ino));
+ payload.append((char*)&success, sizeof(success));
+ }
+};
+
+#endif
--- /dev/null
+#ifndef __MHASHDIRNOTIFY_H
+#define __MHASHDIRNOTIFY_H
+
+#include "msg/Message.h"
+
+class MHashDirNotify : public Message {
+ inodeno_t ino;
+ int from;
+
+ public:
+ inodeno_t get_ino() { return ino; }
+ int get_from() { return from; }
+
+ MHashDirNotify() {}
+ MHashDirNotify(inodeno_t ino, int from) :
+ Message(MSG_MDS_HASHDIRNOTIFY) {
+ this->ino = ino;
+ this->from = from;
+ }
+ virtual char *get_type_name() { return "HN"; }
+
+ virtual void decode_payload() {
+ int off = 0;
+ payload.copy(off, sizeof(ino), (char*)&ino);
+ off += sizeof(ino);
+ payload.copy(off, sizeof(from), (char*)&from);
+ off += sizeof(from);
+ }
+ virtual void encode_payload() {
+ payload.append((char*)&ino, sizeof(ino));
+ payload.append((char*)&from, sizeof(from));
+ }
+
+};
+
+#endif
--- /dev/null
+#ifndef __MHASHDIRPREP_H
+#define __MHASHDIRPREP_H
+
+#include "msg/Message.h"
+#include "mds/CInode.h"
+#include "include/types.h"
+
+class MHashDirPrep : public Message {
+ inodeno_t ino;
+ bool assim;
+
+ // subdir dentry names + inodes
+ map<string,CInodeDiscover*> inodes;
+
+ public:
+ inodeno_t get_ino() { return ino; }
+ map<string,CInodeDiscover*>& get_inodes() { return inodes; }
+
+ bool did_assim() { return assim; }
+ void mark_assim() { assert(!assim); assim = true; }
+
+ MHashDirPrep() : assim(false) { }
+ MHashDirPrep(inodeno_t ino) :
+ Message(MSG_MDS_HASHDIRPREP),
+ assim(false) {
+ this->ino = ino;
+ }
+ ~MHashDirPrep() {
+ for (map<string,CInodeDiscover*>::iterator it = inodes.begin();
+ it != inodes.end();
+ it++)
+ delete it->second;
+ }
+
+
+ virtual char *get_type_name() { return "HP"; }
+
+ void add_inode(const string& dentry, CInodeDiscover *in) {
+ inodes[dentry] = in;
+ }
+
+ void decode_payload() {
+ int off = 0;
+ payload.copy(off, sizeof(ino), (char*)&ino);
+ off += sizeof(ino);
+
+ // inodes
+ int ni;
+ payload.copy(off, sizeof(int), (char*)&ni);
+ off += sizeof(int);
+ for (int i=0; i<ni; i++) {
+ // dentry
+ string dname;
+ _decode(dname, payload, off);
+
+ // inode
+ CInodeDiscover *in = new CInodeDiscover;
+ in->_decode(payload, off);
+
+ inodes[dname] = in;
+ }
+ }
+
+ virtual void encode_payload() {
+ payload.append((char*)&ino, sizeof(ino));
+
+ // inodes
+ int ni = inodes.size();
+ payload.append((char*)&ni, sizeof(int));
+ for (map<string,CInodeDiscover*>::iterator iit = inodes.begin();
+ iit != inodes.end();
+ iit++) {
+ _encode(iit->first, payload); // dentry
+ iit->second->_encode(payload); // inode
+ }
+ }
+};
+
+#endif
--- /dev/null
+#ifndef __MHASHDIRPREPACK_H
+#define __MHASHDIRPREPACK_H
+
+#include "msg/Message.h"
+#include "include/types.h"
+
+class MHashDirPrepAck : public Message {
+ inodeno_t ino;
+
+ public:
+ inodeno_t get_ino() { return ino; }
+
+ MHashDirPrepAck() {}
+ MHashDirPrepAck(inodeno_t ino) :
+ Message(MSG_MDS_HASHDIRPREPACK) {
+ this->ino = ino;
+ }
+
+ virtual char *get_type_name() { return "HPAck"; }
+
+ void decode_payload() {
+ payload.copy(0, sizeof(ino), (char*)&ino);
+ }
+ void encode_payload() {
+ payload.append((char*)&ino, sizeof(ino));
+ }
+};
+
+#endif
+++ /dev/null
-#ifndef __MINODEWRITERCLOSED_H
-#define __MINODEWRITERCLOSED_H
-
-typedef struct {
- inodeno_t ino;
- int from;
-} MInodeWriterClosed_st;
-
-class MInodeWriterClosed : public Message {
- MInodeWriterClosed_st st;
-
- public:
- inodeno_t get_ino() { return st.ino; }
- int get_from() { return st.from; }
-
- MInodeWriterClosed() {}
- MInodeWriterClosed(inodeno_t ino, int from) :
- Message(MSG_MDS_INODEWRITERCLOSED) {
- st.ino = ino;
- st.from = from;
- }
- virtual char *get_type_name() { return "IWrCl";}
-
- virtual void decode_payload(crope& s) {
- s.copy(0, sizeof(st), (char*)&st);
- }
- virtual void encode_payload(crope& s) {
- s.append((char*)&st,sizeof(st));
- }
-};
-
-#endif
string destname;
int initiator;
- crope inode_state;
+ bufferlist inode_state;
public:
int get_initiator() { return initiator; }
string& get_srcname() { return srcname; }
inodeno_t get_destdirino() { return destdirino; }
string& get_destname() { return destname; }
- crope& get_inode_state() { return inode_state; }
+ bufferlist& get_inode_state() { return inode_state; }
MRename() {}
MRename(int initiator,
const string& srcname,
inodeno_t destdirino,
const string& destname,
- crope& inode_state) :
+ bufferlist& inode_state) :
Message(MSG_MDS_RENAME) {
this->initiator = initiator;
this->srcdirino = srcdirino;
this->srcname = srcname;
this->destdirino = destdirino;
this->destname = destname;
- this->inode_state = inode_state;
+ this->inode_state.claim( inode_state );
}
virtual char *get_type_name() { return "Rn";}
- virtual void decode_payload(crope& s, int& off) {
- s.copy(off, sizeof(initiator), (char*)&initiator);
+ virtual void decode_payload() {
+ int off = 0;
+ payload.copy(off, sizeof(initiator), (char*)&initiator);
off += sizeof(initiator);
- s.copy(off, sizeof(srcdirino), (char*)&srcdirino);
+ payload.copy(off, sizeof(srcdirino), (char*)&srcdirino);
off += sizeof(srcdirino);
- s.copy(off, sizeof(destdirino), (char*)&destdirino);
+ payload.copy(off, sizeof(destdirino), (char*)&destdirino);
off += sizeof(destdirino);
- _unrope(srcname, s, off);
- _unrope(destname, s, off);
+ _decode(srcname, payload, off);
+ _decode(destname, payload, off);
size_t len;
- s.copy(off, sizeof(len), (char*)&len);
+ payload.copy(off, sizeof(len), (char*)&len);
off += sizeof(len);
- inode_state = s.substr(off, len);
+ inode_state.substr_of(payload, off, len);
off += len;
}
- virtual void encode_payload(crope& s) {
- s.append((char*)&initiator,sizeof(initiator));
- s.append((char*)&srcdirino,sizeof(srcdirino));
- s.append((char*)&destdirino,sizeof(destdirino));
- _rope(srcname, s);
- _rope(destname, s);
+ virtual void encode_payload() {
+ payload.append((char*)&initiator,sizeof(initiator));
+ payload.append((char*)&srcdirino,sizeof(srcdirino));
+ payload.append((char*)&destdirino,sizeof(destdirino));
+ _encode(srcname, payload);
+ _encode(destname, payload);
size_t len = inode_state.length();
- s.append((char*)&len, sizeof(len));
- s.append(inode_state);
+ payload.append((char*)&len, sizeof(len));
+ payload.claim_append(inode_state);
}
};
#include "msg/Message.h"
-#include <ext/rope>
-using namespace std;
-
class MUnhashDir : public Message {
- string path;
+ inodeno_t ino;
+
+ public:
+ inodeno_t get_ino() { return ino; }
- public:
MUnhashDir() {}
- MUnhashDir(string& path) :
+ MUnhashDir(inodeno_t ino) :
Message(MSG_MDS_UNHASHDIR) {
- this->path = path;
- }
- virtual char *get_type_name() { return "UHa"; }
-
- string& get_path() { return path; }
+ this->ino = ino;
+ }
+ virtual char *get_type_name() { return "UH"; }
- virtual int decode_payload(crope s) {
- path = s.c_str();
- return 0;
+ virtual void decode_payload() {
+ payload.copy(0, sizeof(ino), (char*)&ino);
}
- virtual crope get_payload() {
- crope s;
- s.append(path.c_str(), path.length()+1);
- return s;
+ virtual void encode_payload() {
+ payload.append((char*)&ino, sizeof(ino));
}
};
#define __MUNHASHDIRACK_H
#include "msg/Message.h"
-
-#include <ext/rope>
-using namespace std;
+#include "include/bufferlist.h"
class MUnhashDirAck : public Message {
inodeno_t ino;
+ bufferlist state;
+ int nden;
public:
- crope dir_rope;
-
MUnhashDirAck() {}
- MUnhashDirAck(inodeno_t ino) :
+ MUnhashDirAck(inodeno_t ino, bufferlist& bl, int nden) :
Message(MSG_MDS_UNHASHDIRACK) {
this->ino = ino;
+ state.claim(bl);
+ this->nden = nden;
}
- virtual char *get_type_name() { return "UHaAck"; }
+ virtual char *get_type_name() { return "UHaA"; }
inodeno_t get_ino() { return ino; }
- crope& get_state() { return dir_rope; }
+ bufferlist& get_state() { return state; }
+ bufferlist* get_state_ptr() { return &state; }
+ int get_nden() { return nden; }
- virtual int decode_payload(crope s) {
- s.copy(0, sizeof(ino), (char*)&ino);
- dir_rope = s.substr(sizeof(ino), s.length() - sizeof(ino));
- return 0;
+ //void set_nden(int n) { nden = n; }
+ //void inc_nden() { nden++; }
+
+ void decode_payload() {
+ int off = 0;
+ payload.copy(off, sizeof(ino), (char*)&ino);
+ off += sizeof(ino);
+ payload.copy(off, sizeof(nden), (char*)&nden);
+ off += sizeof(nden);
+
+ size_t len;
+ payload.copy(off, sizeof(len), (char*)&len);
+ off += sizeof(len);
+ state.substr_of(payload, off, len);
}
- virtual crope get_payload() {
- crope s;
- s.append((char*)&ino, sizeof(ino));
- s.append(dir_rope);
- return s;
+ void encode_payload() {
+ payload.append((char*)&ino, sizeof(ino));
+ payload.append((char*)&nden, sizeof(nden));
+ size_t size = state.length();
+ payload.append((char*)&size, sizeof(size));
+ payload.claim_append(state);
}
};
--- /dev/null
+#ifndef __MUNHASHDIRNOTIFY_H
+#define __MUNHASHDIRNOTIFY_H
+
+#include "msg/Message.h"
+
+class MUnhashDirNotify : public Message {
+ inodeno_t ino;
+ //int peer;
+
+ public:
+ inodeno_t get_ino() { return ino; }
+ //int get_peer() { return peer; }
+
+ MUnhashDirNotify() {}
+ MUnhashDirNotify(inodeno_t ino/*, int peer*/) :
+ Message(MSG_MDS_UNHASHDIRNOTIFY) {
+ this->ino = ino;
+ //this->peer = peer;
+ }
+ virtual char *get_type_name() { return "UHN"; }
+
+ virtual void decode_payload() {
+ int off = 0;
+ payload.copy(off, sizeof(ino), (char*)&ino);
+ off += sizeof(ino);
+ //payload.copy(off, sizeof(peer), (char*)&peer);
+ //off += sizeof(peer);
+ }
+ virtual void encode_payload() {
+ payload.append((char*)&ino, sizeof(ino));
+ //payload.append((char*)&peer, sizeof(peer));
+ }
+
+};
+
+#endif
--- /dev/null
+#ifndef __MUNHASHDIRNOTIFYACK_H
+#define __MUNHASHDIRNOTIFYACK_H
+
+#include "msg/Message.h"
+
+class MUnhashDirNotifyAck : public Message {
+ inodeno_t ino;
+
+ public:
+ inodeno_t get_ino() { return ino; }
+
+ MUnhashDirNotifyAck() {}
+ MUnhashDirNotifyAck(inodeno_t ino) :
+ Message(MSG_MDS_UNHASHDIRNOTIFYACK) {
+ this->ino = ino;
+ }
+ virtual char *get_type_name() { return "UHNa"; }
+
+ virtual void decode_payload() {
+ payload.copy(0, sizeof(ino), (char*)&ino);
+ }
+ virtual void encode_payload() {
+ payload.append((char*)&ino, sizeof(ino));
+ }
+
+};
+
+#endif
--- /dev/null
+#ifndef __MUNHASHDIRPREP_H
+#define __MUNHASHDIRPREP_H
+
+#include "msg/Message.h"
+
+class MUnhashDirPrep : public Message {
+ inodeno_t ino;
+
+ public:
+ inodeno_t get_ino() { return ino; }
+
+ MUnhashDirPrep() {}
+ MUnhashDirPrep(inodeno_t ino) :
+ Message(MSG_MDS_UNHASHDIRPREP) {
+ this->ino = ino;
+ }
+ virtual char *get_type_name() { return "UHP"; }
+
+ virtual void decode_payload() {
+ payload.copy(0, sizeof(ino), (char*)&ino);
+ }
+ virtual void encode_payload() {
+ payload.append((char*)&ino, sizeof(ino));
+ }
+
+};
+
+#endif
--- /dev/null
+#ifndef __MUNHASHDIRPREPACK_H
+#define __MUNHASHDIRPREPACK_H
+
+#include "msg/Message.h"
+#include "mds/CInode.h"
+#include "include/types.h"
+
+class MUnhashDirPrepAck : public Message {
+ inodeno_t ino;
+ bool assim;
+
+ // subdir dentry names + inodes
+ map<string,CInodeDiscover*> inodes;
+
+ public:
+ inodeno_t get_ino() { return ino; }
+ map<string,CInodeDiscover*>& get_inodes() { return inodes; }
+
+ bool did_assim() { return assim; }
+ void mark_assim() { assert(!assim); assim = true; }
+
+ MUnhashDirPrepAck() : assim(false) { }
+ MUnhashDirPrepAck(inodeno_t ino) :
+ Message(MSG_MDS_UNHASHDIRPREPACK),
+ assim(false) {
+ this->ino = ino;
+ }
+ ~MUnhashDirPrepAck() {
+ for (map<string,CInodeDiscover*>::iterator it = inodes.begin();
+ it != inodes.end();
+ it++)
+ delete it->second;
+ }
+
+
+ virtual char *get_type_name() { return "HP"; }
+
+ void add_inode(const string& dentry, CInodeDiscover *in) {
+ inodes[dentry] = in;
+ }
+
+ void decode_payload() {
+ int off = 0;
+ payload.copy(off, sizeof(ino), (char*)&ino);
+ off += sizeof(ino);
+
+ // inodes
+ int ni;
+ payload.copy(off, sizeof(int), (char*)&ni);
+ off += sizeof(int);
+ for (int i=0; i<ni; i++) {
+ // dentry
+ string dname;
+ _decode(dname, payload, off);
+
+ // inode
+ CInodeDiscover *in = new CInodeDiscover;
+ in->_decode(payload, off);
+
+ inodes[dname] = in;
+ }
+ }
+
+ virtual void encode_payload() {
+ payload.append((char*)&ino, sizeof(ino));
+
+ // inodes
+ int ni = inodes.size();
+ payload.append((char*)&ni, sizeof(int));
+ for (map<string,CInodeDiscover*>::iterator iit = inodes.begin();
+ iit != inodes.end();
+ iit++) {
+ _encode(iit->first, payload); // dentry
+ iit->second->_encode(payload); // inode
+ }
+ }
+};
+
+#endif
#define MSG_MDS_EXPORTDIRFINISH 158
-#define MSG_MDS_HASHDIR 160
-#define MSG_MDS_HASHDIRACK 161
-#define MSG_MDS_UNHASHDIR 162
-#define MSG_MDS_UNHASHDIRACK 163
-
-#define MSG_MDS_INODEWRITERCLOSED 170
+#define MSG_MDS_HASHDIRDISCOVER 160
+#define MSG_MDS_HASHDIRDISCOVERACK 161
+#define MSG_MDS_HASHDIRPREP 162
+#define MSG_MDS_HASHDIRPREPACK 163
+#define MSG_MDS_HASHDIR 164
+#define MSG_MDS_HASHDIRACK 165
+#define MSG_MDS_HASHDIRNOTIFY 166
+#define MSG_MDS_HASHDIRFINISH 167
+
+#define MSG_MDS_UNHASHDIRPREP 170
+#define MSG_MDS_UNHASHDIRPREPACK 171
+#define MSG_MDS_UNHASHDIR 172
+#define MSG_MDS_UNHASHDIRACK 173
+#define MSG_MDS_UNHASHDIRNOTIFY 174
+#define MSG_MDS_UNHASHDIRNOTIFYACK 175
#define MSG_MDS_DENTRYUNLINK 200
#include "messages/MClientRequest.h"
#include "messages/MClientReply.h"
#include "messages/MClientFileCaps.h"
-//#include "messages/MClientInodeAuthUpdate.h"
#include "messages/MDirUpdate.h"
#include "messages/MDiscover.h"
#include "messages/MExportDirNotifyAck.h"
#include "messages/MExportDirFinish.h"
+#include "messages/MHashDirDiscover.h"
+#include "messages/MHashDirDiscoverAck.h"
+#include "messages/MHashDirPrep.h"
+#include "messages/MHashDirPrepAck.h"
+#include "messages/MHashDir.h"
+#include "messages/MHashDirAck.h"
+#include "messages/MHashDirNotify.h"
+
+#include "messages/MUnhashDirPrep.h"
+#include "messages/MUnhashDirPrepAck.h"
+#include "messages/MUnhashDir.h"
+#include "messages/MUnhashDirAck.h"
+#include "messages/MUnhashDirNotify.h"
+#include "messages/MUnhashDirNotifyAck.h"
+
#include "messages/MRenameWarning.h"
#include "messages/MRenameNotify.h"
#include "messages/MRenameNotifyAck.h"
case MSG_CLIENT_FILECAPS:
m = new MClientFileCaps();
break;
- // case MSG_CLIENT_INODEAUTHUPDATE:
- //m = new MClientInodeAuthUpdate();
- //break;
// mds
case MSG_MDS_DIRUPDATE:
m = new MExportDirWarning();
break;
+
+ case MSG_MDS_HASHDIRDISCOVER:
+ m = new MHashDirDiscover();
+ break;
+ case MSG_MDS_HASHDIRDISCOVERACK:
+ m = new MHashDirDiscoverAck();
+ break;
+ case MSG_MDS_HASHDIRPREP:
+ m = new MHashDirPrep();
+ break;
+ case MSG_MDS_HASHDIRPREPACK:
+ m = new MHashDirPrepAck();
+ break;
+ case MSG_MDS_HASHDIR:
+ m = new MHashDir();
+ break;
+ case MSG_MDS_HASHDIRACK:
+ m = new MHashDirAck();
+ break;
+ case MSG_MDS_HASHDIRNOTIFY:
+ m = new MHashDirNotify();
+ break;
+
+ case MSG_MDS_UNHASHDIRPREP:
+ m = new MUnhashDirPrep();
+ break;
+ case MSG_MDS_UNHASHDIRPREPACK:
+ m = new MUnhashDirPrepAck();
+ break;
+ case MSG_MDS_UNHASHDIR:
+ m = new MUnhashDir();
+ break;
+ case MSG_MDS_UNHASHDIRACK:
+ m = new MUnhashDirAck();
+ break;
+ case MSG_MDS_UNHASHDIRNOTIFY:
+ m = new MUnhashDirNotify();
+ break;
+ case MSG_MDS_UNHASHDIRNOTIFYACK:
+ m = new MUnhashDirNotifyAck();
+ break;
+
case MSG_MDS_RENAMEWARNING:
m = new MRenameWarning();
break;