-- how to reliably deliver cache expire messages?
- - how should proxy behave?
- - exporter failure
- - all cacheexpire info has been passed on up until point where export is permanent. no impact.
- - importer failure
- - exporter collects expire info, so that it can reverse.
- - ???
- - maybe hosts should double-up expires until after export is known to have committed?
-
-/- exporter recovery if importer fails during EXPORT_EXPORTING stage
-- importer recovery if exporter fails
-
-/?- delay response to sending import_map if export in progress?
-/?- finish export before sending import_map?
-/- ambiguous imports on active node should include in-progress imports!
-/- how to effectively trim cache after resolve but before rejoin
-/ - we need to eliminate unneed non-auth metadata, without hosing potentially useful auth metadata
-
-- osd needs a set_floor_and_read op for safe failover/STOGITH-like semantics.
-
-- failures during recovery stages (resolve, rejoin)... make sure rejoin still works!
-
-- fix mds initial osdmap weirdness (which will currently screw up on standby -> almost anything)
-
-
-importmap only sent after exports have completed.
-failures update export ack waitlists, so exports will compelte if unrelated nodes fail.
-importmap can be sent regardless of import status -- pending import is just flagged ambiguous.
-failure of exporter induces some cleanup on importer. importer will disambiguate when it gets an importmap on exporter recovery.
-failure of importer induces cleanup on exporter. no ambiguity.
-
-
-/- no new mds may join if cluster is in a recovery state. starting -> standby (unless failed)
-/ - make sure creating -> standby, and are not included in recovery set?
-
-
-mdsmap notes
-- mds don't care about intervening states, except rejoin > active, and
- that transition requires active involvement. thus, no need worry
- about delivering/processing the full sequence of maps.
-
-blech:
-- EMetablob should return 'expired' if they have
- higher versions (and are thus described by a newer journal entry)
-
-mds
-- mds falure vs clients
- - clean up client op redirection
- - idempotent ops
-- journal+recovery
- - unlink
- - open+create
- - file capabilities i/o
- - link
- - rename
+monitor
+- finish generic paxos
-- should auth_pins really go to the root?
- - FIXME: auth_pins on importer versus import beneath an authpinned region?
+osdmon
+- distribute w/ paxos framework
+- allow fresh replacement osds. add osd_created in osdmap, probably
+- monitor needs to monitor some osds...
+- monitor pg states, notify on out?
+- watch osd utilization; adjust overload in cluster map
+mdsmon
+- distribute w/ paxos framework
journaler
- fix up for large events (e.g. imports)
- need to truncate at detected (valid) write_pos to clear out any other partial trailing writes
-- lnet?
-- crush
- - xml import/export?
- - crush tools
-
+crush
+- xml import/export?
+- crush tools
-locks
- namespace
- path pins -- read lock
- dentry xlock -- write lock
- inode
- hard/file rd start/stop -- read lock
- hard/file wr start/stop -- write lock
-
rados+ebofs
- purge replicated writes from cache. (with exception of partial tail blocks.)
-rados paper todo
+rados paper todo?
- better experiments
- berkeleydb objectstore?
- flush log only in response to subsequent read or write?
- add connection retry.
-
-monitor
-?- monitor user lib that handles resending, redirection of mon requests.
-- elector
-/- organize monitor store
-
-osdmon
-- distribute
-- recovery: store elector epochs with maps..
-- monitor needs to monitor some osds...
-- monitor pgs, notify on out
-- watch osd utilization; adjust overload in cluster map
-
-mdsmon
+objecter
+- read+floor_lockout
osd/rados
+- read+floor_lockout for clean STOGITH-like/fencing semantics after failover.
+- separate out replication code into a PG class, to pave way for RAID
+
- efficiently replicate clone() objects
- pg_num instead of pg_bits
- flag missing log entries on crash recovery --> WRNOOP? or WRLOST?
- pg_bit/pg_num changes
- report crashed pgs?
-messenger
-/- share same tcp socket for sender and receiver
-/- graceful connection teardown
+simplemessenger
- close idle connections
-- generalize out a transport layer?
- - eg reliable tcp for most things, connectionless unreliable datagrams for monitors?
- - or, aggressive connection closing on monitors? or just max_connections and an lru?
-- osds: forget idle client addrs
-
-objecter
+- retry, timeout on connection or transmission failure
objectcacher
- ocacher caps transitions vs locks
- test read locks
reliability
-- heartbeat vs ping
+- heartbeat vs ping?
- osdmonitor, filter
ebofs
- metadata in nvram? flash?
-
-bugs/stability
-- figure out weird 40ms latency with double log entries
-
-
-
remaining hard problems
- how to cope with file size changes and read/write sharing
-- mds STOGITH...
crush
client
-- mixed lazy and non-lazy io will clobber each others' caps in the buffer cache
-
-- test client caps with meta exports
-- some heuristic behavior to consolidate caps to inode auth
-- client will re-tx anything it needed to say upon rx of new mds notification (?)
-
-
-
+- fstat
+- make_request: cope with mds failure
+- mixed lazy and non-lazy io will clobber each others' caps in the buffer cache.. how to isolate..
+- test client caps migration w/ mds exports
+- some heuristic behavior to consolidate caps to inode auth?
-CLIENT TODO
-
-- statfs
-
-
- dump active config in run output somewhere
+
+
+
+
+
+
+
+
+==== MDS RECOVERY ====
+
+- how to reliably deliver cache expire messages?
+ - how should proxy behave?
+ - exporter failure
+ - all cacheexpire info has been passed on up until point where export is permanent. no impact.
+ - importer failure
+ - exporter collects expire info, so that it can reverse.
+ - ???
+ - maybe hosts should double-up expires until after export is known to have committed?
+--> just send expires to both nodes. dir_auth+dir_auth2. clean up export ack/notify process. :)
+
+*** dar... no, separate bystander dir_auth updates from the prepare/ack/commit cycle!
+- expire should go to both old and new auth
+- set_dir_auth should take optional second auth, and authority() should optionally set/return a second possible auth
+- does inode need it's own replica list? no!
+- dirslices.
+
+
+/- exporter recovery if importer fails during EXPORT_EXPORTING stage
+- importer recovery if exporter fails
+
+/?- delay response to sending import_map if export in progress?
+/?- finish export before sending import_map?
+/- ambiguous imports on active node should include in-progress imports!
+/- how to effectively trim cache after resolve but before rejoin
+/ - we need to eliminate unneed non-auth metadata, without hosing potentially useful auth metadata
+
+- osd needs a set_floor_and_read op for safe failover/STOGITH-like semantics.
+
+- failures during recovery stages (resolve, rejoin)... make sure rejoin still works!
+
+- fix mds initial osdmap weirdness (which will currently screw up on standby -> almost anything)
+
+
+importmap only sent after exports have completed.
+failures update export ack waitlists, so exports will compelte if unrelated nodes fail.
+importmap can be sent regardless of import status -- pending import is just flagged ambiguous.
+failure of exporter induces some cleanup on importer. importer will disambiguate when it gets an importmap on exporter recovery.
+failure of importer induces cleanup on exporter. no ambiguity.
+
+
+/- no new mds may join if cluster is in a recovery state. starting -> standby (unless failed)
+/ - make sure creating -> standby, and are not included in recovery set?
+
+
+mdsmap notes
+- mds don't care about intervening states, except rejoin > active, and
+ that transition requires active involvement. thus, no need worry
+ about delivering/processing the full sequence of maps.
+
+blech:
+- EMetablob should return 'expired' if they have
+ higher versions (and are thus described by a newer journal entry)
+
+mds
+- mds falure vs clients
+ - clean up client op redirection
+ - idempotent ops
+
+- journal+recovery
+ - unlink
+ - open(wr cap), open+create
+ - file capabilities i/o
+ - link
+ - rename
+
+- should auth_pins really go to the root?
+ - FIXME: auth_pins on importer versus import beneath an authpinned region?
+