Samuel Just [Thu, 3 Apr 2014 19:52:38 +0000 (12:52 -0700)]
ReplicatedPG::do_op: use get_object_context for list-snaps
find_object_context provides some niceties which we don't need since we know
the oid of the clones. Problematically, it also return ENOENT if the snap
requested happens to have been removed. Even in such a case, the clone may
well still exist for other snaps. Rather than modify find_object_context to
avoid this situation for this caller, we'll simply do it inline in do_op.
Fixes: #7858 Signed-off-by: Samuel Just <sam.just@inktank.com>
Samuel Just [Thu, 27 Mar 2014 23:34:20 +0000 (16:34 -0700)]
ReplicatedPG: do not create snapdir on head eviction
Head eviction implies that no clones are present. Also, add
an exists flag to SnapSetContext in order prevent an ssc from
a recent eviction from preventing a snap read from activating
the promotion machinery.
Fixes: #7858 Signed-off-by: Samuel Just <sam.just@inktank.com>
qa: test_alloc_hint: set ec ruleset-failure-domain to osd
Create a custom profile with ruleset-failure-domain=osd. (The default
ruleset-failure-domain=host won't do because this script assumes and
works only if all osds are on the same host.) While at it, set k and m
explicitly to avoid troubles in the future.
stop.sh: unmap rbd images when stopping the whole cluster
Unmap rbd images when stopping the whole cluster. Not doing so results
in images that cannot be unmapped until the same cluster is brought
back up. Issue a warning if we failed to unmap all images.
Sage Weil [Wed, 2 Apr 2014 23:43:10 +0000 (16:43 -0700)]
lockdep: reset state on shutdown
If we shut down, clear out all of the lockdep state. This ensures that if
we start up again on another cct, we will not be confused by old type ids
and dependency state.
Sage Weil [Wed, 2 Apr 2014 23:46:30 +0000 (16:46 -0700)]
lockdep: do not initialize if already started
If we have already registered a cct for lockdep, do not accept another one.
We already check that the cct matches when we shut down. This we will run
for the life span of a single cct and no longer.
Fixes: #7965 Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Wed, 2 Apr 2014 23:03:37 +0000 (16:03 -0700)]
OSDMap: bump snap_epoch when adding a tier
When we make an existing pool a tier, we start copying the snap metadata
from the base tier. That includes removed_snaps. In order for the OSD
to recognize that this value is changing for the first time, we need to
set snap_epoch, or else the OSD doesn't update it's in-memory PGPool
with removed snaps and we eventually hit an assertion failure because
PGPool::cached_remove_snaps is incorrect (e.g., empty).
Fix this by bumping snap_epoch when we add the new tier.
Fixes: #7915 Signed-off-by: Sage Weil <sage@inktank.com>
* Require "$remote_fs" since it guarantees /usr availability
(rbd executable is in /usr/bin/rbd)
* Speed-up init.d rbd mapping on machines acting as MON/OSD
by starting rbdmap after /init.d/ceph (when possible) and
shutting down rbd before ceph.
* Map rbd devices before starting X (helpful when /home is mounted from rbd).
mds: add dentries in dirfrag to LRU in reverse order
Files in a dirfrag are usually processed in the order of readdir
results. Files at the beginning of are more likely to be used in
the future than files at the last.
For across authority rename, the MDS first freezes the source inode's
authpin. It happens while the source dentry isn't locked. So when the
inode's authpin become frozen, the source dentry may have changed and
be linked to a different inode.
mds: treat cluster as degraded when there is clientreplay MDS
This forbids exporting subtrees and fragmenting dirfrags when there
is MDS in clientreplay state. During replaying client requests, the
MDS may need to authpin some remote objects. Exporting subtrees and
fragmenting dirfrags slow down replaying client requests.
Yan, Zheng [Mon, 31 Mar 2014 01:46:58 +0000 (09:46 +0800)]
mds: don't start new segment while finishing disambiguate imports
This avoid inserting ESubtreeMap among EImportFinish events that
finish disambiguate imports. Because the ESubtreeMap reflects the
subtree state when all EImportFinish events are replayed.
Sage Weil [Tue, 1 Apr 2014 21:27:31 +0000 (14:27 -0700)]
osd/ReplicatedPG: mark_unrollbackable when _rollback_to head
We fell into the case in _rollback_to where we just set ctx->modify = true
and don't explicitly mark the ctx and unrollbackable. Later, we screw up
in proc_replica_log as a result because we think we can rollback this
update to the head when in reality we cannot.
Fixes: #7907 Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Tue, 1 Apr 2014 18:04:47 +0000 (11:04 -0700)]
osd/ReplicatedPG: handle snapdir properly during scrub
Handle snapdir similarly to how head is treated when updating the
next_clone info. Also, add a warning when we have a snapdir object and
head_exists == true (the converse of the existing check).
Fixes: #7937 Signed-off-by: Sage Weil <sage@inktank.com>
Josh Durgin [Mon, 31 Mar 2014 21:53:31 +0000 (14:53 -0700)]
librbd: skip zeroes when copying an image
This is the simple coarse-grained solution, but it works well in
common cases like a small base image resized with a bunch of empty
space at the end. Finer-grained sparseness can be copied by using rbd
{export,import}-diff.
By default, we don't send out maps with primary_temp mappings because
there is no infrastructure in place that would make sure that the
entire cluster knows about primary_temp. Add an option to allow
primary_temp mappings, for development purposes.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Sage Weil [Mon, 31 Mar 2014 17:42:23 +0000 (10:42 -0700)]
mon/PGMap: clear pool sum when last pg is deleted
Use the x.0 pg as a sentinel for the existence of the pool. Note that we
have to clean in up two paths: apply_incrmenetal (which is actually
deprecated) and the normal PGMonitor refresh.
Fixes: #7912 Signed-off-by: Sage Weil <sage@inktank.com>
Loic Dachary [Sat, 29 Mar 2014 10:30:42 +0000 (11:30 +0100)]
doc: pgbackend dev doc outdated notice
* Warn the reader that the implementation is ahead and may differ
* Update the links to the Firefly branch
* Remove links to issues used during development to avoid confusion
Loic Dachary [Sat, 29 Mar 2014 10:27:00 +0000 (11:27 +0100)]
doc: erasure code developer notes updates
Update the introduction to explain erasure code profiles. Remove
obsolete explanations about partial writes etc. Remove links to tickets
used during development. Update permalinks to be closer to
Firefly (v0.78).
Yan, Zheng [Sun, 30 Mar 2014 01:21:57 +0000 (09:21 +0800)]
fuse: implement 'access' low level function
Add an empty 'access' function to fuse low level functions. This
allow us to use ceph-fuse with fuse_default_permissions = false.
'fuse_default_permissions = false' can significantly improve the
speed of create/removing large number of files.
When fuse_default_permissions is true, the fuse kernel module sends
a getattr request whenever the kernel needs to check a directory's
permission. getattr (STAT_CAP_INODE_ALL) can be very slow if the
directory was just modified.