Greg Farnum [Wed, 14 Dec 2016 20:09:44 +0000 (12:09 -0800)]
client: fix the cross-quota rename boundary check conditions
We were previously rejecting a rename if either of the involved directories
was a quota root, even if the other directory was part of the same quota
"tree". What we really want to do is identify the correct quota root
(whether local or ancestral) for each directory and compare them. So
now we do.
osd/ReplicatedPG: limit the number of concurrently trimming pgs
This patch introduces an AsyncReserver for snap trimming to limit the
number of pgs on any single OSD which can be trimming, as with backfill.
Unlike backfill, we don't take remote reservations on the assumption
that the set of pgs with trimming work to do is already well
distributed, so it doesn't seem worth the implementation overhead to get
reservations from the peers as well.
Sage Weil [Fri, 3 Mar 2017 03:20:08 +0000 (21:20 -0600)]
osdc/Objecter: resend RWORDERED ops on full
Our condition for respecting the FULL flag is complex, and involves
the WRITE | RWORDERED flags vs the FULL_FORCE | FULL_TRY flags. Previously,
we could block a read bc of RWORDRED but not resend it later.
Fix by capturing the complex condition in a respects_full() bool and using
it both for the blocking-on-send and resending-on-possibly-notfull-later
checks.
Conflicts:
debian/ceph-common.install (retain /etc/init.d/rbdmap so jewel users can choose sysvinit or systemd)
debian/rules (retain /etc/init.d/rbdmap so jewel users can choose sysvinit or systemd)
qa/tasks/workunit.py: use "overrides" as the default settings of workunit
otherwise the settings in "workunit" tasks are always overridden by the
settings in template config. so we'd better follow the way of how
"install" task updates itself with the "overrides" settings: it uses the
"overrides" as the *defaults*.
Kefu Chai [Thu, 30 Mar 2017 04:37:01 +0000 (12:37 +0800)]
tasks/workunit.py: specify the branch name when cloning a branch
c1309fb failed to specify a branch when cloning using --depth=1, which
by default clones the HEAD. and we can not "git checkout" a specific
sha1 if it is not HEAD, after cloning using '--depth=1', so in this
change, we dispatch "tag", "branch", "HEAD" using three Refspec classes.
Dan Mick [Wed, 29 Mar 2017 03:08:13 +0000 (20:08 -0700)]
tasks/workunit.py: when cloning, use --depth=1
Help avoid killing git.ceph.com. A depth 1 clone takes about
7 seconds, whereas a full one takes about 3:40 (much of it
waiting for the server to create a huge compressed pack)
Erwan Velu [Fri, 31 Mar 2017 12:54:33 +0000 (14:54 +0200)]
ceph-disk: Adding retry loop in get_partition_dev()
There is very rare cases where get_partition_dev() is called before the actual partition is available in /sys/block/<device>.
It appear that waiting a very short is usually enough to get the partition beein populated.
Analysis:
update_partition() is supposed to be enough to avoid any racing between events sent by parted/sgdisk/partprobe and
the actual creation on the /sys/block/<device>/* entrypoint.
On our CI that race occurs pretty often but trying to reproduce it locally never been possible.
This patch is almost a workaround rather than a fix to the real problem.
It offer retrying after a very short to be make a chance the device to appear.
This approach have been succesful on the CI.
Note his patch is not changing the timing when the device is perfectly created on time and just differ by a 1/5th up to 2 seconds when the bug occurs.
A typical output from the build running on a CI with that code.
command_check_call: Running command: /usr/bin/udevadm settle --timeout=600
get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
get_partition_dev: Try 1/10 : partition 2 for /dev/sda does not in /sys/block/sda
get_partition_dev: Found partition 2 for /dev/sda after 1 tries
get_dm_uuid: get_dm_uuid /dev/sda uuid path is /sys/dev/block/8:0/dm/uuid
get_dm_uuid: get_dm_uuid /dev/sda2 uuid path is /sys/dev/block/8:2/dm/uuid
Erwan Velu [Wed, 22 Mar 2017 09:11:44 +0000 (10:11 +0100)]
ceph-disk: Reporting /sys directory in get_partition_dev()
When get_partition_dev() fails, it reports the following message :
ceph_disk.main.Error: Error: partition 2 for /dev/sdb does not appear to exist
The code search for a directory inside the /sys/block/get_dev_name(os.path.realpath(dev)).
The issue here is the error message doesn't report that path when failing while it might be involved in.
This patch is about reporting where the code was looking at when trying to estimate if the partition was available.
Matt Benjamin [Tue, 28 Feb 2017 20:49:06 +0000 (15:49 -0500)]
rgw_file: use fh_hook::is_linked() to check residence
Previously we assumed that !deleted handles were resident--there
is an observed case where a !deleted handle is !linked. Since
we currently use safe_link mode, an is_linked() check is
available, and exhaustive.
Fixes: http://tracker.ceph.com/issues/19111 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit c0aa515f8d8c57ec5ee09e3b60df3cac60453c40)
Matt Benjamin [Wed, 1 Mar 2017 01:24:12 +0000 (20:24 -0500)]
rgw_file: RGWFileHandle dtor must also cond-unlink from FHCache
Formerly masked in part by the reclaim() action, direct-delete now
substitutes for reclaim() iff its LRU lane is over its high-water
mark, and in particular, like reclaim() the destructor is certain
to see handles still interned on the FHcache when nfs-ganesha is
recycling objects from its own LRU.
Fixes: http://tracker.ceph.com/issues/19112 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit d51a3b1224ba62bb53c6c2c7751fcf7853c35a4b)
Matt Benjamin [Tue, 17 Jan 2017 16:23:45 +0000 (11:23 -0500)]
rgw_file: split last argv on ws, if provided
This is intended to allow an "extra" unparsed argument string
containing various cmdline options to be passed as the last argument
in the argv array of librgw_create(), which nfs-ganesha is
expecting to happen.
Matt Benjamin [Mon, 13 Feb 2017 01:18:26 +0000 (20:18 -0500)]
rgw_file: fix hiwat behavior
Removed logic to skip reclaim processing conditionally on hiwat,
this probably meant to be related to a lowat value, which does
not exist.
Having exercised the hiwat reclaim behavior, noticed that the
path which moves unreachable objects to LRU, could and probably
should remove them altogether when q.size exceeds hiwat. Now
the max unreachable float is lane hiwat, for all lanes.
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit b8791b2217e9ca87b2d17b51f283fa14bd68b581) Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Matt Benjamin [Sun, 12 Feb 2017 23:20:43 +0000 (18:20 -0500)]
rgw_file: refcnt bugfixes
This change includes 3 related changes:
1. add required lock flags for FHCache updates--this is a crash
bug under concurrent update/lookup
2. omit to inc/dec refcnt on root filehandles in 2 places--the
root handle current is not on the lru list, so it's not
valid to do so
3. based on observation of LRU behavior during creates/deletes,
update (cohort) LRU unref to move objects to LRU when their
refcount falls to SENTINEL_REFCNT--this cheaply primes the
current reclaim() mechanism, so very significanty improves
space use (e.g., after deletes) in the absence of scans
(which is common due to nfs-ganesha caching)
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit beaeff059375b44188160dbde8a81dd4f4f8c6eb) Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Matt Benjamin [Sat, 11 Feb 2017 21:38:05 +0000 (16:38 -0500)]
rgw_file: add refcount dout traces at debuglevel 17
These are helpful for checking RGWFileHandle refcnt invariants.
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 462034e17f919fb783ee33e2c9fa8089f93fd97d) Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Matt Benjamin [Fri, 10 Feb 2017 22:14:16 +0000 (17:14 -0500)]
rgw_file: add pretty-print for RGWFileHandle
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit ef330f385d3407af5f470b5093145f59cc4dcc79) Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Matt Benjamin [Mon, 20 Feb 2017 20:05:18 +0000 (15:05 -0500)]
rgw_file: fix marker computation
Fixes: http://tracker.ceph.com/issues/19018 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 4454765e7dd08535c50d24205858e18dba4b454c)
Matt Benjamin [Mon, 20 Feb 2017 01:34:31 +0000 (20:34 -0500)]
rgw_file: rgw_readdir can't list multi-segment dirs
This issue has one root cause in librgw, namely that the marker
argument to these requests was incorrectly formatted (though the
marker cache was working as intended).
Secondarily, for nfs-ganesha users, there is a compounding issue
that the RGW fsal was required by "temporary" convention to
populate the entire dirent cache for a directory on a single
readdir() invocation--the cache_inode/mdcache implementations
invariantly pass (before future 2.5 changesets, currently in
progress) a null pointer for the start cookie offset, intended
to convey this.
Fixes: http://tracker.ceph.com/issues/18991 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 2cd60ee9712291b906123aca1704288b18a9742b)
Matt Benjamin [Sun, 19 Feb 2017 23:21:06 +0000 (18:21 -0500)]
rgw_file: allow setattr on placeholder directories
When a POSIX path <bucket>/foo/ is known only as an implicit path
segment from other objects (e.g., <bucket>/foo/bar.txt), a case
that would usually arise from S3 upload of such an object, an
RGWFileHandle object representing "<bucket>/foo/" will be constructed
as needed, with no backing in RGW.
This is by design, but subsequently, if a setattr is performed on
such a handle, we must be ready to create the object inline with
storing the attributes.
Fixes: http://tracker.ceph.com/issues/18989 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 55eec1c0a0e136736961423b7b6244d0f3693c1a)