Sage Weil [Thu, 12 Oct 2017 20:56:01 +0000 (15:56 -0500)]
osdc/Objecter: apply removed_snaps from gap to in-flight requests
If we are so laggy that we aren't contiguous with the mon's latest
map, the mon will provide a summary of removed_snaps for the gap.
Apply those to our in-flight ops.
Sage Weil [Wed, 11 Oct 2017 19:17:39 +0000 (14:17 -0500)]
osd/OSDMap: track newly removed and purged snaps in each epoch
Instead of maintaining a set of snapids that have been removed over
all time, instead note just the newly removed and newly purged snaps
in each OSDMap epoch. This is easier to consume for both the Objecter
and OSD.
Also keep the interval of snaps that have been removed but not perged
in each OSDMap. This is extremely convenient because it frees the OSDs
from having to maintain this information in parallel even when they may
not have PGs belonging to those pools. These structures will be large
right when the ugprade happens and the pg_pool_t::removed_snaps gets copied
to the new fields, but in the steady state it will be relatively small,
reflecting only the set of snaps that are currently being removed.
This also provides convenient visibility into the "trimming snaps" set
that the cluster is working on.
Sage Weil [Thu, 16 Nov 2017 20:26:27 +0000 (14:26 -0600)]
osd/PG: restart recovery if NotRecovering and unfound found
If we are in recovery_unfound state waiting for unfound objects, and we
find them, we need to restart the recovery reservation process so that we
can recover. Do this by queueing DoRecover() event instead of calling
queue_recovery() (which won't do anything since we're not in
recoverying|backfilling pg states).
Make the parent Active state ignore DoRecovery so that if we are already
in some phase of recovery/backfill the event gets ignored. It is already
handled by the other important substates that care, like Clean (for
repair's benefit).
I'm not sure why states like Activating are paying attention tot his vevent...
Fixes: http://tracker.ceph.com/issues/22145 Signed-off-by: Sage Weil <sage@redhat.com>
Adam C. Emerson [Wed, 29 Nov 2017 18:17:12 +0000 (13:17 -0500)]
Merge pull request #18954 from adamemerson/wip-hole-in-the-bucket-dear-liza
rgw: Add try_refresh_bucket_info function
rgw: Add retry_raced_bucket_write
rgw: Handle stale bucket info in RGWPutMetadataBucket
rgw: Handle stale bucket info in RGWSetBucketVersioning
rgw: Handle stale bucket info in RGWSetBucketWebsite
rgw: Handle stale bucket info in RGWDeleteBucketWebsite
rgw: Handle stale bucket info in RGWPutBucketPolicy
rgw: Handle stale bucket info in RGWDeleteBucketPolicy
rgw: Expire entries in bucket info cache
Kefu Chai [Tue, 28 Nov 2017 10:00:37 +0000 (18:00 +0800)]
cmake,rpm,deb: update to accommodate SPDK v17.10
* cmake/modules/BuildSPDK.cmake: add lvol
* cmake/modules/BuildDPDK.cmake: add pci and bus_pci
* ceph.spec.in, cmake/modules/BuildSPDK.cmake, debian/control:
re-introduce libuuid dependency, as 17.07 added lvol. and the latter
depends on uuid.
* cmake/modules/BuildSPDK.cmake: avoid introducing local variable of
`iface_libs`.
* cmake/modules/patch-dpdk-conf.sh: disable
CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES, this option introduces the
balanced allocation of memory. but it also requires libnuma-dev.
let's disable it for now.
Kefu Chai [Tue, 28 Nov 2017 06:42:31 +0000 (14:42 +0800)]
qa/ceph-disk: enlarge the simulated SCSI disk
100MB will be allocated for journal, and the remaining 100MB is for data
device. taking the inode into consideration, there will be approximately
87988 kB available for the activated OSD. and it will complain with a
"nearfull" state.
Kefu Chai [Mon, 27 Nov 2017 08:35:56 +0000 (16:35 +0800)]
ceph-disk: path_set_context() after rename()
it does not matter if we chown/restorecon before or after the rename,
but the logging message looks better this way: instead of fixing the
.tmp files, we are updating the attributes of the dest files w/o
.${pid}.tmp extension.
Kefu Chai [Mon, 27 Nov 2017 04:08:04 +0000 (12:08 +0800)]
qa/workunits/ceph-disk: do not redirect stderr to stdout
normally, if we care about the output of ceph-disk, we expect a json
string, and ceph-disk sends the output to stdout, and errors/warnings
to stderr. so everything works as expected. and the test should also
follow this tradition. for example, if deprecated warnings are printed,
the warning message should not be collected along with the json string.
Adam C. Emerson [Fri, 17 Nov 2017 22:15:26 +0000 (17:15 -0500)]
rgw: Expire entries in bucket info cache
To bound the degree to which an RGW instance can go out to lunch if
the watch/notify breaks down, force refresh of any cache entry over a
certain age.
Fifteen minutes by default, and expiration can be turned off entirely.
This is separate from the LRU. The LRU removes entries based on the
last time of access. This expiration patch forces refresh based on the
last time they were updated.
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Adam C. Emerson [Thu, 16 Nov 2017 19:42:58 +0000 (14:42 -0500)]
rgw: Add try_refresh_bucket_info function
Sometimes operations fail with -ECANCELED. This means we got raced. If
this happens we should update our bucket info from cache and try again.
Some user reports suggest that our cache may be getting and staying
out of sync. This is a bug and should be fixed, but it would also be
nice if we were robust enough to notice the problem and refresh.
So in that case, we invalidate the cache and fetch direct from the
OSD, putting a warning in the log.
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Nathan Cutler [Mon, 27 Nov 2017 14:09:39 +0000 (15:09 +0100)]
cmake: mgr: exclude .gitignore
Fixes RPMLINT warning:
ceph-mgr.x86_64: W: version-control-internal-file /usr/lib64/ceph/mgr/.gitignore
ceph-mgr.x86_64: W: version-control-internal-file /usr/lib64/ceph/mgr/dashboard/static/AdminLTE-2.3.7/.gitignore
You have included file(s) internally used by a version control system in the
package. Move these files out of the package and rebuild it.
Note: the backslash has to be doubled up for the regex to make
it through CMake.
Yingxin [Mon, 27 Nov 2017 10:02:25 +0000 (05:02 -0500)]
blkin: fix unconditional tracing
Blkin trace will be triggered unconditionally at OSD `issue_op`, even if
op->pg_trace is not initialized. This issue introduces unnecessary
overhead and confusing tracing records when blkin tracing is ON.
Kefu Chai [Mon, 27 Nov 2017 04:39:41 +0000 (12:39 +0800)]
cmake: silence CMP0054 warning
see https://gitlab.kitware.com/cmake/cmake/issues/17381 and
https://gitlab.kitware.com/cmake/cmake/commit/a8be8b1b54fe1922a1d1fc0365c3ae5c918b6654,
so before the updated cmake is released and packaged. we should
add this setting.