Jason Dillaman [Tue, 25 Apr 2017 19:45:18 +0000 (15:45 -0400)]
rbd-mirror: new state machine for preparing local image
This state machine will be invoked before the bootstrap state machine
and will be responsible for detecting if the local image is already
primary or if it needs to be deleted.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
currently, only plugin based on isa-l is installed. archs other than
amd64 will not have this directory or the plugin(s) residing in it.
hence dh_install will fail when trying to copy nonexistence file/dir.
* debian/ceph-common.install: chmod +x, and only install crypto on amd64
so dh_install can filter the install list using dh-exec
* debian/control: depends on dh-exec now. dh-exec v0.13 introduces support
for filtering based on architecture. see dh-exec's changelog for more
details. but trusty only offers dh-exec v0.12. so do not require ">=
0.13) at this moment.
```
* Perform a graceful shutdown of NSPR. PR_Cleanup() may be called by
* the primordial thread near the end of the main() function.
```
this helps to silence some warnings from valgrind. but it does not hurt
in practice, because the process is about to die. and the freed memory
chunks are only allocated once in NSPR.
Sage Weil [Tue, 11 Apr 2017 21:54:57 +0000 (17:54 -0400)]
os/bluestore: restructure deferred writes
Explicitly aggregate deferred writes into a batch. When we
submit, take the opportunity to coalesce contiguous writes.
Handle aio completion independently from the original txcs.
Note that this paves the way for a few additional steps:
1- we could make deallocations cancel deferred writes.
2- we could drop the txc deferred states entirely and rely on
the explicit deferred write batch machinery instead... if we
build an alternative way to complete the SharedBlob writes
and ensure the lifecycle issue are dealt with. (I'm not sure
it would be worth it, but it might be.)
Casey Bodley [Tue, 21 Mar 2017 16:19:01 +0000 (12:19 -0400)]
rgw: remove rgw_realm_reconfigure_delay
when the master zone is changed, this config variable was increasing the
window of time where the old master zone would continue to handle
requests to modify metadata. those changes would not be reflected by the
new metadata master zone, and would be lost to the cluster
it was an attempt to optimize for the unlikely case of multiple period
changes in a short period of time, but the logic in reload() handles this
case correctly as is
Casey Bodley [Tue, 21 Mar 2017 20:10:27 +0000 (16:10 -0400)]
rgw: require --yes-i-really-mean-it to promote zone with stale metadata
if a zone is promoted to master before it has a chance to sync from the
previous master zone, any metadata entries after its sync position will
be lost
print an error if 'period commit' is trying to promote a zone that is
more than one period behind the current master, and only allow the
commit to proceed if the --yes-i-really-mean-it flag is provided
Casey Bodley [Fri, 17 Mar 2017 13:55:47 +0000 (09:55 -0400)]
rgw: store realm epoch with sync status markers
sync status markers can't be compared between periods, so we need to
record the current period's realm epoch with its markers. when the
rgw_meta_sync_info.realm_epoch is more recent than the marker's
realm_epoch, we must treat the marker as empty
mgr: pass python interpreter's path to embedded python
* also do prevent python from registering its own signal handler, it does
not make sense in our embedded use case.
* pass python interpreter's path to embedded python before initializing
it. python uses this path to look up the "site" modules, etc.
so it can use other interpreter if it is not intalled into $PATH,
otherwise the "python" in $PATH will always be used even the
PYTHON_EXECUTABLE in CMake's cache is pointing to another python
interpreter.
if one want to debug ceph-mgr with a customize build python, this would
be helpful.
msg/async: return right away in NetHandler::set_priority() if not supported
* SO_PRIORITY is linux specific, so no need to check __linux__
* early return if priority is less than 0 (maybe we should also return if
it's higher than 6?), the less indent.
* store errno if fails to set SO_PRIORITY before printing log messages.
* guard the whole function with '#ifdef SO_PRIORITY' so on platforms
where this option is not supported, this function will be a no-op.
test/librados/c_operations: add cmpext tests
Dispatch compare-and-read and compare-and-write compound requests, and
confirm expected behaviour under compare and miscompare conditions.
Signed-off-by: Zhengyong Wang <wangzhengyong@cmss.chinamobile.com> Signed-off-by: David Disseldorp <ddiss@suse.de>
ceph_test_rados_api_aio: add cmpext tests
Write a buffer and compare it with a matching and non-matching buffer
via cmpext. Do this using rados_aio_cmpext(), ioctx.aio_cmpext() and
ioctx.aio_operate(op.cmpext())
Signed-off-by: Zhengyong Wang <wangzhengyong@cmss.chinamobile.com> Signed-off-by: David Disseldorp <ddiss@suse.de>
librados: add cmpext API
The compare-extent (cmpext) operation allows callers to compare existing
object contents with an arbitrary buffer. cmpext requests can be
compounded with read and write operations, allowing for atomic object
content updates. return 0 on success, negative error code
on failure, (-MAX_ERRNO - mismatch_off) on mismatch
This commit is based on Mike Christie's initial C++ API, with the
addition of AIO support and a C API. Response marshalling was also
reworked, so that the miscompare offset is unmarshalled transparently to
the caller.
Signed-off-by: Zhengyong Wang <wangzhengyong@cmss.chinamobile.com> Signed-off-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Zhengyong Wang <wangzhengyong@cmss.chinamobile.com> Signed-off-by: Mike Christie <mchristi@redhat.com>
[ddiss@suse.de: add rados_cmpext() test coverage] Reviewed-by: David Disseldorp <ddiss@suse.de>
ceph osd: add support for new op cmpext
This adds support for a new op cmpext. The request will read
extent.length bytes and compare them to extent.length bytes at
extent.offset on disk. return 0 on success, negative error code
on failure, (-MAX_ERRNO - mismatch_off) on mismatch
rbd will use this in a multi op request to implement the
SCSI COMPARE_AND_WRITE request which is used by VMware for
its atomic test and set request.
Signed-off-by: Zhengyong Wang <wangzhengyong@cmss.chinamobile.com> Signed-off-by: Mike Christie <mchristi@redhat.com>
[ddiss@suse.de: ReplicatedPG -> PrimaryLogPG] Reviewed-by: David Disseldorp <ddiss@suse.de>